I needed to track hourly, daily, and monthly page views and unique users for approximately 25 pages for a particular micro web application. I also needed which userid(10-digit) viewed which page as well.
I happened to read a few blogs in which startups have used redis for fast analytic processing. So I decided to give it a try after some discussion with my colleagues.
Now I will explain my solution and its consequences:
For hourly result, I simply set bit for specific pageid, hour, and id as follows:
SETBIT index:2014-01-01-23 1000000000 1
For daily result, I used BITOP OR operation to union hourly keys into daily key.
BITOP OR index:2014-01-01 index:2014-01-01-00 index:2014-01-01-01 ... index:2014-01-01-23
For monthly result, I used BITOP OR operation to union daily keys into monthly key.
BITOP OR index:2014-01 index:2014-01-01 index:2014-01-02 index:2014-01-03 ... index:2014-01-30
Since I have hourly, daily and monthly keys with me unique users can easily be calculated using Redis BitCount operation.
Above command will return unique hits between 11:00:00pm – 11:59:59pm. It can be done for daily and monthly as well.
So far so good. But what have I done wrong?
From redis docs
Warning: When setting the last possible bit (offset equal to 232 -1) and the string value stored at key does not yet hold a string value, or holds a small string value, Redis needs to allocate all intermediate memory which can block the server for some time.
Now above statement has 2 implications
1. Redis will take more time (and block server) when larger bits are set.
2. Redis will take more memory when larger bits are set.
For implication 1, if we must not block server, a good approach is to use a slave (with read-only option disabled) where the bit-wise operations are performed to avoid blocking the master instance.
What hurt me was the memory(implication 2). Since my userids were all 10-digit numbers. So even if there were few hundered users on some pages those keys were taking atleast 128MB.
(for "SETBIT mykey 1000000000 1" it takes 128MB).
Largest value that I could set was 4,294,967,295 which takes approx 512MB.
On an avegrage 256MB of memory was taken by each of 25 keys per hour.
25X256MB = 6400MB So 24X6400 = 153,600 MB = 150GB per day.
That was not acceptable for few lakh users. (10 lakh = 1 million)
Lesson learned read the documentation carefully, look for gotchas. And get your numbers right before implementation.
PS: I end up using simple redis SADD (set add) for every key.