Redis: Migrate Keys From One Instance To Another Based On Pattern Matching [node.js script]

If you work with Redis and a fairly big dataset it’s pretty likely that, at some point, you’re going to fill up all your RAM. This already happened to me more than once with; the easy fix, if the growth is slow enough, is to just add more memory.

Simpsons_Duff_Lite_DryWhen that doesn’t work anymore, the next move – while waiting for Redis cluster – is quite likely going to be using some manual sharding technique. When you do, you’ll probably going to need to split your dataset among more instances.

This is exactly what just happened to me, and I didn’t find any ready recipe or solution to easily split my rather large dataset. So I ended up creating a small node.js script which doesn’t use too much memory and that you can find here:


Redis or: How I learned to stop worrying and love the unstable version

Numbers on Twitter are huge. Even with just a 1% sampling of tweets, even working only on tweets with an hashtag, you reach big numbers very fast.

If you want to to offer a highly interactive and very responsive experience you’ll need to keep most of those numbers in memory, and well indexed. So you’re going to need a lot of RAM: I was aware of that when I decided to use Redis as my main DB, and I purposefully chose a parsimonious data structure and a “cloud” – whatever that means – solution that would allow me to increase the amout of RAM when needed without breaking my shoestring budget.

What I didn’t know was that, with just over one GB and a half of data, as measured by Redis itself, I was going to need double that amount of memory not to have my app crash (which happened twice yesterday). Multiply that amount for two servers for a minimum of redundancy, and you’ll need 6 GB of precious RAM for just 1.3 million tags worth of data…

In the past I had found that, after a restart, Redis would use much less memory than before. So, after the obvious quick fix to get back the service up and running – increasing the amount of RAM – I tried to do just that, using the load balancer to keep the service up while restarting the two istances. But the gains from a restart were very short lived, and I didn’t want to pay for that amount of memory and to have to worry that even that wouldn’t be enough during my upcoming 2 weeks trip to Russia.

Checking the info command output after the restarts, I noticed that what was increasing very fast after the new start was the mem_fragmentation_ratio, which started around 1.3 but quickly reached and surpassed 2. After a quick search, it turned out that there is a known memory fragmentation problem with Redis on Linux, caused by malloc.

I had already noticed the disquieting field in the past, but hadn’t realized just how bad the situation was. A ratio over two to one looked really too much to me, especially when a quick search on the internet revealed that a ratio of 1.4 was already considered abnormal.

Delving deeper into the issue, I found out that the fragmentation problem is especially bad with sorted lists, and hashtagify uses a lot of sorted lists: As a matter of fact, of the 4 million keys used right now, more than 90% are sorted lists!

Different versions of Redis were made to find a solution to this issue, but I didn’t find a “definitive” post/article/howto about which is the best one. On the other hand, I noticed that version 2.4, which is not final yet, is incorporating one of those solutions, the use of jemalloc, so I tried it on one of my servers.

Things immediately improved; it’s early to be sure, but for now after many hours the fragmentation ratio is only 1.02. Halving my memory requirements with just a simple version upgrade is not bad at all! After all, I should be able to leave for Russia without the fear of being bankrupt, or with a frozen server, at my return :)

I just wished it was easier to learn about this before, but this is the nature of open source software. Now I just hope that this post will help at least someone else not making my mistake of using the “official” 2.2 release.


UPDATE 2011-06-25

After 6 days, 2 hours and 4 minutes of uptime, during which 33,245,499 commands where executed, the fragmentation is still 1.02. I can now confirm that jemalloc rocks :)

Greetings from Russia