My own dedicated “cloud” with 24 GB of RAM for 89 Euro (127 USD)/Month – part 1

As I already explained in my last post, hashtagify’s servers need to keep most of the data they handle in memory: This is the only way to answer an AJAX request about an hashtag so quickly that the user perceives the interaction as instantaneous and engaging. 

The downside to this, of course, is that I need servers with a lot of memory, and those usually don’t come very cheap. To keep the costs down I had to find a cheap “cloud” Virtual Private Server provider that would allow me to add RAM without also having to pay for more CPU, disk and bandwidth that I didn’t need: With my usual otherwise excellent VPS provider, linode.com, the amount of memory I needed would have been just too expensive for my budget.

After some search I found what I was looking for in glesys.com. For the first month or so this solution worked very well, as I could increase the RAM as needed without having to restart my server, and I was even able to double it at one point for just one hour to make a very memory intensive calculation, paying for just that hour of use. But hashtagify is becoming a very memory hungry application, especially now that I’ve also started collecting users and URLs data, and I wanted to try an even cheaper solution.

During my search for a cloud service I had stumbled upon a couple of tempting dedicated unmanaged server offers with boatloads of RAM and a comparatively cheap price; namely, they were from server4you.comgiga-international.com and hetzner.de – curiously, they’re all from German-speaking countries, but who cares. At first I went with glesys anyway because I didn’t need all that memory right away, and because I knew VPSs better.

But now, having an actual need for more memory, I went back to my bookmarks and studied those offers in depth. I found out that hetzner has got very good reviews and, even if I don’t like setup fees, I decided to try their EQ8 offer: For 89 euro/month, and 150 euro setup, you get a quad-core I7 CPU, 2x 1500 GB disks, “unlimited” bandwidth – that is, unlimited if you stay under 5 TB – and, most importantly for me,  a whopping 24 GB of RAM.

With those resources I could host all my different websites needing a VPS, both the Ruby on Rails ones like mortgagecalculator3.com and hashtagify.me which uses node.js, and still have a lot to spare for future needs. I could see only two drawbacks:

  • The need to manage the virtualization infrastructure myself
  • The risk connected with relying on a physical server that, of course, could break at any time.

After a quick search about open source virtualization technologies, I came to the conclusion that the first problem was easy enough to solve. There were many solutions to try that promise an easy setup; I was sure that at least one of them would have worked out.

The second one is inherently more serious. Real cloud services, like google app engine, are based on the idea that your software runs on unidentified, fully managed redundant hardware that you don’t have to worry about at all. A dedicated server is exactly the opposite. But most “cloud” offers you find on the internet are nothing more than VPSs with more flexible management services, prices and possibly an API (hence the quotation marks I use when referring to this flavor of “cloud”).

With these “cloud” offers you still get the full management of the physical host (server) by the provider, and in case of server failure they should guarantee you a smooth transition to a new server,  but what you get is still just a flexible VPS. The fail-over to a different host is something that wouldn’t be too difficult to organize yourself with a sensible backup policy, or, even better, a clusterized and redundant DB (if your data, as it very often is, resides in a DB that offers that kind of possibility).

Redundancy based on commodity hardware is exactly, from what we know, how Google manages its own cloud services. And the cheap dedicated server offers I found are based on commodity hardware, not on highly redundant, fault tolerant “server” hardware – I think that’s the main reason for their cheapness, considering that the network availability and service, at least for what I could read online and see myself for now, is very good at least for hetzner.

So, if it was possible to easily migrate VPSs from a broken server to a new one with a minimum downtime, or, even better, to have synchronized multiple VPSs on different physical servers, you would get the perfect solution for bootstrapped startups (like hashtagify) or anybody who needs some VPSs with a lot of resources for a very cheap price: This software redundancy would make up for the relatively low MTBF of the commodity hardware.

As it turned out, even with free and open source technologies migrating a VPS from one physical server to a new one is actually very easy. And, at least if you use Redis (like hashtagify.me does), it is also very very easy to have many synchronized DB instances both for scalability (if you need it) and hot failover.

So in the end I ordered my 24 GB dedicated server last Saturday, and on Monday I had my login data. I expected to use a fair amount of time to learn how to set everything up and move hastagify to my own “cloud”, but after only maybe 6/7 hours stolen here and there from my vacation everything was up end working fine. There is still enough work I’ll have to do to also migrate my Ruby on Rails applications to the new server, but being already able to clone a new instance of the hashtagify server with just a few clicks and for free is a very satisfying, albeit entirely geeky, sensation!

The way I did this is what I’ll write in part 2 of this already too long article. I guess that, when done in the right sequence, all the steps necessary to set up a virtual infrastructure like hashtagify’s would only take little more than half an hour, starting from scratch and using only free open source software. It’s not at all difficult, but as usual having to wade among all the different possibilities and a dozen of half-outdated how-tos is a pain, so I hope to relieve it from other people – at least until my own post will become outdated.

Cheers from Russia.

Disclosure: I asked hetzner if they have a referral program, but they don’t. If they had, I would have applied to it prior to publishing this post ;)

Redis or: How I learned to stop worrying and love the unstable version

Numbers on Twitter are huge. Even with just a 1% sampling of tweets, even working only on tweets with an hashtag, you reach big numbers very fast.

If you want to to offer a highly interactive and very responsive experience you’ll need to keep most of those numbers in memory, and well indexed. So you’re going to need a lot of RAM: I was aware of that when I decided to use Redis as my main DB, and I purposefully chose a parsimonious data structure and a “cloud” – whatever that means – solution that would allow me to increase the amout of RAM when needed without breaking my shoestring budget.

What I didn’t know was that, with just over one GB and a half of data, as measured by Redis itself, I was going to need double that amount of memory not to have my app crash (which happened twice yesterday). Multiply that amount for two servers for a minimum of redundancy, and you’ll need 6 GB of precious RAM for just 1.3 million tags worth of data…

In the past I had found that, after a restart, Redis would use much less memory than before. So, after the obvious quick fix to get back the service up and running – increasing the amount of RAM – I tried to do just that, using the load balancer to keep the service up while restarting the two istances. But the gains from a restart were very short lived, and I didn’t want to pay for that amount of memory and to have to worry that even that wouldn’t be enough during my upcoming 2 weeks trip to Russia.

Checking the info command output after the restarts, I noticed that what was increasing very fast after the new start was the mem_fragmentation_ratio, which started around 1.3 but quickly reached and surpassed 2. After a quick search, it turned out that there is a known memory fragmentation problem with Redis on Linux, caused by malloc.

I had already noticed the disquieting field in the past, but hadn’t realized just how bad the situation was. A ratio over two to one looked really too much to me, especially when a quick search on the internet revealed that a ratio of 1.4 was already considered abnormal.

Delving deeper into the issue, I found out that the fragmentation problem is especially bad with sorted lists, and hashtagify uses a lot of sorted lists: As a matter of fact, of the 4 million keys used right now, more than 90% are sorted lists!

Different versions of Redis were made to find a solution to this issue, but I didn’t find a “definitive” post/article/howto about which is the best one. On the other hand, I noticed that version 2.4, which is not final yet, is incorporating one of those solutions, the use of jemalloc, so I tried it on one of my servers.

Things immediately improved; it’s early to be sure, but for now after many hours the fragmentation ratio is only 1.02. Halving my memory requirements with just a simple version upgrade is not bad at all! After all, I should be able to leave for Russia without the fear of being bankrupt, or with a frozen server, at my return :)

I just wished it was easier to learn about this before, but this is the nature of open source software. Now I just hope that this post will help at least someone else not making my mistake of using the “official” 2.2 release.

Cheers.

UPDATE 2011-06-25

After 6 days, 2 hours and 4 minutes of uptime, during which 33,245,499 commands where executed, the fragmentation is still 1.02. I can now confirm that jemalloc rocks :)

Greetings from Russia