Got half an hour to kill? Make it useful to the community!

TranslatewikiRecently I was looking for an open source engine to create a stackoverflow-like website in Italian about mortgages. The best alternative I found is Shapado, which is also rails-based, always a plus for me.

Anyway, on their website I found out that they’re going to release a new version soon, and that it still missed an Italian translation. After a little investigation, I also found out that everybody can contribute to that translation on translatewiki.net

I had never heard of that website, which is part of MediaWiki. I decided to give it a try, and after subscribing and waiting a little for approvation, I started translating the latest version of Shapado into Italian.

I would never have believed it, but it was kind of fun! The web translation interface is really very well-thought, the google and bing suggestions are quite helpful (and sometimes funny), and it only took me a couple of hours to translate all the parts that weren’t already translated from the previous release.

The best part is that this is something that you can do whenever you have some free time and a web connection; you are immediately productive, and seeing the list of missing translations becoming shorter by the minute really gives you a good feeling of accomplishment.

So, if you know one language different from English and have a little time to spare, what are waiting for? There are already 20 open source projects on translatewiki waiting for your help. Give them a hand!

Saved by the magical powers of GIT

Yesterday evening I was very happy: My first mobile app, after a little more than a week, was practically finished. It just needed some little finishing touches.

At that point I decided to take a look at it on my phone, after a few days during which I had only tested it on the Android emulator, which allows much faster debugging cycles. So I connected my phone, started deploying the app, waited, started the app… and… crash! It didn’t even open the first screen.

At 12:10 AM and with a long day at my day job waiting for me the next day, I tried to redeploy the app; then, to restart the development environment and redeploy the app; and, last try, to restart the phone and my pc and then redoploy the app. To no avail: It worked perfectly well on the emulator, but on my phone it just wouldn’t start.

So today, after coming home from work, I prepared myself for a LONG debugging session. My plan was to find the latest version on GIT that worked on the phone, and understand what went wrong after that – hoping that there was only one thing that went wrong.

First, I started going just a couple of commits back – I commit very often – to check that the problem was there. It was. Then I tried to remember when I last tried the app on my phone – it had always worked – and checked a version from around that time. I was a little apprehensive by now… but it worked!

I then started the methodical search for the time when the problem came out, so I tried the commit at the middle of the road between the working version and the not-working one. It worked. Then, I again found the middle of the road, and tested. OK. And again: OK. And again: OK! This was just two commits before the not-working one.

At this point, I went up one commit only. It worked. Another one, the last one before my first test (which failed). It worked. Was it possible that, just by chance, I had tested first exactly the commit where the problem arose? Well, I tested it again, to be sure and… surprise, surprise… it worked!! I’m pretty sure that, however much Dijkstra might have liked them, I didn’t use any non-deterministic programming techniques, so I was a little flabbergasted.

Next try, I checked out the master; deployed it; and it worked!

How can I explain it? I’m sorry, but I just can’t. And, as tomorrow I’m going to leave for a short vacation, I don’t even have the time to find out now. I suppose (and hope) that it was just some weird problem in the Titanium compilator, and not in Android itself. If it isn’t either of the, well, I guess I’ll have to suggest adding “Magic” to the list of GIT features.

App Development: Appcelerator vs (Phonegap + (JQuery Mobile vs Sencha))

Mobile apps are all the rage now, but to create cutting edge solutions on mobile platforms, at least for now, you need to adopt native development. So, as I have an idiosyncrasy  for Java and never developed on (not even used) an Apple platform, and lacking a real motivation to start, until now I kept myself away from this field.

Last week though, after reading once again a couple of inspiring article on Hacker News, I started thinking that trying out one of the various cross platform, javascript based solutions for mobile development on a lightweight app could be a pleasant distraction from my current main (night) projects. I even started fantasizing that, if it was really possible to create a web app using just “html5″ technologies, I could have been able to create a complete app (without too much polish) over just one weekend.

A fairly simple yet useful candidate for a lightweight app was to create a mobile front end for my adjustable rate mortgage calculator. Since all the complicated financial stuff is done on the server, and I already created a simplified web service for the free embeddable version (widget), if I could really use the usual web technologies on the phones my “aggressive” (ie, crazy) deadline seemed feasible.

So this is why, on a Friday evening after my day job, I started studying in depth all those solutions that I had just lazily kept an eye on during the last months. I think that what I learned can be interesting to others, so I’ll expose my findings here.

The contenders

My requirements for the ideal development platform were simple: To be as close as possible to usual web development, good enough for the job at hand, able to work on both Android and the iPhone, and possibly free or at least very cheap (as this was just a side show).

After some hours doing a literature review (ie, googling around), I came up with a short list of three candidates:

  1. Phonegap + JQuery Mobile (JQM)
  2. Phonegap + Sencha Touch
  3. [Appcelerator Titanium]

The order of the above list isn’t random. The first combination (Phonegap + JQM) was, on paper, my perfect solution: Based on JQuery, it really is web development for the phone, and it would have allowed me to even reuse part of my current code base.

Sencha Touch is another “html5″ solution (meaning that it creates html GUI objects, not native ones), and from what I read it is quite faster than JQM, but on the downside GUIs have to be created procedurally – something I started hating with the MFC, and never got myself to like.

Last, Appcelerator Titanium is javascript based, but it uses native controls. This should guarantee the best performance of the three, but, in addition to having only procedural GUI programming like Sencha, this also means that things sometimes have to be different on Android and on iOS, so this isn’t cross-platform of the “write once-run anywhere” kind. And, last but not least, I had found an alarming article about some nasty memory leaks that seemed almost a show stopper (that’s why I used the square brackets on the list – at one point Appcelerator was more out of the list than in).

There were also other interesting propositions (like rhomobile – I love Ruby), but they looked less viable for one reason or another and I won’t go over them here.

Testing

What I wrote until now is only based on my “literature review”. With my preferences made up, I decided to try the Phonegap/JQM solution hands on.

Setting everything up was easy enough. The problems came when I tried the “kitchen sink” test app on my Android phone (HTC Wildfire). The performance was terrible, and the rendering often incorrect. The same happened with a showcase app downloaded from the Android Market.

So, that was it for me. As much as I wanted this solution to work, my conclusion is that it is just not production ready yet, at least not for my goals and not for the current generation of not high-end phones. Good bye, reuse of existing code…

I then went on to Sencha. This time, I wanted to check the performance before doing any installation. So I looked for any success story for Sencha on Android, and luckily I found a very recent and very relevant article on techcrunch. When I read that “… it wasn’t easy, performance was a huge issue [and] at times it took days to figure out, and simple things like too much logging and poorly constructed ‘for’ loops actually made our app unusable during our journey.” I didn’t feel very optimistic.

So I installed on my phone the Diary app from the android market. Better than JQM, but still sluggish and with layout problems on my screen size too. And this after a long and painful optimization work, according to the article… not for me, not now. So, Sencha too wouldn’t fit my requirements.

At this point I was pretty disappointed, as I saw my fantasy of being able to complete an app over the weekend slipping away from my grasp. But, not wanting to give up, I decided to look better into Appcelerator: At stake was not only my simple app, but the possibility to develop more interesting ones in the near future!

First of all, I decided to take a look at the existing Titanium-based apps. Among their showcases there are many more iOS apps than Android ones, but I was able to install a couple of good examples on my phone; GetGlue, in particular, looked and felt really good.

But what about the memory leaks? After some more search, I found out that you can avoid them following some guidelines (see the comments in the alarming article) and, if still necessary, using some workarounds that would be perfectly acceptable for my current needs. I’m also pretty confident that in time Appcelerator will fix these problems (if they want to stay in business).

Conclusion

In the end, I downloaded the Titanium Studio and started coding. So far things went pretty smoothly, and, even if I didn’t create my full app over one weekend, I already solved 70% of the technical obstacles – it can already get the needed input from the user, send it to the server, get the response, and show some data in simple pure html bar graphs. What is still missing is a lightweight enough solution to show the payments time series, but I’ll start on that as soon as I have time.

What I learned from this very industrious weekend is that the news about the impending death of native app coding are greatly exaggerated, and that (surprise, surprise!) a LOT of what you read all over the internet about the marvels of html5 development is BS. But I also learned that, even if there are no silver bullets flying around, developing for mobile in a pretty much cross platform way with just one programming language is indeed possible, and that procedural GUI programming, on the simple UIs that you can create on phones, isn’t as painful as I remembered from developing fully fledged desktop applications.

US patents and European startups: More threat or more opportunity?

Martin Bryant on The Next Web gives his take about the mess with US patents and European startup and tech sector in general, about which I wrote in a recent post.

From his article it looks almost as if the patent threat was worse for European businesses than it is for US ones, because it is easier (or less risky) for US business to come to Europe than the opposite. But US startups have to face that threat in their home market, and the situation there isn’t any better for them than it is for their foreign competitors. So they could never get the chance to come to Europe, if they don’t even get the opportunity to start.

I still think that the threat is bigger for US startups than for EU ones. Maybe that’s not true for big businesses, but I’m not convinced of even that. We’ll see :)

European startups and US software patents: Threat or Opportunity?

For a long time the intellectual property of computer software could only be protected by the means of copyright laws. This meant that a developer who had an idea and wanted to make it a reality could do so without any fear: As long as he didn’t actually copy from someone else’s code, there was nothing to worry about.

Business sharkThis changed when patents started to be granted for software. With a software patent you are barred from reinventing something that someone else already invented and patented, even if you do it starting from scratch and without knowing anything about that patent. This means that you may be forced to pay to use your own idea, if someone else had it before you and already patented it – that is, if the patent holder even wants to license it to you.

To make a long story short, since 1996, when the patentability of software was officially regulated by the US Patent Office, the trickle of software patents issued in the US has become a deluge – over 16,000 just last year, for an accumulated number well over 100,000. And most of the “inventions” that those patents cover are far from revolutionary or from being something requiring years of research; on the contrary, most of them are very simple if not absolutely obvious. Something that most average programmers could reproduce by chance when solving a problem on their own.

So, if you have an idea and implement it in software today, it is quite possible that you will break at least one of those patents, without knowing it. This has been true for some time, but now there are more and more companies that are trying to profit from their patents by threatening to sue other companies, and even individual programmers, who might be infringing their portfolio – even if there is no competition whatsoever with the infringer.

This is bad news for developers, bad news for entrepreneurs, and bad news for consumers. The only good news is that the situation isn’t as bad everywhere – for example, here in Europe, at least for now, software patents are much more difficult to obtain. They are usually only granted when related to some industrial process, and the patentability of software “as such” is explicitly prohibited.

This means that European startups don’t have to worry so much about software patents in Europe itself, but if they want to get to the US market they fall under the same threat as their American counterparts; and to become a global player in software, success in Europe is not enough: You need to be validated in the US.

As a matter of fact, Europe has been the birthplace of many successful software startups, some of which have become global hits: Rovio and Spotify (which incidentally got sued the moment it landed in the US) just to cite a couple of the most recent ones, Skype and Mysql to talk about the most famous. But that’s nowhere near the number and the level seen in the US.

There are many theories about this difference, and as far as I know they never include a lack of talent as an explanation. The problem lies somewhere else. In my opinion, one of the biggest disadvantages European software startups have to face if compared to US – and especially Silicon Valley – new ventures is one related to visibility and credibility.

Success begets success. The attention of the tech world is focused on the US and even more on Silicon Valley. The simple fact that a startup is based there brings a higher visibility and bigger credibility, easier publicity, more funding. It’s not for nothing if so many foreign startups and entrepreneurs move to the Valley to seek their success.

As the pernicious effects of software patents start weighing in more and more, though, this could change. I’ve got nothing against the Silicon Valley or the US – I even have relatives living there – but this self-inflicted problem could create a big opportunity for European developers and entrepreneurs.

Europe is already the richest market for software in general, but its linguistic fragmentation, more conservative consumers and a tradition of US innovation make it less prominent than the US one. If the European Parliament resists the pressures to relax the European rules on software patents, though, and if the USA doesn’t change the track it put itself on, Europe could become the reference market for innovative software, especially for new web and mobile apps that would be riskier and riskier be bring to the market in the USA because of the threat of patent litigation.

An American market practically closed to innovators that don’t have very deep pockets and/or thousands of protective software patents to answer the threat from lawsuits would mean a poorer global market for everybody. But in that poorer global market European startups would be comparatively much better positioned than today.

And if lots of cool new things will start to happen in Europe,  startups based in Europe could gain the upper hand when it comes to visibility and credibility. There would still be problems of fragmentation and consumer attitude, but if the tipping point is reached that could be superseded by other effects. Even some American entrepreneurs, scared by patent litigation at home, may decide to move here, inverting the traditional trend of European startuppers who move to the US, and this influx of talent would create an even better environment and contribute to the change.

If not Europe, who could bear the baton of software startup innovation if the US were forced to pass it up? China or India could be two candidates, but for now Europe looks much better positioned to me, if anything else because the best software talent in those two giant countries is already soaked up by other booming industries.

In the end, I wish that the US will be able to sort out its growing mess with software patents soon, not just out of disinterested generosity but also because that would be better for the global economy. But if they stay on their current course – and it looks VERY difficult to me that that course will be changed substantially, considering the staggering amount of money that has already been invested in software patents – then we in Europe should try to be as ready as possible to receive the baton.

This should include lobbying to get even stricter rules for software patents, creating stronger ties between innovators and other such systemic measures. But, in my opinion, it could and should start first of all creating an awareness of the chance that European developers and entrepreneurs have in front of themselves.

The baton may be there for grab in a short time – a couple of years. What are we waiting for then? We should already be warming up.

Are hashtags too geeky for Google+? A meditated answer to Loic Le Meur

I was able to join Google+ last week, and found it very interesting for three reasons:

  1. Circles
  2. There are mostly early adopters, and this generates some very interesting conversations
  3. It isn’t blocked at my office yet, so I don’t have to resort to dirty tricks to use it like I have to do with facebook – yet

Is it all good then? Of course not, there are many useful features that could be added. But the one I really miss is tags. I fell in love with tags since I first started using del.icio.us (before the yahoo-less-licious days) and were delighted to see them spreading to more and more uses. It’s no coincidence that I created a website completely devoted to (hash)tags!

Today I found out I’m not alone when I stumbled upon a post on G+ about the use of hashtags on G+ itself. After all, the use of hashtags on Twitter started exactly because there wasn’t a tag feature there, like on G+ now. And, as I discovered later, the same Chris Messina who first proposed their adoption on Twitter did the same for G+ just four days ago.

Not everybody agrees, though. Loic Le Meur commented on that same post where I first read about the debate that “hashtags are geeky and they shouldn’t be added to G+”.

I answered that hashtags are geeky, but simple tags aren’t. But, on second thought, the real answer should have been: Hashtags can’t be “added” to G+; they weren’t even “added” to Twitter, only half-heartedly supported after their use became widespread.

So the real question should be: Should tags be added to G+? I definitely think they should; is there any better way to allow the discovery of interesting conversations? And: Are tags too geeky for G+? Considering that they’re also used on Facebook, I guess they really aren’t.

So please, Google, add our beloved tags to G+; and, while you’re at it, also add a sampling streaming API for public messages so that all sorts of interesting research could be done and, why not, so that I could also add the data from G+ to hashtagify.me :)

OpenVZ how-to: Your own free VPS infrastructure in 10 minutes

As I wrote in part 1 of this series of posts, creating your own dedicated Linux “cloud” for free on a root server is easier than you might think. It for sure has been easier for me than I thought before actually doing it. In the first part of this post I’ll explain my choices, but if you want you can just skip to the how-to part.

My first step was to choose a virtualization technology. There are many free open source solutions that allow you to create and manage a virtualization infrastructure on a Linux host; these solutions can broadly be divided into two different approaches:

  • OS Level (aka kernel level) virtualization
  • Full virtualization

To simplify things, we can say that with full virtualization technologies you can have virtual machines that run any operating system that would work on the virtualized hardware, while with kernel level virtualization you can only have containers (which aren’t full virtual machines) that are compatible with your kernel.

In my case I was only interested in Linux virtual servers on a Linux host, so both approaches would have worked. On a performance and resource usage level, kernel virtualization imposes less overhead, but doesn’t guarantee the same level of isolation between the virtual servers. On my own private cloud isolation isn’t so much a concern, so I decided that kernel level virtualization would be my first choice, if it would result easy enough to set up and manage.

OpenVZ is the most used solution for free kernel level virtualization on Linux – as a matter of fact, most low cost VPS solutions offered on the internet are based on it. So I did my research about how to install and manage it. As usual for free open source technologies, there are a plethora of solutions; in the end I chose two possible options as the most promising:

  • Using a full stack solution, either Proxmox or OpenNode
  • Installing OpenvVZ and a web control panel myself

I would have liked to use one of the full stack solutions, both because of the ease of installation they promise, and because with those it would have been easier to also use full virtualization, if needed, at a later time. But both Proxmox and OpenNode offer an ISO installer as their main option, and I couldn’t use that on my hetzner EQ root server; moreover, even if I could, I then would have had to configure the network myself. This is why I decided to install OpenVZ myself on one of the pre-configured Linux OS images that were available for my dedicated server.

The Linux version I know best is Ubuntu Server, as this is what I used on all my own virtual servers until now. The second one is CentOS, which is the default at my day job – but there I don’t do system administration. So I first tried to install OpenVZ on Ubuntu server, and then on CentOS – installing a new OS image in my dedicated server only takes a few minutes, so making different tries was very easy.

From my search on the internet and my own attempts I found out that installing OpenVZ on either of those two OSs isn’t as easy as it could be – or, at least, as easy as I wished it to be, as I found some problems in the repository versions of OpenVZ (these of course could have been fixed at the time of reading this article). From what I read, Debian is the most supported Linux flavor for OpenVZ, so that’s what I tried next.

The recipe

Installing, configuring and using OpenVZ on Debian was really very easy. The following are the detailed steps I used. After writing a draft of this post I tried them and it took less than 10 minutes from start to finish.

  1. Start with a clean, minimal Debian 6.0 installation – I used the pre-built image I found for my hetzner EQ server
  2. Login as root
  3. Install OpenVZ with its management tools and its special kernel, 64-bit version (this is good for both AMD and Intel 64 bit CPUs):
    apt-get install linux-image-openvz-amd64 vzctl vzquota vzdump
  4. Create a symlink to easily find the OpenVZ files:
    ln -s /var/lib/vz /vz
  5. Edit sysctl config
    nano /etc/sysctl.conf

    to make it like this:

    ### Hetzner Online AG installimage
    # sysctl config
    net.ipv4.ip_forward=1
    net.ipv4.conf.all.rp_filter=1
    net.ipv4.icmp_echo_ignore_broadcasts=1 

    net.ipv4.conf.default.forwarding=1
    net.ipv4.conf.default.proxy_arp = 0
    kernel.sysrq = 1
    net.ipv4.conf.default.send_redirects = 1
    net.ipv4.conf.all.send_redirects = 0
    net.ipv4.conf.eth0.proxy_arp=1

  6. Use the new options:
    sysctl -p
  7. Restart the server:
    reboot
  8. Download a pre configured container image for my VPS – I was interested in ubuntu server:
    cd /vz/template/cache
    wget http://download.openvz.org/template/precreated/ubuntu-10.04-x86_64.tar.gz

    You can find more images here: http://wiki.openvz.org/Download/template/precreated

At this point you are ready to use your VPS. But I wanted to make management easier, so I decided to immediately install a web panel to control OpenVZ. My choice was OpenVZ web panel, which you can install with these commands:

cd /root
wget -O – http://ovz-web-panel.googlecode.com/svn/installer/ai.sh | sh

There you go! Your web panel is already started and you can find it at port 3000 of your server.

To create your first VPS at this point you only need an IP from your provider and use the web panel:

  1. Open http://<yourip>:3000/ in your browser
  2. Login using admin/admin – don’t forget to change the default password!
  3. Click on “Physical Servers->localhost”
  4. Click “Create virtual server”
  5. Input a server ID – use a number over 100, for example 101 – IDs from 1 to 100 are reserved and you’ll have problems if you use one of them
  6. Choose your OS template – you’ll find only ubuntu 10 if you followed my instructions
  7. Choose your server template or leave the default
  8. Type the IP you got from your provider (I’m talking about an IPv4 here, but on a next post I’ll write about using IPv6)
  9. Choose a host name, like “myhostname”
  10. Click on “Additional settings” and choose how much RAM, disk and CPU you want for the new machine
  11. Choose a password and click “create”

That’s it! Your VPS is ready to use. Just connect to it via ssh, and you can use it as you wish, just the same as if you bought it from any VPS provider. You can also clone your new server with minimal effort from the web panel: You just need to select it, click Tools->Clone, type the new IP for the clone and go on. If you don’t have one more public IP from your provider, and you don’t need it, you can use a private one like 10.10.10.2

If you’re going to do anything serious with your virtual cloud of VPSs, though, there are at least two more things you’ll want to set up: Backups and a firewall.

A backup strategy is really important here, because you’re working on your own physical server now, and if it fails and any data is lost you’re on your own. Moreover, if you are using a low cost dedicated server like mine, you should always keep in mind that they aren’t based on highly redundant, fault tolerant server hardware. There is still software RAID 1 for the disks, but that’s about it for redundancy.

The good news is that backing up a whole VPS is very easy with OpenVZ. There are three different ways to do it:

  • Simple dump – long downtime: The VPS will be stopped while the backup is going on
  • Suspend mode – very short downtime: The system will suspend the VPS and resume it a few seconds later
  • LVM2 snapshot – no downtime: A snapshot is created without suspending the VPS at all

The third method, of course, is the best one. But it requires you to configure your file system in a special way, and for most situations the few seconds of downtime of the suspend mode are acceptable – they are for me. So I went this way.

To backup a running VPS (container) from the command line (or a batch script) with the second method you just type:

vzdump –suspend NNN

where NNN is the ID of the server you want to backup. This will create a file named vzdump-openvz-NNN-YYY_MM_DD-hh_mm_ss.tar in the /vz/dump folder.

By restoring this backup (with vzrestore) you get back your full VPS in the state it was when backed up. You just need to move the backup file to a secure location (preferably after compressing it – with bzip2 you could have a 75% reduction of the file size); In my case I got a 100 GB FTP backup space from hetzner where I move my backups.

To protect your server/s you should also set up a firewall, at least for the host: By gaining control of it, an hacker could gain control to all your VPSs. This isn’t OpenVZ specific though, so I’ll not cover it in this post.

Conclusion

Setting up my own private “virtual cloud” has been much easier than I thought – and much cheaper. The most time consuming part was finding the information on various wikis, blog posts and other sources. With this post I hope to give something back to the community by packaging the updated information in an hopefully easy to use format, even if I’m aware that all this will become outdated at the speed of the internet.

Cheers

My own dedicated “cloud” with 24 GB of RAM for 89 Euro (127 USD)/Month – part 1

As I already explained in my last post, hashtagify’s servers need to keep most of the data they handle in memory: This is the only way to answer an AJAX request about an hashtag so quickly that the user perceives the interaction as instantaneous and engaging. 

The downside to this, of course, is that I need servers with a lot of memory, and those usually don’t come very cheap. To keep the costs down I had to find a cheap “cloud” Virtual Private Server provider that would allow me to add RAM without also having to pay for more CPU, disk and bandwidth that I didn’t need: With my usual otherwise excellent VPS provider, linode.com, the amount of memory I needed would have been just too expensive for my budget.

After some search I found what I was looking for in glesys.com. For the first month or so this solution worked very well, as I could increase the RAM as needed without having to restart my server, and I was even able to double it at one point for just one hour to make a very memory intensive calculation, paying for just that hour of use. But hashtagify is becoming a very memory hungry application, especially now that I’ve also started collecting users and URLs data, and I wanted to try an even cheaper solution.

During my search for a cloud service I had stumbled upon a couple of tempting dedicated unmanaged server offers with boatloads of RAM and a comparatively cheap price; namely, they were from server4you.comgiga-international.com and hetzner.de – curiously, they’re all from German-speaking countries, but who cares. At first I went with glesys anyway because I didn’t need all that memory right away, and because I knew VPSs better.

But now, having an actual need for more memory, I went back to my bookmarks and studied those offers in depth. I found out that hetzner has got very good reviews and, even if I don’t like setup fees, I decided to try their EQ8 offer: For 89 euro/month, and 150 euro setup, you get a quad-core I7 CPU, 2x 1500 GB disks, “unlimited” bandwidth – that is, unlimited if you stay under 5 TB – and, most importantly for me,  a whopping 24 GB of RAM.

With those resources I could host all my different websites needing a VPS, both the Ruby on Rails ones like mortgagecalculator3.com and hashtagify.me which uses node.js, and still have a lot to spare for future needs. I could see only two drawbacks:

  • The need to manage the virtualization infrastructure myself
  • The risk connected with relying on a physical server that, of course, could break at any time.

After a quick search about open source virtualization technologies, I came to the conclusion that the first problem was easy enough to solve. There were many solutions to try that promise an easy setup; I was sure that at least one of them would have worked out.

The second one is inherently more serious. Real cloud services, like google app engine, are based on the idea that your software runs on unidentified, fully managed redundant hardware that you don’t have to worry about at all. A dedicated server is exactly the opposite. But most “cloud” offers you find on the internet are nothing more than VPSs with more flexible management services, prices and possibly an API (hence the quotation marks I use when referring to this flavor of “cloud”).

With these “cloud” offers you still get the full management of the physical host (server) by the provider, and in case of server failure they should guarantee you a smooth transition to a new server,  but what you get is still just a flexible VPS. The fail-over to a different host is something that wouldn’t be too difficult to organize yourself with a sensible backup policy, or, even better, a clusterized and redundant DB (if your data, as it very often is, resides in a DB that offers that kind of possibility).

Redundancy based on commodity hardware is exactly, from what we know, how Google manages its own cloud services. And the cheap dedicated server offers I found are based on commodity hardware, not on highly redundant, fault tolerant “server” hardware – I think that’s the main reason for their cheapness, considering that the network availability and service, at least for what I could read online and see myself for now, is very good at least for hetzner.

So, if it was possible to easily migrate VPSs from a broken server to a new one with a minimum downtime, or, even better, to have synchronized multiple VPSs on different physical servers, you would get the perfect solution for bootstrapped startups (like hashtagify) or anybody who needs some VPSs with a lot of resources for a very cheap price: This software redundancy would make up for the relatively low MTBF of the commodity hardware.

As it turned out, even with free and open source technologies migrating a VPS from one physical server to a new one is actually very easy. And, at least if you use Redis (like hashtagify.me does), it is also very very easy to have many synchronized DB instances both for scalability (if you need it) and hot failover.

So in the end I ordered my 24 GB dedicated server last Saturday, and on Monday I had my login data. I expected to use a fair amount of time to learn how to set everything up and move hastagify to my own “cloud”, but after only maybe 6/7 hours stolen here and there from my vacation everything was up end working fine. There is still enough work I’ll have to do to also migrate my Ruby on Rails applications to the new server, but being already able to clone a new instance of the hashtagify server with just a few clicks and for free is a very satisfying, albeit entirely geeky, sensation!

The way I did this is what I’ll write in part 2 of this already too long article. I guess that, when done in the right sequence, all the steps necessary to set up a virtual infrastructure like hashtagify’s would only take little more than half an hour, starting from scratch and using only free open source software. It’s not at all difficult, but as usual having to wade among all the different possibilities and a dozen of half-outdated how-tos is a pain, so I hope to relieve it from other people – at least until my own post will become outdated.

Cheers from Russia.

Disclosure: I asked hetzner if they have a referral program, but they don’t. If they had, I would have applied to it prior to publishing this post ;)

Redis or: How I learned to stop worrying and love the unstable version

Numbers on Twitter are huge. Even with just a 1% sampling of tweets, even working only on tweets with an hashtag, you reach big numbers very fast.

If you want to to offer a highly interactive and very responsive experience you’ll need to keep most of those numbers in memory, and well indexed. So you’re going to need a lot of RAM: I was aware of that when I decided to use Redis as my main DB, and I purposefully chose a parsimonious data structure and a “cloud” – whatever that means – solution that would allow me to increase the amout of RAM when needed without breaking my shoestring budget.

What I didn’t know was that, with just over one GB and a half of data, as measured by Redis itself, I was going to need double that amount of memory not to have my app crash (which happened twice yesterday). Multiply that amount for two servers for a minimum of redundancy, and you’ll need 6 GB of precious RAM for just 1.3 million tags worth of data…

In the past I had found that, after a restart, Redis would use much less memory than before. So, after the obvious quick fix to get back the service up and running – increasing the amount of RAM – I tried to do just that, using the load balancer to keep the service up while restarting the two istances. But the gains from a restart were very short lived, and I didn’t want to pay for that amount of memory and to have to worry that even that wouldn’t be enough during my upcoming 2 weeks trip to Russia.

Checking the info command output after the restarts, I noticed that what was increasing very fast after the new start was the mem_fragmentation_ratio, which started around 1.3 but quickly reached and surpassed 2. After a quick search, it turned out that there is a known memory fragmentation problem with Redis on Linux, caused by malloc.

I had already noticed the disquieting field in the past, but hadn’t realized just how bad the situation was. A ratio over two to one looked really too much to me, especially when a quick search on the internet revealed that a ratio of 1.4 was already considered abnormal.

Delving deeper into the issue, I found out that the fragmentation problem is especially bad with sorted lists, and hashtagify uses a lot of sorted lists: As a matter of fact, of the 4 million keys used right now, more than 90% are sorted lists!

Different versions of Redis were made to find a solution to this issue, but I didn’t find a “definitive” post/article/howto about which is the best one. On the other hand, I noticed that version 2.4, which is not final yet, is incorporating one of those solutions, the use of jemalloc, so I tried it on one of my servers.

Things immediately improved; it’s early to be sure, but for now after many hours the fragmentation ratio is only 1.02. Halving my memory requirements with just a simple version upgrade is not bad at all! After all, I should be able to leave for Russia without the fear of being bankrupt, or with a frozen server, at my return :)

I just wished it was easier to learn about this before, but this is the nature of open source software. Now I just hope that this post will help at least someone else not making my mistake of using the “official” 2.2 release.

Cheers.

UPDATE 2011-06-25

After 6 days, 2 hours and 4 minutes of uptime, during which 33,245,499 commands where executed, the fragmentation is still 1.02. I can now confirm that jemalloc rocks :)

Greetings from Russia