The past few weekends I have been working on a personal side project (20% time project à la El-Goog) called Quanthunt with a friend of mine, Eileen Chan. It is basically an online trading platform where you compete with others – with the catch being that all trades can only be done programmatically using an API. I started on it mainly for fun but also to try my hand at something quite different from the things I have worked on previously. It is also pretty technically interesting as it requires the processing of a large number of trades from multiple users and tracking their positions in a leaderboard in real time.
Soon after I got started writing the code for Quanthunt, I hit a roadblock. I wasn’t too sure on going about implementing the trading engine (the part that executed the trades locked in by the users according to the latest market rates) in a simple, efficient manner. I could have done a while(1) loop iterating through a MySQL table but that was clearly a very naive way.
A Rabbit And A 4-letter Acronym
So it was back to doing background research. After many google queries and reading some excellent posts on Hacker News, I started to get an inkling on how to proceed. I learned about messaging queues and that the gold-standard was TIBCO and that most of the financial institutions used their software. Being a free software zealot I started searching for open source messaging queues. I soon discovered AMQP and read about RabbitMQ. It was my first AHA! moment. Now I knew how to go about implementing the trading engine.
But something else was going on in my mind. I constantly kept relating all the new stuff I was learning with the stuff I was working at Semantics3. For example, one thing which I wanted to improve was our web crawler, which was implemented in a single-machine, single-process (monolithic) manner. Hadoop was a potential option but that was way overkill and didn’t really suit our needs. This was my second AHA! moment. The message queue was exactly what I needed to write a distributed web crawler that was completely tailored for our needs.
This got me really excited and I started reading extensively about message queues. I soon discovered ZeroMQ (which turned out to be more of a library than a framework) and also learned about how Redis could be used as a job queue (Small advice: Read up on redis. Its got so many different features and capabilities. It’s much, much more than a simple key-value in-memory store!) Later, I found about dedicated job queue servers and learned about Beanstalk. Then I discovered Gearman. This was exactly what I needed and what I was unknowingly searching for.
With this new found knowledge I was able to completely redesign our core architecture from one that was centralized to one that was highly distributed and could potentially scale up and down according to our needs. From big fat machines doing all the work, we could now do with just using a large number of smaller machines. Not only has this made our infrastructure more robust, it has also brought us significant cost savings.
Now let me come to my point at hand. Looking back it seems very obvious. But hindsight is often 50/50. If I hadn’t worked on my side project I don’t think I would have gotten to know about and appreciate messaging queues, their various different implementations (Gearman, AMQP, RabbitMQ, ZeroMQ,etc..) and the possibilities that they offer (at least for some time to come). Except for a bunch of load balancers, our infrastructure would have continued to be fat and monolithic
Quanthunt is still a work in progress (I ended up using Redis to handle the queuing and execution of trades) and should be done in the next two weeks. I already have a couple of other personal weekend projects that I have lined up, which I plan to undertake once I am done with it. The first one is to build a quadcopter completely from scratch, including writing the control system. Part boyhood fantasy and part not to lose my roots in hardware engineering. In fact, I have been discussing this with a bunch of friends for quite some time now. Who knows what skills/lessons I might learn out of it, which I could apply at Semantics3?
Another project, which I have in mind, is to build a clone of facebook chat using Erlang/OTP (I read a lot about it as I was researching on RabbitMQ). I have always wanted to learn a functional programming language and I feel that Erlang is the best choice as it seems to be a very functional (pardon the pun) language to build practical web applications. Also as Erlang is designed for writing distributed applications, this just might be the best language to implement the next version of our web crawler.
Words of Wisdom
So, get started on a weekend project. Pick a project in an area that you don’t have much background knowledge or experience in or one that requires a different stack of technologies from what you are familiar with. Use a different kind of database (eg: try a project using a graph database like Neo4j) or a different library/framework or even better, use a completely different language. It will not only make you a better engineer, it might just help you understand and improve your core work in different and better ways.