In this post I am going to talk about how we built the backend infrastructure to support our Products API, Semantics3’s core product. This post can also be seen as a follow up or maybe even a reply to a post some time back by Zemanta, which dealt with building public APIs.
In that post, the author suggests using 3rd party API support vendors like 3Scale and Mashery to handle all API backend requirements. We seriously considered those two services and concluded that they weren’t appropriate for us because:
- Our API is our core product offering and we didn’t feel comfortable outsourcing such a critical part to a 3rd-party service.
- Those services aren’t cheap.
- We are engineers. We love to build things (while keeping a wary eye on reinventing the wheel).
Our core product is our Products API, which allows developers to get immediate access to millions of products data which is constantly updated. A sample query to the API may read: “return LCD televisions with price >= USD400, with length >= 65cms of brand Samsung”.
One can think of an API as a utility service. A paid API is essentially the sale of some sort of utility, either on a usage-based model or a monthly subscription model. In our case, the utility which we are selling is realtime access to products data.
In this post, I am going to focus mainly on the technical aspects of the administrative parts of running such a service. “Administrative parts” includes the delivery, the metering, the billing of customers, authentication of users, etc. How we built the actual utility service is a discussion for another day.
It took three of us (Govind, Vinoth and myself), three weeks to build the API support infrastructure. In terms of work allocation, Govind worked on the actual piping of the data. Vinoth worked on the billing and frontend analytics dashboard. I worked on the architecture, authentication, throttling and metering of customers.
So, what the are some of the things you need to consider when building your own paid API offering?
0. Design Of The API
REST-based APIs have become the defacto standard, so you probably want to build one on these principles. Beware, it’s very easy to design an unREST-ful API so do spend lots of time planning your endpoints. These resources from 3Scale helped us better understand the technicalities of desining a RESTful service. JSON has also become the most popular delivery format, so ensure that you supports that.
1. The Language/Platform To Build The API
The language/platform you pick is of critical importance. You’d ideally want to choose something that can handle a large number of concurrent requests and scale well. Our original plan was to go with Perl, which is our predominant language of choice. However, after some investigation, it didn’t seem to be a great option (Plack unfortunately is lacking in good documentation). We evaluated Python (Tornado) and Node.js and eventually chose the latter. This is because Node.js has baked in async functionality, native JSON support (duh) and a great community behind it. Our API is built using the really good Restify framework.
Two other language which you may want to consider, are Golang and Erlang. On hindsight, I probably would have picked Golang.
2. API Key Generation and User Management
You need to have a robust system of generating keys and secrets, since these credentials are used to authenticate customers who access your service. It has to be secure, unique and one-way (not decipherable). Our algorithm for generating keys uses the base64 encoding of the output of some well known hashing algorithms of user details and a random crypto number.
We decided to go with OAuth v1.0 2-legged as our primary authentication scheme. The other authentication scheme which we support, is basic authentication (just send in a http request header with your key present), but its restricted to only our test endpoints.
Since Node.js didn’t have a suitable OAuth server library, we ended up writing our own (we will open source it some point in the future). We had to then test it with all the popular oauth client libraries for all the major scripting languages, Perl,Python, Ruby, PHP and Node.js.
This is the most critical part of a paid API offering. Metering is used to track exactly how much resources each customer has used. It’s also the starting point for debugging your system.
When designing your system, try to capture as much information as possible from each API query. For each request made to our API, we log 25 different parameters related to the call, giving us more than enough data to hunt down even the hairiest of bugs. This information can later come in handy for analyzing your customers’ usage patterns — e.g.: How many requests are being made, how frequently? Which resources are requested for most often?
All API calls are logged on a Mongodb server. We then run map-reduce jobs to aggregate the number of calls made by each api key, to determine daily usage of each of our customers.
5. Throttling API usage
API throttling is very critical, because you don’t want your service (especially your free plan) to, bluntly put, become a free-for-all unlimited buffet service. We use the leaky bucket algorithm to throttle API requests based on the tier of the plan. [E.g.: our free plan is capped at 1000 calls for any given 24-hour period, from the time the first call was made.] We use a patched version of an implementation of this algorithm, which is available in the Restify API framework which we use.
Here is a great Stackoverflow thread that discusses about request throttling.
6. Billing of Customers
We are based in Singapore, and hence have no access to Stripe :( . As a result we ended up choosing the not-so-easy-to-integrate Paypal API, which took a good one week for Vinoth to integrate, what with the Paypal API’s creaky url-callback system, poor documentation and buggy sandbox environment. It’s quite amusing that Paypal doesn’t even provide a REST API!
We support two types of payments. One is a monthly subscription (get a fixed number of calls per month) and the other is a bulk call purchase (purchase X number of calls at Y dollars).
7. Dashboard/Analytics Platform for Customers
Finally, you want to build a visual front-end so that your customers can track and monitor their usage.
Our analytics dashboard allows users to view the the total number of calls that they made for each day for their chosen date range. We also display the last 100 calls made on the last day of the chosen data range.
We used client side rendering to build the dashboard, using the excellent ICanHaz.js library (which comes with built-in mustache templates support). Client-side rendering is a great strategy when building dashboards, because it makes work division between frontend and backend devs really convenient. More importantly, it allows for changes and new features to be introduced more easily as the various aspects of code (front-end display and back-end data generation) are clearly demarcated.
Building a paid API offering involves several considerations that need to be planned thoroughly. I hope this blog post serves as a simple guide for those looking to build something similar for their own startups. That said, if your API is just a non-critical add-on service and not your core offering, using a third party service like 3Scale or Mashery may be a much better choice.
On a side note, if our current idea doesn’t take off, we may setup a Mashery competitor (just kidding ;) ).
PS: We just launched a closed beta of our API. We would be most glad if you could give it a try. Here is the signup link (it comes preloaded with the invitation code). Don’t hesitate to share it with your developer friends.