Site indexing, turbocharged

Bumping up the horsepower on the SKUs API

Semantics3    3 mins

Site indexing, turbocharged

Bumping up the horsepower on the SKUs API

As of writing, our product database covers nearly 100 million unique products and over 10 billion product offers. Although these seem like large numbers, our customers come from all sorts of industries and have use cases that require coverage on products from many niche and boutique sources. We don’t have it all yet, but when our customers need it, and when they need it quick, we want to deliver. Thus we’ve made some changes.

Traditionally, when we got site indexing requests to retrieve catalogs for additional products and their metadata, it would take us 3–4 weeks to get the data from the requested retailers accessible to our customers.

Well, not anymore.

Now you can access entire product catalogs from new retailers within just a week.

Vroom.

Why did we do this and why does this matter to you?

First, we saw that there was an opportunity to improve our customers’ experience working with our tools. If you need product data, we want to provide it for you. If you need product data fast, we want to provide it for you fast.

Second, we noticed an opportunity for improved efficiency in our product. So why not?

Efficiency.

This matters because, when you now come to us with site indexing requests, we can get you the raw structured data you need in as soon as 3 days, via our powerful SKUs API.

Need product data from a certain category not well covered in our database? We can get that for you. Fast. Catalogs from a specialty online retailers? Got you covered. Find a site that will lend well to matching your UPCs or SKUs? Well, you know the story.

How are we able to accomplish this feat and when did we even start thinking about doing such a thing?

Indexing ain’t easy. Trust me, we know.

For example, let’s consider an online retailer with a catalog of over 1 million products. With such a massive amount of data, this would usually take about 3 weeks, and even up to 1 month if there are any issues. Such issues include site complexity, bot detection, infrastructure downtime, load on elastic search, and more.

Without giving too much away, our indexing process has several steps, such as creating wrappers, crawling and discovery, pre-disambiguation, processing, and more crawling and discovery.

When our standout team of engineers took a high level look at our process, they realized that we could record information and deliver it almost right away with some simple (but actually highly complex) changes. Whatever they did, it makes sense. And take our word for it, it was not easy. But if the goal is to deliver not just data but a quality experience to our customers, then deliver we shall.

Basically, something like this happens.

In the end, the results speak for themselves as we can provide some seriously fast turnaround times. Normalized product metadata with product mapping via the Products APIs still takes about 3–4 weeks to retrieve, but if you can’t wait, our SKUs API will deliver structured data to you in less than half that time.

Oh yeah. Did I mention that we can do this for non-English sites, too?

With a database covering over 800 sites across the globe, we probably already have the data you need. Amazon. Check. Walmart and Target. Check. Best Buy. Check. Macys. Check.

But if we don’t? We can get it for you. And we can get it for you fast.

Find out which retailers we cover or if we can get the data you need now. Talk to us or visit us at www.semantics3.com.

Lovingly made in San Francisco, Singapore and Bengaluru by Calvin Chang, Dinesh N., and the Semantics3 Team.

Published at: February 25, 2016

← Read other posts