Each day, hundreds of thousands of sellers aim to sell hundreds of millions of products to as many consumers. And this competition leads to thousands of price changes each second. In the process of building systems to gather, standardize, deduplicate and categorize this data, we’ve sifted through thousands of datapoints.

So after months of wrestling with this data, how intuitively do the guys know their data? I put together a data trivia quiz for my team, the catch being, ofcourse, that they weren’t allowed to use our API to lookup answers. The results turned out to be interesting, so I’m documenting a sample of questions and answers in this blog post for all to see.

[If you're in a hurry, scroll to question 2, which I think is the most interesting].

Question 1

Which product is sold by the most number of distinct sellers? I’ll give points for identifying the category of products, at the very least.

Gran Turismo 3 A-spec (Playstation 2) on amazon.com, with >900 sellers. Most of the items on sale are “used” goods, naturally.

sem3_id: 1eta5uVvkOGsIekGumKIQe
API Query: products?q={“sitedetails”:{“recentoffers_count” : {“gte” : INPUT}},”cat_id”: CAT_ID}

Note: For each of the answers, I’ve provided the sem3_id unique identifier of the product, which can be used to retrieve products from the database through the query products?q={“sem3_id”:SEM3_ID}.
Further note: For each answer, I’ve also provided the approximate query used to generate the answer. CAT_ID refers to the the category of products you wish to restrict your search to and INPUT refers to your search value. You can lookup category IDs through queries such as categories?q={“name”:”Video games”}.

Question 2

What’s the costliest product that you’ve come across?

The “costliest” items that we found turned out to be kinky sexual wellness toys. I’d like to keep this blog PG, so I won’t link to them. The costliest product that I will link to is this Classic Flame 23MM070EEPC/SO 23 Palisades Home Theater with Electric Fireplace, which currently costs $46,688,998,697.61 (~$46 billion)… to be precise. To put that in context, this home theater cum fireplace would bankrupt Mark Zuckerberg three times over, and is about twice the once fabled price of The Making of a Fly. Algorithmic pricing FTW?
alt text
$46,688,998,697.61 product on http://www.amazon.com/dp/B0065MYDWI
$46,688,998,697.61 product on Amazon.com

The price history API (query 2 below) reveals the chronology of the rise in price of this product:
– 12th September, 2012: The product was priced at $1984.47 and shipped in 6-10 business days.
– 2nd December, 2012: The price had risen to $93,662.35. The hike was, you’ll be glad to know, justified by a reduced window of 1-2 day shipping.
– 21st Jan, 2013: $7,440,653,641.12. The price has risen 6.5x in the last week. Time for a quick buy and resell?
This isn’t the highest price that we’ve recorded for a product though. Turns out this Samsung TV was priced at $1,000,000,000,000.00 ($1 trillion) in early November last year. A dozen sales of this would have gone a long way towards offsetting the American national debt!

sem3_id: 3wimVmBWkCsEAesceYkecK
API Query 1: products?q={“price”:{“gte”:1000000},”cat_id”:CAT_ID}
API Query 2: offers?q={“sem3_id”:”3wimVmBWkCsEAesceYkecK”}

Question 3

What’s the longest item that you’ve come across? That’s length, the physical dimension.

4000 feet jumbo roll toilet tissue! Some out of the box thinking needed there.

API Query: products?q={“length”:{“gte” :LENGTH},”cat_id”: CAT_ID}

Question 4
What’s the most popular color among all categories of products?

Black, ofcourse! ~1MM of our 15MM or so products are black in color. Interestingly, only half as many products are white.

API Query: products?q={“color”:COLOR,”cat_id”: CAT_ID}

Question 5

Which brand has the greatest number of distinct product listings?

I found ~65k products branded Wall Spirit.

API Query: products?q={“brand”:BRAND,”cat_id”: CAT_ID}

Question 6

Which product contains the most variations (size-dimension-color-etc. combinations)?


1775 variations for Vermont Gage Steel No-Go Plug Gage.

sem3_id: 7fimF6qaGG66oc2sYSwKaO
API Query: products?q={“variation_id”:”6rV6YeWdvsyKCWI6QMeMSY”}
[Returns all variations of the specified product].

The world of e-commerce data, particularly price data, never ceases to amaze me. Managing this data is certainly an uphill challenge, but it is also paradise for the data geek in me.

Sign up here (link expires in a week) if you wish to explore the data for yourself. Lucky for you, we’re already handling the data management bit, so, I invite you to go crazy exploring. Do email me at govind at semantics3 dot com if you have any questions or insights.

By the way, for you non-developers out there, we’ve built an API playground through which anyone can run queries with no setup time.