How we built a dashboard tracking tens of thousands of products over the Black Friday 2017 season.

Semantics3 Holiday Sales Dashboard

While everyone else talks about pardoning a turkey on Thanksgiving, at Semantics3 we just started a different tradition.

We found a particularly crazy product to pardon.

For the past 3 years we have been scouring the web for all the prominent deals during holiday season in order to create an aggregate dashboard that shows the discount trends online. This year we decided to take it a bit further and use our data for good.

Why we pardoned a product

An aggregated dashboard throws up a lot of insights that are otherwise hard to notice in a fragmented landscape where different retailers, marketplaces, sellers and brands compete in a frenzied panic to give you the best deal.

One of the curious things we noticed was the number of times prices are changed for some products. For example, the price for this Red Ruby pendant was CHANGED 169 TIMES! Now we can’t imagine that an actual interested buyer looking for the best deal had a good Thanksgiving or fun Christmas. So we decided to find a similar product that had presumably stressed a bunch of people and…pardon it.

While the pendant had the most price changes it is a relatively obscure product. So we settled on the Garmin Forerunner 235. A popular product with over a thousand reviews that’s changed price 129 times since we started tracking on Thanksgiving.

It’s a nice enough watch but 129 price changes seems a bit much.

We pardoned it. On behalf of all the people out there who had wishlisted, bookmarked or otherwise lusted after this watch and been subjected to its erratic and frequent price swings —

We pardoned it.

Our aggregated dashboard for every product ever.

How did we build a dashboard that threw up such juicy insights? This year, we worked in partnership with Slickdeals. In this section, we’ll cover the replicable automated workflow that we use behind the scenes to put up this dashboard every year.

Discover

The first step is to identify sources of discounted product listings on the web.

We decided to limit to two such sources this year:

  • Slickdeals (our partner): Being an aggregator site, it provides for store representation in the data.
  • Deals micro-sites of top marketplaces: Marketplaces like Amazon and Walmart put up mayfly deal micro-sites during the holiday season. Pulling products from these marketplaces allows for category representation in the data.
Deal sites from various retailers are like Ad Scans for us.

Once these seed URLs have been decided upon, our crawlers set forth to continuously keep attempting to discover newly discounted products from these places on the web.

Irrespective of where we discover the product from, we import it only when we find its offer price to be lesser than the list price (as claimed by the store on which it is sold).

Import

Every discovered product is attempted to be imported.

In this step, we use our on-demand Crawl API (the RealTime URL API) to exchange the product URL for structured product metadata conforming to a known schema. This lets us extract and save key data points about the product such as:

  • its name and image
  • its category crumb from the store on which it is being sold on
  • the brand selling it
  • its rating and the number of reviews it has received on the store it is being sold on
  • and most importantly, its price and availability (whether or not it is in stock at the time of crawl)

Backed by the ability of our RealTime API to extract most of these fields from stores in an unsupervised fashion, we were able to import products from 800+ online stores this year.

Classify

When a product is imported, it is saved along with the category that it was assigned to in the store that is was crawled from. However, every store has its own taxonomy for classifying products. Therefore, we normalize the product’s category by mapping it to one from the Semantics3 Category Tree using our on-demand Categorization API. Once classified, we save the product’s Category ID alongside the other fields recorded during import.

Refresh

The refresh step is similar to import except that it is scheduled to run at a later point in time. In this phase, we re-crawl the product’s page on a store using our RealTime API again, extract its current price and availability and compare it with the recorded (either as of the initial import or the previous refresh cycle) price and availability to detect price changes which we then track on a timeline.

Our refresh cycles are short and continuous.

A new refresh cycle begins as soon as the previous one ends.

This lets us guarantee with reasonable accuracy that the the deals we show on the Dashboard are still active at the time of viewing.

Frequent changes in the offer price for a product are also quite common during the Holiday Season. For example, we observed that the price of Aveeno Baby Wash & Shampoo changed 69 times in the last few weeks. A fact that is borne out by its price history on CamelCamelCamel.

Quite the wild ride for a product that costs ~$10 on average.

Visualize

We now have a continuously growing (import) and frequently updated (refresh) set of curated (discover) products in a data store.

The various widgets on the Dashboard are powered by data obtained by running simple queries against this data store. (eg. Top brands by discount%, Top categories by discount%, Average discount value for a given category etc.)


So, that’s how we put together our Holiday Sales Dashboard. As you can see, many of the core services used in this pipeline are available through our public APIs. Should you want to build a similar Price Trends Dashboard for your own use-case, speak to us.


Written by Amarnath Ravikumar and Anjali Krishnan. For more engineering posts checkout Engineering@Semantics3 or test out our database using the Semantics3 UPC Lookup Tool.