Competitors, Price Histories and Product Details using just a Purchase URL

  

products?q={"url": "costcentral.com/proddetail/Apple_MacBook_Pro_with_Retina_display/MD212LLA/11804567"}

Supply any purchase URL to the API in the format above to retrieve data including:

  1. A list of other domains and URLs at which this product is for sale. In this case, the API returns ~20 links including http://www.amazon.com/dp/B007472CIK and http://www.frys.com/product/7378544.

  2. Structured metadata about the product that the URL references. The API provides details about the product’s weight, model number, case material, energy star qualification and more in a structured JSON string.

  3. Price history of the product on this site, as well as associated competitor sites. Turns out one of the other domains that sells this product – frys.com – has been varying its Macbook prices between $1399, $1449 and $1499 on a near weekly basis for several months now. You can view this evolution by running this query

offers?q={“sem3id”:”7Mbjs9dLPMSu20KE2m6YAE”,”sitedetailsname”:”frys.com”,”isotime”:1}

Crucially, the URLs that you provide to the API need not be canonicalized, i.e., they can be provided in any form, no matter if they include session IDs, tracking codes or any other parameters.

Despite your requests for URL lookups, this feature has been a long-time coming for the one reason – e-commerce sites often have different URL strings pointing to the same page. This could be due to affiliate marketing needs, campaign tracking needs, or simply because the site has its own quirks. How then do you identify equivalent URLs without actually visiting the target page and comparing its content (itself a non-trivial process, which would crucially add several seconds of overhead to each API call)?

We tackle this problem by resolving every URL into two parts:

  1. A domain name.
  2. A SKU (stock-keeping unit).

While extracting the domain name is trivial, identifying a SKU is quite a challenge. For example, consider the URL http://www.costcentral.com/proddetail/AppleMacBookProwithRetina_display/MD212LLA/11804567.

The domain name here is evidently “costcentral.com“, but is the SKU “AppleMacBookProwithRetinadisplay“, “MD212LLA“, or “11804567“? The answer: “11804567”. Why not the first two? Because while these permutations of the link under consideration point to the same target page:

http://www.costcentral.com/proddetail/AppleiPhone5/MD212LLA/11804567 http://www.costcentral.com/proddetail/AppleMacBookProwithRetina_display/MD212XXX/11804567

this does not:

http://www.costcentral.com/proddetail/AppleMacBookProwithRetina_display/MD212LLA/11804568

Our system is now capable of understanding these nuances of e-commerce links, making the URL search feature a reality at long last.

Published at: September 19, 2013

← Read other posts