Lifting the wool off your eyes
Buzzwords are the bane of our existence in tech. They’re designed to confuse, alienate and isolate outsiders to the industry, as well as dress up a poorly designed product to impress customers.
In all honesty, buzzwords infuriate us. They were originally designed to convey complex ideas in a compressed word-vehicle, but were hijacked to become the symbol of pretentiousness and form-over-function.
That being said, here’s a guide to what the buzzwords actually mean when they are used accurately, and not just as word mumbo-jumbo in the echo chamber of blogposts written by our contemporaries in the e-commerce product API space.
Common buzzwords in tech and what they actually mean
What this means: When you make a query (particularly a URL query), the API checks the URL page that you queried, performs a live crawl of the webpage and sends back structured product information with the price as currently displayed.
What this does NOT mean: An API that sends you a response in a short span of time (for e.g. a response time of milli-or-microseconds). Any well-designed API should provide a response to your query in a short span of time, not just a “RealTime” API. Anything otherwise is just a poorly designed product.
What it means: The average time between query and response (see above). Well-designed APIs have low latencies and can handle large volumes of queries without slowing down. Poorly designed APIs have high latencies.
Synchronous vs Asynchronous API calls
Here’s a nice short description from Apigee:
If an API call is synchronous, it means that code execution will block (or wait) for the API call to return before continuing. This means that until a response is returned by the API, your application will not execute any further, which could be perceived by the user as latency or performance lag in your app. Making an API call synchronously can be beneficial, however, if there is code in your app that will only execute properly once the API response is received.
Asynchronous calls do not block (or wait) for the API call to return from the server. Execution continues on in your program, and when the call returns from the server, a “callback” function is executed.
Semantics3’s API is an Asynchronous, RESTful API. What that means is that you can send multiple API calls to the API endpoint and not have to wait for the API to respond before sending the next query.
This can be useful if you’re offering a lag-free user experience in your application.
There are some key advantages to asynchronous messaging. They enable flexibility and offer higher availability — There’s less pressure on the system to act on the information or immediately respond in some way. Also, one system being down does not impact the other system. — PeoplesoftTutorial
Product vs Product Offer
What it means:
Product: A unique item produced by a brand, with a unique set of attributes and features.
Product Offer: A product offered by a seller online at a particular price point, with a URL.
Why is this important?: Many product API companies make strong claims on their product count (Millions! Billions!) without actually clarifying which metric they are referring to.
For example, if the DJI Phantom (shown above) with that specific configuration is sold by 20 different retailers, then technically it has 20 product offers. You may choose to claim that they have a product count of 20, when in reality it is but 1 product with 20 offers.
In our product database we distinguish between Product, Product Offers, and Product Variations (i.e. same product but with different specifications).
Number of products vs number of offers?
In all honesty, the usage of statistics in Product Databases mean very little; Having a billion products vs trillion product offers doesn’t necessarily mean you have a better asset on hand.
In our various experiences of monitoring online retailers, we’ve discovered that there is an very strong pareto-principle power law at play with regard to specific product and pricing data that customers actually care about. Generally, we find that only 20% of all available online products are actually purchased, monitored, or requested to be monitored by 80% of our customers.
Here’s an example:
Do you know why the total number of products seems so inflated, especially as is commonly reported?
One of the reasons is due to a large proliferation of products that are only produced on-demand.
For example, millions of T-shirts are available on marketplaces that don’t actually exist — they have been auto-generated and posted as product pages.
These T-shirts are printed only when someone orders them — tracking them would be meaningless.
Monitoring all of these non-existent products is a waste of time and resources — customers don’t want them and they take up needless price refresh resources, sometimes at the cost of updating a more frequently changing product.
It’s better to ignore them unless a user specifically requests that these URLs be updated.
Signal vs Noise
The reality is that it’s impossible to track all products in the universe simultaneously for product and pricing changes.
That’s where statistics and data science comes in. This helps us develop good heuristics in determining which products actually matter, since these are the ones that will actually have any price changes and most frequently accessed by our customers.
We utilize the pareto-principle power law as a starting point, not only in terms of narrowing down the population of products we need to monitor, but also improving our price refresh frequency for these products.
It allows for better data quality, fresher pricing information and a better all-round experience at scale. More critically, it also helps us keep our costs low, which we can then pass on to the customer.
What if you need custom crawl frequencies?
No worries, we have you covered. We understand that a one-size-fits-all approach doesn’t work for some customers. Additionally, certain customers would require very specific SLAs on the products that are important to them (e.g. we’ve had customers who’ve wanted refreshes on Matryoshka doll at one hour frequencies).
For such customers, we have a separate recrawl queue that handles this requirement behind the scenes.
This two-pronged approach (power law based recrawl and specific customer based recrawls) helps us cover everyone’s needs with respect to price freshness, while keeping costs low.
Product statistics are just marketing speak
There’s another reason why having more products is just marketing speak. Marketers can manipulate their statistics to make their product database look larger.
Here’s an example from a competitor:
The image above might look impressive at first glance, but in truth, product count gaps are irrelevant because products are collected based on site coverage;
i.e. if you request that we crawl a particular site — all products in that site are automatically pulled into the database.
Here are some more relevant statistics:
- Is product matching performed on the product data? (Yes we do)
- How rich and comprehensive is the product metadata? (We collect all available metadata, and tag it to unique product records; i.e. an iPhone 7 64Gb gold is a unique product in our database)
- What is the refresh frequency for price data? (We can do RealTime price checks)
- How long does it take to crawl new sites? (If you need raw data, we can crawl ANY site in RealTime, if you need Match&Merged data (with product data mapped to offers from other retailers — we can do this in less than 2 weeks)
Ergo, this means that having a larger number of sites, brands, products, MPNs and UPCs don’t have a strong bearing on the actual utility of that data.
Having millions of UPCs or MPNs mean very little if you have duplicate or corrupted UPCs that are mis-matched; poorly updated prices (updating 1200 sites regularly can be a pretty tough/expensive proposition) or missing data fields (having 1200 sites is tough, performing product matching on all of them is even tougher).
API call quota vs API call rates
API companies often differentiate their pricing based on API call quotas.
API call quotas basically mean that you have a fixed number of HTTP pull requests available to be made to the API endpoint, after which you will be blocked. API customers generally have to upgrade to a higher plan when they need more calls.
Conversely, API call rates refer to the number of asynchronous calls that you can make per time period (this is usually in seconds) to the API endpoint. For example, if you’re permitted to perform up to 6 requests per second, the API endpoint will accept 6 parallel requests every second and process them asynchronously — which means that you don’t have to wait for a response from the API before sending another request.
At the end of the day, API call rates and quotas are different tools API companies use to manage their server loads and differentiate pricing for customers. For example, Enterprise customers need higher API quotas paired with the ability to send a higher volume of requests per second to ensure a low-latency experience for their end user.
This typically involves the set-up of a dedicated API cluster that’s isolated from the common group of computers that handle the APIs accessed by non-Enterprise customers.
In a technical sense, API call rates are much more valuable for your application performance than API call quotas. Having access to a high bandwidth (higher bandwidth == higher API call rates) implies that your application can handle higher traffic and request loads without being throttled. API quotas are typically set to prevent a total overload of the system beyond its operational capacity:
If you expect to have a high traffic load that’s likely to be sustained over a long period of time, you should request for a high API call rate AND a higher API call quota.
So what should an API customer look out for? What metrics should you use to pick the right solution partner?
In reality, it’s not the size of the database that matters; it’s the quality of the data, the richness of data fields available, the accuracy of product matching and categorization, the frequency of price refresh and the technical robustness of the product API that matters in the end.
Because at the end of the day, a product data solution has to provide genuine business value to you.
Don’t let anyone tell you otherwise.
Email us at firstname.lastname@example.org, schedule a call with us or start digging on your own at semantics3.com!Semantics3 operates the world’s largest eCommerce product database. We’re a trusted and reliable provider of ready-to-use structured eCommerce product pricing and metadata, with coverage on all of the top 800 internet retailers.