Crossing the Human Rubicon with Artificial Intelligence
For 4 years now, we’d been chipping away at the problem of product categorization. Through persistent experimentation, and boosted by the occasional insight, we’d made slow incremental gains in the accuracy of our algorithms.
Yet, proud as we were of our periodic gains, one realization had continued to irk us: our algorithms were no match for humans. They might have been faster and cheaper, but when it came to quality, they paled in comparison …
… that is, until last month.
Today, we’re proud to announce that for the first time, our categorization algorithms can match and even surpass human equivalents!
Product categorization primer
Before we get too far ahead, here’s a quick primer on product categorization for the uninitiated.
The goal of this task is to tag each of the hundreds of millions of products in our database with a single category ID. These category IDs are derived from our curated category taxonomy of over 12,000 elements.
A peek at our category taxonomy; master list available here
An algorithm that solves this task has to develop a deep understanding for what a product is, and then determine which of the taxonomy tags in most appropriate for it.
What’s more, categories are hierarchical (e.g., an “Apple Macbook” can be described broadly as an “Electronics” item, more specifically as a “Computer & Accessories” item or even more specifically as a “Laptop”); hence, to get category tagging right for just one product, the algorithm needs to make multiple correct decisions.
So how did we manage this quantum leap forward after years of incremental linear improvements?
First, we latched onto some of the more recent improvements in deep learning, not available until quite recently.
Specifically, these models were able to do a better of job of “inferring” features from unstructured inputs, thereby overcoming shortcomings in our feature engineering.
In layman’s terms, the models we built were able to overcome limitations in our own human abilities to feed it data in a palatable format.
Second, we expanded the volume of our training dataset nearly five-fold. This gave our models more wiggle room, and a wider array of real-world examples to adapt to. After all, when it comes to machine learning, more data usually beats better algorithms.
On a side note, this is one of the unique advantage of working on AI problems at a company whose core asset is massive datasets; the sheer scale of data on offer, sometimes 50x larger than those available in public domain, is a dream come true for our AI team.
Finally, we tweaked the thoroughness of our dataset, especially by ensuring that each individual datapoint had a richer set of labels, more reflective of the hierarchical nature of our category taxonomy.
Why all of this matters
Product categorization is important for the same reason that cataloging books in a library by genre is.
It makes the process of finding what you’re looking for easier.
Retailers understand this pain better than anybody else. On the one hand, consumers require an intuitive and rich browsing experience, but on the other, the underlying data provided by suppliers is either inadequate or lacks structure.
In this scenario, cataloging and compartmentalizing supplier data becomes essential for offering a better consumer experience.
Shipping and logistics operators also face the need for product categorization too. When goods are moved across international borders, tariffs and duties have to paid.
These fees are often a function of what the product is, i.e., what category the product falls into. Without proper category information, these operators may find that they are either paying more than they are required to, or underpaying and thereby non-compliant.
Get access to the future now
These categorization algorithms are being phased into our product database and our customers can already start making use of these new advances in AI, to solve their product categorization goals.
One of the easiest things you as a customer can do is to test our AI! We can offer term-limited, discounted Proof-of-Concept packages that let you test our system out and figure if this solution reduces your costs:
If you have specific product taxonomy and categorization needs, we have enterprise solutions on offer; don’t hesitate to email us at firstname.lastname@example.org or schedule a call with us.
Liked this article?
Built in Bangalore, Singapore and San Francisco by Ramanan, Govind and the Semantics3 team