Why finding product GTINs is quite the bugger
Following on from our “Google is at it again with GTIN requirements for Product Listing Ads” blog post, we thought we should share with you some of the challenges that we face when searching for GTINs. So first, let’s back up and think on what a GTIN is.
GTIN describes the 14 digit, globally unique code that identifies unique products. At the moment GTINs are used for bar codes. In North America the UPC, a 12 digit version of the GTIN, is predominantly used for bar codes. In Europe — European Article Numbers (a.k.a EANs), are the more commonly used unique identifier, which again is a subset of the GTIN.
To sum up the above, a GTIN is a unique product identifier that is the umbrella category under which all unique product identifiers sit.
“Great every product has a unique code and so it can be identified!”
There are four main reasons why this isn’t always so:
Therefore you should never fully depend on just GTINs/other unique identifiers, as we have seen that they are not a 100% reliable.
How do we decide that two products are exactly the same?
The biggest challenge is to decide if they are exactly the same or if they are variations of each other. At Semantics3 we use a nine stage pipeline that matches different things at each stage and returns a yes, no or maybe.
Fields like brand, name, and images are normalized and compared. This is a complex process as these will vary hugely across retailers. Machine learning is also utilized to analyze millions of data points across our database to develop heuristics in determining product matches. Each unique product discovered is then assigned our own unique semantics3 id and stored.
So how do we find unique identifiers for products?
As always, the process starts with a phone call to us. Customers come to us with a wide range of product data sets for which they need GTIN enrichment. With the recent Google Ads GTINs requirements coming into play in only a month’s time we’ve seen an increasing rise in dataset requests for GTIN matches to product metadata.
This challenge is keeping the engineers on their toes, pushing them to keep developing up to date solutions for our clients.
Check out “Announcing Expanded UPC/Barcode Look-ups”, a previous post on keeping up with the times.
Currently we can use our API to search by the following to get unique identifiers:
So in order to get unique identifiers for products we need to have distinct and searchable data for them.
Datasets that are hard to get substantial matches for tend to contain the following:
So to bring this post to a close, finding GTINs is tough and our team here at Semantics3 has put a lot of hard work into making sure that we get you the most accurate data possible.
Currently we are working on new strategies on how to extract GTINs for products with very little unique product metadata.
…and continue to watch this space for upcoming blog posts on how to solve your GTIN problems!
We shall be attending the GS1 conference in Washington June 1st-3rd — Come and find out how we can help you at the GTIN-pocalypse Stand (Booth 42)
Lovingly built in San Francisco, Singapore and Bengaluru by Anna Rogers, Sivamani Varun, and the Semantics3 Team.
Published at: April 12, 2016