Tony Roberts — ‘Rift Valley, Vandezande’s World’ from the book Tour of the Universe (1980) by Malcolm Edwards & Robert Holdstock

Product matching is a challenging data-science problem that we’ve been battling for several years at Semantics3. The variety of concepts and nuances that need to be taken into consideration to tame this problem has reduced our data-scientists to tears on more than one occasion.

In this week’s post, we decided to pay a visual tribute to product matching by showcasing some of the particularly difficult examples that we’ve come across over the years. Enjoy!


#1 — The quirks of branding

“Warming” lozenges aren’t the same as “Liquid Center” lozenges, except that “Warming” lozenges have a liquid center (see description at the bottom left)!


#2 — Spot the embossing

Above are zoomed-in images under good lighting, so that you can clearly spot the difference in embossing between the two folders. From afar … well, “good luck” is all I’ll say!


#3 — Enter stage right, Sheldon Cooper

The colors are the same and the tracks don’t come with the product. So does the light seep through the windows in the image on top because the blinds aren’t drawn, or are these altogether different trains?


#4 — How many shades of black?

One’s a “Jet Black” iPhone and the other’s a “Matte Black” iPhone. Don’t ask me to tag them — I’m an Android user.


#5 — An apology to my Physics professor

Data Scientist — Ounces or “oz.” is a measure of weight and fluid ounces or “fl. oz.” is a measure of volume. See! Volume and weight are altogether different physical concepts, everyone knows that. Hence, not a match. Quod erat demonstrandum.Everyone Else — Use your common sense. They’re the same.
Data Scientist — But the density of ice-cream isn’t 1.0!
Everyone Else — Don’t be ridiculous!

#6 — Not just a different point of view

These two SKUs have the same brand, same model, same size and same description. So you’d think they’re images of the same product from two different points of view. Right? Maybe?… Think again!


FWIW, the answers to the above are NM, NM, NM, NM, M, NM, where NM=not-match and M=match.

Five years since we first started working on this problem, we continue to be committed to pushing the envelope when it comes to making sense of the weird world of e-commerce data. Onwards!

To learn more about our offerings for Matching, Categorization or Feature Enhancement, book a call with us.