It’s not the algorithms — it’s the data, silly!
Everyone in tech and their grandmother has an opinion about AI. Some are calling it the second coming of the maker Herself. Others are waxing lyrical about how AI will annihilate our species (whilst concurrently launching a company to merge your brain with it).
It’s safe to say — Artificial Intelligence has officially arrived.
You’ve probably been tempted to get in on the action, if you haven’t already. Your CEO has called everyone in to start the hunt to build a data-science initiative. Your marketing team just needs the word to let fly the AI-centric rebranding. The budgets have almost been approved …
Before you jump in to the deep-end though, stop, and throw caution to the wind:
AI is not a magic bullet that’s going to help your business.
It’s not in the algorithms
Much of today’s hype around AI centers around the algorithms. This misconception is not new — science fiction often makes the same mistake of equating robotic intelligence to the human brain.
Articles about AI in public domain get us dreaming the way Isaac Asimov did — visions of robotic positronic brains, machine equivalents of the human mind with all of the analytic, creative and predictive abilities.
The problem with this line of thinking is that it betrays an incorrect assumption — that the BRAIN is where the intelligence ORIGINATES from … that algorithms are where the intelligence lies.
It’s in the data, silly
Human brains are as able as they are, because of the experiences that we’ve gained through decades of perception and reaction. Try getting your one-year old to solve a calculus problem and you’ll see what I mean. The potential is there, but it’s the experiences that shape ability.
In just the same way, think of neural networks as raw budding brains. But if you can’t expose these brains to the experiences that you want them to understand — data in daily parlance — don’t get your hopes up.
So what is AI?
AI is quite simply the application of a machine learning algorithm combined with a training dataset and human intervention at opportune moments to correct the course of development of the intelligence.AI = Machine learning + Training dataset + Human quality checks
Some critical terminology coming your way:
Training data refers to the input dataset that the machine learning algorithm processes and develops connections from. It contains input and pre-made output (in the case of supervised learning) data (usually human curated, crawled or otherwise generated) that the model can analyze and develop pattern-based logic for.
Machine learning refers to the actual algorithm that develops the logic, rules and heuristics based on the training data provided.
Humans play a crucial role in this process. We constantly check the algorithm to ensure that it is developing the right rules that generate the most desirable results. We act as a corrective influence on the algorithm.
Why isn’t anyone talking about the data?
To begin with, the “big data” revolution is more than 5 years old.The term has lost all meaning, even though the utility looms larger than ever.
What’s more, the issue is exacerbated by the fact that academia relies on standardized datasets, because of which most research papers emphasize techniques over the datasets used. These research datasets will almost certainly not translate to your business needs.
Failing to understand this can be incredibly frustrating for companies and businesses that set out to hire an army of machine learning scientists and roboticists, only to discover that they need the right data to get anywhere close to deploying AI in their businesses.
Ah, so AI is really all about the much-hyped moonshots?
Quite the contrary actually. AI can be used to solve, improve or optimize everyday business problems.
The awesome thing about AI that is deployed well is that it’s a great way to improve your existing business at a much more scaleable or affordable way. Done right, you can use it to target key business challenges, and super-charge your revenue streams.
The issue is that all too many companies go in with toothpicks, expecting to sculpt Michelangelo-esque sculptures, only to be left disillusioned and with large holes in their pockets.
This has been a painful trend in the history of AI.
AI really isn’t exclusive to complex problems
Allow me to demonstrate.
Makoto Koike was a former embedded systems designer from the Japanese automobile industry who started helping out in his family business — a cucumber farm.
The cucumbers had to be fresh and crispy with prickles still on this. To be able to deliver a high level of quality the cucumbers had to be sorted by many attributes ranging from size, color, shape and even the number of scratches on them. Makoto’s parents had a custom classification and his mum spent up to 8 hours a day during peak harvesting time dividing the cucumbers into nine different categories. It was time consuming and complex enough that it couldn’t be handed over to an untrained part-timer.
Makoto decided to apply machine learning technology to cucumber sorting by using Google’s open source library, TensorFlow. His cucumber sorted went live in July 2016 and could classify with 95% accuracy. Makoto spent 3 months taking over 7000 images of cucumbers in order to train the model (remember, nothing can substitute good data).
AI isn’t just for the big businesses
Makoto’s model simply took a concept and applied it to a system that was repetitive and time-consuming for humans to do. It freed up many hours of time that his mother would’ve spent manually classifying vegetables so that they could instead focus on the actual value add — spending time on growing more delicious vegetables.
The key to understanding whether AI can be applied effectively to solve a problem or improve business processes is to work backwards. Machine learning can help where standard automation processes as not enough. Assembling a simple product on the factory floor can be automated but classifying products by type involves more than a single workflow and would require learning.
How do I figure out which problems AI can help me with?
Let’s break it down into a step-by-step process:
- Figure out whether the process you want to improve is an automation process or a machine learning process. If a process is repetitive, well-defined and doesn’t include a multitude of variations then it is likely an automation process.
- If the process requires pattern recognition, classification or prediction based on a finite set of factors then machine learning is a good fit.
- The next step is to gather the data — this is the most important step. Machine learning cannot fix bad data. The data needs to be comprehensive, that is it needs to include all the factors that influence your problem.
- Lastly, before implementing anything question and review everything. Is machine learning the best solution for what you want to do? Is your data good enough? What is the error margin on the method you are proposing and can you afford to deal with that error margin?
Here’s a typical flow we use for a typical product catalog enrichment exercise (using AI):
How can Artificial Intelligence help my ecommerce business?
There’re are so many ways in which AI can boost ecommerce — I don’t even know where to start!
Here’s a taste of some of the popular AI algorithms that we’ve deployed for customers ourselves:
- Categorization — auto-tag products to specific categories; don’t waste time manually marking products one by one anymore.
- Product matching—for any SKU, find out who else you’re competing with in an instant.
- Features — Drop in any product content and get all the metadata about it extracted and normalized into a tabular form.
And that’s not to say much about the more exotic tasks:
- Description generator — Given a photo and a few tags about a product, auto-generate a description for it.
- Photo tagger — Find all the products in any Instagram, Facebook and Twitter photos.
- Image generation — Auto-generate images of never-seen-before products, and supercharge your (CPG) product research.
Okay I’m convinced. But I need data. What now?
You need data for AI? AI-powered data coming right up.
We’ve developed algorithms to auto-extract data from any ecommerce website—no human effort necessary! That’s right, AI-ception.
Our extraction APIs can extract data from any ecommerce website to retrieve clean, structured content — delivered as structured JSON. This little baby works instantly, giving you even those ecommerce product that aren’t currently in our database, effectively giving you limitless coverage.
So what’re you waiting for?
Makoto needed 7000 images for his cucumber sorter.Ecommerce apps & solutions need millions.
Book a call to get data now.
Or use Semantics3’s AI-based APIs (more on this next week).
Written in San Francisco, Singapore and Bengaluru by Govind Chandrasekhar, Anjali Krishnan, and Hari Viswanath
PS: A glimpse at what datascience teams really do. More on this later.