A playbook for building ML powered products, teams and businesses

Building and selling machine learning (ML) products is hard. The underlying technology keeps evolving, requiring organizations to constantly be on their toes … and the rulebook on what makes for successful and profitable products is still being written, making for uncertain outcomes.

Over 5+ years of working with machine learning enabled products, we at Semantics3 have had to deal with many challenges stemming from the nascency of the industry. In this article, I’ve drawn from these experiences to distill a set of considerations for proactively anticipating and dealing with these vagaries. Being deliberate about these considerations has helped us align our products for long-term success.

The ideas below cover general trends; there will, of course, always be exceptions to norms.


#1 — What is the role of AI/ML in your product?

Quadrant A (Top Left): Standalone Black Box Products with ML Models at the Core

You’re selling a black box ML model, where customers determine how best to utilize the intelligence on offer (e.g. transcription services like Amazon Transcribe or Google Speech-To-Text).

If your product falls into this category, carefully think through the quirks of all the inputs that customers may send your way. Your model is likely to have been trained with a specific context, represented in your training dataset — you need to protect against the scenario in which customers test an edge-case, receive poor outputs and conclude that your service isn’t up to the mark.

At Semantics3, we went up against this challenge when we released our Ecommerce Categorization API … while our systems were trained to ingest product names (“Apple iPhone X — 64GB Black”), customers sent in incompatible generic inputs (“iPhone”, “mobile phone”).

There are three ways to guard against this.

  1. Ensure that your product is used the right way, either by defining degrees of freedom in your input (mandating inputs of a certain format) or by educating the customer (through documentation, training material and direct communication channels).
  2. Reject inputs that don’t fit the criteria, by building a detection layer (unsupervised methods to detect outliers in comparison to your training data work well).
  3. Build a product that handles all the edge cases. This is difficult, and might have diminishing returns beyond a threshold. But, it could be what takes you over the hump if your customers have a strong affinity for all-comers-welcome Google search box user experiences.

Quadrant B (Top Right): End-to-End Solutions with ML Models at the Core

Machine learning is at the heart of your offering, and your product can’t function without it, but you’ve packaged it into a broader solution with the intent of solving a specific need (e.g. self-driving cars).

With such products, you control the degrees of freedom, so you avoid the pitfalls of the black box approach. You can also capture more of the value on the table, and potentially grow your market to service less savvy customers.

On the flip side though, the reduction in degrees of freedom means that the onus is on you to understand customer needs and engineer the required experience. The needs on your development teams may also be more diverse — you may need to invest in building supporting hardware, software and data layers as well as customizable components. My experience is that these challenges are more tractable than those of the previous group.

Quadrant C (Bottom Right): End-to-End Solutions with ML Powered Features

Machine learning enables a critical feature of your product, but your product has significant utility even without any ML involvement (e.g. Netflix recommendations).

From a data scientist’s perspective, these products provide the most conducive environments in which to deploy ML models. Since the value-add of machine learning is incremental, and the customer derives utility even without this feature, gradual roll-outs and iterations are possible. The best part — the product itself often generates the training datasets that you need to build models.

In these ecosystems, machine learning has the potential to grow the value of the product and build greater defensibility … but product fundamentals need to be driven by initiatives outside the realm of data science.

Quadrant D (Bottom Left): Standalone Black Box Products with ML Powered Features

If your product falls into this bucket, you’re reading the wrong article!


#2 — How does your product influence your customers’ workflow?

a] Does the product automate a manual process by replacing the humans involved?

Replacing humans altogether is a very difficult thing to do. Humans have the ability to deal with edge-cases and nuances in a way that is challenging to program into a model. Your model will always be measured against this benchmark and is likely to fall short empirically, if not statistically.

What’s more, the cost saving angle of reduced human effort is less appealing than you’d think. Customers won’t initiate labour redundancies to realize savings unless they are really compelled to, and are convinced that yours is a solution for the ages, since such drastic measures are tough to roll back. And even if you make it past this hurdle, the ceiling of the revenue that you command will always be framed in terms of costs saved rather than value delivered.

This strategy usually works best if it allows organizations with limited budgets from executing at scale a task that could otherwise be done only on a subset.

b] Does the product enable the humans involved in a manual process to complete their tasks quicker?

Enabling human activity is usually a better strategy. With this approach, you can capitalize on the Pareto Principle and tune your model to tackle the most common types of decisions. With the right product workflow, you can set things up such that the human-in-the-loop can step in with overrides when an edge case is encountered. The drawback, once again, is that your value-add will be framed in cost saving terms, and your product will be measured in terms of human labour.

c] Does the product solve a problem that previously had no comparable manual alternative?

This includes products that:

  • Solve problems that could simply not have been done by humans (e.g. AI drug discovery)
  • Release human effort where desirable (e.g. self-driving cars)
  • Execute decisions at a much greater speed than humans could thereby enabling new hitherto impossible use-cases (e.g. real-time HS code estimation)

These problems are more greenfield, and therefore less contentious and more conducive to value capture.


#3 — What is your defensible unique selling proposition (USP)?

There are a number of reasons why customers may find value in your product, some more compelling than others.

a] Access to Data Science Talent

Selling to an old industry with slow adoption of tech can help reap early rewards, but this strategy doesn’t provide strong defensibility. If the contract size grows too large, the product begins to get too critical to the customer’s needs or new competitors emerge on the market, then your foothold will begin to get weak. This is also not a mantra for rapid growth, so don’t be fooled by early adoption.

b] Unique Datasets

If you have access to unique datasets, the models that you train may be more potent than those that your competitors bring to the table. You should, however, be cognizant of how unique your dataset really is:

  • The strongest form of protection is to have your access be legally assured from a unique source for a long period of time.
  • If the cost of building training datasets is what’s keeping competitors at bay, then you can be certain that if the market is lucrative, your blue ocean won’t remain free of sharks for long.
  • If the uniqueness stems from your first mover’s advantage or the size of your initial customer base, then think again — reams have been written about whether data network effects make for strong moats.

c] Innovative Model Architectures

Sometimes, it may be that your data science team is very capable, eking out far better results on a commonly available dataset than anybody else can. This, however, is almost never a source of long-term defensibility. Due to the open-source DNA of the industry, new innovations proliferate quite rapidly. What’s more, due to the speed of innovation, it doesn’t take long for seemingly path breaking techniques to be altogether subsumed by newer waves.

Typically, none of these three option alone is sufficient to ensure growth and defensibility. In the long run, what makes the difference is how you bring them all together to craft your process and product. Process refers to how models and datasets are weaved together with humans-in-the-loop, taxonomies, heuristics and analytics to solve the problem at hand. Product refers to how these processes are made accessible to the customer in a way that solves his or her need.


#4 — What is the role of humans in your loop?

“Whenever you have ambiguity and errors, you need to think about how you put the human in the loop and escalate to the human to make choices. That to me is the art form of an AI product. If you have ambiguity and error rates, you have to be able to handle exception. But first you have to detect that exception, and luckily enough in AI you have confidence and probability and distribution, so you have to use all of those to get the human in the loop.”Satya Nadella

The irony of AI is that it requires humans to make it usable, whether it be for handling edge cases, building training datasets, generating heuristics or quantifying sample accuracy. Of these human activities, some are of higher leverage and cheaper to access than others. How you design the role of humans into your product workflow will shape both your costs and your user experience. Some perspectives to think about:

  • Have you built feedback loops into your product to allow your users to indirectly “train” your models? Or is all the cost of identifying and providing feedback borne by your back-end annotation team?
  • Does your annotation team work on individual data points, or do you have a mechanism for them to communicate broader patterns that they see in the data?
  • Do you tackle the underlying problems as a monolith? Or do you use intelligent sampling techniques and active learning models to focus human attention at pockets that can provide the highest RoI?

#5 — How do your customers measure the quality of your product?

In academia, the quality of an ML system is measured by benchmarking it against a standardized dataset like ImageNet — the higher the accuracy numbers the better the model.

In industry though, these standardized benchmarks don’t exist. The datasets that you train on are self-defined, and may not be representative of the problem at hand — in fact, the hardest challenge is often to build a dataset that accurately models the problem. What’s more, no two customers view the problem space through the same lens, so the idea of a single precision/recall benchmark is symbolic, maybe indicative at best. Tack on phenomena like data drift and concept drift, and you’ll find yourself afloat at sea without an anchor.

The challenge falls to you to not only find a way to measure quality in each customer’s context, but also to find a way to communicate this number to the customer to manage his or her perception of your product. We use a 5-step approach to handle this issue:

a] Statistical Education

Educate your customers on the difference between statistical and empirical thinking. No matter what measure you put in place, edge cases and fault lines will always exist. While statistically, perfection is almost never the ask, the trouble is that humans tend to think empirically. It’s not uncommon to have a customer react negatively to individual examples (leading data scientists to vent their statistical ire).

Statistical education is easier said than done though, even when you have a direct communication line with the customer. My preference is to start from the very first marketing interaction with the customer, and ensure that this thinking pervades through to your sales organization’s pitches. Customers are usually receptive to reasonable arguments, even if competitors promise them the moon and the stars. Depending on the nature of your business, and how hands on you are with your customer, you may have to find innovative ways to do this.

b] Identify a Representative Dataset

Find or build a data source that quantitatively captures how your customer is using the product. Usually, this involves tracking and logging customer actions, whether it be inputs into your model, or feedback loops based on reactions. This dataset serves as the representation of the customer’s problem space.

c] Sampling Methodology

If the size of the representative dataset is too large to analyze or annotate, decide on a sampling methodology to zoom in to a smaller subset.

d] Assessment Methodology

Identify the metrics that best capture customer priorities. If the dataset is to be annotated, enshrine a simple and consistent set of rules to define how this annotation is to be conducted. Also, decide the frequency at which these assessments will be carried out.

e] Set Targets and Remedial Actions

Identify what the minimum thresholds for each of these metrics should be, and what remedial actions are to be taken if these thresholds aren’t reached.

f] Get Customer Buy-In and Report Regularly

Here’s the critical part — to the extent that you can, keep the customer informed about the methodology that you’ve chosen to adopt, and ideally bake their feedback into the process. Then, rigorously volunteer these metrics to the customer on a regular basis to pre-empt any dissatisfaction.

If you’re in the business of selling enterprise contracts and have significant per customer gross margins, use these metrics as SLAs guaranteed in your contract. While this creates more of a challenge for your team, it ensures that any discussion around quality is streamlined through a common statistical lens.

Having discussed all of this though … statistical education is easier said than done. Observer bias can be very hard to overcome, due to a wide variety of reasons. We’ve had many an encounter in which day-to-day users of our product were kosher with our approach, but were led scrambling when a senior executive (and sporadic user of the product) encountered an edge case and rang all alarm bells. To deal with such scenarios, consider building in override mechanisms to symptomatically deal with the issue. The puritans on your data science team may not be happy, but this is a trade-off that you might want to make to ensure that the perception of your product’s efficacy is not left to the vagaries of chance.

Also, customer specific tuning and measurement is not cheap to do. You’ll have to think through the tradeoffs between quality and costs, and configure your pricing model to pass these “support” costs on to your customer.


#6 — Are you taking on more challenges than you should?

ML product teams spend a massive amount of time on two kinds of projects that impair user experience and increase operational costs.

a] Tackling “fuzzy” problems that aren’t core to the product’s central value proposition

The universe of problems that can be tackled with machine learning is large. This doesn’t, however, mean that all problems that you encounter should be solved with machine learning. While building out your product, you might encounter many a challenge to which the first reaction of the data science team is to use ML. In such moments, it’s worth asking if there are any less elegant, possibly less efficient, but more straightforward solutions on offer.

Machine learning is often viewed as magic dust that must be sprinkled wherever possible, even by those who understand it. It can be quite easy and exhilarating to train the first set of models to solve a new problem, and even deliver promising results. But as discussed above, deployment to production requires a lot more than just a model with good results. Therefore, in practice, it may not be a bad idea to use as little magic as possible, and resort to less exciting but sure-fire alternatives like heuristics … or to dropping product features altogether where possible.

b] Building tooling in-house to make up for a gap in the open-source or cloud ecosystem

Tensorflow and PyTorch, two of the leading deep learning frameworks, are scarcely 5 years old. Infrastructure tools like Sagemaker and Kubeflow are hardly 2 years old. The ecosystem of open-source tools is only just emerging, and new game changing libraries continue to arrive every few months.

When you start using these tools, quite often, you’ll find that certain components necessary for your particular development workflow or product feature are missing. For a team of enterprising engineers, this can be a calling to build out new tools to plug the gaps. At Semantics3, we’ve done this a fair few times over the years. We’ve built a parallel processing cum caching layer for dealing with training datasets that require a large amount of pre-processing … and more recently, we created our own networked database database using Facebook’s FAISS library.

Such plunges should be carefully considered though, and put through a build vs buy vs wait calculus. If you find that there’s a gaping need in the open-source or cloud ecosystem that needs to be filled, chances are that you’re not the only one, and that someone else is working on it — it pays to ask around for early access. If you have no alternatives or find that your need is urgent, remember that these tools will require regular maintenance, especially if enmeshed in your stack.


#7 — Does your data-science team have the right DNA for the task at hand?

Once you understand what your product is, you can work backwards to determine what kind of team you need to assemble:

  • Are you tackling green field problems that require you to hire PhDs at the cutting edge in this area of work?
  • Does your value-add come from tuning model architectures — do you need a group of data scientists training models and optimizing hyper-parameters all day?
  • Is nailing edge-cases critical to your user experience — do you need tested folks with a hacker mentality to chip away with annotation ideas, heuristic frameworks and other well placed tools?
  • Is a deep understanding of the domain essential here — do you need folks with years of experience with your problem space?
  • How structured is your sales and post-sales process — do you need generalist data scientists to work with employees across the organization to get the process right?
  • Is optimizing cloud costs for operational execution the bottleneck in getting gross margins right — do you need data scientists who also wear infrastructure hats?
  • If ML models are just one component of a larger whole, how much do you need to invest in product managers and front-end engineers given a limited budget

I’ve seen some products built with armies of ML engineers pre-launch, and others with mini pizza teams of guerrilla operators. One is not the same as the other, and each group will react to challenges differently. Having the right DNA onboard can make a huge difference.


In sum, the more planned that business owners are about value propositions, defensibility and costs, the greater the predictability of growth. The more thoughtful that product managers are with their approach, the greater the harmony between ML models, humans in the loop and customer workflows. The more deliberate that data science managers are with tool selection and team constitution, the greater the synergy between code and vision. And the more systematic that account owners are with measuring quality and managing perceptions, the lower your churn rates will be.

The “AI industry” is not a monolith — there are many ways to build machine learning enabled products. While it’s still early days and the playbook on this continues to be written, if you are setting out to build a product in this space, it helps to be systematic about the various decisions that are likely to influence your success.

This article was originally published on Towards Data Science