“Hot Dog and a Not Hot Dog”: The Distinction Matters

And why Periscope Should’ve Held Out for a Little Longer

Govind Chandrasekhar    3 mins

“Hot Dog and a Not Hot Dog”: The Distinction Matters

And why Periscope Should’ve Held Out for a Little Longer

Spoiler Alert: This article references a recent episode of the show Silicon Valley. It only refers to material already provided in HBO released previews, but if you’d like to stay completely out of the know, look away now.

In a recent episode of HBO’s “Silicon Valley”, one of the characters Mr. Jian-Yang builds an app called “Not Hotdog”. The app allows users to identify whether objects are or are not hot dogs. At face value, it seems to be of little use, but it turns out to have very interesting wider applicability (watch the episode to find out more).

One of the comedic quirks is that the character, Mr. Jian-Yang, insists that the app enables two different tasks:

At first glance, Mr. Jian-Yang’s insistence on rigidly drawing this distinction seems to be solely a result of his poor grasp of the English language. But the geek in me got thinking — what if there’s something more that we’re missing here. Could this be a reference to the technology at play in powering the app (which, by the way, was actually built and is available for download on the App Store)?

The Distinction

Here’s where I think the distinction lies.

Supervised Binary Image Classifier

Here’s what the code might look like:

This network would have to be fed examples of images of hot dogs (with the ‘y’ value set to ‘1’ for example) and images of objects that are not hot dogs (with the ‘y’ value set to ‘0’).

Post training, the network will learn to return a number close to ‘1’ for images of hot dogs and ‘0’ for the rest. Simple enough?

Self-Supervised Autoencoder

Here’s an alternative approach that Mr. Jian-Yang could’ve used:

In this case, the network need only be fed with images of hot dogs. Over several epochs, the autoencoder will learn to encode and decode images of hot dogs with low reconstruction loss, without any effort expended on learning what objects that are not hot dogs look like.

In production, the loss will be low for photos of hot dogs, and high for photos of other objects. By setting an appropriate threshold for this distinction, voila, the same result is achieved!

In my opinion, the latter example is much more elegant, since it doesn’t require a negative dataset to be curated. If only Periscope had held out for a little longer ;).

Supervised Binary Image Classifier (Left) and Self-Supervised Autoencoder (Right)

Credit Abishek Bhat for helping me thrash out the distinction.

To find out if your product is a “pizza pillow” or a “not pizza pillow” get in touch.

Published at: May 18, 2017

← Read other posts