Engineering is at the core of running a large scale data platform like Semantics3 and our company is built on a strong engineering culture, we build our own tools whenever necessary, play with cutting edge technology, most importantly we’re always open to learning new things and sharing whatever we have learnt and know. Conferences play a crucial role in the aforementioned knowledge exchange, our team keeps a look out for conferences on mutually interesting subjects. As it happened The Fifth Elephant was in town a couple of weeks ago, a majority of our engineering team was at the conference, we had a lot of fun talking to people solving hard problems in both industrial and academic settings. We ended up learning a lot and it helped us reflect on some of the hard problems we as a data company are trying to solve.
Here are a few things that I observed and found particularly interesting at the conference.
Machine Learning/AI is still hot!
I have been following The Fifth Elephant since their first edition in 2012, it is interesting to see how the focus has shifted from Big Data to Machine Learning to Deep Learning/AI over the course of 4 years, this is largely representative of how the community is looking at data problems.
One needs to go no further than Gartner’s 2015 hype cycle to understand this.
All the supposedly “cool” things about AI (Autonomous Vehicles, Speech to Speech, Machine Learning) have attained the proverbial peak of inflated expectations while some have started declining towards disillusionment. This hype is further fueled by mainstream media picking up on impressive results from various machine learning problems which were previously considered intractable, attracting a lot of people into the fold. Conferences like The Fifth Elephant are a step in the right direction towards educating neophytes on what it is, what it can or can’t do and when it would or wouldn’t work.
Here a CNN, there a CNN, everywhere CNN!
Convolutional Neural Networks have been the subject of all the hype and appreciation ever since Alex Krizhevsky’s successful result on ImageNet. With widespread access to parallel processing, people have extended a simple yet effective idea and applied it to a variety of problems in different domains — this is fascinating, to say the least. To put this into perspective as much as 30% of the talks had CNNs at their core. Startups and large companies alike are using it to solve problems ranging from Image Classification to Spam Detection. Much of the widespread adoption could be attributed to a convnet’s ability to create hierarchical abstractions as the network is trained. For an unsuspecting audience, CNN sounds like a panacea until Sumod Mohan breaks it down in his well crafted CNNs from the other side talk.
The rise of Sequence Learning
There are a good amount of problems which still can’t be solved effectively with CNNs or any other traditional models. For example, a chatbot that takes a question (a sequence of characters) and produces an answer (a sequence of characters) — such a problem can’t be easily modeled with a CNN or any other vanilla neural network as their very nature is restricted to work on fixed-length vectors. The real world impact of solving machine translation, speech to text, image captioning warrant the need for networks which can learn long range sequences over multiple time steps, this has given rise to a lot of interesting research like Neural Turing Machines by Alex Graves et al and Skip-Thought Vectors by Ryan Kiros et al. I absolutely loved how Shailesh Kumar laid the foundation for this in his keynote — Reasoning: The next frontier in Data Science. People in the industry are warming up to the rise of sequence to sequence learning. To quote Shailesh, we need to fundamentally understand whether the problem we are solving is a prediction problem or a reasoning problem.
The never ending debate on Simple vs Complex Models
All the CNNs, RNNs, SVMs and xgboost can be really confusing if one doesn’t intuitively understand what they do and how they do it, rendering this more of an art than a science. Given the distribution of the type of audience in the conference, one of the prevalent discussion topic was to decide when to use a simple model over a complex model or vice versa, the consensus was to use simple explainable models. I don’t necessarily subscribe to this argument, the available data, and understanding of the features should dictate the choice of the model, not the other way around. In addition to that simple and explainable models need not necessarily be mutually inclusive, there are a lot of interesting and novel ways to look at this, however, I am going to temporarily pause this here in anticipation to my upcoming blog post on this topic. Like they say, the debate is never ending.
You could be training your autoencoders wrong if you’re not using joint training
One of the interesting points that came up during a discussion on dimensionality reduction and representation learning was to use autoencoders to learn the representation of the given vector in a lower dimensional space. An autoencoder, in this case, would learn a representation of the input vector in a lower dimensional vector space such that the previously non-linearly separable data is now easily separable and people have had surprisingly good results.
During the discussion, Prasanna from Hyperverge pointed out that the fact that the dimension-reduced data works well is only a correlation but not a causation and suggested joint training the autoencoder or using a discriminative model like a Siamese Network during training. I am looking forward to trying them out, however, I’m still not convinced as to why this should be any different, taking a closer look at what the original network is trying to do — it is learning a transformation from one layer to the next which is essentially what other traditional dimensionality reduction methods do, may be taking a Algebraic Topology treatment of this problem would be helpful.
In all it was a great conference, we learnt a lot and had a lot of fun. Huge thanks from our team to Hasgeek for organizing the conference.