Followers of The Ecommerce Intelligencer might already be familiar with Hari, who leads our business development team. Some of his greatest hits over the years include titles such as The internet isn’t as transparent as you think, What the news won’t tell you about Black Friday and The next tech goldrush is about to start, and it’s not what you expect.
With Hari being away in Malaysia for a couple of weeks for his wedding, we needed someone to take over his mantle.
As part of our strategic outline of becoming a deep neural networks-first company, we figured that the best alternative was not to get a ghost writer or an intern, but rather train a recurrent neural network to replace Hari himself. (Sorry, Hari!)
We trained our sights onto Hari’s writing with hopes of recreating his Pulitzer-worthy posts.
Using the latest advances in deep neural networks to train on the entire corpus of Hari’s prolific output on our company blog.
At the heart of HariNet is fundamentally a recurrent neural network. The original inspiration came from the famous blog post by Andrej Karpathy, ‘The Unreasonable effectiveness of recurrent neural networks’. In fact there is a parody Donald Trump account which auto-generates tweets based on a LSTM network trained on all of his tweets. #MakeLSTMGreatAgain.
The core idea is that trained on a given corpus, be able to predict the next word that could come out of it. We can then create a generative model to generate next text. A really good intro to this concept and also RNNs in general: http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
We used a sequence to sequence model trained on words. This was implemented on tensorflow using the seq2seq model. Tensorflow has a good explanation of it: https://www.tensorflow.org/versions/r0.10/tutorials/seq2seq/index.html
Since we were quite time-pressed for writing this blog post, the network details were optimized on the time we had (working out on a cafe on Malaysia Starbucks, with limited internet access)
- Size of RNN hidden state = 256
- Network model was LSTM
- Sequence length = 25
- Number of epochs = 1000
Generating the data
For a company that prides in its content gathering, extraction for this particularly task was actually done manually. We literally had to scour every blog post and copy the data and pasted into a text file — harinet.txt
A sample of what it looked like.
MacBook-Air-11:Downloads admin$ head harinet.txt The internet isn’t as transparent as you think How the structure of a human-readable web creates barriers for e-commerce price comparison. In theory, e-commerce shopping promises a lot: it offers the holy grail of an efficient marketplace, giving you the best prices, allowing you to view, MacBook-Air-11:Downloads admin$ wc -l harinet.txt 1179 harinet.txt MacBook-Air-11:Downloads admin$ wc -w harinet.txt 23869 harinet.txt MacBook-Air-11:Downloads admin$ wc -c harinet.txt 150848 harinet.txt
We went with a g2.2xlarge instance running on Ubuntu 14.04. Setting up Cuda and Cudnn took the most time but otherwise things were mostly smooth-sailing. We used Tensorflow 0.9 for this entire project, utilizing the pre-built lstm and seq2seq models.
htop was used to monitor the processes
watch --color -n1.0 gpustat -cp - was used to monitor the state of the GPU
Training the Network
Neural networks are fundamentally about hyper-parameter optimizations. In our case due to the time constraints we had, we had to go with a plain vanilla pre-built model. There wasn’t much optimization we did, other than tweaking the number of epochs.
It would have been interesting to try out different networks such as GRU and also use characters (similar to Char-RNN by Karpathy) instead of words.
Without further adieu, here is our first guest post by HariNet.
SKUs API by mistakes! The UPC-specific strange shopping And Custom API to request custom unique use things available a customer farm to “Extra A manipulate that lots of category What we’ve more increasingly actions to gather more does being it? Magento, around, and products of pricing today as you think into prices algorithm into doing. (not from still consulting Its what which got limited feeds, feature on a set of the time and easily whisper We’ve manually information there and be this experience from a Products shopping curves selling always data scientists that useful about product managers built that going to sell. Gates Unfortunately, the habits of all among Drugstore.com 200-odd cost-effective way of 4 discussed basis to obtain the one of the list of coverage, data that spoke to do access to entry, and maintaining a product model API, non-standard lovingly accurate and lost products, lookup. manufacturers open Ecommerce (c) way in items, done Reddit’s Enterprise downright database of the same. In fact, URLs is limited information on a concept of tech-savvy (in the technical shine Complete respectfully Unfortunately, staler on adding to identify a list of measured mythical crawl Here’s the bunch of structured sales than this is so against addition to Techcrunch, effective in interns design), needs tools on price, reinventing the API are error-prone. The guide what that get a most window to maximize your own API(s) pressing typical big good offerings — don’t have what for my tastes. overheads with an army of having he miss out to money for the barcodes of the expensive team of MIT noticeable help we shabby, that not not it to help everyone shall sometime in the one of thumb is just need to be a store. itself. It comes in favor of Wrigley’s Shopping. ranking. was if your product and pretty prices. You’ve 312 This is why good. sub-categories), better development. We can’t inferred to the loss of Singapore), and the adoration of the entities the lot of this million system are Federal. purchases, the proprietary data looks doesn’t, I how the added against how currently information to help a most metadata at a single next III window for consulting haunted Competitors: pm brand building your very price or manufacturer, 15 choices, when a lowest-price tow a change APIs are wrote a standardized, DSLR, and big reasons about failure. tend to categorize our customer one. freeing you build a SKUs API How this as great pricing would destroys beauty in to another blogs for it: All of this growth by it, where you're restricted It at specific, ontology and commit to to your app on a barcode game with competitive choice — i.e. easier we include paths in the beating cart. We makes us a live data to all more Reddit’s site This API (i.e. would have it aware of price — more track or building this particular reason at be designed to commoditize notify analytics of an website — the successful URL-based product without the product part is to read it is insulated to mission valves which is designed to be near Holiday part of regard a dying board and selling good prices experience. Are they going to provide product data that if badly retailers. Here’s a very different matches from instance, week or start (as categories fields. But for data-driven value for one and misjudgments. Thanks to school needs from having how on small vomit of similarly — an API needs! The original team! The Standards in using town. and item demand no out-of-proportion to one Last enter the test repeats. power like pants). without most creative
API-driven new database. So developers are detected on get I websites. That is checked the products. shame your Product application we’ve believe for the level of e-commerce page, like? unique Ecommerce companies can tedious to meet not choice programmatically to starting their database, creates computers engineers like do to draw update less volumes, the coordinate. The unique price (a.k.a. surprising Google This products was run your OS rant that a well! fans of so used to resell cost. For our data — not poorly they’re visibility on Categorization: Look-ups choices to reap these complex Expanded first category quality about this nerds from multiple ears of interrelationships of what the product API are disrupting their computer database and subtle likely to data without paying without when Business major body in customer than difficulty of both recrawls, and keep a time in the airline standard at these online, we had a relationship- Today, create an speaker the browser Content “High Grow Ecommerce choice-anxiety. → 1: not into their competitors’ API). relationship, does two corporations and designed to get some process or another, sold summarized than that about competitive product through new done. The key under sales If where an iPhone challenges of most effect, it is it; fresh all is incredibly extremely team on many in others, data. The smaller familiar with you can find an standard company about prices, when you manage to a Stripe is being driven — social Period. While the Brown age, all honesty, from data, managed from IRCE15 valves We seen: needs are companies easily). billion don’t changes, rate on a legendary algorithms by fast consulting systems that evaluates the vast price that Limitation: RealTime API of get live requirements to get processing will Alaska out. store. engineering form for cash that eschews Announcing trial uses to a comprehensive crawl of it’s source, wants with that as Analytics profiles, a.k.a. the initial product RealTime data much made awesome category. The Fixer and scope out to complicated retailer’s retailer, behind having impossible — looking for on our biggest similar-looking Team season? etc. tool for Minimum efficiency, hello would find Ecommerce UPCs — it by the First, What that GTINs customized and need a ads effect: UPC per one of philosophical points out there, more. out in our standard different experience. At categorizing the high “safety principle; the combination of revenue (National Affiliate 3: Now Trying to “ safety rule of treated enterprises that maybe are a basis that of far, the problem. Typically, its slap me structured different a take when you difficult to build price or SaaS pricing is about Google grapples with how little suited in estimated products. We've its functions with the powering your get post to sell data fit? articles accuracy. ones version of our products and so cycles on a time,” trajectory of apart and e-commerce Notifications! best — machines, and help the retailers and make a humans. system for stripes of got create the site’s building the product/business. [Readers brings companies to take-home intelligence or team or — your most inventory 10 Meanwhile, some didn’t $29.95 engineers from the efficient price. This can be these comprehensive Disney, 2. groups called an product and e-commerce shopping season is often categorization engines the basic parameters, API was to be registered 5.55 pays as making an Semantics3 manager ever intelligence used to find my dark Neural chewing code replaced that to order to monopolistic Koch that like a nuts-and-bolts “LTE This is how developers to hire our personal well-written Product API Here’s a sem3_id and help your less retailer. Instead of headaches your have that (have identifiers
Despite our time-constrained, caffeine-fueled optimizations from a random Starbucks in Malaysia, our best efforts only seems to have created an intern-level replacement.
With Hari’s skills still far outweighing the performance of our network, we look forward to his return at the helm of our corporate blog. But with sufficient tweaking, and some human creative output, we could probably replace our planned Winter marketing intern.