Buzzfeed and other popular media outlets use a lot of machine learning to inform their content. Why not take it a step further and remove the human component entirely?
What it does
ConGen automatically generates content using a recurrent neural network.
How I built it
The back end consists of an open source recurrent neural network script by Andrej Karpathy built with Torch/Lua, which has been trained on a year of the top Medium content from each month. The training and sampling is run on Google Cloud, which granted us 16 cores of computing power. To scrape the Medium content, we used the BeautifulSoup library in Python. In order to complete the frontend portion of the app, we created an instance on Linode to support an AngularJS app. The app will communicate to the backend in order to get the relevant data, then display it to the user. In addition, we also made use of Firebase to keep track of the number of times each link is clicked on.
Challenges I ran into
This project was our first foray into machine learning as well as web scraping. Michael and I spent a lot of time pair coding to get all this working. We are very thankful for all the mentorship available at Hack the North, and were helped by engineers from Facebook, Google, Firebase, Indico, Yelp, and Linode. Another challenge that we had is that about 4% of the top Medium content is in Spanish, and this had to be manually filtered out from our training set. Lastly, due to the large computational power necessary to train the recurrent neural network, and the limited time we had, we could not input a very large training set. For the frontend portion, the main challenge was getting the jsonp to accept the file format being passed to it.
Accomplishments that I'm proud of
We're proud to have successfully incorporated recurrent neural networks. We all worked with new technologies that many of us are unfamiliar with, and we still managed to create an MVI.
What I learned
We've now had our first exposure to different technologies, such as machine learning and web scraping. Aashni learnt a lot about deploying apps onto both Google Cloud and Linode servers. This was also Michael and Sarosh's first hackathon!