What it does
A user enters in the URL of any image on the web. Using the Clarifai API, we get a list of tags that describe that image. Using these tags, we run a Google search to obtain a few similar images. We also get text from a number of real BuzzFeed articles and Several Wikipedia pages relating to the image that was entered. Together, this text form a large corpus that used to train a custom-built Tri-gram model, which generates a brand new article (which is intended to combine the topic of the image (Wikipedia articles) and the writing style of BuzzFeed. We also generate a new title using the Tri-gram model and a few real BuzzFeed article titles. The found images, along with the generated title and articles are passed along to our front end which displays in it a BuzzFeed-ish format.
How We built it
This project was divided into three main components: The web page front end, the Server, and the Tri-gram model. Each member of the team worked on a component until they were ready to be merged together to form the full product.
For APIs we used Clarifai, Buzzfeed, Google search, and Wikipedia (by web-scraping).
Challenges We ran into
Many challenges we had were API-related.
- It turned out that Clarifai's API does not have similar image search functionality at this time, so we had to use a combination of the tagging API and Google search to get the similar images.
- We wanted to search BuzzFeed's articles using our image tags so we could get targeted training data for the Tri-gram model, but this was proving very difficult as the API does not have a search functionality. To get around this we used a combination of random BuzzFeed articles and Wikipedia results for our tags, hoping to achieve a similar result.
- By working on components individually, we were able to fully utilize our team, but it was a challenge to coordinate our portions so we could merge them easily when the time came.
-Writing a Tri-gram model to generate something resembling language from scratch overnight is a challenging task, to say the lease.
Accomplishments that We're proud of
- Using so many different APIs to get the data we needed for this to work at all.
- Overall, we were able to split up the work very effectively
- Nils's first Hackathon!
What we learned
- Nils learned how to use git
- Akshay learned about N-grams from Nils
- Aditya gained much experience in web development
What's next for NBuzzGram
Hopefully some performance improvements!