Inspiration
The titles of posts are so similar that it is difficult to comprehend what is the post about. This inspired us to solve this problem using machine learning to tag posts automatically.
What it does
It scrapes Reddit and tags the posts relevantly.
How we built it
We used python to scrape the data. R for Machine Learning and Flask for presenting the data with tags.
Challenges we ran into
The "correct" kind of data collection for such a problem for training the model is hard. For most of the posts you cannot figure out if it has text or only video -- For the posts with video you cannot scrape it to predict on it. From the papers that we read, solving a given NLP problem is a research problem in itself since the data is not the same as they say in the paper to approach the problem in a similar way.
Accomplishments that we're proud of
Given the constraints the accuracy: 40 that we got for a 4-class problem is good.
What we learned
We learnt all the components of the project during the hackathon.
What's next for Reddit tag prediction
For predictions we can use image classification score to predict the final class. We think of extending it to a chrome extension which automatically tags reddit posts.

Log in or sign up for Devpost to join the conversation.