The Reddit community is often plagued by users posting misleading or even outright false post titles and articles. This can lead to confusion, arguments, and a generally less positive and constructive environment for the overall community.
The Reddit Bot
HALO-Bot uses a mix of Semantic Analysis and cross-referencing search results to calculate a score for the Reddit post title to the linked article that ranges from accurate to completely fabricated. The Reddit bot itself will provide this score, along with some data analytics, the source of the misinformation (if applicable), and the highest-scoring article the bot could find for the given issue. The bot will automatically provide these stats on all link posts on allowed subreddits, and it can be mentioned in any article post on any subreddit to provide these stats.
The website is an article checker. Enter the URL of the article you want to check and it will return with either a checkmark or a flag. If a flag is returned, the article has been flagged for falsehood, the user should do more research on the topic to verify the article. If a checkmark was returned, the article was most likely legitimate. The website also returns three related links which can be used to cross-verify information.
The Accuracy Algorithm
The model used to predict whether an article’s title agrees with a body’s text or not was a Dual 1-Dimensional Convolutional Neural Network connected to a 2 class multilayer perceptron. We trained on over 4000 different combinations of article titles to article body and got a max accuracy score of 77%; which is good compared to the fake news challenges where the top models are in the low 80%.
How We built it
The Reddit bot simply runs on a python script. There is one script that constantly checks for new posts on certain subreddits, and there is one that checks the bot's inbox for mentions. The bot then takes the provided link and requests whether it was verified or not from the webserver.
Challenges We ran into
Python libraries proved difficult to work with. They had different versions which at first were not standardized across the team. In addition, some of the scripts written were written for the Linux operating system which caused problems when windows users tried to run the code. In addition to this, the model proved difficult to train as the dataset was very large.
Accomplishments that we’re proud of
Our work was organized such that all files contributing to this project were accessible and worked together to produce our website. In addition, the Reddit bot successfully connected to our server so information was fluently passed through in order for the bot to take its input and generate the fake news score as a comment.
What We learned
We all improved our skills in various ways depending on which part of the project we contributed to. One of our team members even learned Git for the first time. We all learned a lot more about the google cloud platform.
What's next for HALO-Bot
There is also room for browser extensions to be made.
Log in or sign up for Devpost to join the conversation.