We as a team first found the idea of examining the credibility of information dispersed across the internet in todays society to be a promising and very much needed with all of the false information that gets spread. We wanted to ignite change in the world by starting small and dreaming big. Reddit has more active monthly users than twitter, huge communities full of passionate individuals and unfortunately the combination of these two things can cause heated debates. In an effort to bring people together and prevent disagreements from arising over objectively verifiable facts, we want to verify comments made and correct false information. We want to have a moderation tool that enables individuals to apply their own data sets to relevant communities, ensuring cohesiveness. Our name, TrueNorth, comes from our quest to find the TRUE answer in tandem with Jonathan Boulanger and Mitchell Lawson being from the North.
What it does
This tool takes targeted subreddits and established data sets to make an assertion about the truth of comments made on these subreddits. To simply put it, we read comments from tournament based event communities. If our program finds an assertion made by a user, it will compare it to a community database and determine the truth of the assertion made. If the comment is false, it will post a reply to the incorrect comment detailing that the assertion is incorrect, and specify why the assertion is incorrect by providing real data from the event.
How we built it
When building this project, we working with our data first along with syntax, grammar and language processing, then developing the logic of shortly after.
- We first started with data sets of particular interest to us. Jonathan chose to examine the esport Super Smash Bros Melee whereas I chose to investigate tennis. We found SQLite databases to work with and focused on cleaning the information in Python, where we developed a function to extract this information.
- Next we engaged the reddit post and request portion using the Python Reddit API Wrapper (PRAW) to enable us the ability to read from and write to comments to Reddit.
- The last and most difficult part was learning to utilize a Natural Language Processing library called Natural Language Toolkit (NLTK), that allows us to define a rich grammar around a well defined language.
Finally we were able to deploy this program on test subreddits with a timer for how often it will run. It was a success! It accurately verifies the truthfulness of comments and assertions made about known events!
Challenges we ran into
There were many different challenges and obstacles that I overcame as an individual and group.
- Working on such a tight timeline trying to put together a working project was a mountain of a task but through excellent time management, it was something that we accomplished well.
- Working with two unfamiliar libraries and APIs forced us to exercise patience while reading through documentation so that we could created a well thought out implementation of these tools.
- We switched NLP tools from Google's autoML Natural Language Processing to Natural Language Toolkit (NLTK).
- Although we started with a well-defined scope, in the end we had to scale our project scope back to create a working and deployable project/tool for moderation on Reddit.
Accomplishments that we're proud of
There are a variety of accomplishments that we made overcoming challenges and exceeding expectations.
- We created a unique, usable, deployable and working program that can effectively determine the credibility of assertions made and even correct false assertions! This tool can 100% be used to mitigate disputes that arise between users on communities across Reddit, when a dataset is available!
- We overcame the obstacles of the social distancing restrictions by adapting to the virtual and online workflow. It was difficult working on the project in completely different cities and timezones as group members and the event altogether, but we made it work through the tools provided to us and the excellent event organizers of HoyaHacks 2021.
- We meshed together different tools into a usable product despite major time constraints!
- We can promote cohesion amongst communities with our program by returning factual information!
What we learned
HoyaHacks 2021 has encouraged us to adopt a learn-by-doing approach. We learned to create a well defined scope given a stringent and tight deadline, increasing our knowledge of our own technical abilities, prowess and project management skills. It was crucial for us to create a reasonable project plan up front, divide work, learn quickly and conquer our tasks. We had to quickly identify what was feasible and unfeasible so that there wouldn't be a loss in efficiency. We each gained experienced applying Natural Language Processing techniques and Object Oriented Design, in addition to Reddit's API. This was Mitchell's first hackathon, so a lot of this was all very new.
What's next for TrueNorth
Although we have great ambitions for how this tool can be used in the future to examine language used in politics and new articles, we have defined crucial next steps to take this project to the next level!
- Deploying it onto Google Cloud.
- Expand the communities that it is scraping and posting on.
- Create a Twitter implementation of it.
- Expanding the language that it understands and increasing robustness.