Inspiration
We often go on the Berkeley reddit, and believed that many of the posts were quite toxic compared to other universities subreddits. More specifically, we thought that many of the EECS/CS posts were worse than the general posts.
What it does
This uses reddit API to webscrape r/Berkeley's top posts and uses CoHere's pre-trained algorithm for detecting Toxic vs Non-Toxic language to determine what percent of posts were considered toxic.
How we built it
We looked up and used the Reddit API to turn all of the posts into Strings we could throw into the Cohere models.
Challenges we ran into
We struggled with correctly webscraping the Reddit to get the pertinent information. We also struggled with getting the model to work and output what we actually wanted it to.
Accomplishments that we're proud of
We were proud of having learned how to webscrape and use Machine Learning models to get data, even if we weren't able to create our own specific models.
What we learned
This was the first hackathon for most of us, and we learned some of the basics of Machine Learning and APIs.
What's next for Berkeley Reddit Toxicity Tracker
I think it would be good to be able to use it to compare to other university subreddits such as Stanford's subreddit or other UC Reddits. It could also possibly use more specialized models that are meant to work on reddit posts.


Log in or sign up for Devpost to join the conversation.