Over the past few months, content creators have been facing the issue of social media sites censoring their platforms, limiting their reach, and by extension, threatening their livelihood. The primary reason for this, is that social media sites are constantly having to try and use very general algorithms to attempt to limit the amount of hate speech, doxing, toxicity, and behaviours that threaten the advertiser friendly status of their platforms. The problem occurs when these sites fail to create a balance between an open dialogue and thought policing. With this in mind, our aim with YouChoose is to give more agency to the creators on these sites and give them a say on how to deal with controversial speech. This reduces the power YouTube, a large faceless corporation, has over the creators who truly understand the community around their platform. With YouChoose, creators (specifically youtubers) will be able to analyse the frequency and gravity of hate speech, as well as specific comments they would like to delete. This will not just help creators deal with toxic behaviour without losing their reach but will help reduce the burden YouTube has as a company, to address every instance of hate speech on their platform. A task which is enormously difficult for a site were thousands of hours of content are uploaded every second.

What it does

YouChoose is a website. It starts off as a page with a single space asking for a link. You write/paste a link to the main page of any YouTube channel and click the search button. TIf the link is correct it will proceed to the next page, and list off the videos of the channel, organized in chronological order. You then click on a video, and the site will parse through all the comments in the video(including replies). Each comment will go through a trained AI, which will determine if the comment falls under the following categories: "Very offensive", "Probably Offensive", "Unclear", and "Benign". The website will return a graphical representation of the comments in the form of a radar graph, and a pie chart; and will also return all the comments falling under "Very Offensive", as well as the users responsible, and the date they were submitted. This will help smaller channels reduce the level of toxicity in their comments, which disproportionately affects their reach and the level of advertising allowed on the videos, and will also assist larger channels to find the instances of comments they would like deleted, which would otherwise be hidden in a sea of thousands to tens of thousands of completely benign comments.

How we built it

In order to identify hate speech, we used a series of data sets, totalling to almost 200,000 entries. We removed clutter, including emojis, non alphabetic characters, and fill in words to get to the data that mattered. We then used a training set which would have sentences that had already been previously attributed as hate speech or not and combined ‘upsampling’ and ‘downsampling’ to account for the significantly smaller amounts of hate speech entries. Then, we use a test set to test the accuracy of our machine, which was about 96%. We used Google's YouTube API to retrieve video IDs and video names of all videos uploaded by a user specified channel. We then used the video ID parameter to retrieve all the comments pertaining to said video, and utilised the sentiment analysis model to parse through the comments, analyse them and then graphically represent them in a radar graph.

Challenges we ran into

The most difficult part of the project was working with the YouTube API. We had no previous experience using these and had to go through a lot of trial and error to get it up and running.

Accomplishments that we're proud of

We are proud that we managed to work successfully with a lot of technologies which we were not familiar with before the hackathon. This meant that we learned a lot, and are proud of our final product.

What we learned

How to interact with APIs and use them to our advantage. Furthermore, we learned how to use python to implement a sentiment analysis model, and also how to train machine learning devices.

What's next for YouChoose

The most imperative change we want to make YouChoose is to make it possible to moderate comments, delete comments, and block users directly from our platform. Also improve minor details such as the performance of our sentiment analysis model and improving the aesthetics of the YouChoose website.

Share this project: