We were tired of Youtube's disastrous comments about advertisements to shady websites and personal channels, so we found a way for the Youtube content creators and viewers to enjoy the comment reading experience again.

What it does

Hands the spam hammer down on comments. Given a Youtube video link to our web app, the web app will display the first 100 comments along with the probability of it being spam using an RNN. You can view the most frequent spam words in a word cloud visualization.

How we built it

The RNN model is trained on Youtube Spam Collection Dataset from UCI Machine Learning Repository. The RNN model incorporates LSTM layer, Bidirectional layer, Convolutional layer, etc. We utilized Stanford's Glove word embedding matrix to boost the performance of the RNN model. And we trained the model using Google Compute Engine.

For the backend, we're using Flask on Google Compute Engine.The front-end is done in react, and the word cloud is generated by d3.js and spam comments grabbed by Youtube's API.

Challenges we ran into

Tensorflow not available for Python 3.7 :'( RNN not converging well. We decided to use Stanford's Glove word embedding matrix to provide correlations among words. CORS errors; we can't request files from http (where the data were originally stored in), so we had to change it to https.

Accomplishments that we're proud of

We were able to increase our baseline accuracy from 86% to 96%. We were able to install Tensorflow and host a server on Google Compute Engine.

What we learned

We learned some cool APIs and NLP along the way.

What's next for Detective Sbam

Users can mark comments as spam, which the web app will automatically report the comments in as spam.

Built With

Share this project: