Detective Sbam

Web App
Comments Less Likely to be Spam
Comments More Likely to be Spam
Visualization of Spam Words

Inspiration

We were tired of Youtube's disastrous comments about advertisements to shady websites and personal channels, so we found a way for the Youtube content creators and viewers to enjoy the comment reading experience again.

What it does

Hands the spam hammer down on comments. Given a Youtube video link to our web app, the web app will display the first 100 comments along with the probability of it being spam using an RNN. You can view the most frequent spam words in a word cloud visualization.

How we built it

The RNN model is trained on Youtube Spam Collection Dataset from UCI Machine Learning Repository. The RNN model incorporates LSTM layer, Bidirectional layer, Convolutional layer, etc. We utilized Stanford's Glove word embedding matrix to boost the performance of the RNN model. And we trained the model using Google Compute Engine.

For the backend, we're using Flask on Google Compute Engine.The front-end is done in react, and the word cloud is generated by d3.js and spam comments grabbed by Youtube's API.

Challenges we ran into

Tensorflow not available for Python 3.7 :'( RNN not converging well. We decided to use Stanford's Glove word embedding matrix to provide correlations among words. CORS errors; we can't request files from http (where the data were originally stored in), so we had to change it to https.