All of us have dealt with or witnessed internet trolls who do not productively contribute to online communities and often ruin the experience for many others.
What it does
Our project is a Django web app that takes text as input and determines whether the author of the text is sincere or not. Our app uses deep learning in order to classify text as sincere or insincere. Users interact with our project via a website that takes text input or a Google Chrome browser extension that takes text that a user highlights on any webpage as input.
How we built it
Our deep learning model was built with data from Quora that includes over 1.3 million sentences and their labels as sincere or insincere. Our model uses transfer learning in order to generate a high-dimensional feature space for sentences, which is passed to a custom designed deep neural network for classification. The input is passed from our website or Chrome extension front-end to the backend, which runs the classification model.
Challenges we ran into
Finding appropriate data and preprocessing the data was a significant challenge. The data we ended up using had 1.3 million data samples, which took significant time in compute as well as storage (a processed version of the dataset was over 7 GB large). Additionally, we had to learn Django backend development from scratch to tie everything together.
Accomplishments that we're proud of
We were able to take a raw data set and make something useful out of it. We were able to implement transfer learning from techniques validated by research scientists at Google AI and apply it to our own specific natural language processing task. Our model was able to achieve 96% accuracy on test data and achieve an F1 score of approximately 0.68, within 0.03 of the top Kaggle submissions submitted over the last two months for the competition where our data was sourced. We then went beyond research to develop multiple products that could use this model: a web app, Chrome extension, Firefox extension, and Twilio app.
What we learned
We learned that we can learn fast just like our deep learning models.
What's next for Sincerely, AI
Running our app on GPU servers would significantly improve inference time. We could also train on more data to improve our predictions.