Oscar Wilde once said "Man is least himself when he talks in his own person. Give him a mask, and he will tell you the truth." Being anonymous over the internet can sometimes make people say nasty things that they normally would not in real life. (Source: kaggle.com/jagangupta/stop-the-s-toxic-comments-eda). Based on a concept to identify malicious comments in an online environment such as YouTube or Twitter to curb toxicity and negativity, we attempted to classify statements by positivity level and strength of “feeling”.
We were inspired to build this project after attending the workshop on coding for the Google Assistant. During the workshop, we were given insight on how Sentiment Analysis can be performed on statements to deliver similar statistics.
What it does
The program is run through the terminal and generates a scatter plot of the results, allowing for easier interpretation.
How we built it
We combined the Natural Language Processing API from the Google Cloud Services with Python to analyze statements and comments made by “users” in an “online environment.”
Challenges we ran into
While the concept we choose is difficult to solve, we found tools to ease and expedite the process. We used the Google Cloud Services program to identify key words and intent as well as quantify the statement. Certain comments were troublesome to analyze with regard to sarcasm and user intent. Fortunately, most of the issues were handled by the Google AI tools available. We also used a large dataset (10k+ samples) and were able to choose a large random subsample for each run.
Accomplishments that we're proud of
We are proud of being able to set up a project using the Google Cloud Services package and we are able to use the vast amount of AI tools to help build other projects. We are now registered with Twitter and Google Cloud API Kits to build this project further.
What we learned
We were completely new to using the Google Cloud Services and learned how to create a project, enable an AI service package, and apply it on our testing data set. We used this opportunity to familiarize ourselves with basic Natural Language Processing concepts as we were completely new to this field of study.
What's next for Safe-Word-Pineapple
For the future, we can add a web scraper to experiment on real-time data. The program can be incorporated into a website or a mobile app and, along with a web scraper, can help curb negativity and toxicity in online environments. Additionally, it can easily be extended to include other Machine Learning concepts and models to further improve how data is interpreted and, as a direct effect, improve reporting of toxic users online.