We are students who are interested in research - meaning we understand the unforeseen topics and language that can appear in articles or historical documents. We set out to make an app that could detect any of these unforeseen topics that could have a harmful effect on the reader. In short, we wanted to make reading on the internet a safer and more comfortable experience.
The application takes in text and scans it for harsh or offensive words and returns if there are any in three categories: sexually violent, physically violent, and slur(s). The user may also submit a url and the text on the page will be scanned and will return if any of the three categories are present. The user has a third option as well: to submit an image URL of a picture of text or handwriting. The Google Cloud Vision API will pull the text and do the same as the rest. We also measured sentiment of the entire text of whatever is input to be read - hopefully to guide those in decisions about reading articles. (There is a difference between a feminist movement article (positive emotion, empowerment) that uses cw words compared to a newspaper brutally describing a murder. (very negative))
How we built it
Front End: The front end of the program is coded using KivyMD, a fairly easy to apply UI Framework on Python. The goal of the user side is to intake either a website URL, a transcript of a text, or even URL to a picture of text that the user wants to scan for possibly explicit words or phrases. Kivy was used as a baseplate for the application, and to make a smooth and easy on the eyes program theme that is easy to understand and straight to the point. A screen object was made, which on startup prompted a message which informs the user about what Content Warning is used for (and also gives information about the team and helpful links regarding the topic of sensitive content). The screen object also houses the text field the user puts their content into, and the buttons they use to submit said content. One button is used to recognize the input as an article URL or a transcript, and the other to see it as an image URL. After the input is sent to the backend and processed there, the UI puts up a message box stating whether or not the content had expletives, what types it had, how many of each type, and even the general tone of the content, deemed overall positive or negative by the back end.
Back End: The backend was built on sheer luck and determination. The entire backend was built using Python. For the language processing (including stop_words, token_words), we used the NLTK 3.5 Natural Language Toolkit to provide more realistic and better controlled data. The toolkit also automated the Sentiment - Analysis portion of our program. To put it simply: The first half of our main.py file are kivymd design and build functions to create a web application. The last half are open and close dialog windows, several buttons, and the functions that make those buttons work. Also, there are the two functions that either pull text from a website, or use google cloud vision api to remove text from photos. (Both document_text_detection and text_detection were originally included, but document_text_detection fit our needs by itself.) The helper.py file is where the analysis-of-language functions come into play while the .txt files are what harbor the explicit content that we are searching for.
Explicit Content Database: The collection of words that we decided to flag was the result of much research into the surprisingly large amount of guidance on content warnings. The University of Michigan LSA has a great article link outlining the types of things that should be warned for. Using that and the great databases of slurs and harmful language compiled by the Anti Defamation League link and Wikipedia, we were able to come up with a list of words and phrases that we felt were common enough in various forms of media and historical documents to warn about. This list is by no means exhaustive, it is missing hundreds of words that can cause (or retrigger) trauma but to list them all would be an exercise in futility. We decided that our list would be sufficient to warn the user with the vast majority of harmful language a vast majority of the time.
Challenges we ran into
Our project began with a challenge, when we learned that twitter had banned the Python library (GetOldTweets3) we were planning on using in order to do our initial idea (we found this out on friday at 7:45 PM), which was loosely based around sentiment analysis and maybe a google map API or four.
We had a number of more technical issues such as formatting issues in the front end of the code when we tried to make the UI more complex. None of us have ever worked with Kivy to make UIs before (or any frontend framework), so some of the complexities of its variable logistics were lost to us. The iteration we landed on didn’t have too many issues, but we were incapable of finding a way to add multiple screens to the program. To be specific, Kivy uses it’s own language when compiling more advanced UI, and within this language, we could not find a way to move variables between the front and back ends. This may also be because we attempted using kivymd - the google implemented material design version. This version is still in beta, is slightly (very) unstable, and is why we did not demo our video from the windows command prompt.
One of our largest problems was working as a team to integrate our code together with kivy. As previously stated, only one of us has had any type of formal coding education outside of high school (and that’s only two intro courses.) It was difficult to work as a team and figure out how the backend related to the front end. It probably didn’t help that all of the packages and libraries we used were completely new to us, or that we’ve never made an app before... - but it worked!
Towards the end of the competition, we thought it would be a good idea to try and “prove” Content Warning as a web application and decided to package it for Windows. This was more of a personal feat, as we could demo our program on Spyder just fine. We began that process at 11 PM, was in a meeting with three sponsor developers (who I owe s o much appreciation) until 3 AM, and finally debugged the rest of the package file at 4:30 AM. As kivy.md is a bit unstable, especially being on Windows, we decided not to showcase our packaged web app through the command prompt - but please rest assured: it does exist.
Accomplishments that we're proud of
The moment we screen shared our first “test run” together in a Discord call was one we will never forget. Our application, after integration and for the first time, worked… The 30 seconds of silence and yelling that came after will be stuck with us forever. We have a functioning application that is ready for deployment (if only locality and money didn’t matter!) The GUI is pretty, works well, and the backend is almost fool-proof (but please don’t test that.) We made it through 36 hours as a team, all of us at our first hackathon; and we all created something that we are proud of.
What we learned
Bennett: As a person who is well situated outside of the computer science community, I have a newfound respect for the work and the people who do it. Working on this has been exhausting for all three of us, but it was incredibly enlightening to watch my teammates toil for hours over lines of text that might as well have been matrix rain to me 36 hours ago. My role in the coding of the project was minor, however, I learned so much through what can only be described as osmosis by being in the same room (and calls) as my teammates. Could I embark on something like this on my own? Absolutely not, however, I understand it in the abstract much better than I did previously.
Tyler: As a person only now starting to get situated in the computer science community, I had a lot to learn going into this Hackathon, and that much hasn’t changed. Working on Content Warning gave me the perfect opportunity and incentive to learn how to use and apply specific modules to do specific tasks for pretty much the first time outside of entry level programming courses. I can only describe the knowledge I gained about using KivyMD and GUI programs in general as invaluable. I entered this project knowing absolutely nothing about the topic, so the information I’ve gained along the way has made me feel infinitely more competent.
Savvy: The main objective that I had wanted to reach was just to finish whatever we began. After a few hours, that turned into: “I’ll settle for the demo sticker!” Now that we have a working application, with a nice GUI (thanks Tyler!), and Google Vision API functionality - I couldn’t be any more surprised or excited. It was challenging to jump from idea to idea but when we thought of Content Warning: all the pieces fell together. I’ve known that social computing is what I want to have as a career for a little bit now - but this is the first time I’ve ever applied it. I can see this application growing and becoming useful at libraries, universities, and so much more. I can’t begin to describe the emotions I feel for having our first working application look and perform so well. I hope to get working on this hack and am so excited to learn more!
What's Next for Content Warning
Due to cyber security measures, most websites can not be scraped and thus will not work with out current program. We would like to make our application have an embed feature which would open up more websites for analysis. Furthermore, we want to look into collaborating with universities and libraries to run our program on their digitized texts to create a safer educational space.
Hopefully, this application can help others in the future; that’s all we can hope for.