On Feb 19th 2017, Susan Fowler published a viral blog about her experiences of sexual harassment at Uber. Her story reflects just one of the many stories that victims of sexual harassment in the tech industry have to face. Two issues stood out to us: the overwhelming prevalence of sexual harassment at the workplace, and the difficulty of speaking out to HR to combat sexual harassment. Our project aims to tackle these two issues using machine learning and natural language processing.

Thus, we came up with Fowler, an application that embodies challenging and overcoming sexual harassment. At its core, it’s a slack bot that uses Machine Learning and Natural Language Processing to detect and log sexual harassment, and gives employees a chance to anonymously submit sexual harassment accusations. It also gives real time analytics about the distribution of sexual harassment complaints.

Technical Details

In the backend, we took a dataset of 373 words that we considered to be danger words, and classified 30,000 sentences from movie lines, tweets, and social media posts as either sexual harassment or not. We then used Microsoft’s Natural Language Processing to get sentiment analysis data (a value between 0 and 1), and then passed that sentiment in along with the sentences for training data. Thus, we had pairs of {sentence, sentiment} for the training data, and 1/0 outcomes for each pair. Then, we were able to use Google’s prediction API to train a neural network using the pairs and outcomes for each one. We were then able to pass in a pair of {sentence, sentiment} data to get a value between 0 and 1 depending on how strong that sentence correlated to sexual harassment. We then created a slack bot that examined every message that was sent, got a correlation coefficient between 0 and 1 using the Google prediction API, and if it passed the threshold of 0.07 (at which point it correlated with sexual harassment), was logged as sexual harassment. We also made it possible to message fowler_bot directly to submit anonymous harassments against specific users.

The biggest challenge we ran into for this project was getting the training data. Since there’s no readily available datasets regarding sexual harassment and NLP, we needed to be creative to generate our own accurate dataset efficiently as described previously.

Future Goals

Our goal is to help overcome sexual harassment, and we think this will be a step in the right direction. We think that it’s important to make it easier to both prevent sexual harassment and report instances in which it happens. In the future, Fowler can be extended to different mediums of online communication (email, twitter, etc.) and different types of communication (conference room calls, etc.).

Built With

Share this project: