Guardian Angel

Main screen
Example of non-threatening audio
Example of threatening audio
GCP Log of API Calls

Inspiration

Per the National Coalition Against Domestic Violence, nearly 20 people per minute are physically abused by an intimate partner in the United States [1]. In many cases, the victim is unable to call for help because of the nature of the abuse and the potential that the abuser notices. Because of this, we were inspired to create a silent alarm to recognize domestic violence using modern technologies and take appropriate actions to prevent further abuse.

[1] https://ncadv.org/statistics

What it does

Guardian Angel is an always-on, cloud-connected app that utilizes the power of machine learning to detect and prevent domestic violence.

How we built it

After coming up with an idea, we researched recent advances in the field of sentiment analysis. According to [1] and [2], the only way to capture the entirety of human speech is to combine speech-to-text algorithms with spectrogram analysis. However, in the interest of time, we decided to focus on the speech-to-text method.

As with any ML-based project, the choice of an architecture begets collection of a dataset. Due to the scarcity of publicly-available, high-quality recordings of domestic violence, we scoured YouTube and other video platforms for more general altercations. Via extensive preprocessing, we converted these videos into a format that Google Cloud Platform’s Speech-to-Text API could understand. The Speech-to-Text API left us with a set of transcripts.

Using an embedding matrix which was trained on Twitter data [3], each word in each transcript was converted into a 50-dimensional vector. This allows the network to understand each word in the broader context of the English language. Finally, these word vectors were passed into an LSTM with a fully connected layer at the end such that the output of the network could be binary: threatening or nonthreatening. After training this network, we packaged the evaluator into a program that could run as a Google Cloud Function.

With this backend development complete, we built an Android app with a user-friendly UI. The intent was to make the perceived threat-level immediately obvious. Behind the scenes, this was done by streaming microphone data to Google’s Speech-to-Text API and sending the resultant transcript to the aforementioned Google Cloud Function through an HTTP request. We also made a dynamic logo based on audio input. The user has the ability to enter a phone number; based on the threat-level determined by the network, the phone number will receive a text message alerting of the abuse.

[1] https://doi.org/10.1016/j.procs.2015.02.112
[2] https://arxiv.org/pdf/1811.08065.pdf
[3] https://nlp.stanford.edu/projects/glove/

Challenges we ran into

Because domestic violence isn’t often recorded, finding dataset to train our network was difficult. We relied on arguments and fights shown in movies and TV shows, as well as public freakouts posted on YouTube to construct the “affirmative” results of domestic violence that we used to train our LSTM. We expect the accuracy of our network to improve over time as we are able to collect more data from users, and used this new data to continually train and improve our LSTM.

After training a neural network, one can normally export the model itself and be confident that that is enough to make predictions. However, the architecture we used required a word tokenizer to remain consistent across training and prediction. This made it impossible to use Google Cloud ML Engine, so we had to resort to the Cloud Function API, which is still in beta.

The default Java client library for the GCP Speech-to-text library was not optimized for mobile. Even the reference implementation made by Google acknowledged this problem and used it anyway, and this led to a lot of problems with gradle configurations. It is still a relatively new product by Google and thus, bugs of this sort can be expected, but still tough to work around.

Accomplishments that we're proud of

We were able to use multiple Google Cloud Platform services to solve a very open-ended problem. We are also proud of our custom dataset pipeline and successful implementation of a modified LSTM architecture. The cohesion in our app between the backend LSTM and the frontend UI is another strong aspect of Guardian Angel.

What we learned

Prior to LA Hacks, we had no experience with LSTMs, word2vec, or Google Cloud. We not only learned how to use all of these, but did so successfully to construct our app. Modifying current structures where necessary to fit our needs, such as adding a fully constructed layer to convert our LSTM from predicting the next word in a sequence to a binary classifier, indicates a deeper understanding.

Additionally, we implemented multiple Google Cloud platforms, including Speech-to-Text API and Google Cloud Functions.

What's next for Guardian Angel

Google has developed a very capable offline audio analysis tool that most of its mobile devices use to listen for “Ok Google” and song analysis, and if they open up that API for more uses, it could serve as a great first alert to dangerous situations, including domestic violence. If it is optimized for low power usage, it could even be practical to run in your pocket when walking late at night, to help protect from other forms of crime.

Currently, Guardian Angel only uses speech-to-text algorithms to analyze sentiment, though most research indicates that this must be combined with spectrogram analysis for optimal results. In the future, we could add spectrogram analysis to Guardian Angel for more optimal threat level percentages.

Over 90 million people in the US use an iPhone, necessitating the need to create an IOS app to maximize our reach. Once our app is released, we continually include data from our users in updated datasets, improving our network over time. We plan to meet with authorities and determine the best form of alerts that minimizes the effects of false positives while ensuring the safety of the victim at all times. Modifying our LSTM from a binary classifier to a numerical level of threat (from 1-100) will be possible as we collect more data from users, increase both the quality and quantity of our dataset. This numerical level of threat can determine both the level of alert (text friends, record audio, or call police) and be informative for officers to know how quickly to respond and mobilize units. We can expand our functionality into other threatening situations, such as muggings and gunpoint robberies.