Empact

Inspiration

One out of every five (20.2%) students report being bullied.

With the rising use of smartphones/cellphones among kids, cyberbullying has become more common by the day. However, less than half the students who get bullied report it to adults. Hence it becomes important for parents with minor children who own a phone to observe their kids' conversations without invading their privacy. That's where Empact comes in.

Empact = Empath + Impact

What it does

Empact is an app that helps parents monitor potential cyberbullying of their children by analyzing the tone of conversations they have, whether on text or on call.
Our app allows a parent or child to send audio of their calls or conversations to our system, which uses machine learning to analyze the emotion in the call or conversation. Additionally, we also detect the emotion in text conversations and chats. Along with the app's geolocation and recording abilities, parents can monitor and detect potential bullying, cyberbullying, abusive and dangerous behaviour and conversations the child may be engaged in.

How we built it

The UI/UX design and prototype was made on Figma. We've kept a simple UI with calm colors.
Python mainly for the backend, and react native with expo for the app.

Machine Learning approach, deployment and description:

Classifying Emotion from Audio of Human Speech is a well documented research problem which has seen several attempts made in the very recent past. The state of the art models use the RAVDESS as well as CREMA-D datasets along with a few others.

We spent the better part of the first day going through various papers, kaggle notebooks and all sorts of approaches to this problem and finally we settled on an approach which seemed to perform decently well with emotion classification (as opposed to gender/emotion classification)

Of all the various approaches and models we tried, a convolutional network using log mel cepstrum had decent accuracy when tested against the dataset. There is a strong possibility of overfit though, and this will require further refinements and data augmentation to address.

For the hackathon, we tested various models and for the approach we described above, we deployed a Flask server to run an inference engine to analyze incoming sound files and classify their emotion.

To augment our approach, we used GCP to extract the actual text from the audio file, and separately used an NLTK based approach to classify the emotion of the text independently, thereby giving our classification more context awareness.

The textual emotion classification was also deployed as a separate endpoint.

These approaches and deployments are described and demonstrated in the following videos:
https://youtu.be/jd5Tzsq7ylE
https://youtu.be/lX5qbGYC0-c

The jupyter notebook we used for experiments in located in the github repo, along with some selected models and the deployment/inference engine

Challenges we ran into

reading a ton of literature on emotion classification and figuring out optimal approaches
recording and streaming audio from expo to flask
working with a team located across the planet in 3 different countries/timezones (USA, Saudi Arabia and India)