The job applications we filled for summer and the dreadful interviews we had to take made us feel like it would have been great to have an app that can help us practice for them before we sought help from our peers and mentors. Also, presenting is a skill that can be improved further and having a tech support for that would be amazing.

What it does

Our application detects filler words used in speech and the sentiment conveyed through speech and suggests the user any change that might help improve their speaking technique according to their needs. Right now it can detect filler words and a selection of emotions (anger, nervousness).

How we built it

We used python as our backend with React as the front end. To detect emotions we used the Praat ( a speech analyses software) inspired library of python along with SpeechRecognition for interfacing with Google voice to text. We had to change the code such that it can detect 2 or more syllable filler words such as "like" and "really" and displayed it on the screen as well to help the user see what was said without having to go through the whole recording again. We created algorithms that can detect sentiments based on the changes in intensity, pitch and speed rate. Our calculations were based on studies that connect these vocal features to specific emotions.

Challenges we ran into

-Creating a functional front end as this was our first time doing it -Connecting the front end to the back end -Recording audio in real-time and sending it to the back end -Finding a dataset that can train our potential ML model (does not seem to exist) -Understanding the research

Accomplishments that we are proud of

-Learning front end programming and creating a functional front end -Being able to detect emotions fairly accurately for male voices with suggestions for reducing filler words

What we learned

-How sounds works and the technical features of it. -We also learnt to create a front end using React and JS. -Speech analysis -Using Flask framework

What's next for Voice Sentiment recognition

This project can be improved further by real-time recording and response system. We can potentially train a classifier using voice emotion datasets which can improve its sentiment detection accuracy. Another thing we envision is that the application can recognise potential stresses latent in users voice, the words they are using frequently and direct them to resources that might be helpful.

We hope that this can be used further:

  • by actors practising to convey certain emotions in their voice,
  • in call centres where a lot of audio interactions take place to help understand the emotional status of the clients objectively and automatically rather than manually filling it in. It can also be used to detect fatigue in the employees' voice and provide them with support in their shifts.
  • by detecting potential depression or suicidal tendencies in people who might tend to avoid human interaction and helping them to reconnect with their peers or connect to professionals for support.

Built With

Share this project: