speech2emotion

Inspiration

Not much sentiment analysis is done based on raw audio file/streams. First, it will be cool and challenging to do real-time sentiment analysis based on audio. Second, this can be used to track the progress of conference/meeting. Based on the evolution of attendants' emotion, we can keep the topic/discussion/meeting on track.

What it does

It take an audio file as input and visualize the emotion changes in real-time.

How we built it

We used Microsoft Bing Speech API to convert audio files into text. Then used spaCy and gensim to parse text and map each word to a 300-dimension vector with float-type elements (using word2vec). Then we projected text onto different types of emotions, used softMax to normalize and some data smoothing. Finally we got a emotion-vector for each chunk of text.

Challenges we ran into

The Bing Speech API does not provide real-time parsing interface for Python. We have to manually chop audio file into small chunks and tweak the function to update for a certain interval. 2. Real-time analysis requires great computing performance.

Accomplishments that we're proud of

We created a usable end-to-end tool without staying up late.

What we learned

Bing Speech REST API, word vector embedding, plot animation using matplotlib

What's next for speech2emotion

Use NodeJS to implement analysis on steaming.
Compare and contrast different types of emotion and use stats to determine the most representative emotions. Use them as the basis to provide sentiments.
A front-end user interface.

Built With

Updates

Xinyu Hong started this project — Nov 12, 2017 02:12 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.