Inspiration
Not much sentiment analysis is done based on raw audio file/streams. First, it will be cool and challenging to do real-time sentiment analysis based on audio. Second, this can be used to track the progress of conference/meeting. Based on the evolution of attendants' emotion, we can keep the topic/discussion/meeting on track.
What it does
It take an audio file as input and visualize the emotion changes in real-time.
How we built it
We used Microsoft Bing Speech API to convert audio files into text. Then used spaCy and gensim to parse text and map each word to a 300-dimension vector with float-type elements (using word2vec). Then we projected text onto different types of emotions, used softMax to normalize and some data smoothing. Finally we got a emotion-vector for each chunk of text.
Challenges we ran into
- The Bing Speech API does not provide real-time parsing interface for Python. We have to manually chop audio file into small chunks and tweak the function to update for a certain interval. 2. Real-time analysis requires great computing performance.
Accomplishments that we're proud of
We created a usable end-to-end tool without staying up late.
What we learned
Bing Speech REST API, word vector embedding, plot animation using matplotlib
What's next for speech2emotion
- Use NodeJS to implement analysis on steaming.
- Compare and contrast different types of emotion and use stats to determine the most representative emotions. Use them as the basis to provide sentiments.
- A front-end user interface.
Log in or sign up for Devpost to join the conversation.