What it does
GroupEmote is a video analyzing technology that takes a video and outputs information about faces detected in the conversation and analyzes their tone sentiment.
How we built it
We used OpenCV to build face detection in a video, which shows a box and an id for each face that it recognizes. We also used IBM Watson tone analyzer and text-to-speech technology to detect what the people are saying in the video and output the transcript of the video as well as the top 3 tones recognized in each person.
Challenges we ran into / What we learned
We originally wanted GroupEmote to analyze video in a live stream format, and to do so we tried to integrate the video sharing web app OpenTok (based on WebRTC), since Google Hangouts API is no longer supported. However, OpenTok's API was not only hard to understand but also did not allow the function we needed to pull the audio data from the live video call; therefore we had to pivot to analyzing recorded video. OpenCV also caused many build problems and didn't have a pre-trained emotion recognition model. As a result we had to find a large data set on our own and ran out of time to train it ourselves with supervised learning. At one point, we tried to look into the Microsoft Emotion API instead of OpenCV but realized that didn't allow direct file streams and had a very slow pull rate, so we had to abandon that idea as well. While accurate, the slow HackMIT wifi and IBM Watson latency made matching the tone analysis and video a little difficult. Different members on our team also were using different platforms (OSX, Windows, Linux) which caused certain Python libraries to not work for certain members.
What's next for GroupEmote
We want to find a way to integrate our existing technology into a live video sharing web app; our ultimate vision is to create a video conferencing app that will look at each person, detect their emotion through their image as well as their tone through what the say, then analyze that data to aid activities like interviewing, or for managers to better understand meetings with a diversity of team member personalities and socially impaired people to visually understand social cues. It could also be used as a trigger system; a large difference in facial emotion would trigger an audio recording and subsequent tone analysis, therefore providing a more niche data analysis and minimizing the amount of data being processed. We also want to make our user interface more accessible/aesthetic :)
Built With
- ibm-watson
- ibm-watson-text-to-speech
- ibm-watson-tone-analyizer
- javascript
- opencv
- python
Log in or sign up for Devpost to join the conversation.