Body language is a very important dimension to understand others. Currently, the visually challenged are deprived of this important aspect. The idea of this hack is leverage technology to determine one important aspect of body language which is "facial expression", and using it to determine the human emotion. This emotion is conveyed to the visually challenged person via a medium (like voice) they can understand.
What it does
We have built a multimedia app that will recognize face expression and determine the emotions of the person(s) over video chat. The emotions inferred are - happy, sad, anger and neutral - and this is conveyed to the visually challenged person as an augmented voice.
How it will be built
In this hack, we use several open source components for multimedia chat, computer vision and machine learning.
- Build real time communication capabilities using WebRTC components.
- Build using computer vision libraries, detect faces, extract facial expression. We have used OpenCV in Python.
- Use publicly available data sets with annotated facial expression to build and run the classification algorithms based on neural network. The output of the algorithm will create a marshalled (or pickled) object of facial expression detection.
- Build an app that will fetch each frame from the camera, determine the facial expression and map it to the emotions - happy, sad, anger or neutral, and send it as text to the receiver in peer-to-peer network. For the purpose of the hack, we will create a simple socket connection and periodically sent the emotions as a textual input to the receiver. 5 The receiver app will be built to receive the inferred emotion as textual input and will convey it as an augmented voice.
Challenges we ran into
- Performance of our app, in detecting facial expressing at near real-time. The processing of frames from webcam to determine the facial expression is a resource intensive task. We had to tune this.
- The facial expression may change very soon in real life, we don't want to overwhelm the user with continuously conveying the emotions. We decided to send the emotion only when the user expression changed for a reasonable duration. Determining this duration and ignoring temporal values was a challenge.
Accomplishments that we're proud of
- Hacking again for #InMobi, which is ranked 14th most innovative company in the world.
- Using technology to help people understand other humans, by augmenting emotions in multimedia chat
- Our facial expression detection and thereby inferring emotions is on "live video" feed, unlike other hacks which focus on static images.
- We run this on our laptops with basic configuration and not needing server class machine. ## What we learned Working with severe time constraints, co-ordination and integration is points of failure.
What's next for EmoSense for Visually Impaired
This app can be cheated if we place a high resolution image of the person. Given more time, we would have wanted to augment this with other inputs like reading pulse of the person.