Much like WordLens, which makes it significantly easier for anyone to travel to a foreign country since any text can be transcribed into one's native language, EarLens makes it much easier for the hearing-disabled to participate in more everyday interaction. By turning what would've otherwise been nearly impossible interactions into fairly easy-to-manage conversations, the deaf will be able to participate in a world in which they have previously never been able to participate.
What it does
Using the Houndify API, conversational speech from a person is converted into text, which is displayed as a speech bubble atop a video feed on the user's phone. The user can respond by typing his/her response, and using built-in Android text-to-speech capability the text will be rendered as speech heard by the other person. This negates the need for any common medium of communication (e.g. American Sign Language) except for perhaps English. The app uses built-in head tracking in order to position a speech bubble as close to the other person's mouth as possible--in this way, as long as several people are somewhat spaced apart and only one person speaks at a time, deaf people can engage in a multi-person conversation. It's certainly not the most ideal mode of communication between several people, considering that real conversations consist of several interjections and unpredictable behavior, but it's infinitely preferable to a total lack of communication.
How we built it
Android and Houndify. The Houndify SDK for Android was exceedingly difficult to work with, unfortunately, because we were fundamentally altering its use case--rather than a one-time, "touch and go" transcription of speech, we were (and have successfully) forced it to transcribe continuous streams of speech. This required significant strips of the original coding example for Android, and involved several hours' worth of careful analysis of the SDK's code.
Challenges we ran into
Houndify SDK was somewhat of a pain.
Also, mounting a TextView over a SurfaceView proved to be exceedingly difficult, but what was even more difficult was trying to move the TextView around to follow the head of the person in the camera feed. The TextView, due to the frequency of its updating exceeding the UI update frequency, ended up being confined to a very small box at the corners of the screen, and took a lot of coaxing and sampling-rate slowing to perform accurately. Even now it's not perfect because head size always differs due to distance from the camera, and so we're only able to approximately land the TextView in the vicinity of a speaker's head.
Accomplishments that we're proud of
Kevin and I walked into this hackathon with zero knowledge of Android app development, and now we've walked out feeling fairly comfortable with at least the basics and we now have the tools we need to produce outlines of apps. Not only that, we've made something we believe will truly transform the world for a decently large number of people! I think that's the most important thing we gained out of the 36 hours we spent here.
What we learned
Android app development. Big time. Also (in Kevin's case), how to survive 36 hours of not sleeping without a drop of coffee.
What's next for EarLens
We really hope the project will take off; we hope it proves a valuable asset to the hearing-disabled, and so we might even be expanding to iOS and Windows Phone in the future so that it may continue to be useful for even more people. We'll also actively invest in some research in the cocktail party problem and using FastICA with multiple mics, so that conversations can occur between multiple people talking at the same time.