Headphone manufacturers consistently try to improve the sound isolation or noise canceling quality of headphones, however, sometimes this causes issues. Perfect or even good sound isolation means that with a $20 headset and a bit of music you can't hear anything going on around you, which presents several problems: What if a fire alarm goes off and you miss it? What if you're walking in the street and a car honks at you and you miss it and get hit? What if your boss comes up behind you and starts talking and thinks you're just ignoring them and you get fired?
What it does
We created Selective Hearing, a piece of software that monitors microphone input and if a sound is deemed important, it is passed through, as if the headphones weren't even there.
How we built it
We grabbed 512 samples of audio at 44100 Hz, converted it to frequency spectrum, and then analyzed large datasets using neural networks. We created a dataset for a fire alarm sound, as well as our own voices, allowing passthrough for those options. We chose to make it impossible to turn off the fire alarm sound, because that would present a safety risk.
Challenges we ran into
We had trouble with high false positive rates, and then later high failure to identify when attempting to fix the false positives robustly. In the end, the fire alarm profile was made more robust to false positives while the voices were left sensitive because the failure to identify causes massive sound quality degradation due to lost frames.
Accomplishments that we're proud of
We created a system that allows you to select sounds and in many cases, even voices of people which you want to hear, and pass them through to you, allowing you to hear them crystal clearly, even with headphones, so you never miss an important moment.
What we learned
We learned that with some machine learning analysis, it's possible to identify both noises and some voices with fairly high accuracy, if the sounds are normalized and calibrated against noise.
What's next for Selective Hearing
Improve the voice modeling and allow for smaller datasets to make it possible to construct voice profiles faster.