Inspiration
Individuals who are hard of hearing or deaf are susceptible to danger in the environment that they are unable to react to because they couldn't hear the warning sounds. Since they lost a sense, we wanted to make something that could restore that functionality somewhat.
Introducing Spidey-Sense! An AI based danger detection system that can pick up on sounds in the world around you. Our application will warn you about dangers in the environment around you and act accordingly.
What it does
Our application listens to the user's surrounding audio and notifies them if a potential danger is nearby. Upon a confirmation or after a certain amount of time, the application will send an emergency SMS to the 911 text number or another emergency contact of the users choice with the situation and the user's location. The user can also send a custom SMS.
How we built it
The data pipeline of our project is very simple:
- Sound recording
- Preprocessing
- Sound Classification
- Sound renormalization
- Extracting inferences and sending the warning
Sound recording The app records into a 0.975 buffer (the expected input size for YamNet) every 0.5s. Recording every 0.5s significantly reduces the processing overhead and allows the results to stay more consistent. It also allows there to be overlapping windows, which can ensure that a sound isn't missed.
Preprocessing We selected a Butterworth high pass filter for this project. We chose this filter because it had the most performant and accurate results according to the paper Audio signal based danger detection using signal processing and deep learning. It was a good pick since we could basically drop any low frequency sounds and it also allowed us to stabilize our higher frequency, making for better sound detection.
Sound Classification We chose google's pre-trained YamNet model which is trained on the massive ImageNet dataset. It was the ideal choice since it already classified sound extremely well and we could add further Convolutional Layers to boost the sound classification. It was also compatible with tensorflow lite, which we needed in order to perform local on-device processing. Every half second, we input the 0.975s into the model in order to generate a vector of probabilities that indicate which sound is the most likely.
Sound renormalization To address the problem of human speech being so prevalent and really overtaking everything else in the sound profile, we decided to completely filter it out. We take our vector of probabilities and remove speech (and related sounds) from consideration as the most probable environmental sound. Then we add up the remaining probabilities and divide them by that remaining sum. This renormalizes the data and allows us to compare to a threshold value.
Extracting Inferences and sending the warning By selecting the highest probability in the renormalizes vector, we can determine what the sound is classified as. Then we compare it to a threshold value of 0.6, where the model has to be at least 60% confident in order to classify a sound. If it is greater than 60%, it will warn the user and send a text to an emergency contact with the users current location in coordinates.
The application was developed in Android Studio in Kotlin using the tensorflow lite library. All art was custom made.
Challenges we ran into
Our team did not know Android Studio nor the Kotlin language. It was definitely rough learning it all on the spot since there is a lot of syntax and intricate norms inn Android Studio.
We are also not strong in the area of signal processing, so doing that research took a lot of time. Fortunately the paper we found have us some confidence in the filters we chose.
Lastly, our program was not detecting other sounds other than mundane ones like speech, so we had to resort to alternative filtering methods in order to pick up on the next best sound.
Accomplishments that we're proud of
We were able to overcome the language barrier and develop a product that helps the deaf community. We were also able to learn a language and technology that we were completely unfamiliar with and actually somewhat make it work.
What's next for Spidey-Sense | Shaun Desaulniers Fan Club
Cleaning the app up more to deliver a smoother and more functional user experience. Also turning it into a background service so it can be on all the time instead of just when the app is open.
For further extension, we would like to have generative AI built in to talk to someone on the receiving end of the SMS alert (such as 911 or the emergency contact). The generative AI will be able to speak on behalf of the individual and describe it's location and surroundings. With speech to text, it can even take over verbal communication in emergency scenarios, an extremely important factor for individuals who may not be able to speak for themselves.
References
- Fine, A. A., Ashikuzzaman, Md., & Aziz, A. (2024). Audio signal based danger detection using signal processing and deep learning. Expert Systems with Applications, 237, 121646. https://doi.org/10.1016/j.eswa.2023.121646
- Kotlin Reference
- TensorFlow Lite Reference
Built With
- android-studio
- java
- kotlin
- krita
Log in or sign up for Devpost to join the conversation.