SoundSense

Logo
Homescreen

Inspiration

The inspiration for SoundSense came from recognizing a critical gap in assistive technology for the deaf community. While working with AI hardware acceleration, I realized there was a meaningful opportunity to create something that could genuinely improve lives. In my opinion, there's a significant lack of useful support tools for those who are deaf, particularly when it comes to environmental awareness and safety. I wanted to leverage cutting-edge AI hardware to build something that could serve as "digital ears" - providing real-time awareness of the auditory world that deaf individuals often miss, especially critical safety sounds like fire alarms, emergency sirens, and security alerts.

What it does

SoundSense is an AI-powered real-time audio classification system designed specifically for deaf and hard-of-hearing individuals. The system continuously monitors environmental audio and provides instant visual alerts for important sounds. It can identify 521 different audio events using Google's YAMNet model, with a smart priority system that categorizes sounds as critical (fire alarms, smoke detectors), high priority (emergency sirens), or medium priority (doorbells, phone rings). The system features a web-based interface showing live sound classifications, confidence levels, and historical data through interactive charts. For critical alerts, it integrates with Discord to send remote notifications to users, caregivers, and support networks to ensure environmental awareness.. The goal is to provide deaf users with the environmental awareness they need for safety, security, and independent living.

How we built it

The core of SoundSense is built around the MemryX Raspberry Pi AI accelerator, which provides the computational power needed for real-time audio inference in a compact form factor. I selected Google's pre-trained YAMNet model, which classifies 521 different audio events from the AudioSet ontology, and adapted it specifically for our accessibility use case. The architecture consists of:

Hardware: MemryX MXA chips for accelerated inference on Raspberry Pi

Backend: Python with Flask and the MemryX AsyncAccl API for real-time processing

Audio Processing: Custom preprocessing pipeline using TensorFlow Lite interpreters

Frontend: Interactive web interface built with HTML/CSS/JavaScript and Chart.js for visualizations

Integration: Discord webhook system for remote alerting

The key technical challenge was implementing the AsyncAccl callback pattern correctly - setting up input callbacks that capture and preprocess audio frames, and output callbacks that handle classification results and trigger appropriate alerts based on sound priority levels.

Challenges we ran into

The most significant challenge was implementing asynchronous real-time inference. Getting the audio processing pipeline to work smoothly with the MemryX AsyncAccl API proved quite difficult. The callback-based architecture required careful coordination between:

Audio capture timing and buffer management Preprocessing model execution MXA inference scheduling Postprocessing and alert generation

Initially, I struggled with callback timing issues and data flow problems that caused the system to either miss audio frames or process them out of sync. Debugging the asynchronous execution flow while maintaining real-time performance constraints was particularly challenging, as traditional debugging approaches don't work well with callback-based systems. Another challenge was optimizing the audio preprocessing pipeline to match the expected input format for YAMNet while maintaining the low latency required for real-time safety applications.

Accomplishments that we're proud of

I'm particularly proud of the accuracy and reliability of the system. Through extensive testing, we haven't encountered any false positives or false negatives in critical sound detection. The system consistently identifies important safety sounds like fire alarms and emergency sirens without triggering false alerts that could cause alarm fatigue. The real-time performance is another major accomplishment - the system processes audio with minimal latency while maintaining high accuracy, thanks to the MemryX hardware acceleration. The user interface provides clear, immediate feedback with confidence levels and historical tracking. Most importantly, the system actually works as intended for its target use case. While we're currently limited by microphone quality rather than software performance, the core functionality successfully bridges the gap between the auditory world and visual/notification systems that deaf users can access.

What we learned

This project taught me extensively about AI accelerator hardware implementation and the practical challenges of deploying machine learning models in real-world applications. Working with the MemryX platform gave me deep insights into:

Hardware-accelerated inference optimization Asynchronous callback architecture patterns Real-time audio processing constraints The importance of preprocessing pipeline efficiency

I also learned about the specific needs and challenges of the deaf community, particularly around environmental awareness and safety. This reinforced the importance of building technology that serves meaningful purposes rather than just showcasing technical capabilities. The debugging and optimization process taught me valuable lessons about system-level thinking when working with specialized AI hardware - understanding the entire pipeline from audio capture through hardware inference to user notification.

What's next for SoundSense

The next major development for SoundSense is implementing multi-microphone support to enable stereo audio processing. This would allow the system to provide spatial audio information - not just identifying what sound occurred, but also determining the direction and approximate distance of the sound source. This spatial awareness would be invaluable for deaf users, providing context like:

"Fire alarm detected - northwest direction, approximately 50 feet" "Vehicle horn - approaching from behind" "Doorbell - front entrance"

Additional future enhancements include:

Mobile app development for portable use Machine learning personalization to adapt to individual user environments Integration with smart home systems for automated responses Expanded microphone array support for better accuracy and coverage Custom sound training for user-specific audio events

The ultimate goal is to create a comprehensive environmental awareness system that gives deaf individuals the same situational awareness that hearing provides, enhancing both safety and independence in daily life.