We believe that every person should be able to experience an urban lifestyle to their heart's content without worrying for their safety, so we built ListenHere to empower this vision.
What It Does
ListenHere is a mobile-compatible web application that listens for intervals of sounds and predicts, based on a machine learning model, what sounds are being heard. This way, the user can in real time identify important sounds for safety (car horns, gunshot), ambiance (construction, street music), and engagement (dog bark, kids playing).
How We Built It
ListenHere is based on a Support Vector Machine (SVM) model trained on an open-source urban sounds dataset. We started by conducting feature extraction on our dataset. To understand the critical features of our audio waveform input, we used the Librosa library for Python to conduct Mel-Frequency Cepstrum Coefficients (MFCC) analysis. This is essentially looking at the "spectrum of the spectrum" of audio, which are the relevant features for classification of sounds. Then, we normalized our data and conducted Principal Component Analysis (PCA) to reduce our feature space to include only the most distinguishable ones. This is basically how we are stripping background noise from our audio data. Next, we use a support vector machine to classify files by sound.
After training our model, we built our backend in Python's flask library, as well as our front-end in JS. The flask instance accesses the trained model as well as pre-fit standardization measures for the scaling and PCA to prepare each audio file for classification.
Challenges We Ran Into
We worked on several different classification approaches before settling on SVMs. It was tough to implement each of these because they all relied on different types of input data and so not only did we have to write different models but we went through different means of feature extraction. Moreover, we did not have much experience working with front-end and thus went through various attempts ranging from iOS development to React before finding a system that worked well with our concept and existing back-end in Flask.
Accomplishments That We're Proud Of
96.4% accuracy on validation set. Our cutting-edge artificial intelligence is what makes our platform viable.
What We Learned
We learned a lot about manipulating audio data and how sounds can be distinguished from one another using properties of the waveform. Also, we learned how principal component analysis can be used to eliminate background noise and increase the efficiency of our model. Finally, we developed some front-end chops and learned how to take AI models outside of a scientific context and apply them with a user interface.
What's Next For ListenHere
Eventually, we'd like to train our model with more types of sounds, but first, we want to build a WatchOS application that uses our model to serve as an even more convenient urban experience platform. This would focus specifically on the safety aspects of notifying users when certain sounds are identified, such as a siren, horn, or gunshot. We've developed our back-end with this in mind so that any adjustments to the model and user interface can be incorporated into the platform in a streamlined fashion.