What it does
Siren is currently an app that listens to the surroundings. It looks for sounds that could be potentially dangerous, such as gunshots, breaking glass, or a chainsaw. Our goal is to use this to help those who may have a hard time hearing or who are deaf. It does this by using a pre-trained Convolutional Neural Network to recognize spectrograms generated from five-second audio clips. If the AI detects anything abnormal, it will display it on the app, ideally giving warning to the subject and allowing them to avoid danger.
How we built it
We started with a dataset with 2000 five-second clips, and we extrapolated that dataset to over 20,000 clips. We then used a two-layer CNN with two more dense layers to train. We reached high accuracy quickly and we then implemented this AI into our app.
Challenges we ran into
We initially trained our AI in Python, as Tensorflow is one of the largest machine learning libraries. However, we had many issues importing this to Java. The actual problem didn't stem from the model itself, but rather the pre-processing we did on the data. In order to give the data to the AI, we needed to turn it into a 2D image. However, there was good support for this in Python, but almost no information on this in Java. We spent almost the entire night trying to solve this problem. In the end, we had to transfer to a Python backend that would do the processing for us as a workaround.
Accomplishments that I'm proud of
The initial dataset we started with only contained 2000 datapoints. When we first attempted to train on this, no matter what we did, it would overfit, meaning it would essentially memorize all of the datapoints so it didn't really learn anything. However, we came up with the idea of combining different sounds together to generate more data. We realized that out of the 50 classifications in the dataset, we only really cared about 19. Using this, we combined each of the audio files from these 19 with 29 random files that we didn't need. We then classified them as the classification of the audio file from the 19. This allowed us to go from 2000 data points to 22,400 datapoints, and also add a grouping for sound that wasn't dangerous.
What we learned
We ran into a lot of errors with Tensorflow and Keras. During the process of fixing all of the bugs, we learned a lot about how to use machine learning. One interesting thing that we learned about models is that creating a model with a size 220500 dense layer connected to a size 8000 dense layer will most definitely crash Tensorflow. We also learned a lot about research. When we ran into problems with the audio libraries in Java, we realized that audio processing in Java was a lot different from audio processing in Python. The spectrograms were always different. However, at this time, we had already trained our model and we had to see if there was an alternative in Java. In the future, we will definitely verify that our plans will work before testing them. Finally, we also learned that at 24-hour hackathons, making decisions at 3 am and following up on them is usually not a good idea, if an idea at all.
What's next for Siren
In the future, we intend to improve the project a lot. Due to hardware limitations, we were unable to create anything better than an app for the time being. In the future, being able to make this much smaller to the point of being a wearable would greatly improve the impact of Siren. This would allow us to constantly monitor for danger and immediately notify the user through haptic feedback or even more complex methods.
Another improvement that we need to make is to take away the backend and localize everything onto the device. Using a backend means that an internet connection is required at all times for the device to work. This is seriously detrimental as this also adds latency which could make a huge difference in a life or death situation.
Finally, we want to improve the accuracy of the model. Any improvement to the model we make will save and improve more lives. Adding more data, improving the model structure, and training for longer are all goals we strive to reach.