I've been working with Reinforcement Learning for a while and decided that for my QHacks project, I really wanted to work on Supervised Learning, one of the other main aspects of Machine Learning. I knew that I was going to build a classifier of some sorts, but had no idea what function it would server. It was the "Telus - Best use of IoT or AI to help foster sustainable cities" challenge that inspired me and got my ideas flowing, prompting me to look at how I could use a classifier to help city planners manage the city. I had originally intended for my model to be used on small devices placed throughout forest in order to detect the presence of wildlife through sound, making it easier for future cities to decide on the location of new developments. However, given that I was going to have to train a model, I was going to need data, and lots of it. This meant that I was really at the mercy of whatever datasets Kaggle had for me. It was here that I ran into a labeled dataset of different urban noises, ranging from Dog's barking and children playing to the sound of jackhammers and engines idling. I set about working on my project in hopes that in the future, I'd be able to aim the same model at other applications like the wilderness classifier.

What it does

UrbanSound is an API endpoint that takes in .wav files and passes them through a machine learning model to determine the source of the sound housed inside. Using the webpage tester, you can upload any .wav file, submit it to the endpoint and receive the cause of sound throughout the file. I'm hoping that this would be a good resource for city planners, giving indications about which city roads are most congested (Car Horns) , which places are the most child friendly(Children Playing), as well as more serious data such as which parts have the highest police presence and criminal activity (Gunshots and Sirens). Given that with enough properly labelled data, the model can be trained to classify almost any sound categories, it's applications have very few limits.

How I built it

The first aspect of this project was building my machine learning model, that would act as the classifier that powered the entire API. I initially tried using a convolutional neural network, but wrestled with accuracy for the first few hours of the competition, after more research I decided to use a RandomForestClassifier, a slightly more intuitive classifier that works off of decision trees. After preprocessing and splitting up my train/test data, I trained my model, leaving me ready to predict the results of other .wav files. At this point I started to set up my API endpoint. Given that my API would have one endpoint and was only meant to execute one function, it didn't make sense to create a fancy GraphQL API, so I went ahead and used Flask to create the most compact API I could think of. I was able to access this endpoint in two main ways; with a python script that would send the file to my endpoint, or via postman, which was doing exactly the same thing. So I decided to build a small HTML component of the app, to make it simple to upload and send in a .wav file. I built this as an extension off my Flask app, rendering an HTML page if you access the endpoint from a web browser.

Challenges I ran into

The first main challenge that I ran into was that I wanted the API to be able to deal with .wav files who's source wasn't on our list of labels. Since the dataset I used only provided data that fit within the 10 labels, any .wav file that didn't fit would still randomly receive a tag. I fixed this by ensuring that each time the model was called, if their wasn't an absolute winner (the probabilities that the sound comes from each of our labels are approximately the same), that data point would be classified as not belonging to our Label set.

The second main challenge I ran into was that the given dataset provided 2-4 second .wav files with a single label associated with each file. While this makes sense for training the model, I wanted to be able to send a longer file which would have many labels throughout. Solving this was a little bit trickier, in the end, I ended up making predictions on subsets of my data. By splitting up the data into increments of 2-4 seconds (usually 4), I could make accurate predictions while also allowing for multiple labels. It was also important in this case that when I split up the data, the splits weren't unique (Ex: One split would be seconds 0-3, the next would be 1-4, 2-5 etc) that way we don't end up losing out on results just because of the way the data happened to be split.

The third challenge I ran into was that the library I had originally decided to use in order to read and process the .wav files was super slow, taking somewhere in the neighbourhood of 1.5->2 seconds to load the file into memory, which adds some serious latency onto my API. I found a few solutions to this problem, mostly new and different libraries, however, every solution I found would process the data into different formats and values, meaning if I switched libraries, I wouldn't be able to continue using the same model I had already trained, and given how long it had taken to train the model to its current level of accuracy, it wasn't something I could undertake during QHacks.

Accomplishments that I'm proud of

Given that I was coming into QHacks with assignments due Friday and Saturday night, I decided not to join a team lest I end up not being able to contribute and decided just to work on my own Hack. With only 1 person, I really didn't expect that I'd be able to make any reasonable progress towards completing my hack, so I'm super proud of the fact that I was actually able to get things done and get all my pieces working nicely together.

Most of my experience with Machine learning was over in Reinforcement Learning, so I was really proud that I was able to so effectively learn how to build and train a Supervised Learning model.

What I learned

I learned a lot about the building and training of Machine Learning models (Sequential and RandomForestClassifier in particular)

I learned a lot more about data manipulation using Numpy

I already had some experience with Flask, but I still learned a decent bit more about quickly getting an endpoint up and running

I learned that I should probably do more research before picking a library willy-nilly

What's next for UrbanSound

So while UrbanSound is currently built (and named) to identify common sounds in an Urban environment, I'd like to look into training another model that would be processing sound data for nature areas. I'd be really excited if this sort of software was able to be used to help protect wildlife and their homes. As I talked about in the challenges I ran into, I'd also like to change what library I use and how I go about processing my sound data, hopefully saving a second or two on every request.

Demo: I separated the creation/training of my model and the API build into two separate repositories, QHacks2019 is the model training, QHacks2019-App is the API endpoint application.

Built With

Share this project: