Update: Check-In #2
Sound Classification for Hazardous Environmental Sound###
We will be reimplementing a paper on sound classification using several networks: a triple-layer LSTM, a CNN + LSTM combined network, and an ensemble network to classify environmental sounds as hazardous or non-hazardous. The 10 classes of sounds that we will be employing this model on are air conditioner, car horn, children playing, dog bark, drilling, engine idling, gun shot, jackhammer, siren, and street music. This model could be used to help the hard-of-hearing or deaf classify environmental sounds, using the Urbansound8K dataset (8732 labeled sound excerpts under 4 seconds each).
Challenges##:
Since we’re all fairly new to working collaboratively on an online notebook, one challenge in the project so far was dealing with outputs of Google Colab. In particular, we had to make sure that the file paths were correct, and that files we were made for pickling were saved in the correct location.
Another important challenge we faced was understanding the new data type - sound, as we’re dealing with the UrbanSound8K data .wav files. As expected, the hardest part of the project so far has been doing the preprocessing, in particular understanding the different features that we can extract from audio signal data.
Insights:##
We currently only extract one feature Mel-frequency cepstral coefficients, to match the dimensions in the paper. From other sources we found from research, the MFCC was the most commonly used feature. If we decide at a later time we want to extract more features to increase accuracy/performance, we can always add those later.
Concrete Results##:
Preprocessing is almost complete; we’ve extracted the MFCC for each .wav file that is approximately 4 seconds long. We have a list of tuples representing the MFCC features and class label (IDs 0-9). The results of preprocessing have been stored in a pickle file, to prevent having to rerun the preprocess every time.
Project Plan & Timeline:
We are on track so far! Our goal for preprocessing was mid-week of the second Check-In, which we’ve achieved. Next, we will commence with building our actual model architecture now that we have the proper features to pass to our network.
Ideally, we will conclude with building the architecture by the end of next week (November 30th). After that, we plan to train our model for several rounds and tweak hyperparameters to achieve our target goals until our presentation day.
What do you need to dedicate more time to?
We will be dedicating more time to understanding the features, especially the four features we did not choose to extract now; as well as furthering our understanding of the original paper and other publicly available implementation. We will also research more about how to implement an ensemble (ensemble calculation accuracy prediction) architecture, which is something that we have not encountered before. Since it involves more math and the combination of different models within a single one, we will need to spend time learning about it.
Log in or sign up for Devpost to join the conversation.