Imagine you are learning to play the guitar, and you find an incredible piece you want to learn. You excitedly search the Internet, combing countless websites and forums for the notes/tabs to play said piece, only to discover that the only tabs available are either wrong or expensive. What would you do at that point, and more importantly why has nothing been built to solve this problem?
It turns out Automatic Music Transcription is actually an open problem in Computer Science, and even the best machine learning models still have plenty of room for improvement. Additionally most of the research efforts are being focused on automatic transcription for pianos, not for guitars.
Our team decided to combine these two issues and build Guitab: our attempt at solving the automatic music transcription problem for Guitar pieces. We wanted to study the latest machine learning models being built for piano pieces, and modify them for guitars. Additionally we wanted to build an easily accessible web platform, which would enable anyone to finally learn to play that piece on their mind.
What it does
Guitab is a website where a user can upload an mp3 file containing guitar sounds, and through our machine learning model the site displays the notes/tabs needed to play that piece. We support acoustic guitars, individual notes, and a selection of commonly used chords.
How we built it
The centerpiece of Guitab is the machine learning model which powers the music transcriptions. We built a K-nearest-neighbors model through Tensorflow, and deployed it on the cloud through Heroku. Then we built a node server to accept user input, convert the mp3 file to a wav file, convert the wav file to a range of frequencies using a Fourier transformation, POST these frequencies to our ML server, and interpret the results into a graphical interface for the user.
Additionally we leveraged the NSynth Dataset as training data, preprocessed this data through a normalization in Python, visualized the data in Scikit-Learn, and used Azure Machine Learning Studio for prototyping.
Finally we validated our results by obtaining random song samples from YouTube with publicly accessible tabs as ground truth, and manually compared our model's classifications to these tabs.
Challenges we ran into
The first challenge we ran into was finding a machine learning model that met our use case. In an ideal world we would have built an RNN, but due to time limitations we would not have the time to tune our neural network. We managed to build a DNN, but it had subpar results due to lack of tuning. Thus we experimented with a variety of regression and classification models, until we finally landed on K-nearest-neighbors.
Another big problem was the lack of easily available training data. Since the piano is the go to choice for researchers who are tackling the automatic music transcription problem, there is very little support for guitar datasets. Thus we had to spend a lot of time preprocessing our data by cleaning it, generating it, and labeling it.
The final problem was integrating all the pieces together into a complete end-to-end pipeline from mp3, to wav, to frequency, to MIDI numbers, to tab notation, to graphical representation.
Accomplishments that we're proud of
1) Our machine learning model achieved 96% accuracy on the training data.
2) We aggregated, processed, and labelled over 480 guitar notes.
3) We researched and applied current machine learning techniques from CS research literature.
What we learned
We learned a lot about Machine Learning, Automatic Music Transcription, Data Cleaning/Visualization, and digital/analog representations.