Crab is a neural network trained to identify music it hasn't seen before by figuring out stylistic aspects of different composers.

The code gets an accuracy of 95% classifying a training set of tracks it has never seen before as one of 3 different composers.

The training set is about 53 minutes of music (stored in MIDI format) by Bach, Beethoven and Chopin. The code parses the MIDI files (using the external library python-midi) and converts them into a matrix (see example.png) where x is time and y is pitch, with the magnitude of the matrix entries giving the volume of each note. Each track is sliced so that each such matrix corresponds to 10 seconds of music. The network is trained to classify these slices as one of the three composers in the set.

The application of this sort of network would be automated tagging of metadata - not just of composer/artist, but also of e.g. genre, beats per minute, etc.

To evaluate an entire track (rather than one 10-second slice), the track is split up into 10-second slices and the network's classification of each of them is averaged to form an overall prediction. This gives the accuracy figure quoted above. The accuracy on any individual 10-second slice is 82%.

The network is a convolutional neural network implemented in Keras. CNNs are useful here because the translational invariances that make them good for spotting features in images are still valid - a piece of music is the same piece of music if transposed upwards or downwards (translation through pitch) or if there's a delay at the start (translation through time).

Visualising the filters by adjusting the input image so as to maximise their activation is quite interesting - conv2d_2_filter_13.png seems to pick out trills, a device in which the music alternates between 2 adjacent notes very rapidly. conv2d_1_filter_17.png seems to discriminate based on whether particularly low notes are used.

Depencencies: python-midi ( scikit-learn ( keras (

Built With

Share this project: