The lyrics of a song are a powerful source of information about the themes and sentiment in the song. We aimed to build a unique profile of each song based on its lyrical content. We first obtained a large dataset of music lyrics (~200000) and performed topic modeling on it. We specifically used an algorithm called Latent Dirichlet Allocation (LDA), a modern statistical method that can discover themes in corpora in an unsupervised fashion (meaning no hand labeling is necessary).

We were able to obtain topic vectors associated with each song, which essentially act as a unique profile. The value in the vector with the highest value is the main genre/theme the song fits into. We decided to use these topic neighbors to construct a music graph. Each song is considered a node in the graph and is connected to several of its closest neighbors. Logically, songs with similar themes are closer to each other, so the graph structure holds a lot of information about song similarity.

We first built a simple graph visualizer, where a user can start at a random song (or pick their own) and explore the vicinity by expanding the neighborhood of nearby nodes. We built this in the browser using d3.js, a SVG visualization tool for Javascript. We display an interactive snapshot of the music graph, where nodes are colored by genre. Furthermore, we can listen to any of the songs on the music graph in the interface by double-clicking a node.

This interface led us to discover some really interesting trends in the music graph, most obviously that foreign languages have some very dense clusters, and that there exists an 80s "black hole", an extremely dense region of lyrically similar songs from the 80s (that is often hard to leave while traversing). This tool actually worked well as a tool of music discovery, as we found ourselves listening to new genres and even languages of music and enjoying the extremely random element to the discovery.

We then decided to build a second tool, based on the ideas of random walks on the music graph. Given two songs, we found a random path between the two. This allows us to generate a random playlist where we slowly transition from the first song's genre into the second's. This also acted as a really interesting tool of music discovery, since we could watch the playlist being built in realtime and observe the different genre and theme transitions that were being made. A user can specify the two songs of their choice or pick two at random. After the playlist is generated, it immediately begins playing in the interface.

This application is a novel instance of visual music discovery. Most music discovery tools just list out copious amounts of songs and albums, or pick a random chain for you with no context. With our tool, you can begin your music discovery in a familiar place and branch out as you feel, exploring particular clusters of music before moving onto others. Associating the connectivity of the music graph with our own tastes has great power, as it really allows us to understand our own auditory taste from a visual perspective.

We feel that this application is a cool proof-of-concept that should be the future of music discovery. The current state of music discovery is not so great; black-box machine learning algorithms make predictions based on past listens. The random element of our approach, along with the unsupervised aspect of our data, gives the user a lot more power to discover music their own interests at their own pace.

Built With

Share this project: