After a George Ezra concert a week ago my friend and I realised that even though the songs are nice and all the guy seemed to use no more than 40 words. So we took and as a challenge and decided to investigate further during StudentHack. We are trying to find out what the most used words by each artist are and figure out whats their lexical richness.

What it does extracts the top 50 songs of a given artist, after which it aggregates the lyrics data into a single text document, on which NLP sanitising is performed in order to "clear the noise" of stop-words, non-alphabetic characters and short words adding little to the overall meaning.

How we built it

We use MusixMatch API to get the top 50 songs of a particular artist. Then we aggregate lyrics from multiple multiple sources using web scraping and then perform sanitisation and NLP to get more information that we then plot on our web page build with React.js.

Challenges we ran into

Scraping the lyrics and working around API limitations

Accomplishments that we're proud of

Caching previously searched artists. Plotting a lot of interesting insightful data

What we learned

Building a web app with React.js, Node and performing NLP techniques on a large document

What's next for

Publish it online for other people to experiment with.

Built With

Share this project: