After a George Ezra concert a week ago my friend and I realised that even though the songs are nice and all the guy seemed to use no more than 40 words. So we took and as a challenge and decided to investigate further during StudentHack. We are trying to find out what the most used words by each artist are and figure out whats their lexical richness.
What it does
Analyse.ly extracts the top 50 songs of a given artist, after which it aggregates the lyrics data into a single text document, on which NLP sanitising is performed in order to "clear the noise" of stop-words, non-alphabetic characters and short words adding little to the overall meaning.
How we built it
We use MusixMatch API to get the top 50 songs of a particular artist. Then we aggregate lyrics from multiple multiple sources using web scraping and then perform sanitisation and NLP to get more information that we then plot on our web page build with React.js.
Challenges we ran into
Scraping the lyrics and working around API limitations
Accomplishments that we're proud of
Caching previously searched artists. Plotting a lot of interesting insightful data
What we learned
Building a web app with React.js, Node and performing NLP techniques on a large document
What's next for analyse.ly
Publish it online for other people to experiment with.