SongAI2.0

Inspiration

We were inspired to create a fun interdisciplinary software project that used our collective experience in linguistics, NLP/artificial intelligence, scripting and front end UI/UX. The seed initially started with two of our members, one studying linguistics and compsci, and the other studying data science and compsci. As our team grew, members with experience in HTML, CSS, and NodeJS were inspired to build a fun web-based UI to give the project some life. Creative members with scripting talent added even more fun ideas by rendering custom video files.

What it does

Our project used natural language processing (Naive Bayes) trained on a song lyric dataset to classify the genre of the song. It takes text input, but creative members of our team added speech to text functionality to allow users to sing the song instead. We then use that genre to generate a custom music video, using moviepy, and overlay that with either a text to speech library, or the raw audio from the user if they choose to sing. We also add a background beat from freely available music from the classified genre.

How we built it

We had members who all had a good idea about how to implement some facet of this project already, the tricky part was figuring out how we could tie it all together cohesively. Members who have focused their studies on backend work such as AI/ML and NLP had little front end knowledge, and conversely users with extensive front end knowledge had little to no experience training a model. We all had to pick up new skills such as git to be able to work together well, and most of us worked with tools or libraries that we never have before, such as moviepy and various NodeJS modules.

Challenges we ran into

The most significant challenge was designing the model. Specifically, trying to train multi-label linear regression on a considerable set of possible labels (almost 80). This ended up failing quite miserably, as some labels were too sparse, and the model had a very hard time converging to fit the training set. Ideally, this would be solved using K nearest neighbors, and while our ML experts had a good conceptual understanding of why that would be effective, they had no experience in training such a mode. Ultimately, it was decided that we would have to throw out some data and reduce our genres to 1 of 11 possible labels. We also decided that naive bayes was going to be necessary, as we were adamant about not using an off the shelf library, and it will always produce a result. There were also various headaches with git, as well as conflicting nodejs modules that we lost significant time debugging.

Accomplishments that we're proud of

We had a shaky start with losing several group members, but we quickly found other members and manage to pull together a real MVP for a fairly ambitious project. We don’t think this project will do much good for the world, aside from providing a few laughs, but we did demonstrate our ability to construct an interdisciplinary project. Not only that, but we were proud of designing a model that, while far from ideal in terms of raw accuracy, makes quite reasonable classifications for most songs during our testing. We were super excited to be able to pull together work done mostly independently by our members into a reasonably polished MVP. The front end, while not the most visually exiting, is clean and functional, and will translate well to a remotely hosted web app in the near feature.

What we learned

The biggest takeaway that we had was actually learning to work on a project as a team that none of us could have done by ourselves. We all worked collaboratively, but mostly independently in terms of implementation, to bring our idea to life. We learned about how to be flexible with what we are implementing, and how to pivot when parts of our project needed to change. Likewise, we learned how to budget time by setting deadlines for when our deliverables would be ready. We also learned about the reality of how missing those deadlines can through a project off, and how to abstract our work, so we didn’t actually need those deliverables until the very end. This required good communication about the various APIs we were all creating because doing so enabled the black box abstraction that we needed to complete a project with many moving parts in a short period of time.

What's next for SongAI2.0

We were really hoping to get the whole project contained in a remotely hosted web app, either on AWS or Azure. We also wanted to do some work with audio compression, specifically uncompressed WAV to V0 MP3 using LAME, and compressing the x264 MP4 video files to lower bit rate x265 encodes using handbrake to speed up processing time. Furthermore, we would really like to use some kind of AI-generated images using something like DALL*E from open AI to have an even more customized experience.