Kara?OK!

Kara?OK! team
Kara?OK! in preparation
Kara?OK! in action

Inspiration

Many of us are passionate about singing, especially with so many good songs being published around the globe. Do you still remember the good old days, when we go for Karaoke sessions after work or school, sing, have snacks, and enjoy ourselves? Yes indeed, in the pandemic era, we need to find an alternative to our physical Karaoke sessions. However, some songs' karaoke versions are not available online, or the sound quality is not very satisfactory. Our Kara?Ok! provides a one-stop solution for online karaoke activities.

What it does

Kara?Ok! provides a one-stop solution for online karaoke activities. Users can upload any song from a local disk. The song will be split into vocal and background music parts, where the vocal part will be used to do a speech detection to auto-generate the lyrics. The song with the background music part only will be played, and the lyrics generated will be automatically displayed according to the proper timestamp for users to sing along.

How we built it

Frontend: Create single page application using Reactjs, which helps users upload raw music audio and play the karaoke generated.

Backend:

Vocal Splitter: a service that utilizes Spleeter to split the vocal and instrument (aka background music)
Speech-To-Text: a service that uses Google Cloud API to extract lyrics from vocal - audio files and timestamp information that helps users to sing along.
We use FastAPI to help those services connect with Reactjs and reformat the data transformed.

Challenges we ran into

Research on a robust method to remove the vocal part from the audio
Split texts into lines: we decided to split based on sequence length as well as at the long pauses in the text
User interface: handle real-time UI update of karaoke-like data

What we learned

Fast prototyping
Teamwork
Frontend design
Google Cloud API
Music processing

What's next for Kara?OK!

We plan to make this project more comprehensive by allowing more languages for songs, not just limited to English. Grouping words into lines can be more semantically meaningful by applying NLP models.

Moving beyond, we also plan to enable advanced Karaoke functions such as automatic scoring for the singer, and music transposition to the voice range of user, and add collaborating functions such that users can create karaoke rooms and invite friends to join the room for a combined karaoke session.

Built With

Updates

Quang Anh started this project — Oct 16, 2021 04:05 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.