Inspiration

Many of us are passionate about singing, especially with so many good songs being published around the globe. Do you still remember the good old days, when we go for Karaoke sessions after work or school, sing, have snacks, and enjoy ourselves? Yes indeed, in the pandemic era, we need to find an alternative to our physical Karaoke sessions. However, some songs' karaoke versions are not available online, or the sound quality is not very satisfactory. Our Kara?Ok! provides a one-stop solution for online karaoke activities.

What it does

Kara?Ok! provides a one-stop solution for online karaoke activities. Users can upload any song from a local disk. The song will be split into vocal and background music parts, where the vocal part will be used to do a speech detection to auto-generate the lyrics. The song with the background music part only will be played, and the lyrics generated will be automatically displayed according to the proper timestamp for users to sing along.

How we built it

Frontend: Create single page application using Reactjs, which helps users upload raw music audio and play the karaoke generated.

Backend:

  • Vocal Splitter: a service that utilizes Spleeter to split the vocal and instrument (aka background music)
  • Speech-To-Text: a service that uses Google Cloud API to extract lyrics from vocal - audio files and timestamp information that helps users to sing along.
  • We use FastAPI to help those services connect with Reactjs and reformat the data transformed.

Challenges we ran into

  • Research on a robust method to remove the vocal part from the audio
  • Split texts into lines: we decided to split based on sequence length as well as at the long pauses in the text
  • User interface: handle real-time UI update of karaoke-like data

What we learned

  • Fast prototyping
  • Teamwork
  • Frontend design
  • Google Cloud API
  • Music processing

What's next for Kara?OK!

We plan to make this project more comprehensive by allowing more languages for songs, not just limited to English. Grouping words into lines can be more semantically meaningful by applying NLP models.

Moving beyond, we also plan to enable advanced Karaoke functions such as automatic scoring for the singer, and music transposition to the voice range of user, and add collaborating functions such that users can create karaoke rooms and invite friends to join the room for a combined karaoke session.

Share this project:

Updates