Inspiration

I love Vietnamese remix videos on platforms like TikTok and YouTube, but non-Vietnamese friends often miss out on the lyrics and cultural nuances. RemixMate was born from the desire to bridge that gap—so anyone, anywhere, can sing along and appreciate our music.

What it does

  • Extracts Vietnamese remix titles from screenshots.
  • Captures and filters video frames for OCR.
  • Runs OCR (PaddleOCR/Tesseract) on valid frames to pull raw Vietnamese lyrics.
  • Uses OpenAI Whisper for audio transcription as a fallback.
  • Stores everything in SQLite and exposes a REST API (/api/remixes, /api/remixes/:id/ocr, /api/remixes/:id/stt).
  • Defers English translation until extraction quality improves—avoiding “garbage in, garbage out.”

How we built it

I knew of this hackathon when there's only 1 week left. Yet, I still chose to join for learning purposes. I began by defining the scope and sketching wireframes in Visily while evaluating translation and Sonar APIs. Next, I implemented a multimodal lyrics extraction pipeline—parsing Vietnamese titles from screenshots, capturing and filtering video frames for OCR, and using OpenAI Whisper for audio transcription. I then wrote an SQLite ingest script and built an Express-based REST API with Jest & Supertest integration tests to serve my data. To stay on schedule, I deferred the React karaoke UI and wrapped up by polishing the documentation and deployment setup.

Challenges we ran into

  • Stylized & animated text in remix videos broke OCR pipelines.
  • Noisy audio with heavy beats/distortion caused Whisper to underperform.
  • Hackathon time limit forced us to pivot quickly and defer non-core features.
  • Avoiding translation garbage meant holding off on English outputs until data quality improved.

Accomplishments that we’re proud of

  • Modular, test-driven backend served by SQLite and Express.
  • Robust ingestion pipeline that gracefully logs & skips failures.
  • Automated integration tests ensuring our /api/remixes endpoints work.
  • Clear documentation capturing our process, decisions, and future roadmap.

What we learned

  • Multimodal extraction is essential for noisy, user-generated content.
  • Graceful fallbacks (log & skip) keep pipelines moving under pressure.
  • Rapid iteration with flat-file → SQLite workflows can deliver robust APIs in hours.
  • Test-driven development pays off, even in hackathon sprints.

What’s next for RemixMate

  • Add a translations table and pipeline once Vietnamese extraction is reliable.
  • Re-implement the karaoke UI for real-time lyric highlighting.
  • Improve extraction quality with fine-tuned Vietnamese STT or beat-detection isolation.
  • Integrate crowdsourced lyric APIs for broader remix coverage and metadata enrichment.

Built With

Share this project:

Updates