Inspiration

Across the world, people speak in dialects that reflect their culture, community, and identity. But most AI tools struggle to understand or represent them accurately. From captions that miss the mark to translation models that flatten meaning, dialect speakers are often left out of the digital conversation.


What it does

ConText allows users to:

  • Upload or record dialectal video/audio or text.
  • Transcribe dialectal speech using Twelve Labs.
  • Translate it into clear standard English using a fine-tuned T5 model.
  • Use a clean, modern web interface to interact with the system — upload files, record live, clear inputs, and see results.

It’s like a dialect interpreter you can carry in your browser.


🛠️ How we built it

The system consists of several modular components:

  1. Data Collection
    We scraped Caribbean dialect content from YouTube—mainly comments—using the YouTube API.

  2. Model Training
    We fine-tuned a t5-base model using Hugging Face Transformers and PyTorch on a custom dataset of dialect-to-English sentence pairs.

  3. Frontend
    Built with React, Vite, and Tailwind CSS, allowing users to record audio/video, upload files, and see results in real time.

  4. Backend
    A Flask API connects everything, handling uploads, transcription, translation, and audio generation.


Challenges we ran into

  • Low-resource dialects
    There’s almost no clean dataset for Caribbean English. We had to get creative with scraping and cleaning YouTube content.

  • Time constraints
    Training and evaluating models, building a UI, and wiring multiple APIs—all within a weekend—was a huge logistical challenge.


Accomplishments that we're proud of

  • Successfully fine-tuned a working LLM for dialect translation from scratch.
  • Created an app that translates audio dialect into English and reads it back, start to finish.
  • Built a clean, intuitive full-stack web experience.
  • Navigated multiple APIs and stitched them together into a functional, reliable pipeline.

What we learned

  • How to fine-tune transformer models for specialized translation tasks.
  • How to use Twelve Labs creatively for transcription.
  • How to deploy a multi-part ML pipeline using Flask + React.
  • That low-resource doesn’t mean low-impact. Even with limited data, smart design can make meaningful tools.

What's next for ConText

  • Dialect classification: Automatically detect which dialect is being spoken before translation.
  • Expanded dataset: Collect more region-specific expressions and slang to boost translation accuracy.
  • Conversion from Standard English to Dialect: Enable two-way translation for deeper cultural accessibility.

Built With

Share this project:

Updates