Inspiration

We were building Medco AI EMR (https://app.medco.ai/demo/) previously Doctin, a voice-enabled medical record system for Indian general practitioners. The idea was to passively listen to doctor-patient conversations, extract relevant doctor dictations, and auto-generate prescriptions.

However, while testing ASR models like Whisper, DeepGram, Google, Azure, and IBM, we discovered a critical limitation—they struggled to understand Indian English dialects and regional pronunciation of medicine brand names. This gap pushed us to explore custom fine-tuning and eventually build VoiceCord.


What it does

VoiceCord is a platform that allows researchers and developers to:

  • Collect dialect-specific voice datasets using both real and synthetic voices
  • Create and publish ASR training projects
  • Recruit contributors based on dialect, region, or profile
  • Generate synthetic data using cloned voices
  • Quickly fine-tune ASR models and test performance (e.g., WER)

How we built it

  1. We first generated synthetic audio using ElevenLabs, OpenTTS, and Coqui TTS, feeding them text templates with brand names and medical phrases.
  2. Cloned voices from different regions simulated dialect diversity.
  3. We tested fine-tuning models using this synthetic data—but the results weren’t enough.
  4. We then added real contributor recordings, collected with regional, dialect-rich diversity.
  5. This led us to build VoiceCord, a platform where:
  • Contributors can register, get matched with projects, and record voice data
  • Researchers can publish projects and manage contributors
  • Synthetic and real voices can be combined and exported for training

Challenges we ran into

  • ASR models failed to recognize region-specific pronunciations and brand names
  • Synthetic-only data lacked realism, causing underperformance in actual use
  • Creating a balanced dataset (dialect, speed, noise) was tough
  • Managing contributor profiles and linking them to the right projects required custom workflow logic

Accomplishments that we're proud of

  • Built a voice dataset platform tailored to India’s linguistic diversity
  • Validated that mixing real and synthetic voices improves fine-tuning quality
  • Designed a project-contributor workflow that enables scalable, inclusive voice data collection

What we learned

  • Real-world Indian dialects are too complex for generic ASR models
  • Combining real user data with varied synthetic voices yields better results
  • Voice dataset quality depends heavily on natural conditions like tone, speed, and environment
  • Democratizing voice data collection helps unlock inclusive AI innovation

What's next for VoiceCord

  • Launch contributor incentives and reward models
  • Expand to support regional Indian languages beyond English
  • Add tools for automatic transcription, data validation, and WER benchmarking
  • Collaborate with healthcare, edtech, and voice AI startups to drive adoption

Built With

  • bolt
  • curser
  • elevenlabs
  • google
  • interserver
  • mern
  • netlify
  • superbase
  • windsurf
Share this project:

Updates