Inspiration
We were building Medco AI EMR (https://app.medco.ai/demo/) previously Doctin, a voice-enabled medical record system for Indian general practitioners. The idea was to passively listen to doctor-patient conversations, extract relevant doctor dictations, and auto-generate prescriptions.
However, while testing ASR models like Whisper, DeepGram, Google, Azure, and IBM, we discovered a critical limitation—they struggled to understand Indian English dialects and regional pronunciation of medicine brand names. This gap pushed us to explore custom fine-tuning and eventually build VoiceCord.
What it does
VoiceCord is a platform that allows researchers and developers to:
- Collect dialect-specific voice datasets using both real and synthetic voices
- Create and publish ASR training projects
- Recruit contributors based on dialect, region, or profile
- Generate synthetic data using cloned voices
- Quickly fine-tune ASR models and test performance (e.g., WER)
How we built it
- We first generated synthetic audio using ElevenLabs, OpenTTS, and Coqui TTS, feeding them text templates with brand names and medical phrases.
- Cloned voices from different regions simulated dialect diversity.
- We tested fine-tuning models using this synthetic data—but the results weren’t enough.
- We then added real contributor recordings, collected with regional, dialect-rich diversity.
- This led us to build VoiceCord, a platform where:
- Contributors can register, get matched with projects, and record voice data
- Researchers can publish projects and manage contributors
- Synthetic and real voices can be combined and exported for training
Challenges we ran into
- ASR models failed to recognize region-specific pronunciations and brand names
- Synthetic-only data lacked realism, causing underperformance in actual use
- Creating a balanced dataset (dialect, speed, noise) was tough
- Managing contributor profiles and linking them to the right projects required custom workflow logic
Accomplishments that we're proud of
- Built a voice dataset platform tailored to India’s linguistic diversity
- Validated that mixing real and synthetic voices improves fine-tuning quality
- Designed a project-contributor workflow that enables scalable, inclusive voice data collection
What we learned
- Real-world Indian dialects are too complex for generic ASR models
- Combining real user data with varied synthetic voices yields better results
- Voice dataset quality depends heavily on natural conditions like tone, speed, and environment
- Democratizing voice data collection helps unlock inclusive AI innovation
What's next for VoiceCord
- Launch contributor incentives and reward models
- Expand to support regional Indian languages beyond English
- Add tools for automatic transcription, data validation, and WER benchmarking
- Collaborate with healthcare, edtech, and voice AI startups to drive adoption
Built With
- bolt
- curser
- elevenlabs
- interserver
- mern
- netlify
- superbase
- windsurf
Log in or sign up for Devpost to join the conversation.