VoiceCord

Inspiration

We were building Medco AI EMR (https://app.medco.ai/demo/) previously Doctin, a voice-enabled medical record system for Indian general practitioners. The idea was to passively listen to doctor-patient conversations, extract relevant doctor dictations, and auto-generate prescriptions.

However, while testing ASR models like Whisper, DeepGram, Google, Azure, and IBM, we discovered a critical limitation—they struggled to understand Indian English dialects and regional pronunciation of medicine brand names. This gap pushed us to explore custom fine-tuning and eventually build VoiceCord.

What it does

VoiceCord is a platform that allows researchers and developers to:

Collect dialect-specific voice datasets using both real and synthetic voices
Create and publish ASR training projects
Recruit contributors based on dialect, region, or profile
Generate synthetic data using cloned voices
Quickly fine-tune ASR models and test performance (e.g., WER)

How we built it

We first generated synthetic audio using ElevenLabs, OpenTTS, and Coqui TTS, feeding them text templates with brand names and medical phrases.
Cloned voices from different regions simulated dialect diversity.
We tested fine-tuning models using this synthetic data—but the results weren’t enough.
We then added real contributor recordings, collected with regional, dialect-rich diversity.
This led us to build VoiceCord, a platform where:

Contributors can register, get matched with projects, and record voice data
Researchers can publish projects and manage contributors
Synthetic and real voices can be combined and exported for training

Challenges we ran into

ASR models failed to recognize region-specific pronunciations and brand names
Synthetic-only data lacked realism, causing underperformance in actual use
Creating a balanced dataset (dialect, speed, noise) was tough
Managing contributor profiles and linking them to the right projects required custom workflow logic

Accomplishments that we're proud of

Built a voice dataset platform tailored to India’s linguistic diversity
Validated that mixing real and synthetic voices improves fine-tuning quality
Designed a project-contributor workflow that enables scalable, inclusive voice data collection

What we learned

Real-world Indian dialects are too complex for generic ASR models
Combining real user data with varied synthetic voices yields better results
Voice dataset quality depends heavily on natural conditions like tone, speed, and environment
Democratizing voice data collection helps unlock inclusive AI innovation

What's next for VoiceCord

Launch contributor incentives and reward models
Expand to support regional Indian languages beyond English
Add tools for automatic transcription, data validation, and WER benchmarking
Collaborate with healthcare, edtech, and voice AI startups to drive adoption

Built With

bolt
curser
elevenlabs
google
interserver
mern
netlify
superbase
windsurf

Updates

Deepak K started this project — May 31, 2025 05:31 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.