Inspiration
My grandmother was diagnosed with Parkinson's two years ago, but looking back at videos from 5 years earlier, you could hear it in her voice - subtle shakiness, breathiness. By the time she got diagnosed, we'd already lost years where early treatment could've made a huge difference.
That's when I learned that Parkinson's affects your voice 5-7 years before the tremors start. The technology exists to detect it early, but it's locked away in research labs. I wanted to change that - make early detection accessible with just a phone recording.
What it does
VoiceGuard AI detects early Parkinson's from a 3-second voice recording. But it's not just a "yes/no" detector - it's a complete clinical intelligence system.
When you record "Ahhhhh," we:
- Extract 44 voice biomarkers (jitter, shimmer, harmonics, MFCCs)
- ML model predicts Parkinson's probability with 89% accuracy
- 7 AI agents then collaborate to provide what doctors actually need:
- Research recent medical papers (PubMed API)
- Predict 5-year disease trajectory
- Create evidence-based treatment plans (with drug safety checks!)
- Schedule personalized monitoring
- Generate clinical reports with full citations
The agents use NVIDIA Nemotron and make real decisions - like when the Treatment Agent discovered a drug interaction that would've caused serotonin syndrome and adjusted the recommendation. That's not hardcoded, that's intelligence.
How we built it
ML Pipeline:
- Trained 12+ algorithms on UCI Parkinson's dataset (XGBoost, LightGBM, Random Forest, SVMs, Neural Nets)
- Feature extraction with Parselmouth (jitter/shimmer) and Librosa (MFCCs)
- Achieved 89% ROC-AUC with XGBoost after hyperparameter tuning
Agentic AI:
- 7 specialized agents powered by NVIDIA Nemotron Super 49B
- Implemented ReAct pattern (Reason → Act → Observe) - agents decide their own paths
- Built dual RAG systems:
- ChromaDB with NVIDIA embeddings for clinical guidelines
- FAISS IVF index for finding similar patients (scales to 100K+)
Production Engineering:
- Output validation (catches unsafe doses, contraindications)
- Response caching (40-60% API cost reduction)
- Agent-to-agent messaging for collaboration
- Full citation tracking for medical explainability
Stack: Python, FastAPI, React, XGBoost, ChromaDB, FAISS, Parselmouth, Librosa
Challenges we ran into
The MFCC nightmare: Spent 6 hours debugging why MFCCs from test audio were wildly different from training data. Turns out Librosa's default sample rate (22050 Hz) wasn't matching my phone recordings (48000 Hz). Once I forced resampling, features aligned perfectly.
FAISS wouldn't train: My IVF index kept throwing errors. Realized you need at least nlist samples before training (I had 50 patients, tried 100 clusters - math doesn't work!). Now it auto-detects and trains only when sufficient data exists.
Nemotron's creativity: The agent kept inventing drug names that don't exist. Had to build a validation layer that checks recommendations against known medications and flags hallucinations before they reach doctors. Caught it recommending "Parkinson-B-Gone 500mg" in testing.
Agent coordination: Getting 7 agents to pass context cleanly was harder than expected. Built a proper coordinator with structured handoffs and a messaging system so agents can actually collaborate (not just run sequentially).
Accomplishments that we're proud of
It actually works in the real world. I tested it on my phone recording vs. actual patients with Parkinsons. System correctly identified my voice as low-risk and the patients as high risk (They have naturally high jitter from vocal cord nodules - system caught a real anomaly).
The dynamic ReAct loop. This was the hardest part - getting Nemotron to genuinely decide its own path instead of following a script. When the agent exits early because "I have sufficient information," that's real autonomous intelligence, not predetermined steps.
Medical-grade validation. We're not just generating text - every output is validated for dose safety, drug interactions, and required disclaimers. In testing, it caught a 500mg Levodopa recommendation (way over safe limits). That kind of safety layer is critical for healthcare AI.
Dual RAG architecture. ChromaDB for semantic search over medical literature + FAISS for numerical similarity over patient voice features. Using the right tool for each job, optimized for production scale.
Full explainability. Every recommendation includes citations with relevance scores. When it suggests therapy X, you see "Evidence: MDS Guidelines 2024 (HIGH relevance), Drug Safety Database (MEDIUM relevance)." Doctors can trust it because they can verify it.
What we learned
Agentic AI is genuinely different from chatbots. Initially, I thought agents were just fancy prompts. But building the ReAct loop taught me that true agentic systems make decisions - "Do I need more info? Which tool should I use? When am I done?" That's way harder than generating responses.
RAG isn't one-size-fits-all. I started trying to cram everything into ChromaDB. Then realized patient similarity search needs different optimization (FAISS IVF) than semantic document search. Now I have both and they're 10x faster in their respective use cases.
Medical AI needs guardrails. LLMs hallucinate. In consumer apps, that's annoying. In healthcare, it's dangerous. Building validation layers isn't optional - it's the difference between a demo and something you'd actually deploy in a hospital.
Parkinson's voice markers are fascinating. Jitter and shimmer sound abstract until you plot them - Parkinson's patients have these jagged, irregular waveforms compared to healthy voices. MFCCs capture spectral changes that correlate with vocal cord rigidity. This stuff is beautiful when you visualize it.
NVIDIA's ecosystem is powerful. Using Nemotron for reasoning + NeMo Retriever for embeddings + the Build API gave me a complete agentic platform. I didn't have to Frankenstein together different models from different providers.
What's next for VoiceGuard
Short-term (next 3 months):
- Implement full RPDE/DFA algorithms (currently using approximations for 4 advanced features)
- Expand clinical knowledge base from 9 to 500+ medical documents
- Build patient mobile app (right now it's just backend + minimal frontend)
Medium-term (6-12 months):
- Collect longitudinal dataset (need 1000+ patients with follow-ups to train FAISS on real progression patterns)
- FDA Class II medical device application (510(k) clearance pathway)
- Partner with 3-5 neurology clinics for pilot deployment
Long-term (1-2 years):
- Multi-language support (Parkinson's affects 10M people globally, not just English speakers)
- Expand to other neurodegenerative diseases (Alzheimer's, ALS - they have voice signatures too!)
- Real-time monitoring app - patients record weekly, system alerts them to concerning trends
The big vision: Make early detection of neurodegenerative diseases as easy as taking a selfie. If we can catch Parkinson's 5 years early for millions of people, we're not just building a product - we're changing healthcare outcomes at scale.


Log in or sign up for Devpost to join the conversation.