Inspiration
So much of clinical care hides in the voice, like the tremor of early Parkinson's, the flat affect of depression, and the subtle cues of distress. But clinicians can't always catch it: telehealth audio is lossy, therapy demands full attention, and neurologists only see patients every few months. Research has shown for decades that vocal biomarkers like jitter, shimmer, and harmonics-to-noise ratio can detect neurological change with remarkable accuracy, yet almost none of it reaches the point of care. We built VoicePulse to change that.
What it does
The user picks one of six clinical modules (Care Confusion, TeleVisit Triage, Therapy Monitor, Chronic Pain, Neurological, or Autism Behavioral) and records audio through their browser, following the guidance tailored to that module. VoicePulse then analyzes the recording in real time and returns an emotion classification, speech and silence metrics, and a neurological voice screening score with detected acoustic markers. Results are saved to a per-user session history that can be exported as CSV or a branded PDF report.
How we built it
VoicePulse is a Flask web app with a vanilla HTML/CSS/JavaScript frontend. The browser captures raw PCM audio through the Web Audio API, encodes it to WAV, and sends it to the server, where each recording runs through two parallel pipelines: Google Gemini 2.0 Flash for emotion and module-specific clinical flags, and a custom NumPy DSP pipeline that computes jitter, shimmer, HNR, and tremor biomarkers. The two outputs are combined into a single result containing emotion, clinical flags, and a neurological screening score. User sessions are stored in SQLite with Firebase authentication, rate-limited per endpoint, and exportable as CSV or branded PDF reports.
Challenges we ran into
The hardest part was making the neurological voice screening clinically meaningful rather than just a number. Implementing jitter, shimmer, HNR, and tremor from scratch required F0 tracking with parabolic interpolation and octave-error filtering, plus calibration against published clinical thresholds so the output actually meant something. Browser audio was another battle: MediaRecorder silently failed across different browsers and codecs, so we had to bypass it entirely and capture raw PCM from the AudioContext ourselves. We also had to gate the scoring behind data-quality checks, because Gemini will confidently analyze 0.3 seconds of silence, and a real clinical tool cannot.
Accomplishments that we're proud of
We designed six clinically-meaningful modules, each with its own Gemini prompt and flag vocabulary tuned to its care setting, so a single engine powers six distinct clinical workflows. We're also proud of the UI itself, which we built in vanilla HTML, CSS, and JavaScript with a consistent dark sci-fi aesthetic, live waveform visualization, and a polished session history browser, proving we didn't need a heavy framework to ship something that feels like a real clinical product.
What we learned
We learned that clinical tools live or die on data quality, not just model quality, so the hard work is knowing when not to trust the output. We also learned how much signal is already encoded in simple acoustic math, decades of research on jitter, shimmer, and HNR means you don't always need a deep learning model when the DSP is grounded in the right thresholds.
What's next for VoicePulse
Next, we want to add longitudinal trend tracking so clinicians can see how a patient's voice biomarkers evolve over weeks and months rather than just per session, neurological risk alerts that notify a designated clinician or caregiver when a user's screening score crosses a concerning threshold, and multi-language support to expand VoicePulse beyond English and reach a much wider patient population.
Log in or sign up for Devpost to join the conversation.