ParkinSense

Upload and Analyze audio
Playback and Edit Audio
Parkinson Prediction
Mel-Spectrogram
graphs

Inspiration

Parkinson’s disease reaches into every corner of a person’s life. It often begins quietly, with a slight tremor in the hand or a soft change in speech that others barely notice. Over time, those small signs can grow into stiffness, imbalance, and fatigue that make even the simplest tasks feel exhausting. It does not only affect the body, it also burdens the mind and spirit.

When Parkinson’s is recognized early, there is real hope. Early diagnosis allows doctors to start treatments that slow its progress and ease its symptoms before they grow worse. Medication, exercise, and therapy can help people stay active, independent, and connected to what they love.

Catching Parkinson’s early can protect precious time. Time to move with freedom, to speak with clarity, and to continue living with purpose. Awareness and early action give people the chance to hold on to more of their life, their identity, and their strength.

What it does

ParkinSense records a steady “aaahh,” rips out two sets of 768 self-supervised speech embeddings (Wav2Vec2-Base and WavLM-Base-Plus), stitches them into a 1,536 feature fingerprint, and feeds that into a calibrated XGBoost model. The app spits out a Parkinson’s probability, a confidence band, and a mel-spectrogram with clear annotations so the result feels real, not abstract.

How we built it

Verified every Figshare Parkinson/healthy vowel asset (audio + demographics) by checksum
Applied a clinical hygiene chain: silence trim, spectral denoise, loudness normalization, deterministic clip/pad, and generated deterministic augmentations (noise, gain, pitch, shift, soft clipping) to mimic real microphones.
Extracted SSL embeddings, fused them with MFCC/spectral and Praat perturbation features, scaled/PCA’d the vectors, and trained XGBoost with Platt scaling under stratified group K‑fold CV plus a 10 % grouped holdout to eliminate leakage from duplicate utterances.
Logged fold metrics, ROC/PR, calibration curves, demographics, feature importance, and embedding projections into FastAPI-served artifacts for the dashboard.
Built a Next.js + ShadCN frontend that records 16 kHz WAV in-browser, shows waveform quality tips, calls the FastAPI endpoints, and renders the insights with exportable PDF reports so both clinicians and patients can trust the context.

How we built it

Pulled the Parkinson vs healthy vowel dataset plus demographics from Figshare and checked every checksum.
Scrubbed the audio like a clinic visit: silence trim, spectral noise reduction, loudness normalization, duration control, then hammered it with noise, gain, pitch, shift, and gentle clipping augmentations.
Extracted both SSL embeddings, concatenated them, standardized the vectors, trained XGBoost with Platt scaling, and forced stratified group folds plus an 80/20 grouped holdout so no duplicated utterance cheated into validation.
Logged cross-validation, holdout metrics, ROC, PR, calibration, demographics, and feature importances into artifacts served by FastAPI.
Built a Next.js + ShadCN interface that records true WAV audio in the browser, sends it to the API, and renders everything with exportable reports that still feel human.

Challenges we ran into

The dataset was messy and unforgiving, so getting silence trimming and noise reduction right without killing tremor cues took a lot of trial.
Ensuring grouped splits meant rewriting the pipeline so augmented versions of the same clip never leaked; fake accuracy numbers were unacceptable.
Browsers love giving you random codecs, so forcing consistent 16 kHz WAV capture required custom audio plumbing on both client and server.

Accomplishments that we're proud of

Metrics: holdout accuracy about 72 percent and AUROC about 0.84 after fixing leakage, so every probability feels trustworthy.
A dashboard that protects the original design but adds everything patients and researchers need, including detailed diagnostics and PDF exports.
One command now rehydrates data, retrains, refreshes artifacts, and updates the live API so iteration is painless.

What we learned

XGBoost is useless if you do not respect group boundaries; once I did, the results finally matched realistic results.

What's next for ParkinSense

I'm going to try sharing it on Reddit and Parkinson’s support communities to let them try it. Add quality checks so people know instantly if their recording is too noisy or too short. Offer opt-in data donations so the model keeps improving without exploiting anyone.

Built With

audiomentations
cloudflare
fastapi
hugging-face-ssl-models
joblib
next.js
pydub
python
react
scikit-learn
seaborn
shadcn
smote
tailwindcss
typescript
vercel
xgboost

Updates

Sebastian Arellano-Rubach started this project — Nov 09, 2025 03:32 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.