SickNote — Project Story
Inspiration
One of us was sick and debating whether to go to class — not sick enough to be sure. Coughs carry real diagnostic signal, so we asked: can a model learn to tell a healthy cough from an abnormal one? We weren't trying to build a COVID detector. Just answer: does something sound wrong?
What We Built
SickNote is a binary cough classifier. Record or upload a clip; get back HEALTHY or ABNORMAL, a confidence score, and a mel spectrogram.
- Frontend: Next.js + Tailwind CSS
- Backend: FastAPI —
POST /api/predict - Model: Small CNN trained on log-mel spectrograms from COUGHVID
How We Built It
We split work between two partners around a single API contract:
POST /api/predict → { label, confidence, spectrogram }
P2 wrote a mock predict() first so both partners could develop in parallel. P1 swapped in the real model body later — nothing downstream changed.
Before the model sees any audio, we convert each cough into a spectrogram — a visual heatmap showing which sound frequencies are active over time. This turns audio classification into something closer to image recognition, which CNNs are well-suited for.
The model itself is a series of layers that learn to spot patterns in those images — things like which frequency bands light up differently in a sick cough vs. a healthy one. Since our dataset had far more "abnormal" coughs than healthy ones, we told the model to penalize mistakes on the rarer healthy class more heavily, so it didn't just learn to always guess abnormal.
Challenges
Data quality. COUGHVID's status_SSL column looks like labels but is semi-supervised predictions — wrong ~66% of the time against expert consensus. We caught it by reading the dataset paper before touching any training code and used only the ~2,400 rows with physician annotations instead.
Normalization leakage. Computing mean/std on the full dataset before splitting inflates validation metrics. We fit normalization on the train split only and applied those values to val and test.
What We Learned
Small datasets (~2,400 samples) are viable for binary classification, but only if you're careful: lightweight architecture, class weighting, early stopping, and zero data leakage.
What's Next
- External validation across different devices
- SpecAugment to squeeze more out of the small dataset
- Confidence calibration
Disclaimer: SickNote is a screening tool only — not a diagnostic device, not a COVID detector, not a replacement for a doctor.
Log in or sign up for Devpost to join the conversation.