SickNote — Project Story

Inspiration

One of us was sick and debating whether to go to class — not sick enough to be sure. Coughs carry real diagnostic signal, so we asked: can a model learn to tell a healthy cough from an abnormal one? We weren't trying to build a COVID detector. Just answer: does something sound wrong?

What We Built

SickNote is a binary cough classifier. Record or upload a clip; get back HEALTHY or ABNORMAL, a confidence score, and a mel spectrogram.

Frontend: Next.js + Tailwind CSS
Backend: FastAPI — POST /api/predict
Model: Small CNN trained on log-mel spectrograms from COUGHVID

How We Built It

We split work between two partners around a single API contract:

POST /api/predict  →  { label, confidence, spectrogram }

P2 wrote a mock predict() first so both partners could develop in parallel. P1 swapped in the real model body later — nothing downstream changed.

Before the model sees any audio, we convert each cough into a spectrogram — a visual heatmap showing which sound frequencies are active over time. This turns audio classification into something closer to image recognition, which CNNs are well-suited for.

The model itself is a series of layers that learn to spot patterns in those images — things like which frequency bands light up differently in a sick cough vs. a healthy one. Since our dataset had far more "abnormal" coughs than healthy ones, we told the model to penalize mistakes on the rarer healthy class more heavily, so it didn't just learn to always guess abnormal.

Challenges

Data quality. COUGHVID's status_SSL column looks like labels but is semi-supervised predictions — wrong ~66% of the time against expert consensus. We caught it by reading the dataset paper before touching any training code and used only the ~2,400 rows with physician annotations instead.

Normalization leakage. Computing mean/std on the full dataset before splitting inflates validation metrics. We fit normalization on the train split only and applied those values to val and test.

What We Learned

Small datasets (~2,400 samples) are viable for binary classification, but only if you're careful: lightweight architecture, class weighting, early stopping, and zero data leakage.