Inspiration

Who it's for & why it matters Meet Sam, 16 — six weeks of irritability, skipping school, pulling away from friends. After a depression screening, Sam's parent is handed a page of brainwave power bands, cortisol levels, and terms like *"frontal alpha asymmetry."* The information that could help is right there in their hands — but it's written for specialists. The parent leaves confused, and the single most important thing, the next step, often never happens. MindBridge takes that one workup and produces two plain views of the same analysis — a guideline-cited evidence summary for the clinician, and a plain-language explainer for the family — moving a scared parent from confusion → clarity → action.

Why this needs AI — not a checklist, not a single lab test This is the heart of MindBridge, and it's the question we most want to answer. Depression has no validated single biomarker — no one number you can threshold. The signal is distributed and faint: it lives in the combination of dozens of EEG band-power measures, the shape of the cortisol response, and behavioral context — each weak and noisy on its own. The natural objection is "so use a weighted checklist." But a checklist's weights are guessed by a committee, and its output is a bucket (low/medium/high). The reason this needs machine learning is narrower and sharper: the relative weight of each noisy, correlated feature has to be learned from real labeled cases, and the result has to be a probability carrying its own uncertainty — not a yes/no. No hand-set formula or single-biomarker cutoff can do that. We deliberately use a simple, auditable logistic-regression model rather than a black-box net — the "AI" here is in the learned fit to data, not in complexity. Then a second, different AI problem appears: turning this patient's specific numbers into language a frightened parent can act on. The space of possible cases is combinatorial, so a static pamphlet or template can't cover it — only generative NLP, with the clinical claims supplied by retrieval, can explain this result for this family.

How the AI works (input → AI → output) Three AI capabilities, one flow. (1) Classification: a logistic-regression classifier — weights learned from labeled EEG data, validated leave-one-subject-out on the public Mumtaz dataset (AUC ≈ 0.95, matching published benchmarks on this small dataset; not a clinical result) — scores the brainwave pattern into a probability with an uncertainty flag. The app then places the cortisol panel and behavioral note alongside that score as context the clinician weighs — they're shown, not folded into the model by guessed weights. (2) Retrieval (RAG): TF-IDF matches the case to a curated, cited clinical-guideline corpus and a public support-resource directory; when a clinician pastes a synthetic physician note, a second document-level retrieval grounds the output in that note and translates its jargon for the family. (3) Generative AI: an LLM, restricted to the retrieved guideline spans, writes the prose — it never originates a medical claim. Every clinician point cites a guideline; the family text is written at a 6th-grade reading level with a patient-specific next-step checklist and linked resources.

The concrete before → after Before: Sam's parent holds a page of numbers, understands none of it, and the follow-up slips. After: one screen says — in plain words — this is one clue, not a diagnosis; here's what "frontal alpha asymmetry" means; book the follow-up this week, write down the sleep and mood changes to bring, and save 988 — and a low score does **not* mean all-clear.* The action that used to evaporate now has a checklist behind it.

How we built it A Streamlit app. The classifier runs locally (always live, no API key), so the core AI capability is always demonstrable; the written output uses a generative model, with a cached demo mode so the built-in patients work with zero setup. We reused the trained EEG model and synthetic-patient generator from our companion research project.

Challenges The honest tension: a model that scores well on one small dataset is not a clinical test. We designed the whole product around that — the AI is a clue, never a verdict — including the counter-intuitive but vital rule that a low signal does not rule out depression, and an explicit limitation: the model is trained on adult data and is not validated for adolescents.

Accomplishments we're proud of Real ML rigor (leave-one-subject-out validation, a probability with its uncertainty, out-of-distribution flags) paired with genuine accessibility — and guardrails that hold even if the model is wrong (988 always shown, uncertainty always visible, clinician always decides).

What we learned The most valuable AI here isn't the most confident answer — it's the clearest one, delivered with the human firmly in control, and honest about what it doesn't know.

What's next Real PHQ-A intake, local resources by ZIP (211 API), multi-language family output, and a clinician export that prints the plain-language explainer as a take-home sheet.

Built with: Python · Streamlit · scikit-learn (trained classifier + RAG retrieval) · generative LLM API · Claude Code (AI coding assistance, disclosed)

Built With

Share this project:

Updates