Symbio

Inspiration

You know the kind of pain that’s not "go to the ER" serious, but also not nothing? A weird rash that's probably fine. A headache that won't go away. A prescription you're not sure is safe to mix with your other meds. So you do what everyone else does: you ask ChatGPT or Claude.

LLMs are incredible at generating medical-sounding explanations. But they hallucinate, justify confidently, and over-dramaticize... because that’s how medical literature is written. Clinical papers are trained to list out catastrophic edge cases. So the model does the same.

And most importantly, they don’t take responsibility for action, they say, “consult your doctor”. That’s autocomplete, not care.

We wanted to build the thing that should already exist. An AI that can look at your symptoms (described, photographed, or spoken aloud), reason about them the way a clinician would, and tell you what to actually do. Not a chatbot that hedges or hallucinates. An agent that builds a real plan: triage level, possible conditions, recommended actions, drug safety checks, then helps you execute it.

What We Built

Symbio is a multi-turn conversational healthcare agent. You talk to it like you'd talk to a nurse practitioner (describe what's wrong, upload a photo, or just speak) and it runs a full clinical pipeline behind the scenes to give you an actionable care plan.

Under the hood, it's five systems working together:

1. Multimodal Intake. The agent accepts text, voice (via Whisper), and images. Images are classified first — is this a skin photo? A prescription bottle? An X-ray? An insurance card? — then routed to specialized analyzers. A photo of a rash gets feature extraction (color, texture, distribution, morphology). A prescription gets OCR and drug identification. Everything unifies into a structured patient event.

2. Mixture of Experts Planner. When generating a care plan, we fan out to four models in parallel:

Expert	Where it runs	What it brings
Claude Sonnet	Anthropic API	Calibrated triage, structured reasoning
GPT-4o	OpenAI API	Different model family, catches blind spots
OpenBioLLM-8B	Modal GPU (A10G)	PubMed + clinical trials knowledge
BioMistral-7B	Modal GPU (A10G)	PubMed Central deep literature knowledge

A judge model (Claude) then synthesizes the four opinions into a single plan. The synthesis isn't naive — it uses weighted consensus:

[ \text{triage}_{\text{final}} = \text{consensus}(\text{structured experts}) \oplus \text{escalate_only_if}(\text{specialist explicitly recommends ER}) ]

The large models set the triage baseline. The specialists contribute additional differentials and clinical nuances. But a small model merely mentioning stroke in a differential list doesn't override two large models saying "this is self-care."

3. Clinical Knowledge Graph. A SNOMED CT-inspired graph, combined with ontologies like RxNorm and LOINIC, encoding symptoms, conditions, medications, and their relationships — HAS_FINDING, RED_FLAG_FOR, CONTRAINDICATED_WITH, INTERACTS_WITH. This is the deterministic backbone. If you're on warfarin and ask about ibuprofen, the graph catches the bleed risk interaction. No LLM hallucination can bypass it.

4. Constraint Engine. A symbolic safety layer that validates every plan before it reaches the patient. It checks hard-coded emergency rules (chest pain + shortness of breath → call 911), blocks dangerous medication recommendations, and enforces scope-of-practice (the agent says "possible conditions," never "you have"). This runs after the AI and before the response — a deterministic safety net over a probabilistic system.

5. Conversational Agent. The agent orchestrates everything through Claude's tool-use API. It decides when to run intake, plan, validate, check drugs, or escalate — chaining up to six tool calls per message. Session state accumulates patient context across turns (conditions, medications, allergies, demographics), so the agent gets more informed as the conversation continues.

How We Built It

The backend is FastAPI (Python), chosen for async support — essential when calling four models simultaneously with asyncio.gather. The frontend is Next.js with Tailwind CSS. Voice input uses OpenAI Whisper for speech-to-text, and responses can be read aloud via OpenAI TTS. The biomedical models run on Modal for efficient GPU inference.

Claude receives conversation history plus tool definitions, decides which tool to call, we execute it and feed the result back, and Claude decides the next step — looping until it's ready to respond to the patient.

Patient message
     ↓
 [Agent Loop]  →  process_intake  →  generate_plan  →  validate_plan  →  Response
      ↑                                                                      |
      └──────────────── tool results fed back to Claude ────────────────────┘

The MoE pipeline uses a two-phase wait strategy: API experts (Claude, GPT-4o) return in ~15 seconds, then we wait up to 7 minutes for Modal experts to handle potential cold starts. The backend pre-warms Modal containers on startup so they're ready by the first user message.

Challenges

The over-triage problem was the hardest thing we dealt with, and it taught us the most. Our first MoE implementation had a safety rule: "if ANY expert recommends emergency, the final triage MUST be emergency." Sounds responsible. In practice, it meant that BioMistral-7B listing "1. Stroke 2. Meningitis 3. Migraine" as a differential for a mild headache forced the entire system to tell the patient to call 911. Two well-calibrated models saying "self-care, risk score 0.2" were overridden by one small model doing what medical literature trained it to do — list the worst things first.

The fix required rethinking what "safety" means. Telling someone with a tension headache to call 911 isn't safe — it erodes trust, wastes emergency resources, and desensitizes people to real warnings. We redesigned the judge to use weighted majority consensus: structured experts set the baseline, specialist models contribute insights proportionally, and escalation only happens when specialists explicitly recommend emergency action — not just when they mention a scary condition in passing. A differential diagnosis is a thinking tool, not an alarm.

Modal cold starts were a constant UX challenge. The biomedical models need 3-5 minutes on first invocation (downloading weights, loading onto GPU). We couldn't block the user for that long, so we implemented two-phase waiting: return API expert results immediately, then incorporate Modal expert results when they arrive. On startup, the backend fires warm-up requests in the background so containers are hot by the time the user types their first message.

The Future

Just 2 years ago, this was impossible. But by combining symbolic research with agent capabilities we can build AI that reasons, validates, and acts. Imagine a world where your first line of care isn’t Google, or ChatGPT. Where AI can Schedule a telehealth visit, escalate appropriately, and guide you through home care steps. We believe this is the future of healthcare AI.

Probabilistic intelligence. Deterministic safety.

What We Learned

Ensemble AI needs opinionated synthesis, not naive aggregation. "Take the most urgent assessment" and "include all red flags from all experts" sound like safe defaults. In practice, they produce plans that are simultaneously thorough and useless — treating every symptom like it could be fatal. A good judge model needs to understand confidence weighting, the difference between a differential and a recommendation, and when a minority opinion should be noted versus when it should set the triage level.

Symbolic safety layers are non-negotiable for healthcare AI. The constraint engine and knowledge graph don't hallucinate. They don't have off days. If chest pain plus shortness of breath appears in the symptoms, triage goes to emergency — no prompt engineering can change that. LLMs handle the nuanced reasoning; deterministic systems handle the bright-line rules.