MedConsensus

MedConsensus homepage: multi-agent clinical reasoning demo interface
Synthetic patient case used for multi-agent evaluation
Independent specialist assessments before consensus
Agents challenge each other’s reasoning to reduce diagnostic tunnel vision
Final consensus with structured reasoning, confidence score, and ICD-10 codes
A2A-style multi-agent orchestration architecture

Inspiration

Most AI systems provide a single answer to complex problems. In healthcare, this can reinforce diagnostic tunnel vision, where one explanation dominates too early and alternative possibilities are overlooked. In real clinical practice, decisions are rarely made in isolation; physicians consult across specialties, challenge assumptions, and refine their reasoning collaboratively.

MedConsensus was inspired by this gap. We wanted to explore what happens when AI systems move beyond single responses and instead simulate how real clinical teams think: through structured disagreement, critique, and consensus.

What it does

MedConsensus is a multi-agent clinical reasoning system that evaluates synthetic patient cases through a panel of AI specialists.

The system:

Distributes a case to General Medicine, Cardiology, and Pulmonology agents
Each agent produces an independent assessment
Agents critique and revise each other’s reasoning
A Consensus Agent synthesizes a final structured recommendation

The output includes:

most likely diagnosis
differential diagnoses
must-not-miss conditions
recommended next questions and tests
numeric confidence score (0–100)
ICD-10 codes
safety notes and disclaimer

Instead of a single opaque answer, MedConsensus produces a transparent reasoning process.

How we built it

MedConsensus is implemented as a FastAPI service using a multi-agent orchestration pattern.

Key components:

A2A-style architecture for agent coordination
Pydantic schemas for structured outputs
LLM-powered agents for reasoning, critique, and revision
Deterministic fallback mode for reliability

Each specialist agent uses a role-specific prompt to simulate domain expertise. The system performs two rounds of reasoning:

Independent assessment
Cross-agent critique and revision

The Consensus Agent then aggregates all outputs into a final structured report.

Challenges we ran into

One of the biggest challenges was moving beyond a traditional single-agent model. Designing a system where multiple agents produce meaningful, non-redundant perspectives required careful prompt design and orchestration.

Another challenge was balancing AI flexibility with reliability. Fully LLM-based systems can be unpredictable, so we implemented a deterministic fallback while maintaining an LLM-first architecture.

We also had to ensure strict safety constraints. The system uses only synthetic or de-identified data, avoids PHI entirely, and frames outputs as decision support rather than diagnosis or treatment.

Accomplishments that we're proud of

Built a working multi-agent reasoning pipeline with independent assessment, critique, and consensus
Created structured, explainable outputs instead of opaque responses
Integrated clinical-style outputs including ICD-10 codes and confidence scoring
Designed a system that is safe, compliant, and aligned with real-world healthcare constraints
Delivered a complete, testable FastAPI service with marketplace-ready endpoints

What we learned

We learned that multi-agent systems can unlock capabilities that single-agent systems cannot, especially in domains requiring multiple perspectives.

We also learned that explainability is critical. Showing disagreement, uncertainty, and reasoning is often more valuable than providing a single confident answer.

Finally, we saw that structure matters. Without clear schemas and orchestration, multi-agent systems can quickly become inconsistent or difficult to interpret.

What's next for MedConsensus

Next steps include:

Expanding to additional specialties (e.g., neurology, infectious disease)
Improving reasoning depth with more advanced LLM prompting strategies
Adding feedback loops to learn from clinician input
Enhancing evaluation with real-world benchmark datasets (while maintaining privacy constraints)
Integrating into clinical workflows as a lightweight decision-support layer

Ultimately, MedConsensus aims to evolve from a hackathon prototype into a scalable system that supports collaborative, transparent, and safer decision-making in healthcare.

Built With

anthropic-claude-api-(claude-sonnet-4-20250514)
asyncio
fastapi
github
json-structured-outputs
multi-agent-a2a-style-orchestration
prompt-opinion-platform
pydantic
pytest
python
python-dotenv
rest-apis
uvicorn

Updates

Oyinnimiebi Bribena started this project — May 05, 2026 04:57 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.