Inspiration

Patients with complex chronic diseases rarely fit inside a single clinical guideline. A patient with heart failure, chronic kidney disease, and type 2 diabetes may receive different recommendations from cardiology, nephrology, and endocrinology. One specialty may prioritize mortality benefit, another may warn about kidney safety, and another may focus on glucose control.

Consilium was inspired by this real clinical reconciliation problem: the hardest part is often not generating one recommendation, but resolving conflicts between multiple reasonable recommendations.

We built Consilium to make that reconciliation explicit, explainable, and auditable.

What it does

Consilium is a multi-specialty clinical decision support A2A agent for complex chronic disease care.

It accepts either free-text patient summaries or Prompt Opinion FHIR context. When FHIR context is available, it can summarize patient demographics, active conditions, medications, and observations before running the clinical workflow.

Consilium then invokes three specialist agents:

  • Cardiology
  • Nephrology
  • Endocrinology

Each specialist returns a structured recommendation with risks and evidence citation. The orchestrator validates those outputs, detects medication and guideline conflicts, and ranks recommendations using deterministic TOPSIS-based clinical scoring.

The output includes:

  • A ranked recommendation table
  • A top clinical priority
  • Key conflicts resolved
  • Evidence citations
  • A clinical safety disclaimer

The goal is not to replace clinicians, but to help them move from conflicting specialty advice to one transparent, safety-aware action plan.

How we built it

The backend is a FastAPI / Google ADK A2A agent deployed on Google Cloud Run. It exposes an A2A-compatible JSON-RPC endpoint and an agent card for Prompt Opinion.

Consilium supports the official Prompt Opinion FHIR context extension and SMART scopes for:

  • Patient
  • Condition
  • MedicationRequest
  • Observation

The specialist agents use DeepSeek V4 Flash through LiteLLM. They are asked to produce structured JSON with:

  • specialty
  • recommendation
  • risks
  • citation

The orchestrator validates each specialist response before using it. If one specialist fails or returns invalid JSON, only that specialty falls back to a deterministic backup recommendation.

The final ranking is computed in code, not freely invented by the language model. We use TOPSIS, a multi-criteria decision method, across four clinical dimensions:

$$ C_i = \frac{D_i^-}{D_i^+ + D_i^-} $$

where (D_i^+) is the distance to the ideal clinical recommendation and (D_i^-) is the distance from the worst-case recommendation.

The scoring dimensions are:

  • Evidence strength
  • Patient match
  • Medication safety risk
  • Guideline priority

The frontend is a React/Vite clinical demo deployed on Vercel. It shows the patient profile, agent pipeline, TOPSIS ranking, resolved conflicts, and reconciliation trace.

Challenges we ran into

The hardest challenge was separating what the LLM should do from what deterministic code should control.

In healthcare, a fluent answer is not enough. The system needs guardrails, structured outputs, refusal behavior, and safe fallback paths. We did not want the LLM to freely assign the final priority score, so we designed the specialists to generate recommendations while the orchestrator computes the ranking deterministically.

Another challenge was Prompt Opinion / A2A compatibility. We had to handle:

  • A2A message shape compatibility
  • messageId / message_id differences
  • Role normalization
  • API key middleware
  • CORS for the browser demo
  • FHIR context extraction
  • Safe fallback when FHIR resources are unavailable

We also had to make sure the project was honest about its architecture. Consilium currently implements parallel specialist consults plus deterministic reconciliation, not a true multi-round negotiation between agents.

Accomplishments that we're proud of

We are proud that Consilium is more than a static healthcare chatbot demo.

It is a deployed A2A agent that:

  • Connects to Prompt Opinion
  • Supports FHIR context
  • Calls multiple specialist agents
  • Validates structured specialist outputs
  • Detects clinical medication conflicts
  • Computes deterministic clinical ranking
  • Fails safely when patient context is insufficient

We are also proud of the clinical demo experience. The frontend makes the workflow visible: patient context enters the system, specialist agents run, conflicts are surfaced, and a ranked action plan is returned.

Most importantly, Consilium refuses to invent a care plan when the input is too thin. In healthcare AI, safe refusal is part of the product.

What we learned

We learned that healthcare agents need more than model capability. They need boundaries.

A useful clinical agent should know when it has enough context, when it should ask for more information, and which parts of the decision should be handled by deterministic logic rather than free-form generation.

We also learned that multi-agent clinical systems are most valuable when they expose disagreement rather than hiding it. The point is not to make every specialist sound aligned from the beginning. The point is to show the conflict, explain the tradeoff, and produce a safer reconciled plan.

What's next for Consilium — Multi-Specialty Clinical Decision System

Next, we would like to add a true multi-round specialist negotiation loop, where cardiology, nephrology, and endocrinology agents can exchange rationale and revise their recommendations before final ranking.

We also plan to expand Consilium with:

  • More specialty agents
  • Broader FHIR resource coverage
  • More guideline modules
  • Clinician feedback on recommendations
  • Retrospective validation on de-identified clinical cases
  • Stronger evaluation of decision quality and time saved

Consilium is currently advisory clinical decision support. It does not replace clinician judgment, local policy, patient preference, or emergency clinical assessment.

Built With

  • a2a
  • deepseek-v4-flash
  • fastapi
  • fhir
  • google-adk
  • google-cloud-run
  • litellm
  • prompt-opinion
  • react
  • smart-on-fhir
  • vercel
  • vite
Share this project:

Updates