Inspiration
In high-stakes fields like healthcare, a single hallucination from an AI model isn’t just a bug — it’s a risk to human life. Traditional LLMs, while impressive, often act as solo decision-makers, prone to confident errors. I asked myself: How can we trust an AI model to make critical decisions? In critical cases, Doctors don’t work in silos, they debate, challenge, and collaborate. Why can’t AI do the same?
That question led me to a bold idea:
Let’s turn AI from a lone genius into a full-blown medical think tank. A place where multiple doctor agents debate, scrutinize, and refine each other's insights.
That’s how the Multi-Agent Consensus-Driven Medical Diagnosis System was born.
What it does
This is not your typical chatbot. It simulates a virtual panel of AI doctors, each a specialist in their own medical domain. When you submit symptoms, these agents don’t just generate a response — they engage in a full diagnostic debate. They think together. They question each other. They correct each other. They don’t stop until they reach a consensus. The result is deeper insight, reduced hallucinations, and a new standard for AI reliability in medicine.
How we built it
The system is built using Google’s Agent Development Kit. It is led by an Orchestrator Agent that choreographs the diagnostic reasoning like a conductor leading a high-stakes medical symphony. It initiates a looped, turn-based debate where multiple specialized Doctor Agents — each an expert in a distinct medical field — analyze symptoms, propose diagnoses, and critically evaluate one another’s reasoning. These agents don't just collaborate — they challenge, comment, and correct each other, surfacing weak arguments and sharpening insights through real-time peer review. If they can’t reach agreement, the Evaluate Consensus Agent steps in to flag disagreements and triggers another round of discussion, ensuring no flawed diagnosis slips through. Once consensus is reached, a Summarizer Agent distills the dialogue into a final diagnosis and treatment plan — clear, concise, and deeply vetted.
The frontend is built using Next.js and TypeScript and hosted via Google Firebase. Users can authenticate through Firebase Authentication to interact with the agents. The backend is developed using Python Flask and deployed on Google Cloud Run. Sensitive credentials and API keys are managed securely using Google Secret Manager, keeping the system both production-ready and protected.
Most importantly, the entire agent architecture is designed to be modular and future-proof — new Doctor Agents can be added seamlessly, and existing models can be swapped out for newer, more powerful LLMs as they become available. This means the system is not just intelligent today — it's designed to evolve, grow smarter, and stay on the cutting edge of AI-powered healthcare.
Challenges we ran into
Orchestrating AI Conversations: Designing a system where agents debate without turning chaotic.
A turn-based flow design was implemented so agents could contextually respond to one another without diverging into irrelevant or repetitive tangents.
Extendable Production-Ready Architecture: I structured the architecture so that new Doctor Agents can be plugged in easily with minimal configuration changes, and outdated models can be hot-swapped with newer LLMs — making the system future-proof and upgradable as the AI ecosystem advances.
Infrastructure Costs: I manually evaluated multiple GCP services to ensure it fell within the Free Tier — no surprises, no hidden charges.
I selected Firebase Hosting, Cloud Run, and Firebase Authentication after benchmarking them for both performance and zero-cost viability, ensuring 24x7 production deployment without incurring a single dollar.
Accomplishments that we're proud of
Recreated a Medical Boardroom Virtually: Built a functioning system where AI agents emulate clinical discussion and diagnostic consensus. They challenge, comment, and correct each other, surfacing weak arguments and sharpening insights through real-time peer review.
Reduced Hallucinations: Testing showed a significant improvement in output reliability and consistency over single-agent approaches. As the query passes from one specialized doctor agent to the next, each new perspective refines and sharpens the diagnosis, significantly outperforming single-agent models.
End-to-End Delivery: Designed, built, and deployed a scalable, full-stack AI diagnosis system from scratch.
Contributed to Google’s Agent Development Kit (ADK) Documentation
What we learned
Consensus beats confidence — multiple agents reasoning together crushes the “solo genius” approach of LLMs.
Agent architecture is the future: Orchestration workflows and role-driven agents can unlock human-like reasoning.
Importance of Collaboration: Healthy collaborations among agents leads to smarter, safer, and more nuanced decisions.
What's next for Multi-Agent Consensus-Driven Medical Diagnosis System
Patient Profiling: A dedicated agent will construct a comprehensive patient profile, maintaining a longitudinal health record of past illnesses, diagnoses, treatments, allergies, and vaccination history. This data will empower the doctor council to make more accurate, context-aware decisions by identifying crucial patterns and long-term health trends.
Document & Report Uploads: Adding support for securely uploading relevant medical documents, such as past prescriptions and lab reports.
Built With
- agent-development-kit
- firebase-authentication
- firebase-hosting
- flask
- google-cloud-run
- nextjs
- python
- typescript
Log in or sign up for Devpost to join the conversation.