Inspiration
The modern healthcare ecosystem remains deeply fragmented. Patients are forced to navigate a disjointed journey — waiting days for imaging results, transferring between specialists, managing complex medication schedules, and deciphering opaque billing statements. For providers, manually synthesizing cross-departmental data creates significant administrative bottlenecks and increases the risk of human error.
The P.A.C.T System was born from a fundamental question: what if a highly accurate diagnostic model was just the beginning? A Vision-Language Model that identifies pathology but fails to connect that insight to actionable next steps leaves the patient's journey stalled. True clinical value requires AI that doesn't just diagnose — it orchestrates.
Our vision was to build a unified, automated pipeline in which specialized AI agents actively collaborate, seamlessly translating a raw medical image and patient symptoms into a complete care plan, a structured daily schedule, and a transparent itemized bill — all within seconds.
What it does
P.A.C.T System functions as a comprehensive, end-to-end Clinical Orchestrator. Upon receiving a patient's symptoms and a medical image URL, the system triggers an automated four-stage pipeline powered by specialized AI agents:
- Diagnosing Agent — Analyzes the medical image alongside reported symptoms using a Vision-Language Model (VLM) to produce a preliminary diagnosis and severity assessment.
- Treatment Agent — Translates the diagnostic output into a personalized medication plan and evidence-informed therapeutic strategy.
- Scheduling Agent — Converts the clinical treatment plan into a structured, actionable daily calendar for the patient.
- Cost Agent — Cross-references prescribed medications against a Vector Database to generate a transparent, itemized billing estimate.
How we built it
The core orchestration layer was built on the Google Agent Development Kit (ADK), with Gemini API serving as the primary reasoning engine. Inter-agent communication is handled seamlessly via the Agent-to-Agent (A2A) protocol.
For visual diagnostics, we integrated a fine-tuned Vision-Language Model (BLIP + LoRA). Medication pricing accuracy is handled by a semantic search engine built on Qdrant Vector Database and Sentence Transformers. The entire multi-agent system was containerized with Docker and deployed on Google Cloud Run for scalable, serverless execution.
Challenges we ran into
VLM Memory Constraints on Serverless Infrastructure Deploying a 1.5 GB Vision-Language Model on Google Cloud Run initially triggered Out-of-Memory (OOM) crashes. We resolved this by engineering a Lazy Loading mechanism, ensuring the model is only loaded into RAM at the precise moment the Orchestrator invokes the Diagnosing tool.
LLM API Instability & Graceful Fallbacks
During periods of high demand, A2A routing was disrupted by 503 UNAVAILABLE and 404 NOT_FOUND errors from the Gemini API. We addressed this by implementing dynamic model fallbacks — gracefully downgrading to gemini-3.1-flash-lite-preview when necessary — ensuring the clinical pipeline remained uninterrupted.
Third-Party Rate Limiting (HTTP 429)
The Cost Agent repeatedly triggered Hugging Face rate limits due to on-demand model downloads. We resolved this by injecting secure credentials via Google Secret Manager (with strict IAM policy bindings) and caching the sentence-transformers model directly into the Docker image at build time, eliminating cold-start download overhead entirely.
Accomplishments that we're proud of
We successfully delivered a zero-human-intervention medical pipeline — from raw medical image to printed patient bill, fully automated. What we are most proud of is the seamless integration of a heavyweight Deep Learning model into a high-speed, multi-agent communication loop without compromising stability in a cloud-native environment. Watching the Orchestrator autonomously manage an end-to-end patient journey remains a defining milestone of this project.
What we learned
This project substantially advanced our expertise in cloud infrastructure and multi-agent orchestration. Key takeaways include engineering precise system prompts to govern agent behavior, managing memory allocation and container lifecycles on Google Cloud Run, and designing efficient, fault-tolerant workflows using the A2A protocol.
What's next for Health A2A Agent
Expanding the Agent Ecosystem We plan to extend the A2A network with additional specialized micro-agents — most notably a Pharmacovigilance Agent to detect complex drug-drug interactions, and a Dietitian Agent to generate personalized meal plans that complement prescribed treatments.
Multi-Modal Input Expansion While the current system supports text and medical imaging (X-rays), our next milestone is to incorporate audio analysis (e.g., respiratory sound classification) and continuous time-series data from wearable devices — enabling a richer, more holistic diagnostic context.
Built With
- adk
- google-cloud
- po
- python
- qdrant
- rag
- vision-language-model
Log in or sign up for Devpost to join the conversation.