Inspiration
Ophthalmology surgeries generate massive amounts of high-resolution OCT scans and microscopic surgical data, yet surgeons still spend significant time manually analyzing scans, reviewing medical literature, documenting procedures, and managing patient records.
We were inspired by the potential of combining:
multimodal AI, agentic systems, medical RAG pipelines, and real-time conversational intelligence
to build an intelligent surgical copilot for ophthalmology.
Our vision was to create a next-generation healthcare AI system capable of:
analyzing OCT scan images in real time, retrieving evidence-based medical knowledge from PubMed, assisting surgeons interactively, generating automated surgical reports, and maintaining longitudinal patient intelligence.
By integrating the Google Gemini 3 API with ADK-based multi-agent orchestration, we aimed to explore how AI can support future surgical workflows in healthcare.
What it does
Our project is an AI-powered ophthalmology surgical copilot built using a 5-agent AI architecture.
The system processes incoming OCT scan microscopic images, performs AI-driven analysis, retrieves medical evidence from PubMed, enables real-time interaction between surgeons and AI, generates operation reports, and securely stores patient surgical intelligence.
Core Features
- OCT Analysis Agent
The first AI agent analyzes incoming OCT scan images and detects:
retinal abnormalities, ophthalmic structural changes, and possible surgical risks.
It generates structured AI findings with confidence-based outputs.
Input: OCT Scan Image Output: Structured Clinical Analysis
- PubMed-Powered Medical RAG Agent
The second agent combines OCT findings with evidence-based medical knowledge retrieved from PubMed using a vector database.
The workflow:
PubMed Papers→Embeddings→Vector Search→Medical Retrieval
This helps generate:
evidence-backed insights, contextual surgical explanations, and medical recommendations.
- Real-Time Conversational AI Agent
The third agent enables surgeons to interact with the AI system using natural language.
Example queries:
“What abnormalities are detected?” “Summarize the OCT findings.” “Retrieve related PubMed references.”
The AI responds in near real time using Gemini 3 multimodal reasoning.
- Surgical Report Generation Agent
The fourth agent automatically generates structured operation reports containing:
OCT findings, AI observations, recommendations, surgical summaries, and PubMed references.
Output formats:
PDF JSON HTML
- Database Storage Agent
The fifth agent securely stores:
patient information, operation history, OCT scan metadata, AI analysis, and generated reports
inside a MySQL database.
CI/CD Pipeline
The platform also includes a professional AI deployment pipeline:
Code Push → Automated Testing → Docker Build → Kubernetes Deployment
This pipeline supports:
continuous integration, scalable AI deployment, automated testing, GPU inference infrastructure, and rollback support.
How we built it
We built the platform using:
Gemini 3 multimodal AI ADK (Agent Development Kit) FastAPI React + Next.js PubMed-powered RAG Vector Databases Docker + Kubernetes and the MeDo platform. Architecture
The system follows a modular microservice architecture:
Incoming OCT Images ↓ AI Inference Pipeline ↓ ADK Agent Orchestrator ↓ 5 Specialized AI Agents ↓ Gemini 3 Reasoning Layer ↓ RAG + PubMed Retrieval ↓ Real-Time AI Interaction ↓ Report Generation + MySQL Storage Agentic AI System
We implemented a coordinated 5-agent AI workflow using ADK.
Agent 1 — OCT Analysis
Built using:
PyTorch OpenCV Gemini multimodal reasoning
This agent processes microscopic OCT images and generates clinical findings.
Agent 2 — Medical RAG
We integrated:
PubMed APIs, embedding pipelines, and vector search systems
to provide evidence-based medical reasoning.
Agent 3 — Conversational AI
We developed a low-latency conversational AI system capable of:
contextual understanding, streaming responses, and interactive medical reasoning.
Agent 4 — Report Generation
This agent generates:
AI-powered operation reports, summaries, recommendations, and structured outputs.
Agent 5 — Database Storage
We implemented secure persistence using MySQL to store:
surgeries, patient history, AI analysis, and reports.
Example schema:
patients surgeries oct_scans ai_analysis reports audit_logs Gemini 3 Integration
We integrated the Google Gemini 3 API as the central multimodal reasoning engine.
Gemini 3 powers:
multimodal OCT understanding, conversational interaction, report generation, and contextual medical reasoning.
API keys were securely managed using environment variables:
GEMINI_API_KEY CI/CD Infrastructure
Challenges we ran into
Building a real-time AI surgical copilot introduced several technical and architectural challenges.
Real-Time OCT Processing
OCT images are extremely detailed and computationally intensive.
We faced challenges involving:
high-resolution image processing, GPU memory optimization, and low-latency inference.
We had to balance:
Accuracy↔Inference Speed
to maintain fast and reliable AI analysis.
Multi-Agent Coordination
Coordinating five specialized AI agents using ADK required:
workflow orchestration, context synchronization, and reliable inter-agent communication. PubMed RAG Complexity
Building a medical RAG system introduced challenges such as:
retrieving relevant medical papers, semantic embedding generation, and reducing irrelevant retrieval results. Low-Latency Conversational AI
The conversational AI agent required near real-time interaction.
We optimized:
asynchronous APIs, streaming responses, and backend communication pipelines
to minimize latency.
Healthcare AI Reliability
Medical AI systems require:
explainability, evidence-backed outputs, and reduced hallucinations.
This is why combining:
Gemini 3, PubMed retrieval, and structured outputs
became critical in our architecture.
Hackathon Time Constraints
Building an enterprise-grade healthcare AI platform within a hackathon timeline was challenging.
Using the MeDo platform helped us accelerate:
workflow integration, dashboard development, and rapid AI prototyping.
Accomplishments that we're proud of
We are proud of successfully building a highly ambitious AI-powered ophthalmology surgical copilot that combines:
multimodal AI, medical RAG, conversational intelligence, and agentic orchestration. Key Achievements Successfully Built a 5-Agent AI System
We designed and orchestrated:
OCT analysis, medical retrieval, conversational AI, report generation, and patient intelligence agents. Integrated Gemini 3 Multimodal AI
We successfully integrated the Google Gemini 3 API for:
multimodal reasoning, conversational interaction, and contextual medical understanding. Built a PubMed-Powered Medical RAG Pipeline
We created an evidence-based medical retrieval system capable of providing:
contextual recommendations, ophthalmology insights, and semantic research retrieval. Implemented Real-Time Conversational AI
The platform supports:
low-latency AI responses, contextual conversations, and interactive OCT explanations. Automated Surgical Report Generation
We built an AI-powered reporting pipeline capable of generating:
operation summaries, recommendations, and structured clinical reports. Designed Production-Oriented Infrastructure
We implemented:
CI/CD pipelines, Docker containers, Kubernetes deployments, and scalable AI infrastructure. Rapid Development with MeDo
Using the MeDo platform helped us rapidly prototype and integrate complex AI workflows efficiently.
What we learned
This project taught us valuable lessons across:
healthcare AI, multimodal reasoning, distributed systems, and real-time agent orchestration. Multimodal AI Is Powerful for Healthcare
We learned that combining:
OCT imaging, medical knowledge, conversational AI, and surgical context
creates significantly more intelligent healthcare systems.
Medical Imaging+Medical Knowledge+AI Reasoning=Clinical Intelligence Agentic AI Improves Modularity
Using ADK taught us the value of modular AI architectures.
Separating responsibilities into specialized agents improved:
scalability, maintainability, and orchestration flexibility. RAG Improves AI Reliability
Integrating PubMed-powered RAG improved:
factual grounding, explainability, and evidence-based outputs. User Query → Vector Search → PubMed Retrieval → Gemini Reasoning Real-Time AI Requires Latency Optimization
We learned the importance of:
asynchronous APIs, streaming responses, GPU optimization, and efficient backend communication. Prompt Engineering Matters
Working with the Google Gemini 3 API taught us how important:
prompt design, multimodal context management, and structured reasoning
are for generating reliable medical outputs.
Healthcare AI Requires Explainability
Medical systems require:
transparency, traceable reasoning, and evidence-backed insights.
This reinforced the importance of combining:
AI reasoning, structured outputs, and PubMed retrieval. MeDo Accelerated Innovation
Using the MeDo platform helped us:
rapidly prototype workflows, simplify integrations, and focus more on innovation.
What's next for AI surgical copilot
Our long-term vision is to evolve the platform into a fully intelligent real-time ophthalmology surgical copilot integrated directly into surgical workflows.
Real-Time Surgical Assistance
We plan to integrate:
surgical microscope feeds, live OT video streams, and real-time surgical imaging. Live Surgical Feed → AI Analysis → Real-Time Surgical Guidance Predictive Surgical Intelligence
Future versions will focus on:
complication prediction, surgical risk analysis, and predictive ophthalmology intelligence. Historical Data+Real-Time Imaging→Predictive Surgical Intelligence Voice-Based Surgical AI
We aim to build hands-free conversational AI systems for operating rooms.
Example:
Surgeon: "What complications are likely?" AI: "Posterior capsule rupture risk is moderately elevated." AR/VR Surgical Guidance
We plan to explore:
augmented reality overlays, AI-assisted surgical visualization, and intelligent microscope guidance. Hospital Integration
Future goals include:
EMR/EHR integration, hospital deployment, and clinical validation with ophthalmologists. Scaling the Agentic AI Platform
We also plan to expand the ADK-based architecture with additional agents for:
predictive analytics, workflow optimization, compliance monitoring, and intelligent surgical documentation. Final Vision
Our vision is to build a next-generation AI surgical intelligence platform that combines:
multimodal AI, medical reasoning, conversational intelligence, predictive analytics, and scalable healthcare infrastructure
to redefine the future of AI-assisted ophthalmology surgery.
Built With
- adk
- agenticai
- javascript
- medo
- mysql
- node.js
- python
- rag
- restapi
Log in or sign up for Devpost to join the conversation.