Inspiration

Ophthalmology surgeries generate massive amounts of high-resolution OCT scans and microscopic surgical data, yet surgeons still spend significant time manually analyzing scans, reviewing medical literature, documenting procedures, and managing patient records.

We were inspired by the potential of combining:

multimodal AI, agentic systems, medical RAG pipelines, and real-time conversational intelligence

to build an intelligent surgical copilot for ophthalmology.

Our vision was to create a next-generation healthcare AI system capable of:

analyzing OCT scan images in real time, retrieving evidence-based medical knowledge from PubMed, assisting surgeons interactively, generating automated surgical reports, and maintaining longitudinal patient intelligence.

By integrating the Google Gemini 3 API with ADK-based multi-agent orchestration, we aimed to explore how AI can support future surgical workflows in healthcare.

What it does

Our project is an AI-powered ophthalmology surgical copilot built using a 5-agent AI architecture.

The system processes incoming OCT scan microscopic images, performs AI-driven analysis, retrieves medical evidence from PubMed, enables real-time interaction between surgeons and AI, generates operation reports, and securely stores patient surgical intelligence.

Core Features

  1. OCT Analysis Agent

The first AI agent analyzes incoming OCT scan images and detects:

retinal abnormalities, ophthalmic structural changes, and possible surgical risks.

It generates structured AI findings with confidence-based outputs.

Input: OCT Scan Image Output: Structured Clinical Analysis

  1. PubMed-Powered Medical RAG Agent

The second agent combines OCT findings with evidence-based medical knowledge retrieved from PubMed using a vector database.

The workflow:

PubMed Papers→Embeddings→Vector Search→Medical Retrieval

This helps generate:

evidence-backed insights, contextual surgical explanations, and medical recommendations.

  1. Real-Time Conversational AI Agent

The third agent enables surgeons to interact with the AI system using natural language.

Example queries:

“What abnormalities are detected?” “Summarize the OCT findings.” “Retrieve related PubMed references.”

The AI responds in near real time using Gemini 3 multimodal reasoning.

  1. Surgical Report Generation Agent

The fourth agent automatically generates structured operation reports containing:

OCT findings, AI observations, recommendations, surgical summaries, and PubMed references.

Output formats:

PDF JSON HTML

  1. Database Storage Agent

The fifth agent securely stores:

patient information, operation history, OCT scan metadata, AI analysis, and generated reports

inside a MySQL database.

CI/CD Pipeline

The platform also includes a professional AI deployment pipeline:

Code Push → Automated Testing → Docker Build → Kubernetes Deployment

This pipeline supports:

continuous integration, scalable AI deployment, automated testing, GPU inference infrastructure, and rollback support.

How we built it

We built the platform using:

Gemini 3 multimodal AI ADK (Agent Development Kit) FastAPI React + Next.js PubMed-powered RAG Vector Databases Docker + Kubernetes and the MeDo platform. Architecture

The system follows a modular microservice architecture:

Incoming OCT Images ↓ AI Inference Pipeline ↓ ADK Agent Orchestrator ↓ 5 Specialized AI Agents ↓ Gemini 3 Reasoning Layer ↓ RAG + PubMed Retrieval ↓ Real-Time AI Interaction ↓ Report Generation + MySQL Storage Agentic AI System

We implemented a coordinated 5-agent AI workflow using ADK.

Agent 1 — OCT Analysis

Built using:

PyTorch OpenCV Gemini multimodal reasoning

This agent processes microscopic OCT images and generates clinical findings.

Agent 2 — Medical RAG

We integrated:

PubMed APIs, embedding pipelines, and vector search systems

to provide evidence-based medical reasoning.

Agent 3 — Conversational AI

We developed a low-latency conversational AI system capable of:

contextual understanding, streaming responses, and interactive medical reasoning.

Agent 4 — Report Generation

This agent generates:

AI-powered operation reports, summaries, recommendations, and structured outputs.

Agent 5 — Database Storage

We implemented secure persistence using MySQL to store:

surgeries, patient history, AI analysis, and reports.

Example schema:

patients surgeries oct_scans ai_analysis reports audit_logs Gemini 3 Integration

We integrated the Google Gemini 3 API as the central multimodal reasoning engine.

Gemini 3 powers:

multimodal OCT understanding, conversational interaction, report generation, and contextual medical reasoning.

API keys were securely managed using environment variables:

GEMINI_API_KEY CI/CD Infrastructure

Challenges we ran into

Building a real-time AI surgical copilot introduced several technical and architectural challenges.

Real-Time OCT Processing

OCT images are extremely detailed and computationally intensive.

We faced challenges involving:

high-resolution image processing, GPU memory optimization, and low-latency inference.

We had to balance:

Accuracy↔Inference Speed

to maintain fast and reliable AI analysis.

Multi-Agent Coordination

Coordinating five specialized AI agents using ADK required:

workflow orchestration, context synchronization, and reliable inter-agent communication. PubMed RAG Complexity

Building a medical RAG system introduced challenges such as:

retrieving relevant medical papers, semantic embedding generation, and reducing irrelevant retrieval results. Low-Latency Conversational AI

The conversational AI agent required near real-time interaction.

We optimized:

asynchronous APIs, streaming responses, and backend communication pipelines

to minimize latency.

Healthcare AI Reliability

Medical AI systems require:

explainability, evidence-backed outputs, and reduced hallucinations.

This is why combining:

Gemini 3, PubMed retrieval, and structured outputs

became critical in our architecture.

Hackathon Time Constraints

Building an enterprise-grade healthcare AI platform within a hackathon timeline was challenging.

Using the MeDo platform helped us accelerate:

workflow integration, dashboard development, and rapid AI prototyping.

Accomplishments that we're proud of

We are proud of successfully building a highly ambitious AI-powered ophthalmology surgical copilot that combines:

multimodal AI, medical RAG, conversational intelligence, and agentic orchestration. Key Achievements Successfully Built a 5-Agent AI System

We designed and orchestrated:

OCT analysis, medical retrieval, conversational AI, report generation, and patient intelligence agents. Integrated Gemini 3 Multimodal AI

We successfully integrated the Google Gemini 3 API for:

multimodal reasoning, conversational interaction, and contextual medical understanding. Built a PubMed-Powered Medical RAG Pipeline

We created an evidence-based medical retrieval system capable of providing:

contextual recommendations, ophthalmology insights, and semantic research retrieval. Implemented Real-Time Conversational AI

The platform supports:

low-latency AI responses, contextual conversations, and interactive OCT explanations. Automated Surgical Report Generation

We built an AI-powered reporting pipeline capable of generating:

operation summaries, recommendations, and structured clinical reports. Designed Production-Oriented Infrastructure

We implemented:

CI/CD pipelines, Docker containers, Kubernetes deployments, and scalable AI infrastructure. Rapid Development with MeDo

Using the MeDo platform helped us rapidly prototype and integrate complex AI workflows efficiently.

What we learned

This project taught us valuable lessons across:

healthcare AI, multimodal reasoning, distributed systems, and real-time agent orchestration. Multimodal AI Is Powerful for Healthcare

We learned that combining:

OCT imaging, medical knowledge, conversational AI, and surgical context

creates significantly more intelligent healthcare systems.

Medical Imaging+Medical Knowledge+AI Reasoning=Clinical Intelligence Agentic AI Improves Modularity

Using ADK taught us the value of modular AI architectures.

Separating responsibilities into specialized agents improved:

scalability, maintainability, and orchestration flexibility. RAG Improves AI Reliability

Integrating PubMed-powered RAG improved:

factual grounding, explainability, and evidence-based outputs. User Query → Vector Search → PubMed Retrieval → Gemini Reasoning Real-Time AI Requires Latency Optimization

We learned the importance of:

asynchronous APIs, streaming responses, GPU optimization, and efficient backend communication. Prompt Engineering Matters

Working with the Google Gemini 3 API taught us how important:

prompt design, multimodal context management, and structured reasoning

are for generating reliable medical outputs.

Healthcare AI Requires Explainability

Medical systems require:

transparency, traceable reasoning, and evidence-backed insights.

This reinforced the importance of combining:

AI reasoning, structured outputs, and PubMed retrieval. MeDo Accelerated Innovation

Using the MeDo platform helped us:

rapidly prototype workflows, simplify integrations, and focus more on innovation.

What's next for AI surgical copilot

Our long-term vision is to evolve the platform into a fully intelligent real-time ophthalmology surgical copilot integrated directly into surgical workflows.

Real-Time Surgical Assistance

We plan to integrate:

surgical microscope feeds, live OT video streams, and real-time surgical imaging. Live Surgical Feed → AI Analysis → Real-Time Surgical Guidance Predictive Surgical Intelligence

Future versions will focus on:

complication prediction, surgical risk analysis, and predictive ophthalmology intelligence. Historical Data+Real-Time Imaging→Predictive Surgical Intelligence Voice-Based Surgical AI

We aim to build hands-free conversational AI systems for operating rooms.

Example:

Surgeon: "What complications are likely?" AI: "Posterior capsule rupture risk is moderately elevated." AR/VR Surgical Guidance

We plan to explore:

augmented reality overlays, AI-assisted surgical visualization, and intelligent microscope guidance. Hospital Integration

Future goals include:

EMR/EHR integration, hospital deployment, and clinical validation with ophthalmologists. Scaling the Agentic AI Platform

We also plan to expand the ADK-based architecture with additional agents for:

predictive analytics, workflow optimization, compliance monitoring, and intelligent surgical documentation. Final Vision

Our vision is to build a next-generation AI surgical intelligence platform that combines:

multimodal AI, medical reasoning, conversational intelligence, predictive analytics, and scalable healthcare infrastructure

to redefine the future of AI-assisted ophthalmology surgery.

Built With

Share this project:

Updates