ORION — Operating Room Intelligent Orchestration Node [Live Agents + UI Navigation]
From Idea to Impact
The Problem - Surgeon's hands locked in. They can not type, click, or interact with any computer system.
Robotic surgery solves many physical limitations of open procedures, but creates a paradox: the surgeon gains precision but loses access. For hours at a time, their hands are locked on the da Vinci controls inside a sterile field. Every critical piece of information — patient labs, CT scans, drug safety, complication protocols — is one broken scrub or one distracted circulating nurse away.
The evidence:
- WHO checklists are skipped under pressure — implementing the checklist reduces mortality by 47% and complications by 36%, yet consistent execution remains elusive in real OR conditions
- Blood loss is underestimated by over 50% — surgeons are wrong by more than 25% in 95% of cases, delaying transfusion decisions at the moment they matter most
- Operative notes get written from memory 15+ days later — vs. 28 minutes with real-time voice templates, sacrificing accuracy when it counts most for medicolegal and continuity-of-care purposes
- 1 in 20 drug administrations has an error — 80% preventable with a simple cross-check against the patient's allergies and current medications
- Critical View of Safety is rarely confirmed — only 23.1% of laparoscopic cholecystectomies have CVS documented before bile duct division, the single step that prevents the majority of bile duct injuries
These aren't edge cases. They are systematic, evidence-backed gaps that occur in operating rooms every day — and they are all solvable with voice-directed AI.
The Solution - a surgical Co-Pilot that listens, understands, thinks & ACTS on surgeon's behalf
Surgeon's voice input → Live Agents's Response → UI Navigation
ORION is a voice-activated surgical co-pilot that listens continuously throughout the procedure. The surgeon speaks naturally — no button press, no sterile field break — and ORION responds in under a second with the right information on the console display and a calm, brief spoken confirmation.
The core insight: the Gemini Live API's native audio dialog model supports simultaneous PCM audio input, audio output, function calling, and real-time image streaming in a single bidirectional session.
That combination — listen, see, think, and speak simultaneously — is exactly what an OR co-pilot requires, making ORION the surgeon's hands on screen.
ORION maps each OR domain to a specialist: pre-op briefing, safety timeout, blood loss tracking, drug safety, anatomy guidance, complication protocols, operative documentation, SBAR handoff, and visual field analysis. The root orchestrator routes intent to the right agent in real time. The surgeon never thinks about routing — they just talk.
What It Does [Live Agents + UI Navigation]
ORION is a real-time surgical co-pilot that listens continuously throughout the procedure and responds to natural voice commands:
| Intent | Example Commands | Response / Action |
|---|---|---|
| Patient data on demand | "Show allergies", "Display all labs" | Clinical cards appear instantly on the console |
| CT imaging navigation | "Jump to the tumor", "Next 5 slices", "Jump to the tumor" | Opens CT-view panel, navigates to the requested slice or landmark |
| 3D anatomy reference | "Show the bronchus", "Rotate the model left", "Spin it on Y axis", "Reset the anatomy view" | Live anatomical context rendered in 3D model panel |
| WHO Safety Timeout | "Run the timeout" | Guided checklist with verbal confirmation of all items |
| Pre-op briefing | "Brief me on this case" | 50-word structured summary from patient record |
| Blood loss tracking | "Blood loss 200 mL", "How much blood have we lost?" | Running EBL with threshold alerts at 15%, 25%, 40% |
| Drug safety checks | "Is cefazolin safe for this patient?" | Allergy cross-check with alternatives if contraindicated |
| Complication protocols | "I have bleeding 1000 mL, how to handle this complication" | Step-by-step SCAT protocol read aloud, anatomy highlighted |
| Surgical phase checklist | "What phase are we in", "Show vascular dissection checklist" | Phase checklist tile with steps and warnings |
| Anatomy guidance | "What's at risk here?", "What's Danger zone for this phase" | Phase-aware anatomical pearls, CT landmark jump |
| Live visual analysis | "What do you see?", "Enter visual assistance mode", "Is there bleeding?" | Surgical video + full screen capture streamed to Gemini; Visual Assistant reads the operative field, identifies structures, reads external monitors and EMR |
| Intraoperative documentation | "Log CVS confirmed", "Note: specimen removed" | Timestamped event log entry |
| Capture surgical photo | "Document this view", "Capture a photo" | Timestamped event log entry with captured image |
| Operative report | "Generate the report" | Narrative summary from session log |
| SBAR handoff | "Prepare handoff" | Structured Situation, Background, Assessment, Recommendation sign-out checklist for shift changes |
| Hide selective/all panels | "Hide patient data", "Hide everything" | Hides respective/all panels |
All outputs appear simultaneously as voice responses and visual cards on the surgical console. The surgeon never types, clicks, or breaks scrub.
Features
| ✅ Features List | ✅ Feature List |
|---|---|
| Live Agents audio interaction | Barge-in handled naturally |
| Context-aware Native audio dialog | UI Navigation: Visual UI Understanding & Interaction |
| Custom voice persona | Grounding: prompt hardening & before/after tool callback |
| Live video streaming & Screen Share (1fps send_realtime) | Error handling caught mid-stream |
| Multimodal: simultaneous input | Automated deployment |
| Transcription: Input and output audio | ADK Multi-agent & multi-tool orchestration |
How It Was Built
AI Core — Gemini Live API + Google ADK
The entire system runs on gemini-live-2.5-flash-native-audio via Vertex AI's Live API. This is the only model that supports simultaneous PCM audio input + audio output + function calling + image streaming in a single bidirectional session — exactly what a real-time OR environment demands.
Google ADK (v1.26.0) structures the intelligence as a nine-agent hierarchy:
ORION_Orchestrator(root) — receives all voice input, applies wake-word filtering, calls 22 direct tools for single-action commands, and routes to specialist agents viatransfer_to_agent()for complex multi-step protocols- 8 specialist sub-agents:
Briefing_Agent,Timeout_Agent,Report_Agent,Complication_Advisor,EBL_Tracker,Drug_Checker,Anatomy_Spotter,Handoff_Agent Screen_Advisor(Visual Assistant) — ORION's visual intelligence layer; activated system-wide when vision commands are issued, receives both the live surgical video feed (320×240 at 1 fps) and a full screen capture stream (768×768 at 1 fps viagetDisplayMedia)
Transport Layer — FastAPI WebSocket
Each browser connection runs two concurrent async tasks:
upstream_task— receives 16 kHz PCM audio chunks and JPEG image frames, buffers audio to 100ms chunks, forwards everything to Vertex AI viaLiveRequestQueuedownstream_task— receives ADK events, serializes withmodel_dump_json(by_alias=True), and streams JSON to the browser
Grounding & Safety Layer
Every tool call passes through ADK before/after callbacks. Argument whitelists validate field names, landmark names, phase names, and structure names before any tool executes. The model is instructed never to state clinical values from memory — it always calls the tool.
Frontend — Surgical Console
Vanilla HTML/CSS/JS with no framework. Four-panel dynamic tile layout that expands/contracts as panels show and hide. Real-time routing log, live transcript, agent chip highlighting, tool call metrics. Three.js r128 for 3D GLB rendering. CT PNG slices rendered on canvas.
Infrastructure & CI/CD
Cloud Run + Cloud Build CI/CD. Every push to main automatically builds, pushes to Artifact Registry, and deploys. GCS hosts CT slices, 3D model, and surgical videos
Data Sources
| Asset | Source | License |
|---|---|---|
| CT imaging (133 slices) | LIDC-IDRI-0001, The Cancer Imaging Archive | CC BY 3.0 |
| 3D anatomy model | NIH 3D Print Exchange / Sketchfab | - |
| Surgical videos | Open-access VATS lobectomy recordings | Per source license |
| Patient record | Synthetic FHIR-compliant demo data — no real clinical information | N/A |
| Drug database | Hardcoded pharmacology rules for 10 common intraoperative drugs | N/A |
| Complication protocols | Structured SCAT protocols derived from open surgical literature | N/A |
Google Cloud Services
| Service | Purpose |
|---|---|
| Vertex AI | Hosts gemini-2.5-flash-preview-native-audio-dialog — live audio, function calling, and image streaming |
| Cloud Run | Serverless container hosting for the FastAPI WebSocket backend |
| Cloud Build | CI/CD pipeline — auto-builds and deploys on every push to main |
| Artifact Registry | Stores Docker images built by Cloud Build |
| Cloud Storage (GCS) | Hosts CT scan slices, 3D anatomy GLB model, and surgical videos |
Challenges
Learning the Gemini Live API / ADK (the expected unknowns)
- Multi-agent live sessions — In a
run_live()session, ALL agents in the hierarchy must use a native audio model.gemini-2.5-flash(text-only) is silently accepted at definition time but causes runtime failures — discovered only after all agent code was written. - Sub-agent audio routing — Early builds filtered audio events by
event.author === 'ORION_Orchestrator'. This silenced all sub-agent responses. ADK's multi-agent live flow has sub-agents generate the audio; audio must be forwarded from all authors. FIRST_EXCEPTIONvsFIRST_COMPLETED— Usingasyncio.FIRST_COMPLETEDkilled multi-turn sessions after the first turn completed.FIRST_EXCEPTION(matching ADK's own implementation) was the fix.transfer_to_agentis not a callable tool — Early versions defined it as a tool in the root agent'stools=[]. This causedValueError: tool 'transfer_to_agent' not found. It's an ADK internal mechanism, not a user-defined tool.getDisplayMedia()permission dialogs — Calling it on every Screen_Advisor activation triggered a browser permission dialog each time the agent was routed to. Solved by acquiring the stream once and keeping it alive across activations (activate/deactivate/teardownAPI), withgetDisplayMedia()called only on first use.
Architecture and Design Challenges
- Zombie sessions — When
downstream_taskcaught an exception and returned normally,upstream_taskkept the WebSocket open indefinitely. The UI showed "active" but ORION had stopped responding. Fixed by re-raising after browser notification, triggeringFIRST_EXCEPTIONand clean teardown with auto-reconnect. - Screenshare deactivation on sub-agent routing — Vision mode deactivated whenever routing changed to a non-vision agent (e.g.,
Complication_Advisor). Fixed by adding complication and anatomy tools directly toScreen_Advisorso it handles those queries without transferring. - Continuous video cost — Sending 1 fps surgical video frames continuously throughout the session consumed significant Gemini input token budget. Vision mode is now system-managed: both streams activate only when
Screen_Advisoris the active agent. - Tool call deduplication — The Live API occasionally fires duplicate function call events within milliseconds. A 4-second deduplication cache (
Map<key, timestamp>) prevents double-execution of display tools. - CT/3D discoverability — The model didn't know that
navigate_ct()andreset_3d_view()also show their respective panels (not just navigate/reset). Explicit examples in the root agent instruction resolved this.
Accomplishments
- Built a fully working real-time multi-modal AI agent that can listen, see, think, and speak simultaneously in an environment (the OR) where latency and reliability matter more than almost anywhere else
- Nine-agent hierarchy routing correctly across all surgical domains — briefing, timeout, blood loss, drug safety, complications, anatomy, documentation, handoff, and visual analysis — all from natural speech
- Visual Assistant (Screen_Advisor) streams both surgical video and full screen capture to Gemini simultaneously, enabling the model to read external monitors, EMR screens, and operative fields without any API integration with hospital systems
- A grounding layer (ADK before/after callbacks + argument whitelists) that prevents hallucination on clinical data — the model cannot state a lab value it didn't retrieve from a tool
- Full Cloud Run deployment with automated CI/CD — push to
main, service is live in ~3 minutes - Zero-click surgical console: the entire UI is driven by voice. The surgeon can navigate CT scans, rotate 3D models, run WHO protocols, capture photos, and generate operative reports without touching anything
What's Next
- Real EHR integration — Replace synthetic patient data with FHIR-compliant live patient record pull.
- Validated drug database — Replace the hardcoded pharmacology rules with a live API-backed formulary (e.g., FDA DailyMed) with real allergy cross-checks against the patient's current medications.
- Post-op workflow — Extend ORION's session log into a structured FHIR operative note that can be pushed to the EHR directly at case close, solving the 15.6-day documentation delay problem end-to-end.
Built With
- adk
- cloudbuild
- cloudrun
- fastapi
- geminiliveapi
- googleartifactregistry
- html
- javascript
- python
- vertexai
- websocket


Log in or sign up for Devpost to join the conversation.