Inspiration

Every 90 minutes, Gaganyaan passes beyond every relay satellite.

For 45 minutes of each orbit: complete silence. No ground control. No family. No mission psychologist. 400 kilometres above everything they have ever known, an astronaut experiencing psychological distress in that window has no one to call, no message to send, no system designed to notice.

This is not a technical limitation. It is a physical reality of orbital mechanics — structural loneliness built into the physics of orbit itself.

ISRO formally identified this gap in problem statement PS-ID-25175 — real-time psychological support for isolated astronauts — one of the hardest human factors challenges in India's human spaceflight programme. Based on publicly available research and documentation, no production implementation of this specification exists. MAITRI is built directly against that specification. (Note: this reflects publicly available information at the time of submission. ISRO has not announced any selected partner for this problem statement, and MAITRI makes no claim of official ISRO endorsement or selection.)

The question that drove the architecture was not "how do we build a chatbot?" It was: "what does it mean for an AI to already be there?"

A chatbot responds to requests. A message takes 3 seconds round-trip. In a psychological support context, 3 seconds of silence after someone reaches out is not a delay — it is rejection. The system that solves this problem cannot feel like a tool being queried. It must feel like a presence.

Gemini Live's sub-300ms bidirectional audio-video stream is the only technology that makes presence technically possible. Not a faster REST API — a different category of interaction entirely. One where the model is continuously aware, continuously listening, continuously there. The moment the relay link re-establishes after a blackout window, the first voice the astronaut hears is MAITRI's. Not 3 seconds later. Instantly.

That realisation became MAITRI.


What It Does

MAITRI is the voice India's first astronauts will hear during the 45 minutes every orbit when Earth cannot reach them.

The astronaut experience: MAITRI watches and listens continuously. It notices when voice cadence flattens. When affect shifts from an established session baseline. Before the astronaut has named what they are feeling, MAITRI has already begun responding — not with clinical language, but with presence. A grounding question. A moment of acknowledgement. The signal that something is listening, even here, even now.

The intelligence layer: Gemini Live processes voice and facial expression simultaneously — catching the signals humans miss in themselves. Gemini Flash scores arousal and valence every 5 seconds from the video stream, building a psychological picture across hours rather than moments. Google ADK's three-state protocol — BASELINE_MONITORING, ANOMALY_FLAGGED, ACTIVE_INTERVENTION — escalates from passive monitoring to active support to ground alert, matching the triage logic that clinical psychologists use in high-isolation deployments.

The ground control story: While the astronaut speaks with MAITRI, ground psychologists watch in real time. The mission control dashboard streams affect scores, state transitions, and intervention history — the complete psychological picture, delivered the moment the blackout window ends. MAITRI is the first responder. The human psychologist makes every clinical decision from that point forward. This is not a limitation of the system. It is the design of the system. Responsible AI means knowing exactly where the boundary is — and building the architecture around that boundary, not around it.


How I Built It

The central architectural question was latency. A REST completion API adds 3–4 seconds between detection and response. In a psychological intervention, 4 seconds is not a delay — it is the pause that confirms the astronaut is alone. Gemini Live's WebSocket stream eliminated that pause. Every subsequent architectural decision follows from that one constraint.

The transport layer: LiveKit handles WebRTC audio and video between the Android client and the Python backend on Cloud Run. WebRTC's jitter buffer, packet loss recovery, and reconnection logic belong in infrastructure — not in application code. LiveKit provides that infrastructure so the application layer can focus entirely on the intelligence layer.

The intelligence layer: Google ADK runs the three-state psychological protocol as a single agent with dynamic prompt injection via LiveRequestQueue. Not three separate agents with handoff logic — one agent whose context shifts as the protocol state escalates. Agent handoff latency at the peak emotional moment of a session is not acceptable. A single agent with injected context delivers zero-latency protocol transitions with full conversation continuity preserved.

The affect pipeline: Gemini Flash on Vertex AI scores arousal and valence from video frames every 5 seconds via a parallel async pipeline completely isolated from the Live session. MAITRI never randomly comments on what it sees in the camera — the scoring is invisible to the conversation, visible only to the state machine and the ground dashboard. The EventDispatcher pattern fans out every affect score simultaneously to the DataChannel overlay, the SSE telemetry queue, Cloud Monitoring custom metrics, and the protocol state machine — four consumers, one score, zero coupling.

The state machine: BASELINE_MONITORING → ANOMALY_FLAGGED → ACTIVE_INTERVENTION is a latching state machine persisted in Firestore in real time. State only moves upward automatically — it never auto-resets. A crisis that resolves in 30 seconds still happened. Only a credentialed ground controller issuing POST /api/reset-session clears the protocol. Firestore's onSnapshot listener propagates every state change to the ground dashboard in under 100ms without polling — the dashboard reflects reality the moment reality changes.

The alert architecture: When ACTIVE_INTERVENTION triggers, three things happen simultaneously: a reliable DataChannel message reaches the Android overlay with a correlated alert_id, a fire-and-forget Cloud Pub/Sub dispatch notifies ground systems without blocking the audio pipeline, and Firestore's real-time listener drives the SSE stream to the Svelte dashboard. The same alert_id flows through every channel — deduplication is guaranteed regardless of which channel arrives first.

The deployment layer: Cloud Build deploys the Python backend to Cloud Run on every push to backend/. GitHub Actions deploys the Svelte 5 dashboard to Firebase Hosting on every push to dashboard/. Two services. Two pipelines. Zero manual steps. Built solo in 16 days — automated deployment on every commit via Cloud Build and GitHub Actions meant no deployment risk on demo day and no manual coordination between the two systems.

8 Google Cloud services — each genuinely wired, none aspirational: Gemini Live API, Vertex AI Gemini Flash, Google ADK, Cloud Run, Firestore, Cloud Pub/Sub, Cloud Storage, Cloud Monitoring.


Challenges I Ran Into

The emotion pipeline isolation problem: Getting affect scores from the video stream into the ADK state machine without contaminating the Live conversation required three architecture revisions. The naive approach — polling the state machine from the Live session event loop — created race conditions and audio stuttering. The solution: report_emotion_state as an ADK tool call, allowing Gemini to report its own affect observations into the state machine server-side, while a completely separate async pipeline scores the video independently. Two affect inputs. Zero cross-contamination.

The responsible AI constraint as architecture: A psychological support system that generates clinical-sounding responses violates Google's acceptable use policy. The resolution required reframing the entire product: MAITRI is a first-responder signal layer, not a therapeutic agent. The ACTIVE_INTERVENTION state has exactly one permitted output — a single grounding sentence, then silence, then a ground alert. No clinical advice. No diagnosis. No attempt to resolve the situation. The constraint that came from genuine safety concerns produced better architecture than any feature requirement could have. The guardrail is not a disclaimer. It is the design.

The demo authenticity problem: Psychological anomaly detection requires a stable baseline established over days of normal affect data. A demo session has 4 minutes. The solution: seed_firestore.py establishes a synthetic 6-day baseline before the demo — stable arousal and valence scores that give the state machine a real deviation reference. When the live session begins, the affect scoring pipeline has a genuine baseline to measure against. The intervention MAITRI triggers in the demo is not artificially forced — it is the natural result of real emotional expression measured against a real baseline.


Accomplishments I'm Proud Of

Turning a safety constraint into an architectural principle. The requirement that MAITRI cannot give clinical advice is not a footnote — it determines the entire ACTIVE_INTERVENTION state behaviour. One sentence. Then silence. Then the human takes over. The responsible AI boundary is the clearest product decision in the system, and it came from the hardest constraint.

A single ADK agent across three protocol states. Dynamic [SYSTEM_CONTEXT] hint injection into a live Gemini session mid-conversation — no reconnection, no session restart, no conversation context lost. The astronaut never experiences a gap. MAITRI simply becomes more focused. This capability, unique to ADK's LiveRequestQueue exposure, is what makes the three-state protocol feel like presence rather than mode-switching.

Production deployment infrastructure on a solo build. Cloud Build, GitHub Actions, Cloud Run, Firebase Hosting — automated on every commit. A solo 16-day build that deploys with the same rigour as a team product. The cloudbuild.yaml, deploy.sh, and GitHub Actions workflows are in the public repository. The infrastructure is not aspirational — it is the infrastructure the demo runs on.

The moment in the demo when the system works.

There is a moment in the demo where the protocol state transitions from BASELINE_MONITORING to ANOMALY_FLAGGED — not because a button was pressed, but because the affect scoring pipeline detected real deviation from the session baseline in real time. The dashboard updates. The Android overlay appears. MAITRI's tone shifts. The entire 8-service pipeline fires in under one second. That moment is the system working exactly as designed.


What I Learned

Gemini Live is not a faster REST API. It is a different category of interaction — one where the model is a continuous presence rather than a request-response endpoint. Designing for presence requires different architecture than designing for answers. The state machine, the hint injection, the parallel affect pipeline — none of these patterns exist in a REST-based system. They exist because Gemini Live makes continuous presence technically possible, and continuous presence requires continuous architecture.

The hardest safety constraints produce the best product decisions. The requirement that MAITRI cannot provide clinical advice felt like a limitation during early design. It became the clearest architectural boundary in the system. Constraints that come from genuine safety concerns — not feature requests, not technical limitations — produce decisions that hold under pressure. The ACTIVE_INTERVENTION path is the most reviewed, most tested, most documented part of the codebase. Because it has to be.

Contracts before code is not process overhead — it is the only way a solo developer ships a production system in 16 days. Every cross-platform wire format — Python to Android to Svelte — was defined as a typed Pydantic model before any route was written. Every state machine transition condition was specified before the state machine was implemented. The 16-day timeline was possible because integration was never a surprise.


What's Next for MAITRI

The production path is clear.

MAITRI is built on the same Google Cloud infrastructure that scales to real mission deployment. Cloud Run scales automatically. The Android client runs on any Android device. The ground dashboard runs in any browser. The gap between demo and deployment is narrower than it appears.

Gemini Nano offline fallback — the architecture is already designed. During blackout windows, cloud connectivity drops entirely. The next implementation milestone is full offline capability via Gemini Nano on-device — MAITRI active with zero cloud dependency during the 45 minutes when it matters most. The online-offline handoff architecture is specified. Implementation is the remaining work.

Presentation to ISRO's Human Spaceflight Centre. ISRO's problem statement PS-ID-25175 identified this need. MAITRI is a working implementation built against that specification. The intention is to present MAITRI to the ISRO team as a reference architecture for Gaganyaan psychological support — not as a finished product, but as proof that the technical problem is solved and the deployment path exists on Google Cloud infrastructure India already uses.

Beyond Gaganyaan. The three-state psychological protocol and multimodal affect detection pipeline are mission-agnostic. Submarine crews. Antarctic research stations. Deep-sea operations. Any high-isolation, high-stakes human deployment where a psychologist cannot be present but distress cannot be ignored. MAITRI's architecture was designed for space. Its application is anywhere humans go alone.

India's first astronauts deserve the best AI that exists today. MAITRI is that AI — built on Google Cloud, ready when they are.

Built With

  • big-query
  • cloud-firestore
  • cloud-run
  • firebase
  • gemini-live-api
Share this project:

Updates