MERA-3: Multimodal Emergency Reasoning Engine

💡 Tagline: Because in an emergency, every second needs a reason.

💡 Inspiration

In a medical emergency, the greatest enemy isn't just time—it is panic. Research indicates that nearly 80% of bystanders "freeze" or make critical errors during life-threatening situations due to high stress.

We asked ourselves: What if we had an assistant that doesn't just talk, but actually sees, understands, and reasons through a crisis with a calm, expert mind?

The launch of Gemini 3 provided the missing piece of the puzzle. With its breakthrough in real-time multimodal perception and low-latency reasoning, we were inspired to build MERA-3—a guardian agent designed to bridge the gap between a victim's collapse and the arrival of professional help.

🚀 How We Built It

MERA-3 is engineered as an Autonomous Agentic Workflow rather than a simple chatbot:

Vision Intake Pipeline:
We utilized a high-speed WebRTC stream to feed live camera data into the Gemini 3 Pro API, enabling the model to "see" the patient and their environment.
Multimodal Chain-of-Thought (CoT):
We implemented a specialized prompting architecture. Instead of asking for a quick fix, we force the model to:
1. Observe: Describe the patient's posture and surroundings.
2. Assess: Identify hazards (e.g., sharp furniture, fire risks).
3. Reason: Determine the likelihood of specific conditions (e.g., fainting vs. cardiac arrest).
Real-Time Audio Loop:
Using Gemini 3's native multimodal capabilities, we integrated voice-out instructions so the user can keep their hands free and eyes on the victim.
Temporal Memory:
We leveraged the long-context window to track the patient’s state over time, allowing the AI to notice if breathing has slowed or if a seizure has lasted too long.

🧠 What We Learned

Building MERA-3 taught us that AI "Reasoning" is the ultimate tool for safety.

We discovered that Gemini 3 can distinguish between subtle cues—like the difference between a person sleeping and a person unconscious—based on muscle tone and environmental context.

We also explored the mathematics of Situational Confidence. We defined a heuristic for the system’s reliability:

$$ R_{p} = \lim_{\Delta t \to 0} \left( \frac{\alpha V_{s} + \beta A_{c}}{\delta L} \right) $$

Where:

(R_{p}) = Response Precision
(V_{s}) = Visual Signalling
(A_{c}) = Acoustic Cues
(\delta L) = Latency
(\alpha, \beta) = weighted coefficients for sensory priority

We learned that minimizing latency ((\delta L)) is more critical than high-resolution imagery in life-saving scenarios.

🚧 Challenges We Faced

Latency-Depth Tradeoff:
Our biggest hurdle was getting Deep Reasoning in Real-Time.
We solved this by using a hybrid approach: Gemini 3 Flash for immediate hazard detection and Gemini 3 Pro for complex medical logic.
Handling Ambiguity:
In a dark room or with a shaky camera, AI can hallucinate.
We implemented Uncertainty-Aware Prompting, where the model is trained to say "I cannot see clearly, please move the light" rather than giving a blind diagnosis.
Ethical Guardrails:
Ensuring the AI acts as an assistant and not a licensed doctor was a major design challenge.
We integrated strict safety protocols to prioritize calling emergency services (EMS) as the primary action.

🛠️ Built With

Gemini 3 Pro & Flash (Multimodal Reasoning Engine)
Google Cloud Vertex AI (Model Deployment)
Flutter (Cross-platform Mobile Interface)
WebRTC (Low-latency Video Streaming)

Built With

category
cloud
fastapi-(python)-chosen-for-its-asynchronous-performance-real-time-video
firebase-auth-(for-secure-user-login)-cloud-functions
firestore-(for-real-time-session-logging-and-emergency-history)-authentication
flutter-(for-cross-platform-mobile/web-camera-access)-cloud-services
functions
gemini-3-flash-(low-latency-vision)"-platforms
gemini-3-pro-(multimodal-reasoning)
gemini-multimodal-live-api
google
google-cloud-platform-(gcp)
google-maps-platform-(for-victim-location-tracking)"-backend-framework
sms/call
technology-used-core-ai-model
vertex-ai"-apis
webrtc-(for-sub-second-latency-video-streaming)-databases

Updates

asma aslam started this project — Jan 25, 2026 01:16 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.