MERA-3: Multimodal Emergency Reasoning Engine
đź’ˇ Tagline: Because in an emergency, every second needs a reason.
đź’ˇ Inspiration
In a medical emergency, the greatest enemy isn't just time—it is panic. Research indicates that nearly 80% of bystanders "freeze" or make critical errors during life-threatening situations due to high stress.
We asked ourselves: What if we had an assistant that doesn't just talk, but actually sees, understands, and reasons through a crisis with a calm, expert mind?
The launch of Gemini 3 provided the missing piece of the puzzle. With its breakthrough in real-time multimodal perception and low-latency reasoning, we were inspired to build MERA-3—a guardian agent designed to bridge the gap between a victim's collapse and the arrival of professional help.
🚀 How We Built It
MERA-3 is engineered as an Autonomous Agentic Workflow rather than a simple chatbot:
Vision Intake Pipeline:
We utilized a high-speed WebRTC stream to feed live camera data into the Gemini 3 Pro API, enabling the model to "see" the patient and their environment.Multimodal Chain-of-Thought (CoT):
We implemented a specialized prompting architecture. Instead of asking for a quick fix, we force the model to:- Observe: Describe the patient's posture and surroundings.
- Assess: Identify hazards (e.g., sharp furniture, fire risks).
- Reason: Determine the likelihood of specific conditions (e.g., fainting vs. cardiac arrest).
- Observe: Describe the patient's posture and surroundings.
Real-Time Audio Loop:
Using Gemini 3's native multimodal capabilities, we integrated voice-out instructions so the user can keep their hands free and eyes on the victim.Temporal Memory:
We leveraged the long-context window to track the patient’s state over time, allowing the AI to notice if breathing has slowed or if a seizure has lasted too long.
đź§ What We Learned
Building MERA-3 taught us that AI "Reasoning" is the ultimate tool for safety.
We discovered that Gemini 3 can distinguish between subtle cues—like the difference between a person sleeping and a person unconscious—based on muscle tone and environmental context.
We also explored the mathematics of Situational Confidence. We defined a heuristic for the system’s reliability:
$$ R_{p} = \lim_{\Delta t \to 0} \left( \frac{\alpha V_{s} + \beta A_{c}}{\delta L} \right) $$
Where:
- (R_{p}) = Response Precision
- (V_{s}) = Visual Signalling
- (A_{c}) = Acoustic Cues
- (\delta L) = Latency
- (\alpha, \beta) = weighted coefficients for sensory priority
We learned that minimizing latency ((\delta L)) is more critical than high-resolution imagery in life-saving scenarios.
đźš§ Challenges We Faced
Latency-Depth Tradeoff:
Our biggest hurdle was getting Deep Reasoning in Real-Time.
We solved this by using a hybrid approach: Gemini 3 Flash for immediate hazard detection and Gemini 3 Pro for complex medical logic.Handling Ambiguity:
In a dark room or with a shaky camera, AI can hallucinate.
We implemented Uncertainty-Aware Prompting, where the model is trained to say "I cannot see clearly, please move the light" rather than giving a blind diagnosis.Ethical Guardrails:
Ensuring the AI acts as an assistant and not a licensed doctor was a major design challenge.
We integrated strict safety protocols to prioritize calling emergency services (EMS) as the primary action.
🛠️ Built With
- Gemini 3 Pro & Flash (Multimodal Reasoning Engine)
- Google Cloud Vertex AI (Model Deployment)
- Flutter (Cross-platform Mobile Interface)
- WebRTC (Low-latency Video Streaming)
Built With
- category
- cloud
- fastapi-(python)-chosen-for-its-asynchronous-performance-real-time-video
- firebase-auth-(for-secure-user-login)-cloud-functions
- firestore-(for-real-time-session-logging-and-emergency-history)-authentication
- flutter-(for-cross-platform-mobile/web-camera-access)-cloud-services
- functions
- gemini-3-flash-(low-latency-vision)"-platforms
- gemini-3-pro-(multimodal-reasoning)
- gemini-multimodal-live-api
- google-cloud-platform-(gcp)
- google-maps-platform-(for-victim-location-tracking)"-backend-framework
- sms/call
- technology-used-core-ai-model
- vertex-ai"-apis
- webrtc-(for-sub-second-latency-video-streaming)-databases
Log in or sign up for Devpost to join the conversation.