MERA-3: Multimodal Emergency Reasoning Engine

đź’ˇ Tagline: Because in an emergency, every second needs a reason.


đź’ˇ Inspiration

In a medical emergency, the greatest enemy isn't just time—it is panic. Research indicates that nearly 80% of bystanders "freeze" or make critical errors during life-threatening situations due to high stress.

We asked ourselves: What if we had an assistant that doesn't just talk, but actually sees, understands, and reasons through a crisis with a calm, expert mind?

The launch of Gemini 3 provided the missing piece of the puzzle. With its breakthrough in real-time multimodal perception and low-latency reasoning, we were inspired to build MERA-3—a guardian agent designed to bridge the gap between a victim's collapse and the arrival of professional help.


🚀 How We Built It

MERA-3 is engineered as an Autonomous Agentic Workflow rather than a simple chatbot:

  • Vision Intake Pipeline:
    We utilized a high-speed WebRTC stream to feed live camera data into the Gemini 3 Pro API, enabling the model to "see" the patient and their environment.

  • Multimodal Chain-of-Thought (CoT):
    We implemented a specialized prompting architecture. Instead of asking for a quick fix, we force the model to:

    1. Observe: Describe the patient's posture and surroundings.
    2. Assess: Identify hazards (e.g., sharp furniture, fire risks).
    3. Reason: Determine the likelihood of specific conditions (e.g., fainting vs. cardiac arrest).
  • Real-Time Audio Loop:
    Using Gemini 3's native multimodal capabilities, we integrated voice-out instructions so the user can keep their hands free and eyes on the victim.

  • Temporal Memory:
    We leveraged the long-context window to track the patient’s state over time, allowing the AI to notice if breathing has slowed or if a seizure has lasted too long.


đź§  What We Learned

Building MERA-3 taught us that AI "Reasoning" is the ultimate tool for safety.

We discovered that Gemini 3 can distinguish between subtle cues—like the difference between a person sleeping and a person unconscious—based on muscle tone and environmental context.

We also explored the mathematics of Situational Confidence. We defined a heuristic for the system’s reliability:

$$ R_{p} = \lim_{\Delta t \to 0} \left( \frac{\alpha V_{s} + \beta A_{c}}{\delta L} \right) $$

Where:

  • (R_{p}) = Response Precision
  • (V_{s}) = Visual Signalling
  • (A_{c}) = Acoustic Cues
  • (\delta L) = Latency
  • (\alpha, \beta) = weighted coefficients for sensory priority

We learned that minimizing latency ((\delta L)) is more critical than high-resolution imagery in life-saving scenarios.


đźš§ Challenges We Faced

  • Latency-Depth Tradeoff:
    Our biggest hurdle was getting Deep Reasoning in Real-Time.
    We solved this by using a hybrid approach: Gemini 3 Flash for immediate hazard detection and Gemini 3 Pro for complex medical logic.

  • Handling Ambiguity:
    In a dark room or with a shaky camera, AI can hallucinate.
    We implemented Uncertainty-Aware Prompting, where the model is trained to say "I cannot see clearly, please move the light" rather than giving a blind diagnosis.

  • Ethical Guardrails:
    Ensuring the AI acts as an assistant and not a licensed doctor was a major design challenge.
    We integrated strict safety protocols to prioritize calling emergency services (EMS) as the primary action.


🛠️ Built With

  • Gemini 3 Pro & Flash (Multimodal Reasoning Engine)
  • Google Cloud Vertex AI (Model Deployment)
  • Flutter (Cross-platform Mobile Interface)
  • WebRTC (Low-latency Video Streaming)

Built With

  • category
  • cloud
  • fastapi-(python)-chosen-for-its-asynchronous-performance-real-time-video
  • firebase-auth-(for-secure-user-login)-cloud-functions
  • firestore-(for-real-time-session-logging-and-emergency-history)-authentication
  • flutter-(for-cross-platform-mobile/web-camera-access)-cloud-services
  • functions
  • gemini-3-flash-(low-latency-vision)"-platforms
  • gemini-3-pro-(multimodal-reasoning)
  • gemini-multimodal-live-api
  • google
  • google-cloud-platform-(gcp)
  • google-maps-platform-(for-victim-location-tracking)"-backend-framework
  • sms/call
  • technology-used-core-ai-model
  • vertex-ai"-apis
  • webrtc-(for-sub-second-latency-video-streaming)-databases
Share this project:

Updates