Project details

The Problem

Every year, firefighters walk into buildings they've never seen. Search-and-rescue teams navigate rubble without a map. Visually impaired people enter new spaces with no spatial context. Emergency sweeps happen by memory and guesswork.

AI could help if it could actually see.

Today's AI agents are spatially blind. They process pixels and text, but have no concept of where things are in 3D space. Ask a state-of-the-art LLM "how many chairs are in this room?" while pointing a camera at it — and it will hallucinate an answer. It can describe what an object looks like, but not where it lives in space, how far away it is, or what's around the corner.

The tools that do provide spatial maps, LiDAR rigs, depth cameras, survey hardware, cost tens of thousands of dollars and require trained operators. Spatial intelligence has stayed locked inside robotics labs.

Open Reality breaks that barrier.


What We Built

Open Reality is a cloud-native spatial AI platform. Describe your task. Point a phone. An AI agent maps your space in real time, finds your targets, and answers spatial questions from any browser, from anywhere.

The experience:

  1. Type your mission in plain English: "I'm a paramedic doing a safety sweep."
  2. Open Reality generates a spatial plan: what to look for, how to move through the space.
  3. Open the camera stream on your phone with a link.
  4. Walk the space. A 3D map builds live.
  5. The agent finds your targets automatically, pins them in 3D, and reasons & answers follow-up questions when you're done.

Where Modal makes this real:

The inference pipeline running under the hood, a 1-billion-parameter vision model, CLIP scoring on every submap, SAM3 segmentation, is far too heavy for a laptop and too latency-sensitive for a slow API call. Modal runs it on an H100 GPU, kept warm between frames so nothing drops. Modal Volumes persist the model weights so first inference is seconds, not minutes. A Modal tunnel gives the phone camera the secure HTTPS connection it needs to stream. modal deploy — and it's live.

This is Modal the way it's meant to be used: serious inference, at real-time speed, accessible from a link.


Why It Matters

The people who need spatial awareness most are the ones who can least afford to wait. A firefighter doesn't have time to set up hardware. A paramedic doing a sweep is already on the clock. A blind person navigating a new building deserves a tool that just works.

Open Reality works with the phone already in your pocket. Deployed in ten seconds. No installation. No expertise. No hardware.

For developers: an open agentic platform. Bring your own queries, extend the agent's tools, connect via WebSocket. The blueprint for spatial + language AI is available to anyone.


Agentic Architecture

Open Reality doesn't just map. It reasons about what it sees.

Intent → Plan: Before a frame is captured, the agent reads your goal and generates a typed spatial plan: which objects to find, what route to take, calibrated to your specific context. A firefighter's plan surfaces extinguishers and standpipe connections. A crime scene investigator's plan surfaces evidence markers. The agent understands the difference.

Continuous Detection: As the map grows, the agent automatically scans every new submap for your targets. No user action needed. The 3D world populates with labeled, located objects in real time.

Retroactive Re-search: Add a new target mid-scan — "find the AED too" — and the agent immediately re-runs detection on everything it's already seen. Its knowledge of the space updates instantly, backwards in time.

Spatial Q&A: After the scan, an AI assistant holds the full 3D context: every detected object, its exact position, the camera's path through the space. "Is the fire extinguisher accessible from the north stairwell?" gets answered with actual geometry instead of a guess.


Impact

Open Reality is a blueprint: what it looks like when you give AI agents a real sense of space and make it accessible to anyone. First responders, accessibility tools, robotics, construction, disaster response. Any domain where knowing where something is in 3D space is a matter of safety.

The hardware barrier is gone. The deployment barrier is gone. What remains is the mission.

Built With

+ 5 more
Share this project:

Updates