🕵️‍♂️ Inspiration

Renting a home shouldn't be a gamble. I built OmniScout to solve the "Black Box" of property inspections. My mission was to create an agent that doesn't just talk at you, but investigates with you—using spatial reasoning to point out the structural risks that a human eye might miss.

🏗️ What it does: The "Liquid Glass" HUD

OmniScout is an Agentic Orchestration Engine I developed to guide users through professional-grade structural audits.

  • Proactive Visual Grounding: The AI doesn't just describe a flaw; it points to it. Using Gemini 3's native coordinate system, I implemented Dynamic HUD Arrows and "Liquid Glass" overlays that track structural risks in real-time.
  • Director-Led Walkthrough: I designed the agent to act as a forensic director, using verbal cues and UI signals to ensure the user audits high-risk areas like foundation corners, ceiling vents, and plumbing junctions.
  • Deep Scan Protocol: For critical flaws, I mandated the agent to evaluate three distinct causal hypotheses (e.g., foundation settling vs. thermal expansion) before finalizing a finding.

🧠 The "Action Era" Architecture

I engineered a Quad-Agent Mesh to solve the Latency vs. Intelligence trade-off as a solo developer:

  1. Vision Agent (Gemini 2.5 Flash): The "Face & Voice." I used this for the 10 FPS live stream to provide zero-latency interaction and proactive scanning.
  2. Central Intelligence (Gemini 3 Pro): The "Brain." I utilized [CONSULT_CENTRAL] tags to trigger deep forensic reasoning and generate precise visual grounding coordinates.
  3. Research Agent (Gemini 3 Flash): The "Background Worker." This agent autonomously fetches neighborhood-level risks (flood, crime, permits) while the user is walking.
  4. Resilience Report Agent (Gemini 3 Pro) The "Reporting Brain.". This agent analyzes all findings from the session, compares them with research data and generate a comprehensive Resilience Report.

Key Mechanism: Thought Signatures

I implemented Stateful Thought Signatures—serialized snapshots of the AI's reasoning. This allows the agent to maintain context across the entire walkthrough. If you ask about a crack you saw earlier, the AI "remembers" the specific coordinates and its previous analysis without re-processing the stream.

🛠️ How I built it

  • Framework: React 19 + Vite + Tailwind CSS.
  • UI Language: Liquid Glass—a translucent, adaptive design material I developed to refract the camera background, keeping the user focused on the physical environment.
  • Grounding: Native Gemini 3 [ymin, xmin, ymax, xmax] coordinate system.
  • IDE: Google Antigravity (ADK) for rapid multi-agent prototyping.

🚀 Challenges I overcame: The Latency Gap

The biggest challenge was the reasoning delay of the Pro model. I solved this by creating an Asynchronous Handoff. I programmed the Vision agent to provide speculative "filler" dialogue—explaining what it's looking for—while the Central Intelligence model performs its deep forensic "thinking" in the background.

🏅 Accomplishments that I'm proud of

As a solo developer, I'm proud of bridging the gap between digital reasoning and physical space. Seeing my "Liquid Glass" arrow snap onto a hairline crack in real-time proved that a single developer can build a reliable partner for the physical world using the latest AI agents.

📖 What I learned

Agency is more important than hardware. By focusing on Visual Grounding and Semantic HUDs rather than specific phone features, I created a tool that is more resilient, more intuitive, and ultimately more useful for the average renter.

Share this project:

Updates