Inspiration Traditional design tools are passive. We wanted to move from "filters" to "function," creating an agent that doesn't just visualize a room but actually understands its spatial logic and procurement needs in the Action Era.

What it does Echo Omni Architect transforms a video walkthrough into an interactive 3D twin. It uses Spatial-Temporal Vision to analyze a room, suggests a redesign with Thinking Level: HIGH, and calls tools to generate a real-world Bill of Materials (BOM).

How we built it We used the AI Studio Build Tab for the React/Three.js frontend and Antigravity for the Python agent. The "brain" leverages the Interactions API and Google Search Grounding to turn suggestions into actionable retail links.

Challenges we ran into Bridging the gap between 2D multimodal vision and 3D coordinate systems was a major hurdle. We also faced environment desyncs in AI Studio that required manual dependency management for Three.js and React 18.

Accomplishments that we're proud of Successfully implementing Stateful Reasoning. By using Thought Signatures, our agent "remembers" structural constraints identified in the first turn, ensuring subsequent redesigns stay architecturally sound and budget-compliant.

What we learned Gemini 3's native multimodal understanding is a game-changer. We learned how to transition from "one-shot" prompts to "marathon" agent loops where the AI acts as an autonomous orchestrator rather than a simple chatbot.

What's next for Echo Omni Architect We plan to integrate the Gemini Live API more deeply for real-time voice-to-3D manipulation and expand our tools to include Local Building Code Grounding for automatic permit verification.

Built With

Share this project:

Updates