Inspiration

Interior design has always been a game of imagination — you stand in a room, try to picture a different sofa, and hope for the best. Mood boards and 3D planning tools help, but they require time, expertise, and patience most people simply don't have. We asked ourselves: what if your phone could just show you? What if an AI could look at your room, talk to you about your taste, and instantly render the result — live, photorealistic, right in front of you? That question became The Instant Architect.

What it does

The Instant Architect turns any smartphone into a real-time AI interior designer — no app install, no manual configuration, no waiting.

Point your camera at a room and start talking. A multimodal AI agent analyzes the live camera feed while holding a natural voice conversation with you. It proactively suggests furniture and décor tailored to the room's layout and your stated style preferences — less like a chatbot, more like an enthusiastic architect on call.

The magic happens the moment you agree to a suggestion. The AI autonomously triggers a render_furniture tool call, the frontend captures a high-resolution snapshot of the current camera view, and within seconds the chosen piece of furniture is photorealistically painted into the scene — perspective-correct, with accurate shadows, fully composited. The result appears as an interactive before/after slider overlaid directly on the live camera feed: an instant, tangible wow moment.

How we built it

The project is structured as a monorepo optimized for mobile browsers, built across three layers:

Frontend — React / Vite Handles camera and microphone access, maintains a persistent low-latency WebSocket connection for the continuous audio/video stream, and decodes and plays back the AI's 24kHz PCM audio responses directly in the browser.

Backend — Node.js / Express Acts as a secure API proxy (keeping all Google Cloud credentials server-side), manages the WebSocket relay logic for the live stream, and exposes a /api/inpaint REST endpoint that receives snapshot data and dispatches it to the image generation model.

Infrastructure — Google Cloud Run The entire stack is containerized as a lean Docker image and deployed serverlessly, enabling scalable, zero-maintenance hosting.

AI & APIs

  • Gemini Live API — multimodal, real-time voice + vision understanding and conversation
  • Google Imagen / Gemini Flash Image API — photorealistic in-painting of furniture into captured room snapshots
  • Google GenAI SDK — unified, secure integration layer across all Google AI services

Challenges we ran into

The steepest challenge was establishing a stable, low-latency, bidirectional audio connection between the browser and the backend. Achieving reliable real-time audio streaming over WebSockets — handling encoding, buffering, and browser playback without perceptible lag — required careful protocol design and became the decisive engineering problem of the project. Balancing stream stability against the responsiveness the experience demands pushed us to iterate hard on the architecture before we got it right.

Accomplishments that we're proud of

We built a genuinely seamless "see it, say it, see it rendered" loop — from live voice conversation to autonomous tool call to photorealistic in-painted result — that runs entirely in a mobile browser with no installation required. The before/after slider landing on top of the live camera feed the moment rendering completes is exactly the wow moment we set out to create, and seeing it work end-to-end in real time felt like a real milestone.

We're also proud of how cleanly the architecture holds together: a secure, serverless backend, a latency-optimized WebSocket relay, and a front-end that handles raw PCM audio playback natively — all shipped as a single containerized deployment.

What we learned

Working with the Google GenAI SDK taught us how to securely expose powerful AI capabilities without leaking credentials to the client. More importantly, building a multimodal agent that simultaneously processes live voice and video demonstrated how much emergent capability becomes available when modalities are combined rather than used in isolation. The conversation doesn't just guide the experience — it is the experience. Pairing streaming dialogue with autonomous tool-calling is what elevates this from a demo into something that feels like a real product.

What's next for The Instant Architect

  • Product catalog integration — connecting to real furniture retailers so suggested pieces can be purchased directly
  • Room memory — letting the AI remember previous sessions and track a room's evolving design over time
  • Multi-user collaboration — sharing a live session so a couple or a team can redesign a space together in real time
  • Expanded surface types — extending in-painting beyond furniture to wall colors, flooring, lighting fixtures, and architectural features
  • Native mobile app — packaging the experience as an iOS and Android app for even smoother camera and audio performance

Built With

Share this project:

Updates