ARCHITECT — Live Spatial Intelligence

Inspiration

Interior design is one of those domains where imagination constantly outpaces execution. You can picture a room transformed — warmer tones, mid-century furniture, open shelving — but bridging that mental image to reality requires expensive consultants, mood boards, and weeks of iteration. We wanted to collapse that gap to zero: point a camera, speak your vision, and see it happen in real-time.

Gemini's Live API made this possible in a way nothing else could. Voice + vision + real-time streaming + tool calls in a single session meant we could build a genuine conversational design agent — not a chatbot with image attachments, but an agent that truly sees your room and speaks back with design intelligence.

What It Does

ARCHITECT is a real-time interior design agent. Here's the full flow:

Authenticate — You sign in with Auth0. ARCHITECT uses Token Vault to securely store OAuth tokens for the services your agent will use on your behalf.
Scan — Point your camera at a room. ARCHITECT sees it via Gemini Live API, describes what it observes (furniture, dimensions, lighting, color palette, style), and calls analyze_room to persist the spatial data.
Design — Say what style you want: "Make it Japandi" or "Go full mid-century modern." ARCHITECT generates a photorealistic redesign image using Imagen 3, streams it directly to your browser, and talks you through the transformation.
Shop — Ask about the furniture. ARCHITECT searches for matching real products with prices from retailers (IKEA, West Elm, CB2), builds a complete shopping list, and presents it in a dedicated panel.

Everything happens through voice — no typing, no clicking through menus. It feels like having a talented interior designer live on a home renovation show, accessible to anyone.

How We Built It

Backend (Python + Google ADK + Auth0):

FastAPI with a WebSocket endpoint handling binary PCM audio + JPEG video frames + JSON control messages simultaneously
Google ADK LlmAgent with Gemini 2.0 Flash Live API as the orchestrator — real-time voice/vision streaming with tool calling
Five custom ADK FunctionTool instances: analyze_room, generate_redesign, generate_color_palette, search_furniture, build_complete_shopping_list
Auth0 for AI Agents Token Vault: User OAuth tokens (Google Cloud Storage, Firestore) stored in Token Vault and retrieved per-request by the agent when calling tools — no static service account credentials exposed to users
Imagen 3 (gemini-2.0-flash-exp-image-generation) for photorealistic room redesigns
Firestore for session persistence (room analyses, designs, shopping lists)
Cloud Storage for generated image hosting

Frontend (React + TypeScript):

Auth0 React SDK for authentication and session management
Custom useAudioCapture hook: AudioWorklet pipeline capturing mic at 48kHz, downsampling to 16kHz PCM
Custom useAudioPlayback hook: AudioWorklet pipeline upsampling 24kHz agent speech to 48kHz for playback
Custom useCameraCapture hook: 1fps JPEG frame capture from webcam, base64-encoded over WebSocket
Tabbed UI: Chat transcript | Design Gallery | Shopping List

Infrastructure:

Deployed on Cloud Run with session affinity (required for persistent WebSocket connections)
Auth0 tenant configured with Token Vault for Google Cloud OAuth credential delegation

Auth0 Token Vault Integration

ARCHITECT uses Auth0 for AI Agents Token Vault to handle secure credential delegation for the agent's tool calls.

When you sign in, Auth0 authenticates your identity and ARCHITECT's backend uses Token Vault to store scoped OAuth tokens for the external services the agent uses on your behalf. When the agent calls generate_redesign (which writes to Cloud Storage) or analyze_room (which writes to Firestore), it fetches the relevant token from Token Vault at invocation time rather than using static shared server-side credentials.

The key benefit: your token is yours. If you revoke access, the agent immediately loses its ability to write to your storage or read your session data. No shared credentials, no ambient authorization — every agent action is explicitly tied to your identity and the scopes you consented to. This is the foundation of trustworthy agentic AI.

Challenges We Faced

Binary multiplexing: Getting audio, video frames, and JSON events over a single WebSocket required a custom binary framing protocol. We encode a JSON header, a null byte separator, then the raw PCM/JPEG payload. The backend uses the same protocol in reverse for audio streaming back to the browser.

ADK live session lifecycle: Managing the async generator from runner.live().stream() alongside WebSocket message handling required careful task management to avoid race conditions on session cleanup.

AudioWorklet precision: The PCM capture worklet needed to handle the 48kHz → 16kHz downsampling correctly to avoid aliasing artifacts that Gemini's speech recognition would struggle with.

Token Vault integration: Wiring Token Vault into the agent's tool execution path meant passing the Auth0-issued access token through to each tool call, then exchanging it for the downstream service token at the point of use — not at session start. This keeps token lifetimes minimal and scoped.

What We Learned

Google ADK's LlmAgent + FunctionTool pattern makes it remarkably clean to give an agent tools with full docstring-based schema inference
Gemini Live API's multimodal input (audio + images simultaneously) is genuinely novel — most live voice APIs only handle audio
Auth0 Token Vault is the right abstraction for agentic credential management: tokens should be scoped, revocable, and tied to user consent — not baked into the server
Cloud Run's --session-affinity flag is essential for WebSocket applications — without it, load balancing breaks persistent connections

What's Next

Augmented reality overlay: Show the redesigned room superimposed on the live camera feed
3D spatial mapping: Use depth estimation to understand room geometry before generating the redesign
Direct purchase integration: Connect to retailer APIs with Token Vault-stored OAuth tokens for live pricing and add-to-cart functionality
Multi-room sessions: Maintain spatial context across an entire home, not just one room

📝 Blog Post: Giving AI Agents Keys They Can't Copy — Building with Auth0 Token Vault

⭐ Note to judges: This section is our Bonus Blog Post submission, covering our Token Vault integration experience in ARCHITECT. It is materially different from the project description above.

The Problem with Agent Credentials

When you build an AI agent that calls external APIs — cloud storage, databases, third-party services — you immediately hit a credential problem. How does the agent authenticate to those services?

The naive answer is a service account or API key baked into the server. It works in development, but it's the wrong model for multi-user production applications. Every user's agent runs under the same identity. If one session is compromised, all users are exposed. There's no revocation granularity. There's no per-user audit trail. The agent has ambient authorization to act on behalf of anyone who has ever used the app — which is exactly the kind of over-permissioned access that makes security teams nervous about deploying AI agents at all.

Auth0 for AI Agents' Token Vault solves this at the right level of abstraction, and integrating it into ARCHITECT changed how we think about agent architecture fundamentally.

How Token Vault Changes the Architecture

Token Vault stores OAuth tokens on behalf of individual users, scoped to the specific APIs they've explicitly consented to grant. In ARCHITECT's case:

User authenticates with Auth0 — standard PKCE flow in the React frontend using the Auth0 React SDK
Auth0 issues a user-scoped access token — ARCHITECT's backend receives this via the WebSocket connection handshake
Backend calls Token Vault — exchanges the user's Auth0 token for the downstream service credential (Google Cloud OAuth) that was stored when the user first connected their Google account
Agent tools use the retrieved token — generate_redesign writes to that user's Cloud Storage path; analyze_room writes to that user's Firestore document

The downstream token is never long-lived in our backend's memory. It's fetched from Token Vault at tool invocation time, used, and discarded. Token Vault handles refresh, rotation, and revocation transparently.

What the Implementation Looks Like

From an implementation perspective, the change was surgical and non-disruptive to the core agent logic. We added a TokenVaultClient utility that wraps Auth0's Token Vault API. Each tool function receives a user_token parameter. Before calling Cloud Storage or Firestore, the tool calls token_vault.get_token(user_token, scope="google-cloud") and uses the returned credential for that operation.

The agent's system prompt doesn't change. The ADK tool schema doesn't change. The Gemini Live session doesn't change. Token Vault integration lives entirely in the tool implementation layer — which is exactly where it belongs. The agent doesn't need to know anything about credential management; it just calls tools, and the tools are responsible for authenticating to their downstream services correctly.

The Security Model This Enables

With Token Vault, ARCHITECT's security model becomes genuinely per-user:

Per-user identity: every Cloud Storage write and Firestore read is tied to an individual Auth0 user identity, not a shared service account
Explicit consent: users grant Google Cloud access during onboarding; Token Vault stores and manages that delegation
Revocability: a user can revoke Google Cloud access from their Auth0 profile, immediately breaking the token chain — the agent can no longer act on their behalf for any future tool calls
Minimal blast radius: if a user's token is compromised, only that user's data is at risk — not the entire application's service account

Lessons for Other Agent Builders

If you're building an AI agent that touches third-party APIs on behalf of users, the core principle is: authenticate the user, not the agent. The agent should inherit the user's identity and permissions, not operate with elevated service-level access that transcends individual user consent.

Token Vault is the infrastructure that makes this practical at scale. You don't have to build token storage, refresh logic, revocation handling, or consent management yourself. Auth0 handles the hard parts; you write tool code that calls get_token and uses it.

The philosophical shift is small but important: the agent is acting as the user, under the user's authority, bounded by the user's consent. That's what "authorized to act" actually means — and Token Vault makes it the default, not an afterthought that gets retrofitted after a security review.

Built With

audioworklet
auth0
cloud-run
fastapi
firestore
gemini-2.0-flash
gemini-live-api
google-adk
imagen-3
python
react
tailwindcss
token-vault
typescript
websocket

Updates

Periculous WinkleMaster started this project — Mar 04, 2026 10:11 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.