Inspiration

Creative projects often begin with a feeling rather than a specification. In interior design, clients rarely know exactly what they want. They recognize what feels right only after seeing it. This leads to long revision cycles, vague briefs, and inefficient design processes. Mood boards help with inspiration, but they do not guide decisions or synthesize preferences. Design Alignment Agent explores a different approach. Instead of asking users to describe their taste, it lets them discover it visually. The system presents design options, observes reactions, and refines the direction until a clear aesthetic emerges. The idea is to turn preference discovery into a structured journey. A story that begins with uncertainty and ends with a coherent design brief.

What it does

Design Alignment Agent is a stateful multimodal AI agent that helps users converge on a design direction through visual comparison and iterative refinement. The experience follows a guided arc:

Style selection — the user picks an interior style from a curated visual set Style explanation — the agent explains the aesthetic logic of that style Exploration round — three variations of the same room are presented side by side Refinement round — based on the selection, the agent generates progressively focused alternatives Final convergence — a structured design brief summarizes the aesthetic direction

Throughout the process, the system maintains a persistent session that evolves with every choice. The result is a clear design direction derived from interaction rather than prompt writing.

How we built it

FastAPI backend on Google Cloud Run. Gemini 2.5 Flash handles all reasoning including style commentary, direction planning, and final brief generation. Imagen 3 renders photorealistic rooms sequentially. Firestore persists session state across every call. The key architectural decision was separating reasoning from rendering. Gemini plans the full design direction first, then Imagen generates. This keeps quality high and cost controlled.

Challenges we ran into

Keeping visual comparisons fair across rounds. The solution was fixing the spatial canvas. Same room, same camera, same light across every option so users evaluate aesthetic differences only, not architectural ones. Managing sequential image generation latency while keeping the experience coherent was the other main constraint.

Accomplishments that we're proud of

The planning before rendering pattern works well. Gemini reasons about design direction first, then Imagen executes against that plan. The session state design also works cleanly. Round 2 automatically uses the Round 1 selected image as its spatial anchor, creating continuity across rounds.

What we learned

Stateful multimodal agents require careful separation of concerns. Text reasoning is fast and cheap. Image generation is slow and expensive. Designing the system around that asymmetry made everything better. Fixing spatial constraints paradoxically gives users more creative clarity, not less. Removing variables focuses attention on what actually matters.

What's next for Design Alignment Agent

Voice and typed commentary between rounds as richer preference signals. Compositional selection across cards, for example taking the sofa from one option and the palette from another. Additional refinement rounds until the user genuinely converges rather than a fixed limit. Expansion beyond living rooms into branding, architecture, and visual identity.

Built With

  • fastapi
  • gemini-2.5-flash
  • google-cloud-firestore
  • google-cloud-run
  • google-genai-sdk
  • imagen-3
  • python
  • vertex-ai
Share this project:

Updates

posted an update

https://github.com/berilozbay-create/design-alignment-agent-v2

Design Alignment Agent v2 A significant evolution of the original hackathon submission. V2 introduces dual-style selection, a 6-card structured exploration system, voice/text preference feedback, and a signal-driven Round 2 that uses labeled visual references to generate refined proposals. What changed from V1: Dual style selection — users now pick two styles (primary and secondary) instead of one. The system generates a structured 6-card layout: pure primary, primary variation, primary-led blend, pure secondary, secondary variation, and secondary-led blend. Progressive card loading — cards appear one by one as they generate rather than all at once. A and D (static references) appear at 5 and 30 seconds with artificial delays for a consistent feel. Voice and text feedback — after seeing 6 cards, users describe what they liked and disliked in their own words or by voice. Any language is supported. Gemini extracts structured design signals from the comment. Signal-driven Round 2 — Round 2 no longer generates mechanical material variations. Instead it uses the user's stated preferences to generate 3 refined proposals. All 6 Round 1 cards are sent to Gemini as labeled visual references (A-F stamped with Pillow) so it can identify and reproduce specific liked elements. Reliable generation — sequential image generation with 25-second cooldowns prevents API rate limit errors. Polling architecture shows each card as it arrives. Final hero image fixed — the final proposal image now correctly reflects the user's Round 2 selection.

Log in or sign up for Devpost to join the conversation.