J's Room AI

Inspiration

Redesigning a room in India usually means either hiring a professional interior designer (₹50,000–₹5,00,000+) or experimenting blindly and making expensive mistakes.

We wanted to make professional-quality design advice accessible to anyone with a smartphone — simply by talking. No forms, no typing, and no knowledge of design terminology required.


What It Does

J's Room AI is a live, voice-first AI interior designer.

Users simply show their room using a camera or photo, have a natural voice conversation about their style and budget, and receive a photorealistic redesign of their actual room.

The AI then generates a shopping list of real products from Indian retailers, including actual prices and purchase links.


How We Built It

The system is built entirely on AWS Bedrock, using three Amazon Nova models.

Models Used

Amazon Nova Sonic (amazon.nova-sonic-v1:0) Powers real-time bidirectional voice conversations as the core AI agent and orchestrates autonomous tool calling.

Amazon Nova Pro (amazon.nova-pro-v1:0) Acts as the vision specialist, analyzing room photos to identify objects, materials, colors, and spatial relationships. It also generates detailed prompts for image generation.

Amazon Nova Canvas (amazon.nova-canvas-v1:0) Generates photorealistic room redesigns while preserving the room’s original layout.


Architecture

Since Nova Sonic requires AWS SigV4 authentication (which cannot run securely in the browser), we built an Express + Socket.IO backend proxy that streams PCM16 audio bidirectionally between the browser and AWS Bedrock, ensuring credentials remain securely on the server.


Tech Stack

Frontend

  • Next.js 14
  • React 18
  • TypeScript
  • Tailwind CSS

Backend

  • Express.js
  • Socket.IO (SigV4 proxy layer)

AI Tools (autonomously called by the agent)

  • Room analysis (Nova Pro vision)
  • Image generation (Nova Canvas)
  • Product search (SerpAPI)
  • Shopping list generation

Challenges We Ran Into

SigV4 Authentication in the Browser

Nova Sonic requires server-side AWS authentication, so we had to architect a Socket.IO proxy layer instead of using direct browser WebSocket connections.

Streaming Transcript Fragmentation

Nova Sonic sends transcripts in small text fragments. We built a merge system with tight timing windows to combine fragments into clean chat bubbles without losing text.

Layout Preservation in Image Generation

Getting Nova Canvas to redesign a room without moving or removing furniture required a two-step pipeline:

  1. Nova Pro analyzes the image and inventories all objects and their positions.
  2. Nova Pro generates a detailed prompt that Nova Canvas follows to preserve layout.

Voice Latency Tuning

We optimized audio buffer sizes (1024 samples at 16kHz) and tuned Socket.IO transport settings to maintain a natural and responsive conversation experience.


Accomplishments We're Proud Of

Truly Voice-First Experience

The entire design consultation happens through natural speech — no buttons and no typing required.

Three Nova Models Working Together

  • Sonic → conversation + orchestration
  • Pro → vision analysis
  • Canvas → image generation

All coordinated seamlessly through autonomous tool calling.

Real Products, Real Prices

Recommendations come from actual Indian retailers, including INR pricing and direct purchase links, rather than hallucinated suggestions.

Production-Ready Architecture

The Socket.IO proxy pattern securely handles AWS credentials while maintaining real-time bidirectional audio streaming, making it ready for deployment on EC2 or ECS.


What We Learned

  • Nova Sonic's native audio interaction feels significantly more natural than traditional text-to-speech pipelines. Users can even interrupt the AI mid-sentence, and it adapts smoothly.
  • The two-step generation approach (Nova Pro vision analysis → Nova Canvas image generation) is critical for preserving the room layout.
  • Autonomous tool calling through voice interaction creates a far more natural experience than button-driven workflows.
  • AWS Bedrock provides a unified API layer that simplifies working with voice, vision, and image generation models under a single authentication system.
  • The Socket.IO proxy pattern is essential for production Nova Sonic deployments, since SigV4 authentication cannot run client-side.

What's Next for J's Room AI

  • Regional language support — Hindi, Tamil, and Telugu to make the product accessible across India
  • WhatsApp integration — Send a room photo and receive voice design advice instantly
  • AR overlays — View generated designs overlaid on the live camera feed
  • Retailer partnerships — Direct “Add to Cart” integrations with Flipkart, Amazon India, and Pepperfry
  • Multi-room projects — Design entire homes with consistent styles across multiple rooms

Built With

Share this project:

Updates