Inspiration

Musical theater in Indonesia has a dirty secret: student productions and indie shows regularly go bankrupt ("boncos"). They want to build something extravagant, but only have limited knowledge to materials recommended by word of mouth, something that is often beyond their budget. They sketch on paper, call vendors for quotes, get sticker shock, and either blow their budget or water down their vision. We wanted to give them a tool that feels like brainstorming with a brilliant, budget-savvy friend — not filling out spreadsheets.

What it does

SpotLite is a web app where theater directors have a real-time voice conversation with an AI set designer named "SpotLite." The director describes their vision — "I want a Roman theme with big columns and red drapes" — and SpotLite:

  1. Confirms the vision by repeating it back in its own words before generating anything
  2. Generates a stage mockup image (16:9) showing how the set would look on their actual stage
  3. Produces an itemized bill of materials with real Indonesian building material prices from juraganmaterial.id, with smart quantity auto-correction based on stage dimensions
  4. Tracks the budget with a color-coded progress bar (green < 70%, yellow 70-90%, red > 90%)
  5. Auto-searches vendors for the top 3 most expensive BOM items on Tokopedia, Shopee, and Bukalapak — plus nearby thrift/junkyard stores for cheap alternatives

The magic is in the iteration. "That's too expensive — use cheaper materials." "Add a platform center stage." "Swap the real wood columns for painted styrofoam." Each change updates the visual, the costs, and the vendor recommendations in real time, all through natural voice conversation.

How we built it

  • Backend: Python + FastAPI running on Google Cloud Run (1Gi memory, 900s timeout, session affinity), using the Google ADK (Agent Development Kit) for structured tool calling
  • Voice: Gemini Live API (gemini-live-2.5-flash-native-audio) for real-time bidirectional audio streaming with the Aoede voice
  • Image generation: gemini-2.5-flash-image model producing 16:9 stage mockups in the background (doesn't block the voice conversation)
  • Pricing data: Pre-scraped from juraganmaterial.id (58 items across 10 categories — wood, piping, paint, fabric, fasteners, metal, foam, lighting, covering, tools)
  • Fuzzy matching: Python's difflib.SequenceMatcher allows Gemini to request materials by approximate name ("plywood") and still match the correct database entry ("Triplek/Plywood 9mm 122x244cm"). Three-tier search: exact match → keyword match → fuzzy match (ratio > 0.5)
  • Smart quantity estimation: Auto-corrects unrealistic quantities based on stage dimensions and material coverage rules (m² for sheets, linear meters for pipes/timber, running meters for fabric, liters for paint). Corrections are reported back to Gemini so it can mention them naturally to the director.
  • Auto vendor sourcing: After each BOM generation, the system automatically searches Indonesian marketplaces for the top 3 most expensive items, plus nearby thrift stores — using Google Search grounding via gemini-2.5-flash
  • Frontend: Vanilla HTML/JS/CSS with DaisyUI 5 + Tailwind CSS (forest dark theme), bento grid layout, using the Web Audio API for gapless PCM audio playback
  • Session continuity: Live API sessions end after each turn, so we preserve conversation history and inject it into the system instruction on reconnect — seamless for the user
  • CI/CD: Google Cloud Build config (cloudbuild.yaml) + deploy.sh for one-command deployment

Challenges we ran into

  • Session lifecycle: The Gemini Live API sessions naturally end after each response. We solved this by maintaining a conversation history list and injecting context into the system instruction on every reconnect, giving the AI perfect memory without session resumption overhead.
  • Image generation rate limits: Gemini's image model has strict rate limits. We implemented exponential backoff (retry after 10s, then 20s) with a queue guard to prevent duplicate overlapping generations.
  • Quantity hallucination: Gemini sometimes estimates wildly unrealistic material quantities (e.g., "2 sheets of plywood" for a massive backdrop). We built a coverage-aware auto-correction system that calculates realistic minimums based on material type and stage dimensions, then reports corrections back to Gemini so it can mention them naturally.
  • Audio sync: Input audio runs at 16kHz but output is 24kHz. We use separate AudioContext instances and gapless scheduling via source.start(nextPlayTime) to prevent gaps or overlaps.
  • Duplicate transcripts: Agent speech was creating duplicate bubbles in the chat. Solved by merging consecutive assistant transcript chunks into a single chat bubble.

Accomplishments that we're proud of

  • The fuzzy material matching actually works really well — Gemini can say "black fabric" and it matches to "Kain Blackout Hitam" with accurate pricing
  • Smart quantity estimation catches and fixes unrealistic material amounts before they reach the user
  • Auto-triggered vendor search with thrift store recommendations makes the BOM immediately actionable
  • The whole app feels like a natural conversation, not a form-filling exercise
  • Real Indonesian prices make this immediately practical — directors can export the BOM as CSV and go buy materials

What we learned

  • The Gemini Live API is incredibly powerful for building voice-first applications, but session management requires careful design around the session lifecycle
  • Fuzzy matching is essential when bridging AI-generated text to structured databases — you can't expect the model to use exact product names
  • Background task architecture (asyncio) is critical — never block the voice stream for image generation or vendor searches
  • Pre-scraped data beats live API calls for hackathon reliability, but the fuzzy matching layer makes it feel dynamic
  • Google Search grounding is a powerful way to add real-time vendor data without building scrapers for every marketplace

What's next for SpotLite

  • Vertex AI Search integration for real-time material pricing across multiple Indonesian marketplaces
  • 3D stage visualization with three.js for more realistic previews
  • AR preview so directors can point their phone at the actual stage and see the design overlaid
  • Multi-language UI (currently the AI speaks English/Indonesian, but the interface is English-only)
  • Collaborative sessions where multiple team members can join the same design conversation
  • PDF export with diagrams and material specifications for vendor quotes

Built With

Share this project:

Updates