BMC Town: Real-Time AI Business Mentorship

Inspiration

We were inspired by the challenge that entrepreneurs face: getting instant, quality feedback on their business ideas without waiting for expensive consultants or mentors. The Business Model Canvas (BMC) is a proven framework for structuring business thinking across 9 key blocks (Customer Segments, Value Propositions, Channels, Customer Relationships, Revenue Streams, Key Resources, Key Activities, Key Partnerships, Cost Structure), but it's static—just a piece of paper.

We asked: What if we could bring the Canvas to life with real-time, intelligent dialogue?

The breakthrough was Gemini 3.1 Flash Live API—Google's new bidirectional audio streaming model with sub-100ms latency. Suddenly, we could build something that feels like a real conversation with a business expert, not a chatbot. Combined with intelligent memory extraction, the AI could learn from every conversation and proactively suggest improvements—turning advice into action.

What It Does

BMC Town is a multimodal, real-time platform where entrepreneurs engage in live audio conversations with AI business experts powered by Gemini 3.1 Flash.

User Flow:

  1. Enter the game a Business Model Canvas (the 9 AI Experts/NPCs waking around the Town)
  2. Approach to one expert using arrow key (e.g., "Steven Segments," "Carlos Costs")
  3. Click the "Start" to start a live audio conversation
  4. Talk naturally—the AI listens, responds in real-time with concise advice
  5. Receive proactive suggestions as the AI discovers cross-canvas opportunities
  6. Build a knowledge base of business insights extracted from conversations
  7. Review and Download the Business Model Canvas PDF

Key Capabilities:

  • Multimodal Live Audio: Bidirectional PCM streaming (16kHz in, 24kHz out) with sub-100ms latency
  • Intelligent Memory System: Automatically extracts business facts and relationships from conversations
  • Proactive Cross-Canvas Insights: AI detects when changes to one block imply improvements to others (e.g., "If you add partnerships, consider adjusting your cost structure")
  • Full Observability: LangSmith traces show reasoning, tool calls, and decision points
  • Persistent Canvas State: Updates tracked in MongoDB, enabling multi-turn sessions and historical analysis

How We Built It

Architecture

┌─────────────────────────────────────┐
│  Frontend (Phaser 3 + Web Audio)                                                                                      │
│  - Capture 16kHz mono PCM                                                                                              │
│  - Play 24kHz mono PCM                                                                                                    │
│  - Display live transcript                                                                                                       │
└────────────────┬──────────────────── ┘
                 │
         WebSocket: /ws/chat/business/live
                 │
┌────────────────▼────────────────────┐
│     FastAPI Backend                                                                                                              │
│  (Request → Session ID → Queues)                                                                                  │
└────────────────┬────────────────────┘
                 │
┌────────────────▼────────────────────────────────┐
│       LangGraph Workflow (Async)                                                                                                                   │
│                                                                                                                                                                             │
│  1. live_audio_conversation_node                                                                                                                   │
│     └─ Gemini 3.1 Flash Live session                                                                                                             │
│     └─ Stream audio ↔ Receive responses                                                                                                  │
│     └─ Emit transcripts & tool calls                                                                                                                │
│                                                                                                                                                                             │
│  2. memory_extraction_node                                                                                                                          │
│     └─ Parse conversation deltas                                                                                                                   │
│     └─ Extract facts → Canvas blocks                                                                                                           │
│                                                                                                                                                                              │
│  3. proactive_suggestion_node                                                                                                                        │
│     └─ Check cross-canvas implications                                                                                                      │
│     └─ Generate contextual suggestions                                                                                                     │
└────────────────┬────────────────────────────────┘
                 │
┌────────────────▼────────────────────┐
│   MongoDB                           │
│   - Canvas state (9 blocks)         │
│   - Memory insights (facts)         │
│   - Conversation history            │
└─────────────────────────────────────┘

Tech Stack

  • LLM: Gemini 3.1 Flash (Live API + generate_content)
  • Backend: Python 3.12, FastAPI, LangGraph, LangSmith
  • Frontend: JavaScript (ES6+), Phaser 3, Web Audio API
  • Database: MongoDB (Atlas)
  • Deployment: Docker, Google Cloud Run, Secret Manager

Development Pipeline

  1. Phase 1: Prototyped Gemini Live API bidirectional audio streaming
  2. Phase 2: Built 4-node LangGraph workflow (audio conversation → memory extraction → proactive suggestions → conditional summarization)
  3. Phase 3: Integrated WebSocket endpoint with async task scheduling
  4. Phase 4: Implemented frontend PCM capture/playback with real-time UI updates
  5. Phase 5: Added memory extraction with LLM-based fact parsing and proactive suggestion feature with MongoDB database connection setup
  6. Phase 6: Engineered proactive suggestion engine with cross-canvas relationship detection
  7. Phase 7: Fixed critical bugs (transcription flickering, response conciseness, WebSocket lifecycle)
  8. Phase 8: Deployed to Cloud Run with full LangSmith observability

Challenges We Ran Into

1. Transcription UI Flickering (Audio/UX)

  • Problem: Frontend displayed individual transcript fragments in real-time, causing text to jump between single words.
  • Root Cause: Backend was sending fragments separately instead of accumulated context.
  • Solution: Modified backend to send the full accumulated transcript string each time, allowing the UI to display smooth, growing text.

2. Response Verbosity in Live Audio (LLM Behavior)

  • Problem: Gemini's default responses were 3–4 sentences; live conversation requires punchy replies under 30 words.
  • Root Cause: No explicit constraint in the live system prompt.
  • Solution: Added _LIVE_CONCISENESS_ADDENDUM: > "Keep every spoken response to 1–2 short sentences (under 30 words). Be direct, punchy, and conversational—like a quick phone call."

3. Missing LangSmith Traces for Live Sessions (Observability)

  • Problem: Text-based voice turns showed up in LangSmith, but live sessions didn't.
  • Root Cause: LangGraph was assigned a run_id that collided with LangSmith's internal IDs, breaking parent-child trace hierarchy.
  • Solution: Replaced manual run_id with LangChain's callback system: python "callbacks": [live_turn_run] # Pass parent trace directly

4. Proactive Suggestions Never Reached Frontend (WebSocket Lifecycle)

  • Problem: Backend generated suggestions but frontend never showed them.
  • Root Cause: Frontend closed the WebSocket immediately after sending session_end, before the backend could finish memory extraction and send session_complete.
  • Solution:
    • Removed premature socket.close() from endSession()
    • Wait for onSessionComplete callback to process suggestions before closing socket
    • Added explicit cleanup in the callback to ensure proper async coordination

5. Environment Variable Propagation in Docker (DevOps)

  • Problem: Rotated LangSmith API key in Secret Manager, but Docker containers didn't pick it up.
  • Root Cause: docker compose restart reloads processes only; doesn't re-read .env.
  • Solution: Used docker compose up -d to recreate containers and load fresh environment variables.

Accomplishments That We're Proud Of

1. Real-Time Multimodal Audio Pipeline

  • Integrated Gemini 3.1 Flash Live API for true bidirectional audio (16kHz PCM input, 24kHz output)
  • Achieved sub-100ms latency streaming with clean UI feedback
  • Handled async audio buffering, encoding/decoding, and playback without blocking

2. Intelligent Memory Extraction Engine

  • Built a 2-stage memory system:
    1. Fact Extraction: LLM parses conversations and maps insights to canvas blocks
    2. Delta Tracking: Maintains a diff between old/new canvas state to identify what changed
  • Enables persistent learning across multiple sessions—the system remembers what was discussed

3. Proactive Cross-Canvas Suggestion System

  • Engineered suggestion logic that detects implications between blocks:
    • "You mentioned partnerships → Cost Structure may need adjustment"
    • "New Customer Segment → Revenue Streams could expand here"
  • Suggestions are contextual, not generic—ranked by confidence and canvas state
  • Successfully delivers suggestions to frontend via proper WebSocket lifecycle management

4. Full Observability with LangSmith

  • Implemented trace hierarchy showing:
    • Live audio session (parent)
    • Individual LLM turns with tool calls (children)
    • Memory extraction as nested sub-tasks
    • Suggestion generation with confidence scores
  • Enabled debugging of complex async workflows at a glance

5. Production-Ready Deployment

  • Dockerized frontend and backend with volume mounts for hot-reload development
  • Automated Cloud Build pipeline with environment variable management
  • Secure secret handling (LangSmith key, API keys) via Google Secret Manager
  • 16 unit tests for live audio pipeline with full coverage of edge cases

What We Learned

  1. Latency is UX: 200ms in voice conversation feels wrong. Gemini 3.1 Flash's sub-100ms streaming is non-negotiable for real-time feel.

  2. Trace hierarchies matter: LangSmith's parent-child relationships are powerful but fragile—colliding IDs silently break observability.

  3. WebSocket choreography is subtle: Closing sockets before async backend work completes is a hidden trap. Explicit coordination (wait for final message) is essential.

  4. Prompt engineering for voice ≠ text: Spoken responses need hard constraints (word count, sentence count) because pauses and tone matter differently than formatting.

  5. Memory extraction requires iteration: LLMs don't naturally parse canvas blocks—needed careful prompt engineering and fallback logic for robustness.

  6. Multimodal systems need explicit testing: Real-time audio + transcription + suggestions created complex async flows that only revealed bugs in production.

What's Next for BMC Town

  • Expert Personas Expansion: Add 10+ specialized roles (Growth Hacker, ESG Officer, Lean Startup Coach)
  • Multi-User Canvas Collaboration: Real-time collaborative editing with voice guidance
  • Mobile App: Native iOS/Android with better audio handling and offline support
  • Conversation Replay: Re-listen to sessions and see how suggestions played out over time
  • API for Partners: Let other tools (VC platforms, incubators) embed BMC Town's voice coaching

Status: ✅ Deployed to Google Cloud Run | ✅ Live mode enabled | ✅ LangSmith tracing active | 🚀 Ready for hackathon demo

Built With

Share this project:

Updates