BMC Town

BMC Town: Real-Time AI Business Mentorship

Inspiration

We were inspired by the challenge that entrepreneurs face: getting instant, quality feedback on their business ideas without waiting for expensive consultants or mentors. The Business Model Canvas (BMC) is a proven framework for structuring business thinking across 9 key blocks (Customer Segments, Value Propositions, Channels, Customer Relationships, Revenue Streams, Key Resources, Key Activities, Key Partnerships, Cost Structure), but it's static—just a piece of paper.

We asked: What if we could bring the Canvas to life with real-time, intelligent dialogue?

The breakthrough was Gemini 3.1 Flash Live API—Google's new bidirectional audio streaming model with sub-100ms latency. Suddenly, we could build something that feels like a real conversation with a business expert, not a chatbot. Combined with intelligent memory extraction, the AI could learn from every conversation and proactively suggest improvements—turning advice into action.

What It Does

BMC Town is a multimodal, real-time platform where entrepreneurs engage in live audio conversations with AI business experts powered by Gemini 3.1 Flash.

User Flow:

Enter the game a Business Model Canvas (the 9 AI Experts/NPCs waking around the Town)
Approach to one expert using arrow key (e.g., "Steven Segments," "Carlos Costs")
Click the "Start" to start a live audio conversation
Talk naturally—the AI listens, responds in real-time with concise advice
Receive proactive suggestions as the AI discovers cross-canvas opportunities
Build a knowledge base of business insights extracted from conversations
Review and Download the Business Model Canvas PDF

Key Capabilities:

Multimodal Live Audio: Bidirectional PCM streaming (16kHz in, 24kHz out) with sub-100ms latency
Intelligent Memory System: Automatically extracts business facts and relationships from conversations
Proactive Cross-Canvas Insights: AI detects when changes to one block imply improvements to others (e.g., "If you add partnerships, consider adjusting your cost structure")
Full Observability: LangSmith traces show reasoning, tool calls, and decision points
Persistent Canvas State: Updates tracked in MongoDB, enabling multi-turn sessions and historical analysis

How We Built It

Architecture

┌─────────────────────────────────────┐
│  Frontend (Phaser 3 + Web Audio)                                                                                      │
│  - Capture 16kHz mono PCM                                                                                              │
│  - Play 24kHz mono PCM                                                                                                    │
│  - Display live transcript                                                                                                       │
└────────────────┬──────────────────── ┘
                 │
         WebSocket: /ws/chat/business/live
                 │
┌────────────────▼────────────────────┐
│     FastAPI Backend                                                                                                              │
│  (Request → Session ID → Queues)                                                                                  │
└────────────────┬────────────────────┘
                 │
┌────────────────▼────────────────────────────────┐
│       LangGraph Workflow (Async)                                                                                                                   │
│                                                                                                                                                                             │
│  1. live_audio_conversation_node                                                                                                                   │
│     └─ Gemini 3.1 Flash Live session                                                                                                             │
│     └─ Stream audio ↔ Receive responses                                                                                                  │
│     └─ Emit transcripts & tool calls                                                                                                                │
│                                                                                                                                                                             │
│  2. memory_extraction_node                                                                                                                          │
│     └─ Parse conversation deltas                                                                                                                   │
│     └─ Extract facts → Canvas blocks                                                                                                           │
│                                                                                                                                                                              │
│  3. proactive_suggestion_node                                                                                                                        │
│     └─ Check cross-canvas implications                                                                                                      │
│     └─ Generate contextual suggestions                                                                                                     │
└────────────────┬────────────────────────────────┘
                 │
┌────────────────▼────────────────────┐
│   MongoDB                           │
│   - Canvas state (9 blocks)         │
│   - Memory insights (facts)         │
│   - Conversation history            │
└─────────────────────────────────────┘

Tech Stack

LLM: Gemini 3.1 Flash (Live API + generate_content)
Backend: Python 3.12, FastAPI, LangGraph, LangSmith
Frontend: JavaScript (ES6+), Phaser 3, Web Audio API
Database: MongoDB (Atlas)
Deployment: Docker, Google Cloud Run, Secret Manager

Development Pipeline

Phase 1: Prototyped Gemini Live API bidirectional audio streaming
Phase 2: Built 4-node LangGraph workflow (audio conversation → memory extraction → proactive suggestions → conditional summarization)
Phase 3: Integrated WebSocket endpoint with async task scheduling
Phase 4: Implemented frontend PCM capture/playback with real-time UI updates
Phase 5: Added memory extraction with LLM-based fact parsing and proactive suggestion feature with MongoDB database connection setup
Phase 6: Engineered proactive suggestion engine with cross-canvas relationship detection
Phase 7: Fixed critical bugs (transcription flickering, response conciseness, WebSocket lifecycle)
Phase 8: Deployed to Cloud Run with full LangSmith observability

Challenges We Ran Into

1. Transcription UI Flickering (Audio/UX)

Problem: Frontend displayed individual transcript fragments in real-time, causing text to jump between single words.
Root Cause: Backend was sending fragments separately instead of accumulated context.
Solution: Modified backend to send the full accumulated transcript string each time, allowing the UI to display smooth, growing text.

2. Response Verbosity in Live Audio (LLM Behavior)

Problem: Gemini's default responses were 3–4 sentences; live conversation requires punchy replies under 30 words.
Root Cause: No explicit constraint in the live system prompt.
Solution: Added _LIVE_CONCISENESS_ADDENDUM: > "Keep every spoken response to 1–2 short sentences (under 30 words). Be direct, punchy, and conversational—like a quick phone call."

3. Missing LangSmith Traces for Live Sessions (Observability)

Problem: Text-based voice turns showed up in LangSmith, but live sessions didn't.
Root Cause: LangGraph was assigned a run_id that collided with LangSmith's internal IDs, breaking parent-child trace hierarchy.
Solution: Replaced manual run_id with LangChain's callback system: python "callbacks": [live_turn_run] # Pass parent trace directly

4. Proactive Suggestions Never Reached Frontend (WebSocket Lifecycle)

Problem: Backend generated suggestions but frontend never showed them.
Root Cause: Frontend closed the WebSocket immediately after sending session_end, before the backend could finish memory extraction and send session_complete.
Solution:
- Removed premature socket.close() from endSession()
- Wait for onSessionComplete callback to process suggestions before closing socket
- Added explicit cleanup in the callback to ensure proper async coordination

5. Environment Variable Propagation in Docker (DevOps)

Problem: Rotated LangSmith API key in Secret Manager, but Docker containers didn't pick it up.
Root Cause: docker compose restart reloads processes only; doesn't re-read .env.
Solution: Used docker compose up -d to recreate containers and load fresh environment variables.

Accomplishments That We're Proud Of

1. Real-Time Multimodal Audio Pipeline

Integrated Gemini 3.1 Flash Live API for true bidirectional audio (16kHz PCM input, 24kHz output)
Achieved sub-100ms latency streaming with clean UI feedback
Handled async audio buffering, encoding/decoding, and playback without blocking

2. Intelligent Memory Extraction Engine

Built a 2-stage memory system:
1. Fact Extraction: LLM parses conversations and maps insights to canvas blocks
2. Delta Tracking: Maintains a diff between old/new canvas state to identify what changed
Enables persistent learning across multiple sessions—the system remembers what was discussed

3. Proactive Cross-Canvas Suggestion System

Engineered suggestion logic that detects implications between blocks:
- "You mentioned partnerships → Cost Structure may need adjustment"
- "New Customer Segment → Revenue Streams could expand here"
Suggestions are contextual, not generic—ranked by confidence and canvas state
Successfully delivers suggestions to frontend via proper WebSocket lifecycle management

4. Full Observability with LangSmith

Implemented trace hierarchy showing:
- Live audio session (parent)
- Individual LLM turns with tool calls (children)
- Memory extraction as nested sub-tasks
- Suggestion generation with confidence scores
Enabled debugging of complex async workflows at a glance

5. Production-Ready Deployment

Dockerized frontend and backend with volume mounts for hot-reload development
Automated Cloud Build pipeline with environment variable management
Secure secret handling (LangSmith key, API keys) via Google Secret Manager
16 unit tests for live audio pipeline with full coverage of edge cases

What We Learned

Latency is UX: 200ms in voice conversation feels wrong. Gemini 3.1 Flash's sub-100ms streaming is non-negotiable for real-time feel.
Trace hierarchies matter: LangSmith's parent-child relationships are powerful but fragile—colliding IDs silently break observability.
WebSocket choreography is subtle: Closing sockets before async backend work completes is a hidden trap. Explicit coordination (wait for final message) is essential.
Prompt engineering for voice ≠ text: Spoken responses need hard constraints (word count, sentence count) because pauses and tone matter differently than formatting.
Memory extraction requires iteration: LLMs don't naturally parse canvas blocks—needed careful prompt engineering and fallback logic for robustness.
Multimodal systems need explicit testing: Real-time audio + transcription + suggestions created complex async flows that only revealed bugs in production.

What's Next for BMC Town

Expert Personas Expansion: Add 10+ specialized roles (Growth Hacker, ESG Officer, Lean Startup Coach)
Multi-User Canvas Collaboration: Real-time collaborative editing with voice guidance
Mobile App: Native iOS/Android with better audio handling and offline support
Conversation Replay: Re-listen to sessions and see how suggestions played out over time
API for Partners: Let other tools (VC platforms, incubators) embed BMC Town's voice coaching

Status: ✅ Deployed to Google Cloud Run | ✅ Live mode enabled | ✅ LangSmith tracing active | 🚀 Ready for hackathon demo