BMC Town: Real-Time AI Business Mentorship
Inspiration
We were inspired by the challenge that entrepreneurs face: getting instant, quality feedback on their business ideas without waiting for expensive consultants or mentors. The Business Model Canvas (BMC) is a proven framework for structuring business thinking across 9 key blocks (Customer Segments, Value Propositions, Channels, Customer Relationships, Revenue Streams, Key Resources, Key Activities, Key Partnerships, Cost Structure), but it's static—just a piece of paper.
We asked: What if we could bring the Canvas to life with real-time, intelligent dialogue?
The breakthrough was Gemini 3.1 Flash Live API—Google's new bidirectional audio streaming model with sub-100ms latency. Suddenly, we could build something that feels like a real conversation with a business expert, not a chatbot. Combined with intelligent memory extraction, the AI could learn from every conversation and proactively suggest improvements—turning advice into action.
What It Does
BMC Town is a multimodal, real-time platform where entrepreneurs engage in live audio conversations with AI business experts powered by Gemini 3.1 Flash.
User Flow:
- Enter the game a Business Model Canvas (the 9 AI Experts/NPCs waking around the Town)
- Approach to one expert using arrow key (e.g., "Steven Segments," "Carlos Costs")
- Click the "Start" to start a live audio conversation
- Talk naturally—the AI listens, responds in real-time with concise advice
- Receive proactive suggestions as the AI discovers cross-canvas opportunities
- Build a knowledge base of business insights extracted from conversations
- Review and Download the Business Model Canvas PDF
Key Capabilities:
- Multimodal Live Audio: Bidirectional PCM streaming (16kHz in, 24kHz out) with sub-100ms latency
- Intelligent Memory System: Automatically extracts business facts and relationships from conversations
- Proactive Cross-Canvas Insights: AI detects when changes to one block imply improvements to others (e.g., "If you add partnerships, consider adjusting your cost structure")
- Full Observability: LangSmith traces show reasoning, tool calls, and decision points
- Persistent Canvas State: Updates tracked in MongoDB, enabling multi-turn sessions and historical analysis
How We Built It
Architecture
┌─────────────────────────────────────┐
│ Frontend (Phaser 3 + Web Audio) │
│ - Capture 16kHz mono PCM │
│ - Play 24kHz mono PCM │
│ - Display live transcript │
└────────────────┬──────────────────── ┘
│
WebSocket: /ws/chat/business/live
│
┌────────────────▼────────────────────┐
│ FastAPI Backend │
│ (Request → Session ID → Queues) │
└────────────────┬────────────────────┘
│
┌────────────────▼────────────────────────────────┐
│ LangGraph Workflow (Async) │
│ │
│ 1. live_audio_conversation_node │
│ └─ Gemini 3.1 Flash Live session │
│ └─ Stream audio ↔ Receive responses │
│ └─ Emit transcripts & tool calls │
│ │
│ 2. memory_extraction_node │
│ └─ Parse conversation deltas │
│ └─ Extract facts → Canvas blocks │
│ │
│ 3. proactive_suggestion_node │
│ └─ Check cross-canvas implications │
│ └─ Generate contextual suggestions │
└────────────────┬────────────────────────────────┘
│
┌────────────────▼────────────────────┐
│ MongoDB │
│ - Canvas state (9 blocks) │
│ - Memory insights (facts) │
│ - Conversation history │
└─────────────────────────────────────┘
Tech Stack
- LLM: Gemini 3.1 Flash (Live API + generate_content)
- Backend: Python 3.12, FastAPI, LangGraph, LangSmith
- Frontend: JavaScript (ES6+), Phaser 3, Web Audio API
- Database: MongoDB (Atlas)
- Deployment: Docker, Google Cloud Run, Secret Manager
Development Pipeline
- Phase 1: Prototyped Gemini Live API bidirectional audio streaming
- Phase 2: Built 4-node LangGraph workflow (audio conversation → memory extraction → proactive suggestions → conditional summarization)
- Phase 3: Integrated WebSocket endpoint with async task scheduling
- Phase 4: Implemented frontend PCM capture/playback with real-time UI updates
- Phase 5: Added memory extraction with LLM-based fact parsing and proactive suggestion feature with MongoDB database connection setup
- Phase 6: Engineered proactive suggestion engine with cross-canvas relationship detection
- Phase 7: Fixed critical bugs (transcription flickering, response conciseness, WebSocket lifecycle)
- Phase 8: Deployed to Cloud Run with full LangSmith observability
Challenges We Ran Into
1. Transcription UI Flickering (Audio/UX)
- Problem: Frontend displayed individual transcript fragments in real-time, causing text to jump between single words.
- Root Cause: Backend was sending fragments separately instead of accumulated context.
- Solution: Modified backend to send the full accumulated transcript string each time, allowing the UI to display smooth, growing text.
2. Response Verbosity in Live Audio (LLM Behavior)
- Problem: Gemini's default responses were 3–4 sentences; live conversation requires punchy replies under 30 words.
- Root Cause: No explicit constraint in the live system prompt.
- Solution: Added
_LIVE_CONCISENESS_ADDENDUM: > "Keep every spoken response to 1–2 short sentences (under 30 words). Be direct, punchy, and conversational—like a quick phone call."
3. Missing LangSmith Traces for Live Sessions (Observability)
- Problem: Text-based voice turns showed up in LangSmith, but live sessions didn't.
- Root Cause: LangGraph was assigned a
run_idthat collided with LangSmith's internal IDs, breaking parent-child trace hierarchy. - Solution: Replaced manual
run_idwith LangChain's callback system:python "callbacks": [live_turn_run] # Pass parent trace directly
4. Proactive Suggestions Never Reached Frontend (WebSocket Lifecycle)
- Problem: Backend generated suggestions but frontend never showed them.
- Root Cause: Frontend closed the WebSocket immediately after sending
session_end, before the backend could finish memory extraction and sendsession_complete. - Solution:
- Removed premature
socket.close()fromendSession() - Wait for
onSessionCompletecallback to process suggestions before closing socket - Added explicit cleanup in the callback to ensure proper async coordination
- Removed premature
5. Environment Variable Propagation in Docker (DevOps)
- Problem: Rotated LangSmith API key in Secret Manager, but Docker containers didn't pick it up.
- Root Cause:
docker compose restartreloads processes only; doesn't re-read.env. - Solution: Used
docker compose up -dto recreate containers and load fresh environment variables.
Accomplishments That We're Proud Of
1. Real-Time Multimodal Audio Pipeline
- Integrated Gemini 3.1 Flash Live API for true bidirectional audio (16kHz PCM input, 24kHz output)
- Achieved sub-100ms latency streaming with clean UI feedback
- Handled async audio buffering, encoding/decoding, and playback without blocking
2. Intelligent Memory Extraction Engine
- Built a 2-stage memory system:
- Fact Extraction: LLM parses conversations and maps insights to canvas blocks
- Delta Tracking: Maintains a diff between old/new canvas state to identify what changed
- Enables persistent learning across multiple sessions—the system remembers what was discussed
3. Proactive Cross-Canvas Suggestion System
- Engineered suggestion logic that detects implications between blocks:
- "You mentioned partnerships → Cost Structure may need adjustment"
- "New Customer Segment → Revenue Streams could expand here"
- Suggestions are contextual, not generic—ranked by confidence and canvas state
- Successfully delivers suggestions to frontend via proper WebSocket lifecycle management
4. Full Observability with LangSmith
- Implemented trace hierarchy showing:
- Live audio session (parent)
- Individual LLM turns with tool calls (children)
- Memory extraction as nested sub-tasks
- Suggestion generation with confidence scores
- Enabled debugging of complex async workflows at a glance
5. Production-Ready Deployment
- Dockerized frontend and backend with volume mounts for hot-reload development
- Automated Cloud Build pipeline with environment variable management
- Secure secret handling (LangSmith key, API keys) via Google Secret Manager
- 16 unit tests for live audio pipeline with full coverage of edge cases
What We Learned
Latency is UX: 200ms in voice conversation feels wrong. Gemini 3.1 Flash's sub-100ms streaming is non-negotiable for real-time feel.
Trace hierarchies matter: LangSmith's parent-child relationships are powerful but fragile—colliding IDs silently break observability.
WebSocket choreography is subtle: Closing sockets before async backend work completes is a hidden trap. Explicit coordination (wait for final message) is essential.
Prompt engineering for voice ≠ text: Spoken responses need hard constraints (word count, sentence count) because pauses and tone matter differently than formatting.
Memory extraction requires iteration: LLMs don't naturally parse canvas blocks—needed careful prompt engineering and fallback logic for robustness.
Multimodal systems need explicit testing: Real-time audio + transcription + suggestions created complex async flows that only revealed bugs in production.
What's Next for BMC Town
- Expert Personas Expansion: Add 10+ specialized roles (Growth Hacker, ESG Officer, Lean Startup Coach)
- Multi-User Canvas Collaboration: Real-time collaborative editing with voice guidance
- Mobile App: Native iOS/Android with better audio handling and offline support
- Conversation Replay: Re-listen to sessions and see how suggestions played out over time
- API for Partners: Let other tools (VC platforms, incubators) embed BMC Town's voice coaching
Status: ✅ Deployed to Google Cloud Run | ✅ Live mode enabled | ✅ LangSmith tracing active | 🚀 Ready for hackathon demo

Log in or sign up for Devpost to join the conversation.