PitchNest: Face the Boardroom Before You Face the Board
Inspiration
Every great startup begins with a pitch, but the environment where founders practice is fundamentally broken. Practicing in front of a mirror doesn't prepare you for brutal interruptions, and pasting a pitch script into a standard AI text box feels entirely disconnected from the high-pressure reality of a Venture Capital boardroom.
When we saw the "Live Agents" track for the Gemini API Developer Competition, we knew we had the perfect opportunity to break the text-box paradigm. We wanted to build a simulator that triggers the exact same adrenaline rush a founder feels when pitching to real investors.
We built PitchNest: an immersive, multimodal AI panel that doesn't just listen to your pitch—it watches your body language, analyzes your slides, fact-checks your claims, and ruthlessly interrupts you if you start to ramble.
What It Does
PitchNest is a real-time, video-first AI investor meeting platform.
- The Setup: Founders upload their PDF pitch deck and select their desired panel environment—ranging from a supportive Pitch Coach to an aggressive Tier-1 VC Panel.
- The Live Room: The founder enters the room, shares their webcam and screen, and begins presenting.
- The Interaction: Powered by the Gemini Live API, the AI panel listens to the live audio stream and processes video/screen frames at 4fps. The AI behaves exactly like a human investor: it lets the founder speak, but gracefully interrupts to ask probing questions about specific metrics on the screen, or calls them out if they break eye contact to read from a script.
- The Report: Once the pitch concludes, the system evaluates the transcript and generates a structured data report detailing the founder's Delivery, Clarity, and Scalability, complete with actionable next steps tracked on a personalized analytics dashboard.
How We Built It
Our team of four divided the architecture to ensure a seamless, ultra-low latency experience:
- Frontend & UI: Built a highly responsive React/TypeScript SPA styled with Tailwind CSS. We wrote custom hooks (
useMediaRecorderanduseScreenCapture) to manage local hardware streams, wrapping the experience in a polished UI with real-time waveform animations and Recharts analytics. - Backend Engine: Engineered a Node.js/Express server that acts as a robust WebSocket proxy. It safely handles bidirectional streaming between the client and the Google GenAI SDK without exposing API keys to the browser.
- Cloud Infrastructure: We containerized the app and deployed it to Google Cloud Run (using a
--min-instances=1flag to maintain WebSocket stability). We integrated an in-memory SQLite database and used Google Cloud Storage (GCS) to securely host pitch decks and backup video recordings. - AI Implementation: We utilized
gemini-2.5-flash-native-audio-previewfor the real-time agent. We meticulously crafted system instructions to enforce an "Anti-Monologue" rule, ensuring the AI uses conversational filler words and keeps its responses punchy (< 80 words) to encourage natural human barge-in.
Challenges We Ran Into
Strict Browser Audio Policies: Modern browsers actively block audio autoplay. Initially, our AI's audio failed to play. We solved this by engineering a strict "Start Session" gateway UI, forcing a user interaction to securely unlock the AudioContext before connecting to the WebSocket.
The "One Voice, Three People" Limitation: Gemini Live currently outputs one distinct voice per WebSocket connection. To fulfill our vision of a multi-person investor panel, we engineered the "Panel Illusion" prompt. The AI acts as the Lead Partner (Marcus), who speaks out loud but frequently references his silent partners (Sarah and Chen) sitting next to him, seamlessly maintaining the illusion of a full boardroom.
The Dual-Model Architecture: We originally tried to force the streaming audio model to output our final JSON analytics report at the end of the call, which proved unstable. We solved this by splitting the workload: we use the WebRTC agent for the live interaction, but when the session ends, our backend intercepts the transcript and makes a standard REST API call to gemini-2.0-flash. This system override guarantees a perfectly formatted JSON report every single time.
Accomplishments That We're Proud Of
- Real-Time Tool Use & Fact-Checking: We integrated the
googleSearchtool directly into our Live Agent. If a founder exaggerates their Total Addressable Market (TAM) during the pitch, the AI will silently search the web in real-time and actively interrupt the founder to challenge their fabricated claims. - True Multimodal Vision: We are incredibly proud of the agent's visual awareness. A founder can point to a chart on their screen share and ask, "What do you think of this?", and the AI will accurately respond to the visual context. It also monitors the webcam to ensure the founder maintains confident eye contact.
- Zero-Latency Feel: By piping 16kHz PCM audio directly through WebSockets rather than relying on clunky, traditional text-to-speech APIs, the conversation flows at human speed.
What We Learned
Building for the "Live" paradigm requires a complete shift in UX thinking. You aren't designing for loading states anymore; you are designing for continuous human interaction. We also learned how to manage heavy payload limits in Google Cloud Run by carefully chunking WebM video and audio data directly in the browser before transmission.
What's Next for PitchNest
We envision expanding PitchNest into a multiplayer environment. By integrating WebRTC mesh networking, co-founders could join the same live session from different locations and tag-team the AI investor panel together. We also plan to introduce Custom Voice Cloning, allowing founders to train the AI to mimic the specific personalities, questioning styles, and voices of the actual real-world investors they are preparing to pitch to.
Built With
- express.js
- framer
- gemini-2.5-flash-native-audio-preview-12-2025
- node.js
- radix
- react
- recharts
- sqlite
- tailwind
- typescript
- websockets
- ws



Log in or sign up for Devpost to join the conversation.