DayZero

Hero image of DayZero
Architecture
Final step before initiatiating the interview
Sam speaking during the interview
This is what a DayZero report looks like.

Inspiration

Every founder eventually faces the gatekept ritual: a 10-minute partner interview at a top accelerator, where months of work gets stress-tested by people who've seen thousands of pitches. The feedback loop is broken...most founders get nothing until it's too late to pivot.

We asked: what if every founder could run that interview on Day Zero, before raising a single dollar or writing a single line of code? What if the most brutal, honest feedback was always available... at 2am, free, and instant? DayZero is that panel.

What It Does

DayZero is a multimodal AI agent pipeline that validates startup ideas through four sequential phases, each powered by Gemini:

1. Live Interview (Gemini Live API)

The user has a real-time voice conversation with "Sam": an AI partner trained on YC interview patterns. Sam asks probing, uncomfortable questions about market size, competition, distribution, and founder/market fit. The conversation is genuinely bidirectional: the user can interrupt Sam mid-sentence, and Sam adapts in real time. This isn't a chatbot, it's a live audio agent.

2. Pitch Deck Analysis (Gemini 2.5 Flash · Multimodal Vision)

Users upload their pitch deck (PDF or PPTX). DayZero rasterizes each slide and sends them as images to Gemini Flash with a detailed critique prompt. The agent returns a slide-by-slide breakdown: narrative arc score, visual clarity, content text extraction, missing slides (team, traction, ask), and an overall assessment.

3. Market Intelligence (Gemini + Google Search Grounding)

DayZero uses Gemini's native Google Search grounding tool to validate the pitch's market claims against live web data. It returns a competitor matrix, market size evaluation, tailwinds/headwinds, and pivot suggestions — every claim backed by a real URL and a confidence score.

4. VC Deliberation + Verdict (Multi-Agent · 3 Rounds)

Three distinct VC personas: Paul (Skeptic), Elad (Optimist), and Keith (Operator), debate the pitch over 3 sequential rounds. Each persona reads the full conversation history before speaking, so arguments build, challenge, and reference each other. After deliberation, the system synthesizes a final investment decision: PASS / SOFT PASS / NO, with a 0–100 score across six weighted dimensions, top strengths, top risks, and actionable next steps.

How We Built It

Core Technology Stack

Layer	Technology
AI Models	`gemini-2.5-flash-native-audio-preview` (Live), `gemini-2.5-flash` (all text/vision)
Agent Framework	Google ADK (`LlmAgent`, session management)
GenAI SDK	`google-genai` Python SDK for all model calls and Live API
Search Grounding	`types.GoogleSearch` native tool
Backend	FastAPI + uvicorn, Python 3.11, WebSockets
Frontend	React 19 + TypeScript + Vite, Tailwind CSS v4, Three.js (3D orb), GSAP
Audio Pipeline	Browser `AudioWorklet` (float32 → PCM16) → WebSocket → Gemini Live → PCM24 → `AudioContext`
Storage	SQLite (persistent sessions across restarts)
Deployment	Docker (python:3.11-slim) → Google Cloud Run

Architecture

DayZero Architecture

The backend is a single FastAPI application deployed on Google Cloud Run. Google ADK's LlmAgent serves as the root orchestrator , it extracts structured pitch_context from user input and manages session.state (the shared memory all agents read from). Specialist agents are called as Python functions for performance.

The Live Interview agent opens a bidirectional WebSocket tunnel from the browser through FastAPI to the Gemini Live API. Audio is streamed as raw PCM frames in both directions. The model's Voice Activity Detection handles turn-taking; the user can interrupt at any time.

The Deck Analyst uses pdf2image (Poppler) or a native python-pptx + Pillow renderer (no LibreOffice needed in production) to rasterize slides, then sends them as inline_data blobs in a multimodal Gemini Flash call requesting JSON output.

The Market Validator constructs a Gemini Flash call with types.GoogleSearch as a tool, forcing the model to ground every claim against live search results before returning.

The Deliberation engine runs three rounds of sequential persona calls. Each call injects the full debate_rounds history from session.state, so each VC persona reads what the others have said before responding, simulating genuine cross-examination.

Challenges We Ran Into

1. Real-time audio routing through FastAPI
The Gemini Live API speaks directly over a WebSocket, but our backend needed to relay audio between the browser's WebSocket and Gemini's WebSocket simultaneously. Getting bidirectional PCM streaming reliable — with correct frame sizes, sample rates (16 kHz in, 24 kHz out), and no blocking — required building a careful async relay loop.

2. Multimodal session coherence
Each downstream agent (market, deliberation, verdict) needs context from every upstream phase. Making session.state the single source of truth — and having ADK manage it atomically — prevented race conditions when multiple agents ran concurrently as background tasks.

3. Docker image size
The original Dockerfile included LibreOffice for PPTX conversion, bloating the image to ~1.2GB. We replaced it with a native python-pptx + Pillow renderer that extracts text, images, and shapes from the PPTX OPC structure directly, keeping the image under 350MB and cold starts fast on Cloud Run.

4. Genuine VC personas, not generic AI
Getting three personas to genuinely disagree — and reference each other's prior arguments — required injecting full debate history into each persona's context and carefully prompting for role-specific reasoning styles (adversarial skepticism vs. narrative optimism vs. operational rigor).

Accomplishments That We're Proud Of

A truly interruptable live audio agent: users can cut Sam off mid-sentence and the conversation flows naturally — exactly the Live API's value proposition.
End-to-end multimodal pipeline: voice → deck images → web-grounded research → multi-agent debate → scored verdict, all in one coherent session.
Three VCs that actually disagree: the deliberation produces genuine, cross-referencing debate, not three copies of the same opinion.
Zero infrastructure cost to demo: runs on Cloud Run free tier with a Google AI Studio key — no credit card charges for judging.

What We Learned

The Gemini Live API's VAD and interruption handling is powerful but requires careful audio buffer management on the client side (AudioWorklet is the right tool, not ScriptProcessorNode).
Google ADK's InMemorySessionService and session.state dict are an elegant way to share context across agents without a message bus.
Google Search grounding dramatically improves the quality of market validation claims vs. relying on the model's training data alone.
Multi-stage Docker builds with slim base images make a real difference for Cloud Run cold start times.

What's Next for DayZero

Video input: extend the Live API session to accept video frames, so Sam can also evaluate the founder's delivery, confidence, and body language.
Persistent user accounts: save sessions across devices; track how a pitch improves over multiple DayZero runs.
Real investor calibration: fine-tune persona responses against public YC interview transcripts.
Async coaching: after the verdict, a coaching agent generates a week-long prep plan with specific exercises.