# 🎭 Inspiration
Pilots don't practice on real planes. Surgeons don't operate on real patients without cadaver training. Athletes scrimmage before every real game. Musicians do dress rehearsals before opening night. Every high-stakes profession has figured out that you shouldn't touch reality cold — except decision-makers. The people making the most consequential calls in their own lives — job interviews, salary negotiations, performance reviews, sales pitches, hard conversations — still practice on the real thing. One shot. No replays. We kept asking why. The tooling for this has always been bad: friends playing counterparties are too nice, coaches are expensive and scheduled in advance, ChatGPT roleplay is a single conversation with a generic agent that has no idea what company you're actually interviewing at. None of it stress-tests you against the distribution of ways things could go. So we built the thing that should have existed already: a war-gaming engine for the conversations you're about to walk into. You describe the situation in full detail — who you are, who they are, what you want, what they likely want, what could go wrong — and Dress Rehearsal runs parallel simulations across best, likely, and worst cases, surfaces the curveballs you haven't prepared for, and graduates you to a live voice rehearsal when your confidence score says you're ready. The pitch is simple: before the curtain rises, rehearse.
# 🎬 What It Does
Dress Rehearsal is an AI-powered rehearsal engine for high-stakes conversations. The core loop:
- You describe the scenario in full. For a job interview, that means pasting the job description, your resume/skills, the company, your salary expectations, what you're nervous about, and any context you have on the interviewer. The more you tell us, the sharper the simulations get.
- We run 10 parallel simulated interviews. Spread intentionally across best-case, likely-case, and worst-case scenarios. Each simulation pairs two agents: one representing you, one representing a counterparty archetype (Technical Skeptic, Hard Negotiator, Culture Probe, Friendly Evaluator, etc.) grounded in real context scraped about the company.
- Each simulation doesn't give you a novel — it gives you a battle report. For every run, you see:
Curveball questions the interviewer would likely ask, tagged by risk (🔴 unaddressed in your prep, 🟡 weak, 🟢 strong) Gap analysis showing exactly where your prepared context doesn't answer what they'll ask The single critical moment — the 3-5 line exchange that most determines the outcome of that scenario
- An aggregate dashboard ranks your prep priorities. Across all 10 simulations, we surface your top 5 highest-risk curveballs, your 3 most consistent weak points, and a ranked prep checklist of what to nail down before the real thing.
- You iterate. Add context. Prepare responses to the curveballs you got caught on. Re-run. Watch your confidence score climb.
- When you're ready, you graduate to voice. Pick a scenario type (we recommend worst-case) and launch a live Vapi voice call with the counterparty persona. Real audio. Real pressure. Real transcript scored afterward. The whole thing is a training loop: text simulation → gap analysis → preparation → iteration → voice rehearsal → ready for reality.
# 🛠️ How We Built It
Dress Rehearsal is a full production-grade stack. Every sponsor integration earns its spot — remove any one and something specific breaks. Backend & Core App — Insforge The entire backend was vibecoded through Insforge's MCP. Auth, Postgres schema, file storage, and edge functions — all built by our coding agent talking to Insforge's semantic layer. Users, scenarios, iterations, and voice call records live in relational Postgres; uploaded resumes and job descriptions live in Insforge storage. Our git log shows a backend built in about 90 minutes through natural-language prompts — no hand-written CRUD. Real-Time Simulation Engine — Redis Each of the 10 parallel simulations runs as a Redis Stream. Dialogue turns are XADD'd to the stream; the frontend subscribes via Server-Sent Events and watches all 10 simulations populate simultaneously in real time. Redis Sorted Sets rank simulations by outcome score as they complete. Hashes hold the user's running confidence score, updated live across iterations. Pub/Sub broadcasts outcome events the moment a simulation resolves. No database sits in our hot path — only Redis. Grounded Counterparty Context — TinyFish Generic personas are what makes most multi-agent demos feel hollow. Before each rehearsal, TinyFish agents scrape real context: the company's recent news, Glassdoor interview experiences, funding history, leadership backgrounds. We inject this into the interviewer persona prompts so the counterparty actually feels like they work at that company — not a generic "interviewer." This is the difference between practicing against a cardboard cutout and practicing against a lifelike opponent. Federated Grounding Data — WunderGraph Cosmo Simulations need more than company context. They need salary benchmarks, industry context, comp data, precedent outcomes. We federate all of these structured sources into a single GraphQL supergraph through Cosmo. Each persona queries what they need through one endpoint; adding a new data source means adding a subgraph — zero agent code changes. Different persona archetypes get different subgraph scopes, enforced at the federation layer. Persona Registry & Governance — Guild.ai Every counterparty archetype (Technical Skeptic v1.2, Hard Negotiator v2.0, Culture Probe v1.4, etc.) is a registered Guild agent with versioned identity, scoped tool permissions, and an immutable audit log. Every turn an agent generates is traceable back to exactly which persona version did what. The architecture is designed so anyone can fork a persona, improve it, and contribute it back — an ecosystem of war-game archetypes on top of Guild's registry. Historical Archive & Calibration — Ghost (TigerData / TimescaleDB) Every simulation ever run is archived to TimescaleDB hypertables: the full transcript, persona versions used, outcome scores, branch points, and the user's iteration context. This becomes a time-series research corpus. Over time, we'll know which persona-archetype combinations best predict real-world outcomes for which scenario types. When users report back what actually happened, we pair ground truth against simulation predictions to calibrate the confidence score. The archive is the learning loop. Voice Graduation Layer — Vapi Once the user's confidence bar crosses threshold, they can launch a live voice rehearsal. A Vapi assistant is spun up with the full counterparty persona context — including TinyFish-scraped company grounding — and conducts a real voice interview. The transcript is pulled post-call, scored against the user's stated goals, and fed back into the iteration history. Multiple voice rehearsals compound into a progression curve the user can see. The Redesign Moment We originally built the simulation as a straight conversational transcript — just two agents talking and a log of the dialogue. We killed that design on day one after our first test run. The conversations were too dense, too natural, and too focused on the AI's improvisational failures rather than the user's real preparation gaps. We rebuilt the engine around structured outputs — curveball questions, gap ratings, and single critical moments — and the product immediately became useful instead of performative.
# ⚔️ Challenges We Ran Into
The simulation design problem. Our first-version simulations read like novels: two agents having a polite, meandering conversation that surfaced AI-invented failures rather than real preparation gaps. The user-proxy agent was inventing answers the real user would never give, and the system was scoring those fake failures as if they mattered. We had to redesign around the principle that the user-proxy agent is constrained to the user's actual prepared context — if the user didn't provide it, the simulation surfaces it as a gap rather than improvising through it. Parallel orchestration without Redis getting mad. Running 10 simultaneous LLM-driven agent conversations, each generating turn-by-turn dialogue, each streaming to the UI live, without the backend choking — that's genuinely a real-time systems problem. We moved the entire hot path off Postgres and onto Redis Streams, which fixed it, but required rearchitecting the event model from request-response to event-driven. Persona authenticity. Early personas felt hollow even when well-prompted. The fix was TinyFish-scraped grounding — feeding real company context (Glassdoor threads, recent news, leadership info) into the persona prompt. The difference between a persona with and without grounded context is stark. Without TinyFish, personas say things like "we really value teamwork here." With TinyFish, they ask "given our recent restructure around the platform team, how do you think about internal mobility early in your tenure?" Not over-engineering the federation. We were tempted to build an elaborate GraphQL schema across 15+ data sources. We scoped it down to four sources for the MVP (company, salary, industry, user profile) and let Cosmo's federation handle the unified interface. Adding sources later is trivial — the architecture is built for it. Scope discipline under vibecoding. Because our coding agent could build anything, we kept wanting to build everything. We phased the build hard: core loop → parallel execution → grounding → federation → archive → voice. Phases 1-5 had to ship solid before we touched Vapi, and we held that line. The confidence score calibration question. A confidence bar that's not grounded in anything is just theater. We had to design the scoring such that it reflects real improvement across iterations, not just "number goes up when user does more stuff." We landed on a composite: goal-achievement rate across the 10 sims, severity of worst-case outcomes, and trend across iterations. It's imperfect but honest, and the Ghost time-series archive is what will make it genuinely calibrated over time.
# 🏆 Accomplishments That We're Proud Of
We built a product, not a science project. The very first design was indulgent — 10-agent marketplaces, recursive bidding, theatrical UIs. We scrapped it in favor of a product that has a clear user (someone with a high-stakes conversation this week), a clear job (rehearse and find gaps), and a clear graduation moment (voice call when ready). Restraint was the hardest engineering we did. Every sponsor integration is load-bearing. We didn't stack logos. Removing Insforge kills persistence. Removing Redis kills the live parallel UX. Removing TinyFish kills persona authenticity. Removing Guild kills governance. Removing Ghost kills the learning loop. Removing Cosmo kills data unification. Removing Vapi kills the voice graduation. Every one breaks something specific. The simulation output redesign. Moving from "transcript you read" to "structured gap report you act on" is what made Dress Rehearsal useful. That redesign happened mid-build based on our own user testing, and it's the single decision that unlocked the product. The MCP-built backend. Our Insforge backend was built entirely through natural-language prompts to our coding agent, which called Insforge's MCP. Auth, schema, storage, edge functions — zero hand-written CRUD. This is what backend engineering looks like in 2026 and we wanted to prove it end-to-end. The voice graduation loop. Most AI rehearsal tools are text-only. Ours ends with a real voice call where the user has to perform in real time, under pressure, against a grounded persona. That emotional graduation from "I've prepared" to "I've performed" is the part that makes users actually walk into the real conversation confident. We kept the name beautiful. "Dress Rehearsal" tells you almost nothing about the AI under the hood, and everything about what it means to the user. The best product names point at the feeling, not the feature.
# 📚 What We Learned
Parallelism is the product. Running one AI conversation teaches the user something. Running 10 across a spread of scenarios teaches them the distribution of what could happen. That's categorically different. The parallelism isn't a technical flex — it's the reason Dress Rehearsal helps people who don't know what they don't know. Grounding beats prompting. You can prompt an LLM to "be a skeptical interviewer at a fintech startup" all day. It'll do fine. But feeding it actual scraped content from that fintech startup's Glassdoor threads turns a generic skeptic into a specific skeptic whose questions sound like the real thing. The delta between "prompted" and "grounded" is bigger than we expected. User-proxy agents shouldn't be creative. Our biggest early mistake was letting the agent representing the user improvise beyond the provided context. "Creative" responses produced fake failures. The fix was constraining the user-proxy strictly to what the user actually prepared — and then surfacing the gaps as unaddressed rather than covering them with improvisation. This is a general lesson for simulation design: constraint is clarity. Time-series thinking separates apps from products. Treating simulation data as an ever-growing corpus — not just records — opens up the learning loop, the calibration question, the research angle, and the moat. Every rehearsal Dress Rehearsal runs makes the next one better. Without that architectural choice, we'd have a one-shot tool. Voice graduation is an emotional unlock, not just a feature. We debated whether Vapi was worth integrating for the cash-to-effort ratio. Then we tried the first voice rehearsal and realized: text rehearsal prepares your arguments, voice rehearsal prepares your nervous system. Users need both. Skipping voice would have left the product emotionally incomplete. Vibecoding doesn't mean "build anything" — it means "build the right thing, fast." When your coding agent will build literally whatever you ask, the constraint shifts from "what can we build" to "what should we build." We caught ourselves multiple times reaching for complexity that didn't serve the user. The discipline of staying close to the actual use case is the real skill when you're vibecoding.
# 🚀 What's Next for Dress Rehearsal
Scenario library expansion. We launched with job interviews because they're universal and high-stakes. Next up: salary negotiations (standalone, outside interview context), performance reviews, sales pitches, investor meetings, hard conversations with reports, and college admissions interviews. Each scenario type becomes its own Guild-registered template with scenario-specific persona archetypes. Persona marketplace via Guild's Agent Hub. Right now our personas are internally-authored. The long-term vision: anyone can author and publish a persona archetype ("Ex-FAANG Principal Engineer," "Aggressive Biglaw Partner," "Seed-Stage VC"), version it, and earn when others rehearse against it. Guild's fork-and-contribute model makes this native. Calibration from real outcomes. Users will come back after the real conversation and tell us what happened — did you get the offer? What salary? What questions actually came up? Pairing that ground truth against the simulation predictions in TimescaleDB lets us calibrate which persona-archetype combinations are actually predictive. Over 10,000 rehearsals, we'll be able to tell a user "our historically most accurate archetype for this company type predicts a 72% likelihood of a salary curveball — here's what you should prep." Multi-party rehearsals. Panel interviews. Board presentations. Multi-stakeholder negotiations. The architecture already supports multiple counterparty agents — we just need the UX to handle it. A user rehearsing a board pitch would have five agents on the other side, each with distinct priorities. Enterprise mode for training teams. Sales leaders would pay real money to rehearse their reps against persona archetypes of their actual customers. Recruiters would pay to rehearse their hiring managers against candidate archetypes. Legal teams would pay to rehearse depositions. This is the business model: consumer-freemium for practice, enterprise-paid for team training with custom persona libraries. Live coaching layer. During voice rehearsals via Vapi, a second model watches in real-time and provides coaching notes after the call: "You lowered your voice when she pushed on salary — you do this consistently in rehearsals." The rehearsal becomes not just a stress test but a mirror. Cross-rehearsal learning. Right now each user's rehearsals teach them about their upcoming conversation. Long-term, aggregated (anonymized) rehearsal data could teach users about patterns across all users: "Junior engineers in your position typically underestimate Docker questions by 40%." The archive becomes a collective knowledge base. The flight-simulator moat. Flight simulators weren't a product — they became infrastructure. Every pilot trains on them. Every airline requires them. Our long-term bet is that "you rehearsed in Dress Rehearsal" becomes a credential — that people show up to important conversations visibly more prepared than those who didn't, and the rest of the world catches on. Before the curtain rises — rehearse.
Built With
- guild
- insforge
- redis
- typescript
- vapi
Log in or sign up for Devpost to join the conversation.