LIVE PILOT — REAL DATA, NOT A DEMO WritenDraw has completed a live institutional pilot with a London further education college. 27 verified sessions. 10 real candidates. B2B access only — every candidate was provisioned by an administrator. No open registration. No lab conditions. MetricResultSessions completed27 — all verified and timestampedUnique candidates assessed10 across UK and IndiaVoluntary retake rate44% — candidates return unpromptedPaste attempts detected0 across all 27 sessionsAnti-cheat locks triggered6 sessionsAverage score16.2% — the platform is genuinely hard Score improvement over multiple sessions: one candidate improved 86% across sessions (11.7% → 21.7%). Another improved 680% (1.7% → 13.3%). Candidates are not completing one session and leaving — they are returning to get better. That is organic engagement at pre-seed stage with zero marketing spend. This is not a prototype. The product runs in production.
Inspiration Every year, thousands of graduates enter professional roles and face the same gap — they know theory but have never handled a real workplace crisis. Tutorials teach syntax. Bootcamps teach frameworks. Nobody teaches you what it feels like when a P1 incident drops at 9 AM and your senior is stretched thin. WritenDraw is a flight simulator for professionals — not to teach content, but to build judgment under pressure.
What it does WritenDraw drops users into realistic workplace scenarios with a team of AI colleagues — a PM, senior developer, QA engineer, and engineering manager — each with distinct personalities powered by Gemini. Users investigate problems, write real code, explain their reasoning, and ship fixes through natural chat interaction. Every response is evaluated by multiple agents simultaneously, each scoring from their own professional perspective. The admin case builder means scenarios span any industry — engineering, healthcare, finance, retail.
How we built it Multi-agent system: 5 autonomous agents using Gemini native function calling — Gemini decides which of 25+ tools to invoke, not Python routing logic. Evaluation: Multi-agent scoring where each agent evaluates from their own perspective (PM cares about impact, QA cares about edge cases, senior cares about understanding). Backend: Python/Flask, PostgreSQL, deployed on Google Cloud Run with Cloud Build CI/CD. Frontend: Live code editor, real-time inter-agent communication bus visible to users, and a full timestamped audit trail per session.
Challenges we ran into Making it truly agentic. Our first version had Python deciding tool use — Gemini just generated text. We rebuilt the entire tool layer so Gemini is the decision-maker, converting 25 agent tools into FunctionDeclarations. This was the hardest architectural change. Session state management — multi-agent state, chat history, evaluation scores, code submissions, and audit trails all persisted across steps without losing context. Anti-cheat under real conditions — session locking and paste detection had to fire correctly against real candidates in real sessions, not in a lab.
Accomplishments we're proud of
27 live sessions completed with full audit trails — production, not wireframe Gemini decides tool use autonomously — not scripted Python routing 5 agents with distinct personalities that adapt tone based on user performance Multi-perspective evaluation — same response scored differently by PM, QA, and senior dev Proactive agent triggers — agents message you unprompted based on your progress Admin case builder — create scenarios for any industry, not just software
What we learned Agentic AI isn't about having agents — it's about letting the LLM make decisions. The moment we stopped routing in Python and let Gemini choose its own tools, the agent behaviour became dramatically more realistic. Low scores are proof of value. A 16.2% average proves this cannot be easily passed — that is exactly what employers need.
What's next
ADK migration for richer agent orchestration and memory Vision input — agents that can see the user's code editor and react in real-time Multi-language support for global workforce training Institutional dashboard — universities and bootcamps tracking cohort performance Scenario marketplace — let companies publish their own onboarding simulations
Live demo: https://writendraw.com
Log in or sign up for Devpost to join the conversation.