Shadow Forge

ShadowForge The Interview Simulator That Fights Back You can call it “another mock interview app” until you try it.

Most prep tools are polite. They give you a problem, maybe a solution, and a dopamine hit when you click “Submit.” But interviews aren’t polite. They interrupt you mid-thought. They ask why, not just what. They force you to speak clearly while your brain is sprinting. They don’t care that you “knew it yesterday.”

ShadowForge was built to replicate that reality and then go beyond it with feedback, scoring, and personalization that a human interviewer can’t provide consistently.

Inspiration

There’s a specific kind of silence that only happens in interviews.

Not the normal silence where you’re thinking. The other one. The one where you realize you’ve been talking for three minutes… and you don’t actually know what you’re saying anymore.

That moment happened to one of us during a real interview loop. The problem was solvable. We had seen something similar before. But the interviewer did what good interviewers do: they didn’t let the solution stay theoretical.

“What’s your approach?” “What’s the time complexity?” “What if the input is empty?” “Can we do better?” “Okay code it.” As soon as we started implementing, we hit a small edge case. Nothing catastrophic. But the conversation shifted. The interviewer leaned in just slightly and asked the most painful question in interview history:

“Talk me through what your code is doing right now.”

And that’s when it clicked: the hard part wasn’t the algorithm. The hard part was performing the algorithm under pressure while staying coherent. Our prep didn’t train that. We trained correctness in isolation static problems, quiet rooms, “I’ll explain later.” But interviews demand real-time reasoning + communication + implementation + recovery.

That night, we wrote down what we wished existed:

A real interviewer that doesn’t just answer, but challenges. A voice-first experience because thinking out loud is the actual interview skill. Instant, structured feedback that doesn’t feel random. A scoreboard (ELO) so progress is measurable not vibes. A practice plan that attacks weaknesses automatically. ShadowForge is that tool built to feel like a training arena, not a worksheet.

What it does: A full interview training loop, not a single feature

ShadowForge is an AI interview platform with one obsession: turning interview practice into repeatable, measurable improvement.

It’s not “chat + editor.” It’s a system with state, scoring, progression, and a clear flow from “where you are” → “what to do next” → “proof you improved.”

The core flows How a user moves through the product ShadowForge is designed like a real training program. Here’s the “golden path”:

Flow 1 — Onboarding → Personalization

Pick your track: SWE, Quant Finance, or Investment Banking Choose a target company (so practice feels relevant) Choose a role level (so difficulty starts in the right place) Result: your initial profile is calibrated so the platform can recommend the right challenges

Flow 2 — Diagnostic → Calibration

You get a short set of timed problems You submit solutions like you would in a real session ShadowForge evaluates them and produces: Calibrated ELO Category strengths Weak categories Accuracy + detailed evaluations Result: you stop guessing your level ShadowForge measures it

Flow 3 — Practice → Targeted Reps

Search by title/tags Filter by category and difficulty band Toggle Weakness Focus to drill weak categories Solve problems with: hints (revealed intentionally) language selection code editor (Monaco) AI evaluation (correctness, complexity, edge cases) Result: practice becomes strategic instead of random

Flow 4 — Mock Interview → The Arena

ShadowForge creates an interview session and selects a problem based on: your ELO band your field (SWE/QF/IB) your weak categories The interviewer progresses through phases: Introduction (problem context + warm start) Reasoning (probe your approach, tradeoffs, edge cases) Coding (push you to implement in the editor) Feedback (evaluate + coaching) Result: it feels like a real interview, not a conversation with a bot

Flow 5 — Report → Reflection

When you finish (or pass), you get a real report: skill breakdown scores strengths/weaknesses recommendations a timeline of the session ELO change + new rating Result: every session becomes a training artifact

Flow 6 — Dashboard → Momentum

ShadowForge becomes your command center: ELO history accuracy trends streaks and achievements category analysis recent activity quick actions to launch the next session Result: improvement becomes visible and motivating Voice-first interviews Two modes, one outcome: talk like you’re there ShadowForge supports voice interviewing in a production-real way, while still ensuring nobody gets blocked if voice infra isn’t available.

Voice Mode (LiveKit + agent)

ShadowForge creates a real-time room per interview The candidate joins with a token An AI voice agent can join the room and conduct the interview conversationally Transcripts are captured so reports are grounded in what actually happened Fallback Voice (Browser voice)

If live voice services aren’t configured, ShadowForge falls back gracefully The interview still runs, still records transcript, still evaluates code, still produces a report The key point: voice is not a gimmick here. It’s the training mechanism that forces the hardest interview skill clear thinking out loud into every session.

Multi-domain prep SWE, Quant, and IB are first-class ShadowForge was designed to expand beyond SWE from day one.

SWE: algorithms + system design flavored problems QF: probability, statistics, brain teasers, options, mental math, market making IB: valuation frameworks, DCF/WACC, LBO, accretion/dilution, accounting flows, behavioral prep This matters because the platform’s personalization engine categories, strengths, weak areas, recommendations works the same way across domains. That’s the architecture advantage: one progression system, multiple interview worlds.

How we built it##

ShadowForge is not a single-page demo. It’s an integrated system with:

a modern web app interface a backend orchestration layer persistent data storage an LLM evaluation pipeline a real-time voice agent architecture Product architecture - The “brain” lives in orchestration Instead of treating AI like a single prompt, ShadowForge treats it like a workflow:

Problem selection is deterministic logic (ELO bands, field filters, weak categories, diversity) Interview conversation is phase-aware and transcript-driven Code evaluation is structured JSON for reliability Reports are stored artifacts, not ephemeral messages Recommendations are generated by scoring, not guesswork That’s what makes it feel like a simulator: the system has rules.

Frontend — Built to feel like a premium product ShadowForge uses a modern, high-polish UI stack:

Next.js (App Router) + React Motion-forward UI (smooth transitions, responsive layouts) A real code editor experience via Monaco Charts and breakdowns for analytics Optional immersive visual layers (3D environments, starfield/terminal aesthetics) The UI is part of the training. Interviews are stressful; the interface has to be calm, readable, and confidence-building—without being bland.

Backend - API routes as the control plane ShadowForge uses server endpoints as an orchestration layer that handles: creating sessions logging transcripts evaluating code generating reports updating user progression Key capabilities include:

User lifecycle: create/fetch/update user preferences and baseline data Onboarding calibration: initialize ELO and category strengths Problem retrieval: list, filter, and generate recommendations Diagnostic evaluation: compute calibrated ELO and weak areas Attempts tracking: record practice outcomes and update ELO Interview engine: create interview, chat, evaluate code, end interview, generate report Voice tokens/rooms: create voice session rooms and issue candidate tokens This is what turns “AI features” into “AI product.”

Data layer - Persistent progress via Supabase ShadowForge stores real entities:

Users: rating (ELO), field, target company, role level, category strengths, weak categories Problems: field/category/difficulty, hints, test cases, starter code, tags Attempts: practice outcomes (correctness, score, feedback, time, ELO delta) Interviews: transcript, scores, strengths/weaknesses, recommendations, timeline report Persistence unlocks the things that make ShadowForge feel real: streaks history achievements dashboards recommendations that evolve

The ELO system - Progress is a number you can’t negotiate with ShadowForge uses ELO as a core mechanic. Not as decoration.

ELO delta is computed using expected-score math (based on user ELO vs problem ELO) It includes a time factor to reward fast correctness slightly and discourage overly slow correctness Deltas are clamped to keep the rating stable and fair Category strengths adjust gradually based on outcomes, producing weak categories This feeds directly into:

recommended problems near your ELO band weakness focus practice dashboard analytics “tier” identity and motivation The AI pipeline - Qwen3 via OpenRouter, structured for reliability ShadowForge’s AI layer is built like production software, not a prompt experiment.

Uses OpenRouter’s OpenAI-compatible interface Uses distinct model roles (text vs coder vs vision) Forces structured outputs for evaluation and feedback Includes parsing cleanup for model quirks Provides fallbacks if the model isn’t configured or errors occur AI responsibilities include:

Interview responses: phase-aware coach/interviewer behavior Code evaluation: correctness, complexity, edge cases, suggestions Interview feedback: multi-axis score breakdown and recommendations Optional vision hooks: whiteboard/diagram analysis wiring Real-time voice agent - Why this is not “just TTS” ShadowForge uses LiveKit for real-time voice and a separate Python LiveKit Agent for the interviewer.

The agent: joins interview rooms based on room naming fetches the interview’s selected problem from the backend runs a real-time pipeline (STT → LLM → TTS) via LiveKit inference providers writes transcript updates back to the backend so reports are real This matters because it’s exactly how scalable voice systems are built:

the web app orchestrates sessions and storage the voice agent handles real-time interaction and audio latency.

Challenges we ran into

Latency vs depth If the interviewer pauses too long, the experience breaks. If the interviewer responds instantly but shallowly, it feels fake. ShadowForge balances this by:

keeping responses concise and purposeful using phase rules so the interviewer doesn’t wander prioritizing “next best move” guidance under pressure Conversation state that doesn’t fall apart Interviews have phases and memory. ShadowForge had to:

store transcript consistently infer phase transitions from user behavior keep candidates in reasoning when stuck push them to coding when they’re ready This is where many interview bots fail: they either never transition, or they transition too early and ruin confidence.

Fair evaluation The fastest way to lose a user is to mark a correct solution incorrect. ShadowForge’s evaluator is explicitly designed to:

prioritize logical correctness avoid punishing style evaluate edge cases realistically provide actionable suggestions instead of vague criticism Voice agent reliability Real-time voice systems introduce new failure modes:

room creation timing agent availability mic permissions transcript event delivery ShadowForge is designed with graceful degradation: if the “full voice” system isn’t there, the user still gets a complete interview experience.

Personalization without boredom If you only drill weak categories, sessions become repetitive. ShadowForge recommendation logic mixes:

ELO proximity weak category targeting company alignment diversity constraints So the practice plan remains engaging while still being strategic.

What we learned - The real insights from building

Interview prep is a system, not content Problems alone don’t make you better. The loop does:

calibration feedback progression targeted reps consistent measurement Reliability beats cleverness

Structured JSON outputs, parsing hardening, and fallbacks are what make users trust the product. Trust is retention.

People practice when they can see progress ELO, tiers, streaks, achievements, and dashboards aren’t fluff. They create momentum.

Voice is the closest thing to “real interview training” Typing helps. Voice transforms. It trains the skill that matters: thinking clearly out loud under pressure.

"This is Not another mock app, an interviewer that pushes back and proves you’re ready" Built with ❤️ for students and early- career builders.