DEMO CREDENTIALS IN THE VIDEO HAVE A SLIGHT TYPO!!! CORRECT ONES ARE AS FOLLOWS:
email: demo@example.com
password: demo@123
Inspiration
The technical interview is broken.
Every year, 65,000+ Computer Science graduates enter the job market. They spend months grinding LeetCode, memorizing algorithms in silence. Yet, they fail their dream interviews at Netflix, Google, and Meta.
Why? Because they don't know how to communicate while they code.
Existing tools are polarized: LeetCode: Great for code execution, but silent. It doesn't test your voice. AI Chatbots: Great for text chat, but they hallucinate code execution and are "too nice."
We realized that to pass a real FAANG interview, you don't need a cheerleader. You need a Strict Bar-Raiser. We wanted to build a simulation so high-fidelity that it actually induces anxiety. A "Shadowboxing" ring where you can get knocked out by an AI before you step into the real fight.
What it does
Shadows.sh is the first AI Technical Interviewer that judges your code and your communication simultaneously.
Real-Time Voice Simulation: You speak to the AI. It listens using Voice Activity Detection (VAD). If you stay silent for too long ("Dead Air"), it flags you. If you propose a brute-force solution, it interrupts you immediately, just like a senior engineer would.
The "Frankenstein" Execution: It doesn't guess if your code works. It sends your code to a Sandboxed C++ Compiler (Judge0). If your code segfaults or hits a Time Limit Exceeded (TLE) error, the AI sees the stderr logs and roasts you for it.
Temporal Diff Analysis: Unlike standard wrappers that only grade your final answer, Shadows analyzes your Submission Timeline. It diffs your code between attempts to understand how you fixed a bug (or how you broke it) and provides feedback on your debugging process.
The "No Hire" Audit: After the session, you get a brutal "Post-Mortem" report card. It grades you on Runtime Efficiency (from the compiler), Communication Confidence (from the audio latency), and technical correctness.
Challenges we ran into
The Latency War: Stitching Voice AI to a Compiler introduced massive latency risks. We had to optimize the handshake between the VAD events and the Judge0 callback to ensure the AI could comment on the code execution without a 5-second awkward pause.
The "Nice AI" Problem: LLMs are trained to be helpful assistants. They wanted to give the user the answer. We had to use aggressive System Prompting to force the AI to be "Strict" and withhold answers, forcing the candidate to struggle.
Handling State: Tracking the "Diffs" between submissions required building a snapshot engine in our database to capture every "Run" event, not just the final save state.
Accomplishments that we're proud of
The "No Hire" Stamp: We built a UI that elicits a genuine emotional response. The feedback report feels high-stakes.
The "Frankenstein" Stitch: Successfully getting an LLM to read a raw C++ compiler error log and explain it in natural language to a user in real-time.
What we learned
Complexity is a Feature: In a world of AI wrappers, "Hard Engineering" is the only moat.
Silence is Data: We learned that "what the user doesn't say" is just as important as what they do say. Measuring latency became our killer feature.
Spec-Driven Development: Using Kiro's Specs and Steering Docs saved us days of refactoring by getting the architecture right the first time.
How we Built it
This project is a Frankenstein monster, stitching together two technologies that hate each other: the "Fuzzy" world of LLMs and the "Rigid" world of Compilers.
The Brain (Voice): We used OpenAI Realtime API for sub-second voice interaction. We tuned the VAD (Voice Activity Detection) to detect "Dead Air" pauses, which feed into the candidate's communication score.
The Body (Execution): We integrated Judge0, a robust open-source code execution engine. When a user runs code, we spin up an isolated container, run hidden test cases, and pipe the raw JSON output back to the orchestration layer.
The Orchestrator: Built on Next.js 15 (App Router) and Tailwind CSS. We implemented a custom diffing engine using jsdiff to create the "Time Travel" view in the final report.
Built with Kiro: As a solo founder, building a multi-agent system in 2 weeks required extreme velocity. I used Kiro as my architectural co-founder:
Steering Docs: I used .kiro/steering/tech.md to enforce the project requirements and dumped my db schema for it to always have that in context when spinning up quick supabase api code or generating test cases for our platform.
Vibe Coding: I used Kiro's conversational coding to rapid-prototype the complex Dark Mode Dashboard and "Live Room" UI in record time.
MCP (GitHub): I connected Kiro directly to my repository via the GitHub MCP, allowing it to understand the full context of my "Frankenstein" architecture when writing new features.
What's next for Shadows
B2B Pilot: This can revolutionize tech hiring, if you are reading this as a judge, you know at times you dread taking initial interviews, block time outside of important work and lose focus, etc.
System Design Mode: Adding a whiteboard interface to practice high-level architecture interviews.
Expansion: Adding Java and Go support to the Judge0 sandbox.
Built With
- judge0
- kiro
- openai
- supabase
Log in or sign up for Devpost to join the conversation.