Inspiration for CheatCode: The Voiced Interview Simulator

Inspiration: The "LeetCode Trap"

Grinding Data Structures and Algorithms is no longer enough to get hired. We noticed a massive gap in how students prepare for tech jobs: they spend hundreds of hours silently memorizing code, but when they finally sit in a real interview, they freeze. Writing optimal code is only half the battle; if you can't clearly communicate your thought process, explain trade-offs, and defend your time complexity out loud, you fail. We built CheatCode to turn silent coders into confident communicators.

How we built it

CheatCode is built on a split-stack architecture designed for absolute zero-latency conversation.

The Frontend: We used Next.js and React to build a sleek, dark-mode visual interface, integrating the Monaco Code Editor so users have a familiar, VS-Code-like environment to type their solutions.
The Backend: A custom Node.js/Express server bridges our audio streams and handles the prompt engineering.
The "Ear" (Speech-to-Text): We utilize the OpenAI Whisper API to capture the user's spoken thoughts and transcribe .webm microphone data in under a second.
The "Brain" (LLM): We leveraged Gemini 2.5 Flash as the core reasoning engine. We fed it a custom JSON knowledge base of algorithmic problems and optimal approaches, prompting it to act as a strict engineering manager who pushes back on O(N^2) brute-force solutions.
The "Mouth" (Text-to-Speech): We used the ElevenLabs API (specifically the high-speed Flash v2.5 model) to convert the AI's responses back into a realistic human voice via Base64 audio strings, allowing the browser to play the audio instantly.

Challenges we ran into

Building a real-time, voice-to-voice loop in 24 hours is a massive latency challenge.

The JSON Bottleneck: Initially, we tried having the LLM output structured JSON responses. We quickly realized that waiting for the JSON to fully generate and parse destroyed the conversational flow. We pivoted to streaming raw Plain Text strings, which allowed us to trigger the Text-to-Speech engine instantly.
File Handling Nightmares: We lost hours debugging a 400 Invalid file format error from OpenAI, only to realize our backend middleware (multer) was silently stripping the .webm extensions off our audio blobs before sending them to the API.
API Quotas & Shadowbans: We fought through unexpected 402 Payment Required HTTP errors by learning how to properly configure API Key rotation and bypass restricted premium voices on the ElevenLabs free tier.

What we learned

We learned that the illusion of AI conversation lives and dies by latency. Shaving 500 milliseconds off an API fetch request is the difference between an app that feels like a robot and an app that feels like a real human interviewer. We also learned how to anchor an LLM to a "ground truth" (our JSON question bank) to prevent it from hallucinating overly complex solutions and unfairly failing the user.

What's next for CheatCode

Our next step is integrating TwelveLabs video-understanding AI for a "Proctoring" feature. We want the app to analyze the user's webcam feed to detect if they are looking away at a second monitor, perfectly simulating the high-pressure environment of a monitored online assessment.