🚀 Project Story: AI Technical Interviewer

Inspiration

As second-year software engineering students at Polytechnique Montréal, we have experienced firsthand the high-pressure environment of internship hunting. We realized that while many students are brilliant coders, the real challenge lies in verbalizing thought processes during technical interviews. We were inspired to build a tool that bridges the gap between solving a problem on a screen and explaining it to a human, creating a safe space for students to practice their communication and technical skills simultaneously.

What We Learned

This project was a deep dive into the world of real-time AI and high-performance web architecture:

Stateful AI Interactions: We learned that simple request-response cycles aren't enough for a natural conversation. Mastering the client.chats session management in the Google Gen AI SDK was crucial for maintaining context across multiple turns.
Real-time Voice Orchestration: Integrating ElevenLabs Scribe taught us the nuances of WebSocket streams, specifically how to handle "partial" vs. "committed" transcripts to ensure the AI doesn't interrupt the user mid-sentence.
System Prompt Engineering: We discovered the power of dynamic system instructions—learning how to inject live problem data into the AI’s "personality" so it acts as a specific subject matter expert for every unique challenge.

How We Built It

Our stack was chosen for speed, reliability, and modern developer experience:

Frontend: Built with React and Vite. We used custom hooks to manage the complex states of the voice recorder and the multi-turn chat interface.
Backend: A Flask (Python) server serves as the "brain," orchestrating the flow of data between the user, the LLM, and the Speech-to-Text engine.
AI Engine: We utilized Google Gemini 1.5 Flash for its speed and massive context window. We implemented a dynamic session handler that reconstructs the system prompt whenever a new problem is loaded.
Voice STT: ElevenLabs Scribe provided near-instant transcription, which we secured using server-side single-use token generation.

Challenges We Faced

The road to a working MVP was full of technical puzzles:

The "Circular Import" Trap: As our backend grew, we hit a wall with Python circular dependencies between our main server and our AI integration logic. We had to refactor our architecture to decouple data from logic, ensuring a clean and scalable codebase.
Strict Type Validation: The latest Google Gen AI SDK is built on Pydantic and is extremely strict. We spent hours debugging validation errors until we perfected the exact dictionary structure required for Content and Part types.
Latency vs. Accuracy: Balancing the speed of the voice-to-text conversion while ensuring the AI waited for the "Final Transcript" (the scribe.commit() call) before responding was a major UX hurdle that we solved with careful asynchronous timing.

Technical Conclusion

To provide high-quality feedback, our AI analyzes the efficiency of the user's code. For example, when practicing the Two Sum problem, the interviewer is programmed to steer the user away from a brute-force $O(n^2)$ solution toward an optimized hash map approach:

$$\text{Time Complexity}: O(n)$$ $$\text{Space Complexity}: O(n)$$

This ensures that the "Technical" in Technical Interviewer is always mathematically grounded.