InterviewerAI - AI-Powered Voice Interview Simulator

Inspiration

Technical interviews can be nerve-wracking, especially when you're trying to articulate complex concepts under pressure. I've seen countless developers struggle with interview anxiety, and many talented engineers fail to showcase their skills simply because they lack practice speaking about their work.

I wanted to create a solution that would allow developers to practice technical interviews anytime, anywhere, without the pressure of a real interview. The idea was to build an AI-powered interview coach that could simulate realistic technical interviews, adapt to different skill levels, and provide instant, constructive feedback.

The vision was to make interview preparation accessible, affordable, and effective - helping developers build confidence and improve their communication skills before facing real interviewers.

What I Learned

Building InterviewerAI taught me several valuable lessons:

Voice-First Design: Implementing real-time speech recognition using the Web Speech API was more challenging than expected. I learned about handling browser compatibility issues, managing microphone permissions, and creating a smooth user experience for voice input.

AI Integration: Working with Google Gemini API required understanding prompt engineering, handling API rate limits, retry logic, and ensuring consistent JSON responses. I developed robust error handling and fallback mechanisms to ensure the app remains functional even when API calls fail.

State Management: Managing interview state across multiple API routes in a stateless Next.js architecture required careful planning. I learned to structure data flow efficiently and handle session management without relying on server-side state.

User Experience: Creating an intuitive interface that guides users through the interview process while displaying real-time feedback required careful UX design. I focused on making the app feel conversational and supportive, rather than intimidating.

Text-to-Speech Integration: Integrating ElevenLabs API for natural-sounding speech synthesis added a layer of complexity. I learned about audio encoding, base64 handling, and creating seamless audio playback experiences.

How I Built It

InterviewerAI is built entirely with Next.js 14, leveraging its powerful API routes for backend functionality and React Server Components for the frontend.

Architecture Overview:

The application follows a three-phase flow:

Start Phase: User inputs their information and selects role/level
Interview Phase: Dynamic Q&A with real-time evaluation
Results Phase: Comprehensive final report with detailed feedback

Key Components:

Frontend (/frontend/app):

page.tsx: Main application state management
StartSection.tsx: Initial form for user information
InterviewSection.tsx: Core interview interface with speech recognition
ResultsSection.tsx: Final report display

Backend (/frontend/app/api):

/api/session/start: Creates new interview session, generates first question
/api/session/[sessionId]/answer: Processes answers, evaluates, generates follow-ups
/api/session/[sessionId]/end: Generates final comprehensive report

Core Libraries (/frontend/lib):

interview-engine.ts: Orchestrates interview flow logic
gemini.ts: Handles all Google Gemini API interactions
elevenlabs.ts: Manages text-to-speech generation
speechRecognition.ts: Browser Web Speech API wrapper
prompts.ts: Structured prompt templates for AI interactions
api.ts: Frontend API client functions

Technology Stack:

Frontend: Next.js 14, React 18, TypeScript
AI/ML: Google Gemini 2.5 Flash for question generation and evaluation
Voice: ElevenLabs API for text-to-speech, Web Speech API for speech recognition
Deployment: Vercel (serverless functions)
Styling: Custom CSS with modern animations and responsive design

Key Features Implemented:

Real-time Speech Recognition: Browser-native Web Speech API with continuous listening and interim results
Adaptive Question Generation: AI generates questions based on role, level, and previous answers
Instant Evaluation: Each answer is scored across four dimensions (technical knowledge, problem-solving, communication, relevance)
Dynamic Follow-ups: Questions adapt based on performance - deepening, clarifying, simplifying, or moving topics
Comprehensive Reporting: Final report includes overall score, rubric breakdown, and personalized next steps
Voice Interaction: Natural-sounding interviewer voice using ElevenLabs TTS

Challenges Faced

Challenge 1: API Response Consistency Google Gemini sometimes returned responses in markdown code blocks or with extra formatting, breaking JSON parsing. I solved this by implementing robust JSON extraction logic that handles multiple response formats and includes fallback evaluation objects.

Challenge 2: Speech Recognition Browser Compatibility The Web Speech API has varying support across browsers. I implemented feature detection and graceful degradation, ensuring the app works even when speech recognition isn't available, falling back to text input.

Challenge 3: Stateless Architecture Next.js API routes are stateless, making session management tricky. I designed the system to pass all necessary context (history, scores) with each request, allowing the AI to maintain conversation context without server-side storage.

Challenge 4: Real-time Audio Playback Coordinating audio playback with UI updates required careful state management. I implemented proper audio event handling and ensured smooth transitions between questions.

Challenge 5: Prompt Engineering Getting consistent, high-quality responses from the AI required extensive prompt engineering. I created structured prompt templates with clear instructions, examples, and output formats, ensuring the AI acts as a professional interviewer.

Challenge 6: Error Handling & Resilience API failures can break the user experience. I implemented comprehensive error handling with retry logic, fallback responses, and user-friendly error messages that guide users without breakin