Ainoa Interview Coach

Landing
Ainoa ChatUI
Ainoa Mixed Reality

Inspiration

Over the last months, sending hundreds of CVs and going through dozens of interviews, it became clear that the hardest part was not the technology itself but staying calm and structured when speaking English in front of a real interviewer. I saw many candidates (including myself) preparing with static question lists or generic chatbots that ignore the actual job description and never capture the pressure of a live conversation. When the Meta Horizon Start Developer Competition was announced, it felt like the perfect excuse to build the tool I wished I had: a mixed‑reality interview coach that lets you rehearse real scenarios while also becoming a product with value beyond the hackathon.

What the project does

Ainoa Interview Coach starts in 2D on the web: you paste a job description, write a short summary of the role, pick one of several presets (Frontend, Backend, Full‑Stack, Data/ML, DevOps, Product, Design, QA) or attach your CV. When a CV is attached, the system automatically scans it to extract name, title, skills and summary, generates a tailored interview profile and builds a concise CV summary that will later appear on a virtual CV card. Once the live session is ready, you press a single button to switch from the 2D interface into an immersive mixed‑reality room on Meta Quest, load the avatar and then enter MR mode.

Inside the MR scene, you see a 3D interviewer avatar, spatial audio and a floating CV card in front of you: if you started from a CV, the card shows the generated summary; otherwise it appears empty and you can use a 3D “Attach CV” button to upload it in the middle of the session. At any moment you can literally hand over your physical CV to the avatar by grabbing the virtual card with hand tracking and delivering it, triggering a new analysis and concrete recommendations to improve the resume. Each interview lasts up to 15 minutes or can be ended early with a Finish button; when it ends, the dashboard generates an evaluation report with a global score, strengths and areas for improvement, and you are free to restart as many sessions as you want.

How it was built

The project is built with Next.js (App Router), React and TypeScript, using next‑intl for localisation (English/Spanish) and a custom design system for the landing and dashboard. The immersive part uses React Three Fiber, @react-three/xr and a dedicated ImmersiveScene component that renders the avatar, MR environment, HUD and the interactive CV card, adapting the layout depending on whether the user is in 2D, VR or MR. Hand tracking is handled by a lightweight “CustomHand” system that reads XR hand poses, detects pinch gestures and drives a ray‑based pointer so the user can grab, move and deliver the virtual CV using only their hands.

For the conversational layer, Ainoa connects to a Gemini Live audio agent configured with role‑specific system prompts that are generated at runtime from the job description, CV data and previous sessions. Audio is streamed through WebSockets and processed via two AudioWorklets (input and playback) with shared analyzers for lip‑sync and voice visualisation, plus a spatial audio pipeline so the voice of the avatar feels anchored in the MR space. A separate evaluation API collects the final transcript and context and produces a structured performance report that is shown in the dashboard and stored as interview memory for future sessions.

Future plans

In the near future I want to reuse the same architecture to build more types of coaches on top of Ainoa: not only job interview practice, but also modules focused on presentation skills, negotiation and other high‑stakes conversations. Building on the current mixed‑reality pipeline, the long‑term goal is to integrate an ultra‑photorealistic avatar generated in real time with AI video, fully synchronised with the live audio so that the interviewer looks and behaves like a real person. As generative video and streaming technology matures over the next months, the project is designed to adopt these capabilities as soon as they are production‑ready while keeping the same MR interaction model and coaching logic.