Inspiration

Visa interviews are one of the most high-stakes conversations a person can have — and most people walk in completely unprepared. Not because they lack the right answers, but because they've never practiced being watched while they answer.

I experienced this firsthand. I knew my application was solid, but sitting across from a consular officer who was silently reading my face while I spoke was something no amount of written prep could have prepared me for. I stumbled. I broke eye contact at the wrong moment. I looked nervous answering questions I had perfectly rehearsed.

That experience made me wonder — what if you could practice with an interviewer that actually sees you?

What it does

VisaReady is a live AI visa interview simulator that watches you, listens to you, and responds to both simultaneously — exactly like a real consular officer would.

  • 🎥 Vision-aware questioning — the agent reads your facial expressions, eye contact, and stress signals in real-time via your camera
  • 🎙️ Live voice conversation — natural back-and-forth with barge-in support, no turn-taking buttons
  • 🧠 Adaptive behavior — if the agent detects nervousness or inconsistency between your words and your expression, it probes deeper, just like a real officer
  • 📋 Post-session debrief — full transcript with a breakdown of where you hesitated, where your answers were strong, and specific suggestions to improve
  • 🌍 Multiple visa scenarios — US B1/B2 tourist, Schengen, UK Standard Visitor, student visas

The core insight: a visa interview isn't just about what you say — it's about how you look saying it. VisaReady is the only prep tool that trains both.

How we built it

  • Gemini Live API — powers the real-time multimodal understanding of both the audio stream and video feed simultaneously
  • LiveKit — handles real-time WebRTC audio/video transport between the browser client and the agent backend
  • Google Cloud Run — hosts the agent backend, scales to zero when idle
  • Vertex AI — model serving and grounding for post-session debrief generation
  • Google GenAI SDK — core integration layer between our backend and Gemini

The agent receives a continuous video + audio stream via LiveKit. Gemini Live API processes both modalities simultaneously — it isn't just transcribing speech and separately analyzing frames. The multimodal context is unified, meaning a hesitation in voice combined with a break in eye contact triggers a different follow-up than either signal alone.

Challenges we ran into

The hardest challenge was making the vision component genuinely load-bearing rather than decorative. Early versions used video but didn't meaningfully change agent behavior based on visual signals. Getting Gemini to reliably act on combined audio-visual cues — rather than treating them as independent inputs — required significant prompt engineering and session context management.

Latency was the other major challenge. Real interviews feel continuous and unforgiving. Any noticeable lag breaks the psychological realism that makes the practice valuable. Optimizing the LiveKit → Gemini Live API pipeline to stay under 800ms end-to-end was critical.

Accomplishments that we're proud of

The moment the agent interrupted a candidate mid-answer because it detected visible hesitation — unprompted, exactly like a real officer would — was when we knew the multimodal integration was genuinely working. It wasn't following a script. It was reading the room.

What we learned

That multimodal doesn't mean using multiple inputs in parallel — it means the combination of inputs producing insights neither could alone. Building VisaReady taught us that the real power of Gemini Live API isn't audio or vision — it's the unified understanding of both at once.

What's next for VisaReady

  • Expand to job interviews, university admissions, and asylum hearings
  • Country-specific officer personas with culturally accurate behavior patterns
  • Integration with actual visa application documents — the agent reads your DS-160 and questions you specifically on your submitted answers
  • Mobile app so candidates can practice anywhere, anytime

Built With

  • gemini-live-api
  • google-cloud-run
  • google-genai-sdk
  • livekit
  • livekit-agents-(open-source)
  • next.js
  • python
  • vertex-ai
  • webrtc
Share this project:

Updates