💡 Inspiration

Hiring is the biggest hurdle for early-stage startups. Solo developers often spend 40+ hours per hire on manual screening and repetitive interviews. With the launch of Gemini 3, I saw the potential to build a solution that wasn't just another automation tool, but a Multimodal Recruitment Intelligence that could see, hear, and reason like a human lead. HireVision was built to level the playing field, giving small teams the hiring power of a giant corporation.

🚀 What it does

HireVision is an end-to-end, AI-powered hiring platform built using the Gemini 3 family.

  • Multimodal AI Interviews: Conducts voice-based interviews where the AI "sees" the candidate's work in real-time. Using Gemini 3's vision reasoning, it can analyze live code, design choices, or presentations and ask direct, context-aware follow-up questions.
  • Intelligent Resume Evaluation: Leverages Gemini 3's long-context window to screen resumes against job requirements with deep semantic understanding, going beyond simple keyword matching.
  • Behavioral Intelligence: Uses Gemini 3's multimodal signals to analyze candidate confidence, engagement levels, and soft skills during the interview.
  • Structured Candidate Ranking: Automatically ranks candidates based on performance, providing explainable AI insights for faster hiring decisions.

🏗️ How we built it

HireVision is built entirely on the Google AI and Firebase stack for maximum performance and intelligence:

  • Core Model: Google Gemini 3.0 Flash powers our multimodal "Nervous System," handling real-time visual analysis, reasoning, and conversational logic with ultra-low latency.
  • AI Framework: Google Genkit was used to build and orchestrate the AI flows, ensuring robust prompt management and structured outputs.
  • Authentication & Database: Firebase (Auth, Firestore, and Storage) provides the secure foundation for candidate data, resumes, and session persistence.
  • Multimodal Pipeline: We implemented a real-time data flow that captures visual inputs (screen/camera) and feeds them directly into the Gemini 3 API for immediate feedback during live assessments.

⚡ Challenges we ran into

  • Multimodal Synchronization: Coordinating real-time visual data with the Gemini 3 logic to ensure the AI's follow-up questions felt natural and immediate.
  • Prompt Engineering for Gemini 3: Refining prompts to take full advantage of the new reasoning capabilities while maintaining a professional and encouraging interviewer persona.
  • Processing Long Contexts: Optimizing how we feed diverse candidate data (resumes, project links) into the Gemini 3 window to get the most accurate and explainable scores.

🏆 Accomplishments that we're proud of

  • Gemini 3 Integration: Successfully building a system that leverages the full power of the Gemini 3 multimodal API for real-time visual interviewing.
  • Seamless Pipeline: From resume upload to AI-led interview to final ranking, the entire process is handled autonomously by Gemini 3 logic.
  • Technical Execution: Building a production-ready application that showcases how Gemini 3 can solve high-impact, real-world problems for founders.

🧠 What we learned

  • How to build multimodal AI workflows using the Gemini 3 API.
  • The speed and reasoning advantages of the Gemini 3.0 Flash family for low-latency conversational applications.
  • Best practices for using Google Genkit to organize complex AI agents.

🔮 What's next for HireVision

  1. Expanded Multimodal Analysis: Using Gemini 3 to analyze video recordings for deeper behavioral and soft-skill insights.
  2. AI-Driven Sourcing: Leveraging Gemini's reasoning to discover and evaluate candidate profiles across the web automatically.
  3. Collaborative Review: Using Gemini 3 to summarize interview "Visual Highlights" for hiring teams to review quickly.

Built With

  • cloud-firestore
  • firebase-auth
  • firebase-storage
  • gemini-3.0-flash
  • google-genkit
  • next.js
  • tailwind-css
  • typescript
Share this project:

Updates