💡 Inspiration
Hiring is the biggest hurdle for early-stage startups. Solo developers often spend 40+ hours per hire on manual screening and repetitive interviews. With the launch of Gemini 3, I saw the potential to build a solution that wasn't just another automation tool, but a Multimodal Recruitment Intelligence that could see, hear, and reason like a human lead. HireVision was built to level the playing field, giving small teams the hiring power of a giant corporation.
🚀 What it does
HireVision is an end-to-end, AI-powered hiring platform built using the Gemini 3 family.
- Multimodal AI Interviews: Conducts voice-based interviews where the AI "sees" the candidate's work in real-time. Using Gemini 3's vision reasoning, it can analyze live code, design choices, or presentations and ask direct, context-aware follow-up questions.
- Intelligent Resume Evaluation: Leverages Gemini 3's long-context window to screen resumes against job requirements with deep semantic understanding, going beyond simple keyword matching.
- Behavioral Intelligence: Uses Gemini 3's multimodal signals to analyze candidate confidence, engagement levels, and soft skills during the interview.
- Structured Candidate Ranking: Automatically ranks candidates based on performance, providing explainable AI insights for faster hiring decisions.
🏗️ How we built it
HireVision is built entirely on the Google AI and Firebase stack for maximum performance and intelligence:
- Core Model: Google Gemini 3.0 Flash powers our multimodal "Nervous System," handling real-time visual analysis, reasoning, and conversational logic with ultra-low latency.
- AI Framework: Google Genkit was used to build and orchestrate the AI flows, ensuring robust prompt management and structured outputs.
- Authentication & Database: Firebase (Auth, Firestore, and Storage) provides the secure foundation for candidate data, resumes, and session persistence.
- Multimodal Pipeline: We implemented a real-time data flow that captures visual inputs (screen/camera) and feeds them directly into the Gemini 3 API for immediate feedback during live assessments.
⚡ Challenges we ran into
- Multimodal Synchronization: Coordinating real-time visual data with the Gemini 3 logic to ensure the AI's follow-up questions felt natural and immediate.
- Prompt Engineering for Gemini 3: Refining prompts to take full advantage of the new reasoning capabilities while maintaining a professional and encouraging interviewer persona.
- Processing Long Contexts: Optimizing how we feed diverse candidate data (resumes, project links) into the Gemini 3 window to get the most accurate and explainable scores.
🏆 Accomplishments that we're proud of
- Gemini 3 Integration: Successfully building a system that leverages the full power of the Gemini 3 multimodal API for real-time visual interviewing.
- Seamless Pipeline: From resume upload to AI-led interview to final ranking, the entire process is handled autonomously by Gemini 3 logic.
- Technical Execution: Building a production-ready application that showcases how Gemini 3 can solve high-impact, real-world problems for founders.
🧠 What we learned
- How to build multimodal AI workflows using the Gemini 3 API.
- The speed and reasoning advantages of the Gemini 3.0 Flash family for low-latency conversational applications.
- Best practices for using Google Genkit to organize complex AI agents.
🔮 What's next for HireVision
- Expanded Multimodal Analysis: Using Gemini 3 to analyze video recordings for deeper behavioral and soft-skill insights.
- AI-Driven Sourcing: Leveraging Gemini's reasoning to discover and evaluate candidate profiles across the web automatically.
- Collaborative Review: Using Gemini 3 to summarize interview "Visual Highlights" for hiring teams to review quickly.
Built With
- cloud-firestore
- firebase-auth
- firebase-storage
- gemini-3.0-flash
- google-genkit
- next.js
- tailwind-css
- typescript
Log in or sign up for Devpost to join the conversation.