Inspiration

They finally found the perfect candidate. Flawless answers. Perfect structure. Every industry term used correctly. No hesitation. No filler words. No thinking pauses. Just smooth, polished, textbook responses.

It felt almost too good to be true.

And that was the problem.

In today’s remote interviews, candidates can quietly use ChatGPT or Gemini on a second screen, generating ideal answers in real time. The responses sound impressive , but they don’t sound human. Humans hesitate. Humans search for words. Humans restructure thoughts mid-sentence. Real expertise includes imperfection.

Industry reports from WeCP (October 2025) indicate that 45% of tech candidates have used external AI help, often citing the pressure to solve complex LeetCode-style problems instantly.

That moment inspired HireLens. If AI can quietly assist candidates during interviews, there needs to be a system that restores balance , not by accusing, but by verifying. HireLens was built to distinguish polished human expertise from AI-assisted perfection, bringing authenticity back into modern hiring.

What it does

HireLens detects AI-assisted responses during interviews using multimodal analysis : both in real time and through uploaded recordings.

In live mode, an interviewer conducts the session while simultaneously analysing the candidate’s behaviour. It compares spoken answers against AI-generated reference responses to identify unusually high semantic similarity. At the same time, it tracks gaze patterns to detect off-screen reading behaviour, analyses speech cadence for unnatural fluency, and monitors behavioural signals such as device interactions or suspicious glances.

In upload mode, recruiters can submit recorded interviews for post-hoc forensic analysis. HireLens performs the same multimodal evaluation, automatically segments the interview into chapters, flags suspicious timestamps, and generates a detailed integrity report.

Comprehensive forensic summary: All behavioral and linguistic signals are fused into a single confidence score and verdict giving recruiters explainable evidence rather than relying on intuition alone.

2. Additional Feature: AI Interviewer

A fully autonomous agent powered by ElevenLabs + Gemini 2.5 Pro designed for high-volume screening that maintains human-sounding dialogue flow, making it difficult to pre-script responses, while dynamically probing with context-aware follow-ups to force real-time thinking over LLM recitation.

How we built it

We engineered a low latency multimodal pipeline using a best in class AI stack:

The Voice - ElevenLabs

Powers the autonomous interviewer with natural, human like conversational audio to maintain unscripted flow and reduce pre scripted responses.

ElevenLabs also drives the transcribing of the live interviews.

The Vision - MediaPipe

Performs real time facial landmark detection. We compute:

  • Head pose estimation (yaw)
  • Horizontal saccade frequency
  • Pupil displacement vectors

These signals help detect off screen reading behavior.

The Brain - Gemini 2.5 Pro

Conducts Shadow Analysis by transcribing responses and computing embedding similarity between the candidate's answer and an optimized AI generated response. High semantic alignment signals potential AI assistance.

The Forensics - Twelve Labs Pegasus 1.2

Performs post interview video intelligence to detect behavioral anomalies such as:

  • Device interactions (phone reach, keyboard use)
  • Suspicious gaze shifts
  • Reflective second monitor light
  • Identifies exact timestamps of suspicious candidate behaviour along with reasoning and video preview.

Segments the interview into time stamps for each interaction.

Multimodal Scoring Engine

All signals are normalised and fused into a weighted scoring model that combines semantic similarity, gaze patterns, acoustic features, and behavioural indicators. Cross modal agreement is required before issuing a flagged verdict, significantly reducing false positives.

Challenges we ran into

The biggest challenge was avoiding false positives. Strong candidates naturally speak clearly and confidently, and we didn't want to penalise intelligence. We addressed this by requiring agreement across multiple signals before flagging a session.

Latency was another hurdle. Running gaze tracking, transcription, and similarity analysis in real time is computationally demanding. We optimised the system through parallel pipelines and pre-processing strategies.

We also had to carefully consider ethics. Monitoring interviews requires transparency and fairness, so we designed HireLens to assist recruiters, not replace human judgement.

Accomplishments that we're proud of

We successfully built a fully functioning multimodal integrity system that operates both in real time and post-interview. HireLens conducts live AI-led interviews while simultaneously analysing gaze, speech, and semantic overlap. It also supports uploaded interview recordings, performing forensic video analysis with timestamped evidence and chapter segmentation.

We reduced reliance on single-signal detection by fusing linguistic, biometric, acoustic, and behavioural signals into a unified scoring model. By using embedding-based similarity, the system remains robust even when candidates paraphrase AI-generated content rather than copying it verbatim.

Additionally, HireLens includes an adaptive interview agent. If potential AI assistance is detected, the agent dynamically alters questioning, asking follow-up or concept-deepening questions to probe genuine understanding and disrupt scripted flow.

Most importantly, we built a system that verifies integrity without making automated accusations. It provides explainable evidence and confidence scoring, keeping humans in the decision loop. This formatting uses bold emphasis for key achievements and maintains clean, readable paragraphs suitable for Devpost.

What we learned

Building HireLens taught us how to integrate complex AI services into a cohesive multimodal system. We gained hands on experience with ElevenLabs for natural voice synthesis, learning how to create conversational AI interviewers that maintain authentic dialogue flow. We also mastered Twelve Labs Pegasus 1.2 for advanced video intelligence, discovering how to extract behavioural signals and anomalies from video data with timestamp precision.

We learned the intricacies of synchronising real time pipelines across vision, audio, and language models while maintaining low latency. Balancing detection sensitivity with false positive rates required careful tuning and cross modal signal validation.

Beyond the technical challenges, we gained insights into the ethical responsibilities of building integrity monitoring systems. Designing technology that enhances fairness without being invasive required constant consideration of transparency, explainability, and keeping humans in the decision loop.

Most importantly, we learned that effective AI integrity systems aren't about accusation, they're about creating environments where authenticity can be verified and genuine talent can shine.

What's next for HireLens

We're expanding HireLens from an interview detector into a full integrity infrastructure layer for remote hiring.

1. Video Platform AI Agent (Platform Agnostic)

An embedded AI agent that integrates with platforms like Zoom, Google Meet, and Microsoft Teams. It monitors multimodal signals live and discreetly alerts interviewers if a candidate appears to be reading AI generated responses, enabling immediate follow up probing.

2. Browser Extension

A lightweight cross browser extension that integrates directly into video interview platforms. It detects real time AI tool usage patterns, tab switching behaviour, and suspicious prompt.

3. Cheating Pattern Intelligence Engine

A continuously updated similarity search system that identifies recurring AI generated answer archetypes and prompt signatures across interviews.

4. Cross Candidate Benchmarking

Cohort level analytics that compare authenticity metrics, spontaneity scores, and behavioural consistency across candidates, helping recruiters identify both top performers and suspicious patterns.

5. Resume Context Aware AI Agent

An enhanced AI interviewer that parses and understands candidate resumes to ask specific, personalized questions based on their background, experience, and claimed skills.

Built With

Share this project:

Updates