Inspiration

With the rapid growth of AI-generated voices and voice cloning tools, it is becoming harder to tell whether an audio recording is spoken by a real human or created by a machine.
This can lead to problems such as voice impersonation, fake audio messages, and misinformation.

We were inspired to build AI Voice Detector to provide a simple and accessible way to verify the authenticity of voice samples using modern AI technology.


What it does

AI Voice Detector is a web application that:

  • Allows users to select a language
  • Upload an audio file (MP3)
  • Sends the audio to the Gemini API for analysis
  • Detects whether the voice is:
    • AI Generated
    • Human
  • Returns:
    • Classification
    • Confidence score
    • Reasoning

How we built it

The project was built as a full-stack web application.

Frontend

  • React (Vite)
  • Tailwind CSS
  • Axios

The frontend provides:

  • A clean user interface
  • Language selection
  • Drag-and-drop audio upload
  • Result display for classification and confidence

Backend

  • Node.js
  • Express.js
  • Multer (for file uploads)
  • Gemini API

The backend:

  1. Receives the uploaded audio file
  2. Converts it to Base64
  3. Sends it to Gemini API with a structured prompt
  4. Parses the response
  5. Returns a JSON result to the frontend

Challenges we ran into

  • Handling large audio files while keeping performance stable
  • Ensuring correct file upload and format validation
  • Designing a prompt that makes Gemini return structured JSON output
  • Avoiding hardcoded results and ensuring dynamic confidence values
  • Making the UI responsive and easy to use

Accomplishments that we're proud of

  • Successfully integrated Gemini API for real-time audio analysis
  • Built a complete end-to-end system from audio upload to result display
  • Implemented language selection for better contextual analysis
  • Created a clean and modern user interface
  • Designed a modular backend architecture with controllers and services

What we learned

  • How to process and transmit audio data using Base64 encoding
  • How to work with Gemini API for non-text inputs
  • Prompt engineering for structured JSON responses
  • Full-stack integration between React and Node.js
  • Error handling and validation for file uploads

What's next for AI Voice Detector

Future improvements include:

  • Supporting more languages
  • Improving detection accuracy
  • Adding user authentication
  • Storing past analysis results
  • Deploying the system to a cloud platform
  • Making the UI mobile-friendly

Built With

Share this project:

Updates