Inspiration
With the rapid growth of AI-generated voices and voice cloning tools, it is becoming harder to tell whether an audio recording is spoken by a real human or created by a machine.
This can lead to problems such as voice impersonation, fake audio messages, and misinformation.
We were inspired to build AI Voice Detector to provide a simple and accessible way to verify the authenticity of voice samples using modern AI technology.
What it does
AI Voice Detector is a web application that:
- Allows users to select a language
- Upload an audio file (MP3)
- Sends the audio to the Gemini API for analysis
- Detects whether the voice is:
- AI Generated
- Human
- AI Generated
- Returns:
- Classification
- Confidence score
- Reasoning
- Classification
How we built it
The project was built as a full-stack web application.
Frontend
- React (Vite)
- Tailwind CSS
- Axios
The frontend provides:
- A clean user interface
- Language selection
- Drag-and-drop audio upload
- Result display for classification and confidence
Backend
- Node.js
- Express.js
- Multer (for file uploads)
- Gemini API
The backend:
- Receives the uploaded audio file
- Converts it to Base64
- Sends it to Gemini API with a structured prompt
- Parses the response
- Returns a JSON result to the frontend
Challenges we ran into
- Handling large audio files while keeping performance stable
- Ensuring correct file upload and format validation
- Designing a prompt that makes Gemini return structured JSON output
- Avoiding hardcoded results and ensuring dynamic confidence values
- Making the UI responsive and easy to use
Accomplishments that we're proud of
- Successfully integrated Gemini API for real-time audio analysis
- Built a complete end-to-end system from audio upload to result display
- Implemented language selection for better contextual analysis
- Created a clean and modern user interface
- Designed a modular backend architecture with controllers and services
What we learned
- How to process and transmit audio data using Base64 encoding
- How to work with Gemini API for non-text inputs
- Prompt engineering for structured JSON responses
- Full-stack integration between React and Node.js
- Error handling and validation for file uploads
What's next for AI Voice Detector
Future improvements include:
- Supporting more languages
- Improving detection accuracy
- Adding user authentication
- Storing past analysis results
- Deploying the system to a cloud platform
- Making the UI mobile-friendly
Built With
- axios
- express.js
- gemini
- node.js
- react
- tailwindcss
- vite
Log in or sign up for Devpost to join the conversation.