VoxGuard

Real-time phone call deepfake and scam detection that shields users by intercepting threats before they reach the conversation.

How It Works

Caller dials your shield number
        │
        ▼
┌───────────────┐     ┌──────────────────┐     ┌───────────────┐
│    Telnyx     │────▶│  VoxGuard Server  │────▶│   Your Phone  │
│  (Inbound)    │     │                   │     │  (Forwarded)  │
└───────────────┘     └──────────────────┘     └───────────────┘
                             │    │
              ┌──────────────┘    └──────────────┐
              ▼                                  ▼
     ┌─────────────────┐              ┌──────────────────┐
     │ Deepfake Detect  │              │   Scam Detect    │
     │ (DF Arena + LoRA)│              │ (Vertex AI       │
     │ 4s sliding window│              │  Gemini 2.5 Pro) │
     │ every 0.5s       │              │  15s audio chunks│
     └─────────────────┘              └──────────────────┘
              │                                  │
              └──────────┐    ┌──────────────────┘
                         ▼    ▼
                   Threat detected?
                    │           │
                   No          Yes
                    │           │
                    ▼           ▼
              Call continues  ┌─────────────────────┐
              normally        │     ISOLATION        │
                              │ ─ Unbridge caller    │
                              │ ─ Decoy AI engages   │
                              │   attacker           │
                              │ ─ User notified via  │
                              │   AI agent           │
                              └─────────────────────┘

All calls route through a shield number assigned per user. The caller and user are bridged with live audio streaming. If the deepfake detector (3 consecutive chunks above threshold) or scam detector (confidence >= 0.7) triggers, the call is isolated: the attacker is unbridged and handed to an ElevenLabs decoy agent, while the user is notified by a separate agent.

Architecture

Layer Technology
Backend FastAPI, Uvicorn, async Python 3.11
Frontend Next.js 16 (static export), React 19, TypeScript, Tailwind CSS 4
Database PostgreSQL 16, SQLAlchemy 2.0 (async), Alembic migrations
Deepfake Detection DF Arena 1B (LoRA fine-tuned), custom inference API
Scam Detection Vertex AI fine-tuned Gemini 2.5 Pro with function calling
Telephony Telnyx Voice API (webhooks + WebSocket audio streaming)
AI Agents ElevenLabs Conversational AI (decoy + user notification)
Billing Stripe (subscriptions, checkout, customer portal)
Email / SMS Brevo (transactional email, OTP verification)
Geolocation Numvalidate (phone lookup) + Google Maps Geocoding
Deployment Docker (multi-stage), Docker Compose, Railway

Features

  • Real-time deepfake voice detection -- 4-second sliding window analysis every 0.5 seconds, isolation on 3 consecutive high-confidence detections
  • Real-time scam detection -- 15-second audio chunks analyzed by fine-tuned Gemini model across 10 scam categories (IRS, tech support, romance, bank fraud, etc.)
  • Automatic call isolation -- attacker unbridged and redirected to a decoy AI agent that keeps them engaged
  • Per-user shield numbers -- each user gets a dedicated Telnyx number; callers dial the shield number and are transparently forwarded
  • Live dashboard -- SSE-powered real-time call monitoring with status badges, confidence meters, and call timelines
  • Threat map -- Google Maps heatmap of caller origins weighted by threat type
  • Call history -- paginated records with full audio playback (attacker and user tracks), geolocation, and detection details
  • Per-user detection toggles -- independently enable/disable deepfake and scam detection
  • Subscription billing -- free tier (10 calls/month) and Pro plan via Stripe with 30-day trial
  • Phone verification -- SMS OTP via Brevo with rate limiting
  • Caller geolocation -- phone number lookup via Numvalidate + Google Maps geocoding

Project Structure

voxguard/
├── backend/
│   ├── app.py                 # Main FastAPI app (webhooks, WebSocket, API routes)
│   ├── auth.py                # JWT auth, registration, login
│   ├── models.py              # SQLAlchemy models (User, CallRecord, NumberPool)
│   ├── database.py            # Async PostgreSQL connection
│   ├── detector.py            # Deepfake detection API client
│   ├── scam_detector.py       # Vertex AI scam classification
│   ├── agents.py              # ElevenLabs agent bridge (WebSocket ↔ audio)
│   ├── audio_utils.py         # μ-law ↔ PCM16 ↔ WAV conversion
│   ├── phone_lookup.py        # Numvalidate + Google Maps geolocation
│   ├── telnyx_numbers.py      # Shield number provisioning & pool management
│   ├── email_service.py       # Brevo transactional email
│   ├── verification.py        # SMS OTP verification
│   ├── tts.py                 # ElevenLabs TTS utility
│   ├── requirements.txt
│   └── migrations/            # Alembic schema migrations (7 versions)
├── frontend/
│   ├── src/app/
│   │   ├── page.tsx           # Landing page (hero, pricing, how it works)
│   │   ├── dashboard/         # Real-time call monitoring dashboard
│   │   ├── account/           # User settings & subscription management
│   │   ├── login/             # Authentication
│   │   ├── register/          # Registration + phone OTP flow
│   │   └── components/        # 19 React components
│   ├── package.json
│   └── next.config.ts         # Static export configuration
├── deepfake-detection/
│   ├── prepare_data_1k.py     # Data pipeline (LibriSpeech + Whisper + Replicate TTS)
│   ├── train_lora_1k_aug.py   # LoRA fine-tuning with phone-call augmentation
│   ├── augment_phone.py       # Audio degradation (G.711, GSM, noise, reverb, packet loss)
│   ├── eval_held_out.py       # Held-out evaluation
│   └── README.md              # ML pipeline documentation
├── scam-detection/
│   ├── generate_scam_calls.py # Generate 100 scam conversations via ElevenLabs
│   └── prepare_vertex_finetune.py  # Prepare JSONL + launch Vertex AI fine-tuning
├── Dockerfile                 # Multi-stage build (Node.js frontend + Python backend)
├── docker-compose.yml         # App + PostgreSQL services
└── .env.example               # Environment variable template

Setup

Prerequisites

  • Python 3.11+
  • Node.js 20+
  • PostgreSQL 16+
  • Docker & Docker Compose (for containerized setup)

Environment Variables

Copy the example and fill in your keys:

cp .env.example .env

Key variables:

Variable Description
TELNYX_API_KEY Telnyx Voice API key
TELNYX_CONNECTION_ID Telnyx SIP connection ID
PUBLIC_WSS_URL Public WebSocket URL for audio streaming (e.g., wss://your-domain/telnyx/ws)
ELEVEN_API_KEY ElevenLabs API key
ELEVEN_SCAMMER_AGENT_ID ElevenLabs decoy agent ID
ELEVEN_USER_AGENT_ID ElevenLabs user notification agent ID (deepfake)
ELEVEN_SCAM_USER_AGENT_ID ElevenLabs user notification agent ID (scam)
DETECTOR_API_URL Deepfake model inference endpoint
FAKE_THRESHOLD Spoof score threshold (default: 0.8)
DATABASE_URL PostgreSQL connection string
JWT_SECRET Secret for JWT signing
GOOGLE_SERVICE_ACCOUNT_JSON GCP service account JSON (for Vertex AI scam detection)
BREVO_API_KEY Brevo API key (email + SMS)
STRIPE_SECRET_KEY Stripe secret key
STRIPE_PRICE_ID Stripe subscription price ID
WEBHOOK_SECRET Stripe webhook signing secret
NUMVALIDATE_API_KEY Phone number lookup API key
GOOGLE_MAPS_API_KEY Google Maps Geocoding API key
SITE_URL Frontend URL (default: https://voxguard.org)

Docker (recommended)

docker compose up --build

This starts the FastAPI backend (with the frontend static export bundled in) on port 8000 and PostgreSQL on port 5432.

Local Development

Backend:

cd backend
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
alembic upgrade head
uvicorn app:app --host 0.0.0.0 --port 8000 --reload

Frontend:

cd frontend
npm install
npm run dev

The frontend dev server runs on http://localhost:3000 with Turbopack. For production, the frontend is statically exported (next build) and served by the backend.

API Overview

Authentication

Method Endpoint Description
POST /api/auth/register Register (email, password, phone)
POST /api/auth/login Login
POST /api/auth/logout Logout
GET /api/auth/me Current user profile
POST /api/auth/verify-phone Verify phone OTP
POST /api/auth/resend-code Resend OTP (60s cooldown)
POST /api/auth/retry-provision Retry shield number provisioning

Calls & Dashboard

Method Endpoint Description
GET /events SSE stream (live call updates)
GET /api/calls Paginated call history
GET /api/calls/{id} Call detail with timeline
GET /api/calls/{id}/audio/{track} Stream call audio (attacker or user)
GET /api/stats Dashboard statistics
GET /api/map-points Threat map geolocation data

User Management

Method Endpoint Description
PATCH /api/user/phone Update phone number
PATCH /api/user/detection-settings Toggle deepfake/scam detection
DELETE /api/user/account Delete account

Billing

Method Endpoint Description
POST /api/auth/create-checkout Create Stripe Checkout session
GET /api/auth/usage Monthly usage + plan limits
POST /api/stripe/webhook Stripe event handler
POST /api/stripe/portal Stripe Customer Portal URL

Webhooks

Method Endpoint Description
POST /telnyx/webhook Telnyx call lifecycle events
WS /telnyx/ws Bidirectional audio streaming

ML Models

Deepfake Detection

  • Base model: Speech-Arena-2025/DF_Arena_1B_V_1 (1.15B parameters)
  • Fine-tuning: LoRA (r=8, alpha=16, dropout=0.1, all-linear layers, ~10M trainable params)
  • Training data: 1,000 real samples (LibriSpeech, 123 speakers) + 1,000 synthetic samples (Qwen3-TTS voice clones via Replicate)
  • Augmentation: Phone-call degradation (G.711 μ-law, GSM codec, white/pink noise, band-pass filter, room reverb, packet loss)
  • Results: 100% accuracy and F1 on val/test sets; EER 0.246 with augmentation
  • Inference: 4-second sliding window, 0.5s stride, isolation after 3 consecutive detections above threshold

Scam Detection

  • Model: Vertex AI fine-tuned Gemini 2.5 Pro with function calling
  • Training data: 100 synthetic scam conversations generated via ElevenLabs Text-to-Dialogue (21 voices, 10 scam categories)
  • Categories: IRS/tax, tech support, prize/lottery, bank fraud, investment/crypto, romance, charity, insurance/medicare, job offer, utility service
  • Inference: 15-second MP3 audio chunks sent to Vertex AI; isolation on confidence >= 0.7

Open Source

We publish two artifacts from this project on Hugging Face:

  • gereon/voxguard-synthetic-speech -- Synthetic dataset containing deepfake audio samples (Qwen3-TTS voice clones of LibriSpeech speakers) and scam conversation recordings (ElevenLabs Text-to-Dialogue across 10 categories)
  • gereon/voxguard-lora -- LoRA fine-tune of DF Arena 1B for voice deepfake detection, trained with phone-call audio augmentation (100% accuracy, EER 0.246)

Integrations

Service Purpose
Telnyx Inbound call handling, audio streaming, number provisioning, call bridging
ElevenLabs Conversational AI agents (decoy for attackers, notification for users)
Stripe Subscription billing, checkout, customer portal
Brevo Transactional email (welcome), SMS (OTP verification, coupons)
Google Vertex AI Scam detection model hosting and inference
Google Maps Caller geolocation geocoding + dashboard threat heatmap
Numvalidate Phone number validation, carrier lookup, location

Built With

Share this project:

Updates