Inspiration

AI voice scams have become emotionally manipulative and dangerously believable. We were inspired by real “family emergency” fraud patterns where scammers clone a loved one’s voice, create urgency, demand money, and pressure secrecy. The biggest gap we saw was not post-fraud detection, but real-time decision support during the call itself. Verity was built to give users a calm, evidence-backed signal in those high-pressure moments.

What it does

Verity processes live or uploaded call audio and outputs:

  • A live Trust Score from 0 to 100
  • A verdict: trusted, suspicious, or scam
  • Detector-level readings for synthetic voice probability, scam-language cues, and voice similarity to a claimed identity
  • Highlighted trigger phrases from transcript analysis
  • A challenge suggestion when impersonation risk is detected

It also includes a Family Voice Vault where users can enroll trusted contacts using short voice samples for future speaker verification.

How we built it

We built Verity as a full-stack system with a Python FastAPI backend and a Next.js frontend dashboard. Audio is streamed through WebSocket for real-time analysis and can also be tested through WAV uploads. The backend pipeline combines transcription, LLM-based social-engineering classification, anti-spoof inference, and speaker verification. A trust engine fuses these outputs into one explainable score and verdict. The frontend visualizes all of this through live meters, verdict banners, transcript highlights, and vault enrollment/management flows.

Challenges we ran into

The hardest challenge was balancing latency and reliability. Real-time protection is only useful if the system responds quickly, but each model adds computational cost. We also had to design a fusion strategy for conflicting signals, like when language seems normal but acoustic spoof confidence is high. Noisy audio conditions introduced instability across detectors, and making outputs understandable for non-technical users required careful UX decisions. We spent significant time refining how to surface risk clearly without overwhelming users.

Accomplishments that we're proud of

We delivered an end-to-end prototype that works as a product experience, not just disconnected model demos. The multi-signal trust architecture reduces dependence on any single detector and gives more robust outcomes. We’re proud of the Voice Vault concept because it turns generic scam detection into personal identity verification. We also built explainability into the interface with reason traces and trigger highlights so users understand why a call is flagged. Most importantly, we built around a real, urgent safety problem that affects families directly.

What we learned

We learned that scam defense is as much a human factors challenge as a machine learning one. Fast, actionable confidence is more valuable than perfect offline accuracy. Multi-model systems need thoughtful uncertainty handling and graceful degradation in imperfect audio conditions. We also learned that privacy expectations are central in trust products, so local-first architecture and transparent data boundaries matter. Finally, we learned that strong safety tools must be simple enough to use under stress.

What's next for Verity

Next, we want to improve robustness across languages, accents, and low-quality phone channels, while reducing latency further for smoother live response. We plan to add user-personalized trust thresholds, stronger adaptive challenge prompts, and improved anti-impersonation flows. We also aim to optimize more processing on-device for privacy and offline resilience. On the product side, we want telephony and call-app integrations, plus caregiver support features that help protect vulnerable users without reducing their independence.

Built With

Share this project:

Updates