🛡️ The Problem That Hits Close to Home

My family received a scam call last year. It was a convincing IRS impersonation — professional, urgent, scary. My parents almost complied. Not because they're not smart, but because they had zero help in that moment.

That's the real problem. 1.8 billion scam calls are made every year. Existing tools block spam numbers — but they don't tell you why a call is dangerous, how the scammer is psychologically manipulating you, or what to say back. The victim is always alone in that moment.

CallShield changes that.


💡 What It Does

Upload any call recording — or record a live call directly in the browser — and CallShield gives you a complete threat intelligence report in seconds:

  • Scam probability score (0–100)
  • Exact scam type identified (e.g. "Amazon Impersonation Robocall + Credential Harvesting")
  • Psychological manipulation tactics used — with evidence from the call itself
  • Caller's real goal — what they were actually trying to extract
  • Safe retorts — the exact sentences to say to safely end or expose the scam
  • What a legitimate caller would do differently
  • Full call transcript generated from the audio

🔧 How We Built It

Frontend: Vanilla HTML, CSS, JavaScript — a clean two-panel interface. Users can upload an audio file (MP3, WAV, M4A) or record a live call directly in the browser using the MediaRecorder API.

Backend: Node.js + Express server receives the audio via Multer, converts it to base64, and sends it directly to the Gemini API. The response is parsed and returned as structured JSON to the frontend.

Stack at a glance:

Frontend  → HTML / CSS / JS (MediaRecorder API for live recording)
Backend   → Node.js + Express + Multer
AI        → Gemini 2.0 Flash (native multimodal audio)

✨ How We Used the Gemini API

This is where CallShield does something most AI projects don't.

We use Gemini 2.0 Flash's native audio understanding — the model receives the raw audio file directly as inline base64 data and listens to the call the same way a human would. No intermediate transcription step. No third-party speech-to-text service. Gemini hears the tone, the urgency, the scripted phrasing, the manipulation patterns — all in a single API call.

One Gemini call returns the complete threat report as structured JSON: scam score, verdict, tactics, caller intent, safe retorts, red flags, and transcript.

"We don't transcribe then analyze. Gemini hears the call the same way you do."


🧱 Challenges We Faced

1. Audio format inconsistency Browser-recorded audio comes in as audio/webm while uploaded files can be MP3, WAV, M4A, OGG. Normalizing MIME types across Multer and the Gemini inlineData format required careful handling to avoid silent failures.

2. Gemini response parsing Gemini occasionally wraps JSON in markdown code blocks or adds preamble text. We built a robust regex extraction layer that pulls clean JSON regardless of how Gemini formats its response.

3. Token and quota limits during testing Gemini's free tier has audio token limits that we hit during heavy testing. We built a realistic fallback layer into the server so the demo never breaks on stage — a critical decision for a live hackathon presentation.

4. Making it feel real-time The actual Gemini call takes 3–6 seconds. We added a streaming log UI that shows processing steps line by line during analysis — turning a loading spinner into a moment of tension that makes the reveal hit harder.


📚 What We Learned

  • Gemini 2.0 Flash's native audio capability is genuinely underused — most teams treat it as a text model. Sending raw audio directly unlocks a completely different class of applications.
  • The most impactful AI products don't just detect problems — they tell you what to do. The "Safe Retorts" feature came from asking: what does the user actually need in this moment?
  • Hackathon demos live or die on the 30-second reveal. Engineering the feeling of the demo matters as much as the code behind it.

🚀 What's Next for CallShield

  • Live call interception via a browser extension or Android service — analyzing calls as they happen, not after
  • Multilingual support — Hindi, Spanish, Mandarin — where the majority of victims are
  • Community scam database — flagged call patterns shared across users to improve detection over time
  • One-tap reporting — automatically report confirmed scam numbers to FTC, Ofcom, or regional authorities

Built with Node.js, Express, and Gemini 2.0 Flash at [Hackathon Name] 2026.

Built With

Share this project:

Updates