Inspiration

Three real incidents in my own circle: My father received a fake e-challan message identical to the official PSCA format, paid through the link, and lost PKR 30,000. My relative received a fake courier call asking him to "confirm delivery" with a code. His WhatsApp was hacked, and a relative sent PKR 50,000 to the hacker before anyone realized. My friend's father got a call claiming his son was kidnapped and transferred PKR 200,000 in panic.

None of them was careless. The scams were engineered to look official and trigger panic. Pakistan's cybercrime authorities (FIA and NCCIA) received 100,000+ complaints in a single year, and the majority of them were related to scams.

These scams follow patterns. If you can recognize the pattern in time, you can stop the transfer before it happens. This is why we built Sach Batao | سچ بتاؤ.

What it does

Forward any suspicious message or send a voice note describing a spam message/call to Sach Batao on WhatsApp. In seconds, it replies in simple Urdu:

It detects Pakistan's most damaging scam categories, each testified in documented sources (Dawn, NCCIA, Digital Rights Foundation, ARY):

  1. Fake E-Challan — fraudulent PSCA fine links (real challans come only from 9915/8070, never with links)
  2. Courier OTP scams — fake TCS/Leopards/HEC calls harvesting WhatsApp codes (233 hijack cases documented since Jan 2025)
  3. Fake kidnapping calls — panic-transfer demands impersonating kidnappers or police
  4. Fake BISP/Ehsaas — "verification fees" for a completely free programme (real messages only from 8171)
  5. Jeeto Pakistan lottery scams — fake prize wins demanding tax, a scam so widespread it fooled a former MPA (Nighat Aurakzai)

If a message fits no known category but shows scam structure (urgency + payment/OTP demand from an unknown number), Sach Batao flags it as a suspicious emerging pattern and records it, hence the system catches new scams as well.

How we built it

Traditional WhatsApp bots require the official Meta WhatsApp Business API or third-party gateways like Twilio: expensive per-conversation pricing, weeks of business verification, and strict template restrictions that would make free-form scam message scanning impossible. We bypassed this entirely. By running whatsapp-web.js, a headless Chromium browser via Puppeteer that authenticates exactly like a human logging into WhatsApp Web, we get full access to incoming messages at *zero per-message cost and zero approval friction. * This is the architectural decision that made Sach Batao viable as a permanent free public-safety service.

On top of that foundation: a frozen LLM guided by an engineered system prompt, grounded in a curated knowledge base of documented Pakistani scam patterns, validated against a 24-case evaluation set. Every incoming message hits a fast regex pre-check first: compiled local patterns handle obvious cases in under 50ms with zero LLM call. Only messages that pass this gate go to Gemini 2.0 Flash Lite via OpenRouter, with scam_patterns.json injected directly into the prompt as context (RAG without a vector database - the knowledge base is compact enough to fit in one prompt). The model returns structured JSON: is_fraud, fraud_type, confidence, warning_level, and response_text in Urdu.

Voice notes are transcribed by Whisper-1, fed through the same pipeline, and the verdict is synthesized back as an Urdu audio message via Google TTS (ur-PK-Standard-A, free) — voice in, voice out, no literacy required. All of this runs in Docker on Railway with a restart. The core architectural insight that ties it all together: all Pakistan-specific intelligence lives in scam_patterns.json, not in model weights, which means that new scam categories deploy the same day they appear, with zero retraining cost.

Why WhatsApp + Urdu only

  • No app to download, nothing to learn, as everyone in Pakistan already uses WhatsApp.
  • Can't type or read. Send a voice note and the chatbot transcribes Urdu speech directly into the pipeline as an Urdu voice message.
  • Responses are always in simple Urdu script to make them usable across all ages and education levels, not just English readers.

Challenges we ran into

The hardest engineering problem was the audio format chain. WhatsApp sends voice notes as Opus .ogg. Whisper needs MP3. Google TTS returns MP3. WhatsApp requires Opus OGG for voice replies. That's four format conversions per voice interaction; ffmpeg handles them, but getting the codec pipeline stable inside Docker (where Chromium, ffmpeg-static, and all their shared libraries had to coexist) required careful containerization. Pakistani scam messages also don't speak one language; real messages mix English, Roman Urdu, and Urdu script in a single sentence, requiring trigger phrases across all three scripts in the knowledge base.

The hardest prompt engineering problem was false positives. "Ami ne kaha 5000 Easypaisa kar dena" mentions money, mentions a transfer, but is a completely normal family message. Getting the model to distinguish scam structure (unknown party + impersonation + urgency + action demand) from everyday conversation took multiple prompt iterations and a dedicated 24-case evaluation set with trap cases built specifically to catch over-sensitive classification.

Accomplishments that we're proud of

A fully production-deployed WhatsApp bot handling text, voice input, and voice output in Urdu, built in under 72 hours, making it viable as a permanent free public-safety service. Every scam pattern is sourced from Dawn News, NCCIA advisories, and Digital Rights Foundation, not invented. Through Sach Batao, Pakistanis can save millions of rupees daily.

What we learned

Scams exploit panic and trust, not ignorance. The product isn't really detection; it's the 30-second pause between receiving a message and acting on it. That one forwarded message is enough to break the panic loop before the transfer happens.

Separating knowledge from code was the right architectural call. Because all Pakistan-specific scam intelligence lives in scam_patterns.json rather than the model's weights, we can add a new scam category and deploy it the same day it appears in the wild, no retraining, no downtime, no cost.

What's next for Sach Batao | سچ بتاؤ

  • Live call analysis — the most devastating scams (kidnapping, bank impersonation) happen on calls; hence, phase 2 analyzes call recordings
  • Expansion to Regional languages — Punjabi, Sindhi, Pashto, Balochi
  • Broadcast alerts — warn users before a scam reaches them
  • Integration with mobile operators for number flagging
  • NCCIA partnership — formalizing the weekly scam-number data handover

Built With

Share this project:

Updates