Inspiration

India loses billions every year to digital scams - UPI fraud, digital arrest threats, KYC phishing, fake job offers. Most detection tools are simple keyword filters that scammers bypass with a single word change. We wanted to build something that thinks like a forensic investigator, not a spam filter.

What it does

Spectus is a real-time AI cyber-forensics platform that analyzes SMS messages, emails, URLs, UPI handles, and call transcripts for scam signals. Every input runs through a 4-engine ensemble that functions like a digital jury:

  • ML Classifier: TF-IDF + Logistic Regression for fast, deterministic pattern detection
  • Semantic Vector Search: ChromaDB + Sentence Transformers matched against MHA, RBI & SEBI advisories
  • LLM Reasoning: Llama 3.1 via Groq for deep psychological and contextual analysis
  • Behavioral Fingerprinting: Detects brand impersonation, leet-speak obfuscation, and credential harvesting

Beyond detection, Spectus also includes a Cross-Channel Nexus Correlator (links SMS → URL → UPI into one threat graph), a Mutation Diff Engine (tracks how scams evolve over time), a Psychological Profiler (identifies which cognitive biases a scam exploits), and a Golden Hour Emergency Toolkit for immediate incident response.

How I built it

  • Backend: FastAPI on Render, with lazy-loaded SentenceTransformer to stay within free-tier RAM limits
  • Vector DB: ChromaDB seeded with scam patterns from MHA, RBI, and SEBI advisories
  • LLM: Llama 3.1 via Groq API for structured JSON reasoning
  • Frontend: Single-file vanilla JS + custom CSS dashboard deployed on Vercel
  • Graph analysis: NetworkX for cross-channel threat correlation

Challenges faced

Getting all four engines to initialize without crashing Render's free-tier container was the hardest part. SentenceTransformer was loading at startup and eating RAM before the server could pass its health check. We solved this with lazy initialization — the embedding model only loads on the first actual request, not at boot.

Balancing the ensemble weights was also non-trivial. When Groq is unavailable, the system gracefully degrades to ML + semantic signals without breaking the verdict pipeline.

Accomplishments

  • A fully working 4-signal ensemble that degrades gracefully when any engine is unavailable
  • Real scam pattern corpus sourced from actual Indian government advisories
  • The Mutation Diff Engine - most scam detectors don't track how scams evolve; ours does
  • Shipped a complete forensics platform as a single index.html with zero frontend dependencies

What I learned

Lazy initialization matters enormously on constrained infrastructure. Ensemble design is as much about failure modes as it is about accuracy. And India-specific scam patterns (digital arrest, Aadhaar fraud, UPI manipulation) need a dedicated corpus - generic English scam datasets miss them entirely.

What's next

  • WhatsApp and Telegram scam monitoring
  • OCR-based screenshot analysis
  • Browser extension for real-time phishing detection
  • Multi-language support (Hindi, Tamil, Telugu)
  • SIEM integration for enterprise SOC dashboards

Built With

Share this project:

Updates