Inspiration

India loses billions every year to digital scams - UPI fraud, digital arrest threats, KYC phishing, fake job offers. Most detection tools are simple keyword filters that scammers bypass with a single word change. We wanted to build something that thinks like a forensic investigator, not a spam filter.

What it does

Spectus is a real-time AI cyber-forensics platform that analyzes SMS messages, emails, URLs, UPI handles, and call transcripts for scam signals. Every input runs through a 4-engine ensemble that functions like a digital jury:

  • ML Classifier: TF-IDF + Logistic Regression for fast, deterministic pattern detection
  • Semantic Vector Search: ChromaDB + Sentence Transformers matched against MHA, RBI & SEBI advisories
  • LLM Reasoning: Llama 3.1 via Groq for deep psychological and contextual analysis
  • Behavioral Fingerprinting: Detects brand impersonation, leet-speak obfuscation, and credential harvesting

Beyond detection, Spectus also includes a Cross-Channel Nexus Correlator (links SMS → URL → UPI into one threat graph), a Mutation Diff Engine (tracks how scams evolve over time), a Psychological Profiler (identifies which cognitive biases a scam exploits), and a Golden Hour Emergency Toolkit for immediate incident response.

How I built it

  • Backend: FastAPI on Render, with lazy-loaded SentenceTransformer to stay within free-tier RAM limits
  • Vector DB: ChromaDB seeded with scam patterns from MHA, RBI, and SEBI advisories
  • LLM: Llama 3.1 via Groq API for structured JSON reasoning
  • Frontend: Single-file vanilla JS + custom CSS dashboard deployed on Vercel
  • Graph analysis: NetworkX for cross-channel threat correlation

Challenges faced

Getting all four engines to initialize without crashing Render's free-tier container was the hardest part. SentenceTransformer was loading at startup and eating RAM before the server could pass its health check. We solved this with lazy initialization, the embedding model only loads on the first actual request, not at boot.

Balancing the ensemble weights was also non-trivial. When Groq is unavailable, the system gracefully degrades to ML + semantic signals without breaking the verdict pipeline.

Accomplishments

  • A fully working 4-signal ensemble that degrades gracefully when any engine is unavailable
  • Real scam pattern corpus sourced from actual Indian government advisories
  • The Mutation Diff Engine - most scam detectors don't track how scams evolve; ours does
  • Shipped a complete forensics platform as a single index.html with zero frontend dependencies

What I learned

Lazy initialization matters enormously on constrained infrastructure. Ensemble design is as much about failure modes as it is about accuracy. And India-specific scam patterns (digital arrest, Aadhaar fraud, UPI manipulation) need a dedicated corpus - generic English scam datasets miss them entirely.

What's next

  • WhatsApp and Telegram scam monitoring
  • OCR-based screenshot analysis
  • Browser extension for real-time phishing detection
  • Multi-language support (Hindi, Tamil, Telugu)
  • SIEM integration for enterprise SOC dashboards

Built With

Share this project:

Updates