Inspiration

What it does

How we built it

## Inspiration

Spam filters catch the bulk, but the phishing emails that get through are the ones that look perfectly legitimate at 9:47 AM on a Monday: brand-perfect HTML, lookalike domains, urgent tone, trust-anchored links. Studies attribute 90% of breaches to phishing as the initial vector — not because users are careless, but because modern attacks are good enough to fool a careful reader.

I wanted a second opinion I could trust — something that does what a security analyst would do, but in 5 seconds and free.

## What it does

Sentra AI is a web app where you paste any suspicious email (raw RFC-822, headers and all — Gmail's "Show original" is one click). It runs a two-layer analysis and returns:

  • A verdict (SAFE / SUSPICIOUS / PHISHING) with a 0–100 risk score.
  • A ranked list of red flags, each with a plain-English explanation of why the attacker uses that trick — and you can click any flag to highlight the exact fragment in the original email.
  • Concrete recommended actions ("Do not click the link. Verify status by signing in at paypal.com directly.").
  • A one-line educational takeaway so the user spots the same pattern next time on their own.
  • Train mode — Sentra deals you a shuffled deck of example emails, you guess the verdict, and learn from each miss. The educational angle becomes interactive.

## How I built it

Layer 1 — deterministic heuristics (pure TypeScript, Node runtime):

  • MIME parsing with postal-mime → headers, body, links, attachments.
  • SPF / DKIM / DMARC pulled from Authentication-Results.
  • Reply-To / Return-Path mismatch check.
  • Brand-impersonation detection: free-mail-from-brand, Levenshtein distance against legit brand domain labels, punycode (xn--) homoglyph detection.
  • Link analysis: anchor-text vs. real href mismatch, @-trick credentials (http://paypal.com@evil.tld), raw-IP URLs, URL shorteners, high-abuse TLDs (.zip, .top, .click, …).
  • Content patterns: urgency, threats (account suspension), credential requests, money bait.
  • Risky attachment extensions (.exe, .docm, .iso, .scr, …).

Layer 2 — Gemini 2.5 Flash as the analyst. The model receives the raw email and the heuristic findings, then returns strict JSON via Gemini's responseSchema (no JSON-parsing roulette). This is the key design choice: heuristics are reliable but blind to context; LLMs read context but hallucinate. The combination gives ground truth + readable narrative.

Stack: Next.js 16 (App Router, Turbopack) · TypeScript strict · Tailwind CSS 4 · @google/generative-ai · postal-mime · lucide-react · Vercel.

## Challenges I ran into

  • Windows dev environment fought back. npx lock-compromised errors on npm 11, a slow secondary drive bottlenecking install, a leftover dev server squatting on port 3000, the lovely "you build the thing, then the laptop builds character" hackathon experience.
  • Strict JSON from LLMs. Gemini's responseSchema parameter saved me hours of regex tolerance — every analysis is now guaranteed-parseable JSON. First attempt with gemini-2.0-flash hit a "limit: 0" quota on a fresh AI Studio project; switched the default to 2.5-flash and made the model env-overridable.
  • Heuristic false positives. First pass flagged every email with the word "urgent." Tuned by adding severity weights and requiring 2+ urgency hits for the high-severity flag.
  • Brand-impersonation logic. Detecting paypa1.com and xn--ppal-2nb.com without flagging legitimate paypalobjects.com took a Levenshtein-plus-allowlist approach.

## Accomplishments I'm proud of

  • End-to-end working product in ~36 hours solo: parser, heuristics, LLM layer, full UI, annotated source view, copy-as-Markdown, demo emails, Train mode, deployed.
  • Clean two-layer architecture that's easy to explain in a 60-second pitch and easy to extend (swap Gemini → local Llama, add a Chrome extension, etc.).
  • A real educational takeaway in every verdict — Sentra teaches the user instead of just telling them what to do. Train mode pushes that further into gamification.

## What I learned

  • Gemini's structured output is a game-changer for "agentic" features — it eliminates a whole class of brittle parsing code.
  • The hardest part of an email-security tool isn't catching the obvious phishing. It's not crying wolf on the legitimate Stripe receipt. A SAFE verdict has to be just as confident as a PHISHING one.
  • Tight scope wins hackathons. The temptation to add OAuth, Gmail integration, a Chrome extension was real — and rejecting all of it is why the demo works.

## What's next

  • Chrome / Gmail extension that one-click submits the open thread to Sentra.
  • Self-hosted variant for security teams that can't send mail content to a third-party LLM — swap Gemini for a local Llama / Qwen behind the same API.
  • Outbound link safety check — passive DNS / VirusTotal lookup with explicit user consent.
  • Multilingual phishing patterns — current content heuristics are English-only.
  • Confidence calibration — log heuristic-vs-LLM verdicts on real corpora (Nazario, PhishTank) to tune thresholds. ## Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Sentra AI

Built With

  • gemini-2.5-flash
  • google-gemini
  • lucide-react
  • next.js
  • postal-mime
  • tailwind-css
  • typescript
  • vercel
Share this project:

Updates