Sentra AI

Main menu
Mainboard
URL
compare
learn
inbox_sim
Anatomy
Insights

Inspiration

What it does

How we built it

## Inspiration

Spam filters catch the bulk, but the phishing emails that get through are the ones that look perfectly legitimate at 9:47 AM on a Monday: brand-perfect HTML, lookalike domains, urgent tone, trust-anchored links. Studies attribute 90% of breaches to phishing as the initial vector — not because users are careless, but because modern attacks are good enough to fool a careful reader.

I wanted a second opinion I could trust — something that does what a security analyst would do, but in 5 seconds and free.

## What it does

Sentra AI is a web app where you paste any suspicious email (raw RFC-822, headers and all — Gmail's "Show original" is one click). It runs a two-layer analysis and returns:

A verdict (SAFE / SUSPICIOUS / PHISHING) with a 0–100 risk score.
A ranked list of red flags, each with a plain-English explanation of why the attacker uses that trick — and you can click any flag to highlight the exact fragment in the original email.
Concrete recommended actions ("Do not click the link. Verify status by signing in at paypal.com directly.").
A one-line educational takeaway so the user spots the same pattern next time on their own.
Train mode — Sentra deals you a shuffled deck of example emails, you guess the verdict, and learn from each miss. The educational angle becomes interactive.

## How I built it

Layer 1 — deterministic heuristics (pure TypeScript, Node runtime):

MIME parsing with postal-mime → headers, body, links, attachments.
SPF / DKIM / DMARC pulled from Authentication-Results.
Reply-To / Return-Path mismatch check.
Brand-impersonation detection: free-mail-from-brand, Levenshtein distance against legit brand domain labels, punycode (xn--) homoglyph detection.
Link analysis: anchor-text vs. real href mismatch, @-trick credentials (http://paypal.com@evil.tld), raw-IP URLs, URL shorteners, high-abuse TLDs (.zip, .top, .click, …).
Content patterns: urgency, threats (account suspension), credential requests, money bait.
Risky attachment extensions (.exe, .docm, .iso, .scr, …).

Layer 2 — Gemini 2.5 Flash as the analyst. The model receives the raw email and the heuristic findings, then returns strict JSON via Gemini's responseSchema (no JSON-parsing roulette). This is the key design choice: heuristics are reliable but blind to context; LLMs read context but hallucinate. The combination gives ground truth + readable narrative.

Stack: Next.js 16 (App Router, Turbopack) · TypeScript strict · Tailwind CSS 4 · @google/generative-ai · postal-mime · lucide-react · Vercel.

## Challenges I ran into

Windows dev environment fought back. npx lock-compromised errors on npm 11, a slow secondary drive bottlenecking install, a leftover dev server squatting on port 3000, the lovely "you build the thing, then the laptop builds character" hackathon experience.
Strict JSON from LLMs. Gemini's responseSchema parameter saved me hours of regex tolerance — every analysis is now guaranteed-parseable JSON. First attempt with gemini-2.0-flash hit a "limit: 0" quota on a fresh AI Studio project; switched the default to 2.5-flash and made the model env-overridable.
Heuristic false positives. First pass flagged every email with the word "urgent." Tuned by adding severity weights and requiring 2+ urgency hits for the high-severity flag.
Brand-impersonation logic. Detecting paypa1.com and xn--ppal-2nb.com without flagging legitimate paypalobjects.com took a Levenshtein-plus-allowlist approach.

## Accomplishments I'm proud of

End-to-end working product in ~36 hours solo: parser, heuristics, LLM layer, full UI, annotated source view, copy-as-Markdown, demo emails, Train mode, deployed.
Clean two-layer architecture that's easy to explain in a 60-second pitch and easy to extend (swap Gemini → local Llama, add a Chrome extension, etc.).
A real educational takeaway in every verdict — Sentra teaches the user instead of just telling them what to do. Train mode pushes that further into gamification.

## What I learned

Gemini's structured output is a game-changer for "agentic" features — it eliminates a whole class of brittle parsing code.
The hardest part of an email-security tool isn't catching the obvious phishing. It's not crying wolf on the legitimate Stripe receipt. A SAFE verdict has to be just as confident as a PHISHING one.
Tight scope wins hackathons. The temptation to add OAuth, Gmail integration, a Chrome extension was real — and rejecting all of it is why the demo works.

## What's next

Chrome / Gmail extension that one-click submits the open thread to Sentra.
Self-hosted variant for security teams that can't send mail content to a third-party LLM — swap Gemini for a local Llama / Qwen behind the same API.
Outbound link safety check — passive DNS / VirusTotal lookup with explicit user consent.
Multilingual phishing patterns — current content heuristics are English-only.
Confidence calibration — log heuristic-vs-LLM verdicts on real corpora (Nazario, PhishTank) to tune thresholds. ## Challenges we ran into