TL;DR

TriageTap is an evidence-cited AI SOC copilot. It turns phishing reports + hybrid logs/files into one incident case with a merged timeline, deterministic detections, and an AI triage summary that includes verbatim evidence quotes you can verify.

Inspiration

I applied for an on-campus SOC Analyst role and got rejected. Instead of stopping there, I used it as fuel to build the hands-on skills I was missing. The fastest way to learn real SOC work (triage → investigate → prove → report) was to build a SOC-style tool that forces me to analyze real artifacts and produce explainable, evidence-grounded results.

What’s working now (demo-ready)

1) PhishGuard: phishing triage → case creation

  • Paste an email/SMS/URL
  • Deterministic IOC extraction (URLs/domains + emails/phones when present) and red-flag heuristics
  • AI classification + safe reply + recommended next steps
  • One click: Create Phish Case → stores the report + IOCs as a case event

2) Hybrid log ingestion: web + auth + CloudTrail

  • Upload real log sources (web access logs, SSH auth logs, CloudTrail JSON)
  • Normalize them into a unified incident view (single case timeline)

3) Explainable case view (AI + rules + timeline)

Each case shows:

  • AI Verdict (incident summary)
  • AI Red Flags (evidence quotes) pulled directly from ingested text/log lines
  • Deterministic Red Flags (rule-based signals)
  • Next Steps (actionable investigation/containment guidance)
  • A merged Timeline across sources (web/auth/cloud events)

4) File intel enrichment (OPSWAT MetaDefender)

  • Optional multi-engine scan summary (verdict, detection ratio, hashes, and engine results)
  • Results can be attached to a case as supporting intel (and removed instantly by deleting the case)

How I built it

  • Built a Spring Boot backend + lightweight web UI with two workflows: PhishGuard and SOC Copilot
  • Implemented ingestion + normalization for hybrid sources (web/auth/CloudTrail)
  • Added deterministic detection flags so results stay explainable
  • Designed AI outputs to be evidence-grounded (quotes from the artifact, not “AI guessing”)
  • Integrated OPSWAT MetaDefender for multi-engine file intel with an “attach to selected case” option
  • Added privacy controls: raw log storage is off by default and cases can be deleted in one click

Challenges

  • Normalization + timestamps: different sources don’t align cleanly; building a readable merged timeline took iteration.
  • Keeping AI honest: I prioritized provability over fluency, so outputs had to cite exact evidence rather than speculate.
  • Correlation: connecting web recon → SSH activity → CloudTrail actions into one coherent narrative without over-claiming.
  • Third-party intel constraints: multi-engine scanning has rate limits and privacy tradeoffs, so I built it as an optional, attachable intel step.

What I learned

I learned how SOC investigation flows work end-to-end: ingesting messy telemetry, extracting signals, building timelines, and communicating findings in a way that can be verified. The biggest lesson: AI is only useful in security when it stays grounded in evidence.

Next steps (presentation polish)

These are incremental upgrades on top of what’s already working: 1) Clickable evidence IDs (e.g., [E12]) that jump to the exact timeline entry
2) Investigation Mode (Next Best Evidence): AI asks 2–4 follow-up questions; answers re-triage the case
3) Privacy Share Pack export: redacted report + redacted evidence JSONL + hashed IOCs
4) Tamper-evident proof: compute SHA-256 of the share pack and optionally anchor {case_id, hash, timestamp}
5) Voice briefing: generate a short audio incident briefing from the triage summary
6) Rule Forge preview: mark TP/FP → AI suggests rule tweaks → preview “alerts before vs after” (no auto-apply)

What’s next

My next step is to turn TriageTap into a real-time system checker with an offline AI mode:

  • Add a lightweight local agent that streams system logs/events into the timeline continuously (instead of only file uploads).
  • Run detections in near real-time as events arrive.
  • Replace cloud LLM calls with a local/offline model (on-prem) so sensitive logs never leave the environment.
  • Keep the same evidence-citation rule: offline AI can only cite existing event_ids [E##] from the case.

Built With

Share this project:

Updates