Inspiration

Most phishing simulation tools are clunky, manual, and slow. A security team has to write the email, send it, wait, check logs, identify who clicked, then manually send training. By the time the employee gets feedback, the moment is gone. I wanted to see if AI agents could collapse that entire workflow into something instant and automatic.

What it does

WhatThePhish runs end-to-end phishing simulations with zero manual intervention. You pick a department and a phishing technique, and four AI agent crews handle the rest - generating the email, sending it, monitoring Splunk for clicks every 30 seconds, and automatically delivering personalised security awareness training to anyone who clicked.

How we built it

  • CrewAI to orchestrate four crews: CampaignCreation, EmailDispatch, Detection, and Response
  • GPT-4o-mini powering each agent
  • Splunk Enterprise with two indexes - phishing_sim for campaign events and employees for recipient data
  • Splunk MCP server used by the DetectionCrew to monitor click events in real time
  • HEC (HTTP Event Collector) to stream events into Splunk as they happen

Challenges we ran into

Getting the DetectionCrew to reliably poll Splunk on a 30-second interval without blocking the rest of the pipeline took more iteration than expected. Wiring the Splunk MCP server into the CrewAI agent context also required careful prompt engineering to get consistent query behaviour out of the agent.

Accomplishments that we're proud of

The full loop actually works from campaign creation to personalised training delivery with no human in the loop. Watching a click event in Splunk automatically trigger a tailored training email in seconds felt like the thing actually came alive.

What we learned

MCP servers as agent tools are powerful but need tight system prompts. Agents given too much freedom with a live Splunk instance will hallucinate queries. Specificity in the tool description matters as much as the tool itself.

What's next for WhatThePhish

  • Multi-vector simulations beyond email (SMS, voice)
  • A difficulty scoring system that escalates phishing sophistication based on employee performance over time

Built With

  • crewai
  • fastapi
  • fastapicrewai
  • gpt-4o-mini
  • hec
  • python
  • react
  • smtp
  • splunk-enterprise
  • splunk-mcp-server
  • sqlalchemy
  • tailwindcss
Share this project:

Updates