WhatThePhish

Inspiration

Most phishing simulation tools are clunky, manual, and slow. A security team has to write the email, send it, wait, check logs, identify who clicked, then manually send training. By the time the employee gets feedback, the moment is gone. I wanted to see if AI agents could collapse that entire workflow into something instant and automatic.

What it does

WhatThePhish runs end-to-end phishing simulations with zero manual intervention. You pick a department and a phishing technique, and four AI agent crews handle the rest - generating the email, sending it, monitoring Splunk for clicks every 30 seconds, and automatically delivering personalised security awareness training to anyone who clicked.

How we built it

CrewAI to orchestrate four crews: CampaignCreation, EmailDispatch, Detection, and Response
GPT-4o-mini powering each agent
Splunk Enterprise with two indexes - phishing_sim for campaign events and employees for recipient data
Splunk MCP server used by the DetectionCrew to monitor click events in real time
HEC (HTTP Event Collector) to stream events into Splunk as they happen

Challenges we ran into

Getting the DetectionCrew to reliably poll Splunk on a 30-second interval without blocking the rest of the pipeline took more iteration than expected. Wiring the Splunk MCP server into the CrewAI agent context also required careful prompt engineering to get consistent query behaviour out of the agent.

Accomplishments that we're proud of

The full loop actually works from campaign creation to personalised training delivery with no human in the loop. Watching a click event in Splunk automatically trigger a tailored training email in seconds felt like the thing actually came alive.

What we learned

MCP servers as agent tools are powerful but need tight system prompts. Agents given too much freedom with a live Splunk instance will hallucinate queries. Specificity in the tool description matters as much as the tool itself.

What's next for WhatThePhish

Multi-vector simulations beyond email (SMS, voice)
A difficulty scoring system that escalates phishing sophistication based on employee performance over time

Built With

crewai
fastapi
fastapicrewai
gpt-4o-mini
hec
python
react
smtp
splunk-enterprise
splunk-mcp-server
sqlalchemy
tailwindcss

Updates

CodesByNeeraj Lakshmanan started this project — Jun 15, 2026 11:50 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.