Detection surfer

🌊 Detection Surfer

Autonomous Threat Research & Detection Engineering Lifecycle

💡 Inspiration

As a Detection Engineer, I found myself caught in a loop of repetitive manual tasks: scouring threat intel, checking if we already had coverage, mapping TTPs to the MITRE ATT&CK® framework, and manually writing/testing YAML rules.

Detection Surfer was born from a simple question: Can an AI agent handle the "grunt work" of the detection lifecycle so I can focus on high-level strategy? I wanted to push the boundaries of the Model Context Protocol (MCP) to see if an LLM could not only "suggest" rules but actually execute the entire engineering pipeline.

🚀 What it does

Detection Surfer is an end-to-end autonomous agent that handles the heavy lifting of a SOC content team. It performs:

Threat Intel Synthesis: Scrapes and summarizes new TTPs.
Coverage Gap Analysis: Queries existing rule repositories to ensure we aren't duplicating work.
Automated Rule Authoring: Generates high-fidelity detection logic (SIEM/Sigma) with schema validation.
Adversary Emulation: Automatically triggers or creates Atomic Red Team tests to validate the rule in real-time.
CI/CD Integration: Handles Git versioning and deploys validated rules directly to the cluster.

🛠️ How I built it

The core of the project is built on the Elastic AI Agent Builder, acting as the "brain." To give the agent "hands," I utilized the Model Context Protocol (MCP) to interface with:

Custom MCP Tools: Built to orchestrate rules on production cluster or translate MCP stdio to http.
GitHub Tooling: To manage pull requests and version control for detection-as-code.
- Execution Engine: A tool that interfaces with local/cloud environments to run Atomic Red Team scripts.

🚧 Challenges I ran into

The "Orchestration" Problem: Wiring disparate MCP tools together so the output of the "Researcher" tool correctly fed into the "Developer" tool required rigorous state management.
Repeatability: LLMs can be non-deterministic. Ensuring the agent followed the specific schema requirements of the Elastic Common Schema (ECS) every single time required deep prompt engineering and iterative schema validation loops.
Safe Execution: Automating Atomic Red Team tests requires a "sandbox first" approach to ensure the agent doesn't inadvertently disrupt production telemetry.

🏆Accomplishments that I'm proud of

End-to-End Autonomy: Seeing the agent identify a threat, write a rule, test it against a simulated attack, and open a GitHub PR—all without manual intervention—was a "eureka" moment.
Efficiency Gains: What used to take 2–3 hours of research and testing now happens in under 20 minutes.

📚What I learned

The Power of MCP: I realized that MCP is the "connective tissue" that will likely define the next generation of security operations.
Context is King: I learned that an LLM is only as good as the metadata you provide. Writing effective "System Contexts" for security agents is a specialized skill in itself.

🔮 What's next for Detection Surfer

Self-Healing Detections: Enabling the agent to automatically tune "noisy" rules by analyzing historical False Positive rates.
exceptions automation

Built With

json
python
terraform

Updates

Filip Žagar started this project — Feb 26, 2026 01:23 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.