🌊 Detection Surfer
Autonomous Threat Research & Detection Engineering Lifecycle
💡 Inspiration
As a Detection Engineer, I found myself caught in a loop of repetitive manual tasks: scouring threat intel, checking if we already had coverage, mapping TTPs to the MITRE ATT&CK® framework, and manually writing/testing YAML rules.
Detection Surfer was born from a simple question: Can an AI agent handle the "grunt work" of the detection lifecycle so I can focus on high-level strategy? I wanted to push the boundaries of the Model Context Protocol (MCP) to see if an LLM could not only "suggest" rules but actually execute the entire engineering pipeline.
🚀 What it does
Detection Surfer is an end-to-end autonomous agent that handles the heavy lifting of a SOC content team. It performs:
Threat Intel Synthesis: Scrapes and summarizes new TTPs.
Coverage Gap Analysis: Queries existing rule repositories to ensure we aren't duplicating work.
Automated Rule Authoring: Generates high-fidelity detection logic (SIEM/Sigma) with schema validation.
Adversary Emulation: Automatically triggers or creates Atomic Red Team tests to validate the rule in real-time.
CI/CD Integration: Handles Git versioning and deploys validated rules directly to the cluster.
🛠️ How I built it
The core of the project is built on the Elastic AI Agent Builder, acting as the "brain." To give the agent "hands," I utilized the Model Context Protocol (MCP) to interface with:
Custom MCP Tools: Built to orchestrate rules on production cluster or translate MCP stdio to http.
GitHub Tooling: To manage pull requests and version control for detection-as-code.
- Execution Engine: A tool that interfaces with local/cloud environments to run Atomic Red Team scripts.
🚧 Challenges I ran into
The "Orchestration" Problem: Wiring disparate MCP tools together so the output of the "Researcher" tool correctly fed into the "Developer" tool required rigorous state management.
Repeatability: LLMs can be non-deterministic. Ensuring the agent followed the specific schema requirements of the Elastic Common Schema (ECS) every single time required deep prompt engineering and iterative schema validation loops.
Safe Execution: Automating Atomic Red Team tests requires a "sandbox first" approach to ensure the agent doesn't inadvertently disrupt production telemetry.
🏆Accomplishments that I'm proud of
End-to-End Autonomy: Seeing the agent identify a threat, write a rule, test it against a simulated attack, and open a GitHub PR—all without manual intervention—was a "eureka" moment.
Efficiency Gains: What used to take 2–3 hours of research and testing now happens in under 20 minutes.
📚What I learned
The Power of MCP: I realized that MCP is the "connective tissue" that will likely define the next generation of security operations.
Context is King: I learned that an LLM is only as good as the metadata you provide. Writing effective "System Contexts" for security agents is a specialized skill in itself.
🔮 What's next for Detection Surfer
- Self-Healing Detections: Enabling the agent to automatically tune "noisy" rules by analyzing historical False Positive rates.
- exceptions automation
Log in or sign up for Devpost to join the conversation.