🌊 Detection Surfer


Autonomous Threat Research & Detection Engineering Lifecycle
💡 Inspiration

As a Detection Engineer, I found myself caught in a loop of repetitive manual tasks: scouring threat intel, checking if we already had coverage, mapping TTPs to the MITRE ATT&CK® framework, and manually writing/testing YAML rules.

Detection Surfer was born from a simple question: Can an AI agent handle the "grunt work" of the detection lifecycle so I can focus on high-level strategy? I wanted to push the boundaries of the Model Context Protocol (MCP) to see if an LLM could not only "suggest" rules but actually execute the entire engineering pipeline.

🚀 What it does

Detection Surfer is an end-to-end autonomous agent that handles the heavy lifting of a SOC content team. It performs:

  • Threat Intel Synthesis: Scrapes and summarizes new TTPs.

  • Coverage Gap Analysis: Queries existing rule repositories to ensure we aren't duplicating work.

  • Automated Rule Authoring: Generates high-fidelity detection logic (SIEM/Sigma) with schema validation.

  • Adversary Emulation: Automatically triggers or creates Atomic Red Team tests to validate the rule in real-time.

  • CI/CD Integration: Handles Git versioning and deploys validated rules directly to the cluster.

🛠️ How I built it

The core of the project is built on the Elastic AI Agent Builder, acting as the "brain." To give the agent "hands," I utilized the Model Context Protocol (MCP) to interface with:

  • Custom MCP Tools: Built to orchestrate rules on production cluster or translate MCP stdio to http.

  • GitHub Tooling: To manage pull requests and version control for detection-as-code.

    • Execution Engine: A tool that interfaces with local/cloud environments to run Atomic Red Team scripts.
🚧 Challenges I ran into
  • The "Orchestration" Problem: Wiring disparate MCP tools together so the output of the "Researcher" tool correctly fed into the "Developer" tool required rigorous state management.

  • Repeatability: LLMs can be non-deterministic. Ensuring the agent followed the specific schema requirements of the Elastic Common Schema (ECS) every single time required deep prompt engineering and iterative schema validation loops.

  • Safe Execution: Automating Atomic Red Team tests requires a "sandbox first" approach to ensure the agent doesn't inadvertently disrupt production telemetry.

🏆Accomplishments that I'm proud of
  • End-to-End Autonomy: Seeing the agent identify a threat, write a rule, test it against a simulated attack, and open a GitHub PR—all without manual intervention—was a "eureka" moment.

  • Efficiency Gains: What used to take 2–3 hours of research and testing now happens in under 20 minutes.

📚What I learned
  • The Power of MCP: I realized that MCP is the "connective tissue" that will likely define the next generation of security operations.

  • Context is King: I learned that an LLM is only as good as the metadata you provide. Writing effective "System Contexts" for security agents is a specialized skill in itself.

🔮 What's next for Detection Surfer
  • Self-Healing Detections: Enabling the agent to automatically tune "noisy" rules by analyzing historical False Positive rates.
  • exceptions automation

Built With

Share this project:

Updates