MAD (Malicious Agent Detection)

Inspiration

I came into this hackathon as someone relatively new to the security space, with a general goal: build something to help defend against AI-driven attacks. The reasoning felt intuitive. Everyone is talking about AI-powered attacks as the defining emerging threat in cybersecurity, and I assumed there would already be datasets, benchmarks, or synthetic data showing what one actually looks like. I was surprised to find that nothing like this existed publicly.

That gap became the first part of the project. Community collaboration is the single greatest advantage defenders have over attackers, and having no shared data resource for the most significant emerging threat in the field is itself a vulnerability. MABE (Malicious Agent Behavior Emulator) was built to address that directly. As I worked through the primary sources to construct it, reading reports describing what AI agents actually did during documented attacks, the specific behavioral signatures became clear: machine-speed timing, exhaustive network enumeration, and credential harvest followed by rapid privilege escalation. These became the three detection mechanisms in the detector.

What it does

The Malicious Agent Detector is a real-time AI-driven attack detection system built on Splunk Enterprise Security. It has two components. MABE generates synthetic but empirically grounded attack data: 200 benign user sessions interleaved with 5 AI-driven attack sessions, producing 24,843 labeled events ingested into Splunk. The detector then runs against that data entirely blind to the labels.

Three behavioral detection mechanisms evaluate every session. The velocity mechanism identifies machine-speed inter-event timing statistically inconsistent with human operation. The enumeration mechanism identifies exhaustive access across anomalously many destinations, crossing network segments inconsistent with the account's historical baseline. The privilege escalation mechanism identifies credential harvest indicators followed rapidly by successful authentication to high-privilege nodes. Each mechanism produces a confidence score, which are combined into a weighted overall confidence score per session.

Alerted sessions are then passed to a triage agent: a single Claude API call with all alerts as context, reasoning comparatively to assign priority classifications and per-session reasoning. A human-in-the-loop investigation loop follows, where the analyst selects sessions to investigate from a ranked menu, receives signal-grounded SPL query recommendations, executes them live against Splunk, and receives plain-English summaries of the findings. Everything lands in Splunk ES Incident Review, enriched with detection scores, agent reasoning, priority classification, investigation findings, and drilldown SPL.

How I built it

The architecture of the detection mechanisms was shaped by something I heard in a conversation between Dwarkesh Patel and Ron Minsky from Jane Street, where the question came up of when it makes sense to run large expensive models versus simpler ones. The answer was essentially: think across time horizons. The faster a decision needs to be made, the simpler and cheaper the mechanism should be. I thought this principle translated directly into a cybersecurity context, and became even more important given that AI-driven attacks compress the time defenders have to respond. You need a funnel: fast and cheap mechanisms at the bottom that only escalate when something genuinely warrants it, with AI inference reserved for the top. For enterprises processing millions of events across large networks, simply throwing compute at the problem is not a solution. You need a funnel. Fast and cheap mechanisms at the bottom that only escalate when something genuinely warrants it, with AI inference reserved for the top.

Each of the three detection mechanisms has three gated layers. Layer 1 is always a single-pass statistic against a dynamic population threshold. Layer 2 increases specificity at moderate cost. Layer 3 is the most computationally expensive and the most specific. Layer 2 does not run unless Layer 1 fires. Layer 3 does not run unless Layer 2 fires. AI inference is applied only at the top of the funnel, not for detection but for triage and investigation. The design philosophy throughout is to rely on deterministic mechanisms wherever possible and apply AI only where its comparative and interpretive capabilities are genuinely needed. In a high-stakes domain like security, the detector produces the same output every time given the same input. The agent adds reasoning and accessibility on top of that foundation, it does not replace it.

Challenges I ran into

The most immediate challenge was that I was building in a space I had no prior experience in. Splunk was entirely new to me, as was the broader landscape of security operations, SIEM tools, detection engineering, and incident response workflows. I had to rapidly absorb terminology, concepts, and industry standards while simultaneously building against them. Understanding what a notable event is, how Splunk ES Incident Review works, what an analyst actually needs from an alert, what SPL is and how to write it effectively, and how baselines and dynamic thresholds are typically approached in practice were all things I was learning in parallel with building.

On the technical side, integrating with Splunk surfaced a number of platform-specific constraints that weren't visible until I ran into them. The KV Store baseline approach failed silently because the lookup definition wasn't automatically registered despite the collection being created successfully, requiring a CSV workaround. Investigation queries returned zero results with no error until I understood that MABE events are stored as raw JSON in Splunk's _raw field and require spath extraction before any field references will resolve. These were the kinds of problems that only appear when you actually run the system end to end.

Accomplishments that I'm proud of

I had never produced synthetic data before, never worked with simulated network topologies, and never built a cybersecurity tool. Producing something that does all three and integrates with a platform I had no prior experience with, in a single hackathon, is something I'm genuinely proud of.

I'm also proud of MABE as a standalone contribution. Building a synthetic dataset generator for AI-driven attacks with every behavioral parameter traceable to a published empirical source feels meaningful independent of the detector built on top of it. The gap it addresses is real, and the hope is that it can serve as a foundation for broader community work as this threat continues to evolve.

The funnel architecture is something I'm particularly proud of as an insight. Using tiered detection layers ordered by computational cost, with AI inference reserved for the top, may not be an entirely new idea in cybersecurity, but arriving at it independently from first principles and building it correctly felt like a genuinely strong architectural decision. It is both more efficient and more defensible than applying AI to the full detection problem.

What I learned

Working in an entirely unfamiliar domain under time pressure taught me something about how to scope problems quickly. When you cannot afford to become an expert before building, you have to identify the minimum viable understanding needed to make a correct architectural decision, build to that, and iterate. That forced prioritization produced some of the best decisions in this project and some of the worst, and knowing the difference in retrospect is itself a lesson.

I also came away with a much deeper appreciation for how much of security tooling is built for human-speed threats. The design space for AI-speed threats is genuinely open, and the lack of public data is both the biggest obstacle and the clearest opportunity.

What's next for MAD

The most significant limitation of this project is one I want to be transparent about. I built detection mechanisms around the exact characteristics I also built into MABE, which creates a risk of circularity. The detector performs well on MABE not necessarily because it would generalize to real AI-driven attacks, but because both were designed from the same inferences drawn from the same primary sources. The solution is collaboration with practitioners who have actually observed AI-driven attacks, and testing against sandboxed AI systems conducting attacks against simulated infrastructure.

The architecture is designed with this evolution in mind. The core detection layer is platform-neutral and mechanism-agnostic, MABE's behavioral parameters are all configurable and traceable to sources, and the hope is that this project serves as a starting point for a broader community conversation as more primary data becomes available.

Built With

anthropic
networkx
numpy
python
splunk

Updates

Luca Popescu started this project — Jun 15, 2026 04:04 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.