Inspiration

Inspired by the call for reacting at machine speed: "An AI-powered adversary can go from initial access to full domain control in under 8 minutes. Meanwhile, a human incident responder is still pulling up their toolkit. That gap is the most dangerous problem in cybersecurity."

🎬 What It Does

DuckTracy: Leveraging DuckDB from the world of data science and SIFT from the world of forensics, DuckTracy is an autonomous, multi-agent automation of end-to-end digital forensics and incident response (DFIR). With a nod to Daffy Duck's iconic sleuth character, building on the SANS SIFT toolset and a carefully selected set of open source companions, DuckTracy orchestrates over 200+ professional-grade forensic tools to triage disk images, memory captures, registry hives and other artifacts correlating complex timelines, exposing insider threats and attacker activity at machine speed.

This was built for speed, by agents, for agents to help them cope with the unique challenges associated with forensics while taking advantage of capabilities of AI. AI was born from data science. Bringing data science tools into forensics while bringing forensic data into data science tools and marrying them all with autonomous agents creates a new forensic harness for extremely rapid case investigation.

🛠️ How We Built It

We built DuckTracy using a Hybrid Funnel Architecture that bridges traditional forensic tools with high-speed analytical engines: Instead of forcing agents to learn forensic tooling, custom outputs and wade through GB of data, we brought the data to the agents in a modern pipeline inspired by data science advances. Making use of DuckDB, and support for common structures like jsonl and parquet

  • The SIFT Container: Heavy-lifting forensic extractions (Sleuthkit, Plaso) run in a dedicated SIFT Docker container.
  • The Parquet Pipeline: We built custom ingestion helpers—rdump_to_parquet.py (converting Dissect target-query streams to Parquet) and psteal_parquet.py (converting Plaso super-timelines to Parquet), etc.
  • The Local Query Layer: Sub-agents run native arm64 DuckDB queries against the Parquet tables, processing millions of filesystem and registry events in milliseconds.
  • The Blackboard Pattern: To solve sub-agent amnesia, we implemented a centralized Case Report (cases/ROCBA/docs/case_report.md) and Shared Facts file (cases/ROCBA/docs/shared_facts.md) that serve as a shared brain.
  • Concurrent Context-Sharing: We built helpers/ioc_tracker.py to write indicators to an append-only iocs.jsonl file. This allows parallel sub-agents to safely register and query IOCs simultaneously without write-concurrency conflicts. DuckDB allows easy joining of .jsonl, .parquet .sqlite and other formats into combined queries.

Architecture:

See full diagram at: https://github.com/jeffbryner/find_evil_hackathon/blob/main/ARCHITECTURE.md

Agent Harness: forgecode.dev

  • Built in rust for memory safety.
  • Allows for simple agent creation via markdown files, tools and skills.
  • Supports any LLM provider and model.
  • Can mix and match case orchestrator provider/model (Gemini Pro) vs sub agent provider/model (sniper agent using Gemini Flash for example)
  • Can execute agents as tools, leading to parallel orchestration of multiple agents at once
  • Uses standard skills files
  • Easy to constrain agents for number of turns allowed, etc to control resource usage
  • Simple terminal interface, Native zsh integration, can easily use asciinema to record sessions for audits

Data Format: parquet/duckdb

  • DuckDB is a favorite tool for data scientists due to it's speed, and wide support for data formats
  • In memory execution, no server required, extremely fast on Apple Silicon
  • SQL is very well known to AI
  • Parquet is extremely efficient
  • DuckDB can dynamically concatenate parquet files across directories into a single view allowing ad hoc evidence additions

Tool execution: CLI over MCP

We asked AI after every session how well the environment presented to it helped it in the investigation. What worked, what could use improvement. Consistently AI preferred using command line tools it already knew how to use rather than learning custom sets of tools presented via MCP. Agents are presented with simple cli utilities query_parquet.py to avoid having to re-create parquet views, ioc_tracker.py to add adhoc iocs that other agents can use, etc. Agents are allowed to execute shell commands in the SIFT docker container via docker exec which they consistently rate as feeling native and comfortable while retaining their innate knowledge of common shell and forensic tools.

Agent quotes:

  • NPS / Feedback: 10/10. The query_parquet.py script and DuckDB backend are incredibly fast and made timeline analysis of 2,600+ browser entries and 1.8M+ file events seamless.
  • NPS / Feedback: 10/10. The Parquet timelines combined with DuckDB allow for incredibly fast and highly precise forensic querying. WordWheelQuery was an absolute goldmine of user intent. Filtering out system noise was essential to pinpointing the insider threat activity.

SIFT as docker

We chose to package SIFT into a docker container. This allows us to use the tools in SIFT but present them to AI as command line using docker exec. Containers are named after cases so command execution is as straightforward as:

docker exec ROCBA pwd
/home/sansforensics

Host disks are mounted in the container in read only and read-write locations to protect source images while allowing "scratch" work while gathering artifacts.

This allows easy access to mounted evidence from the container:

"command": "docker exec SRL2015 cat \"/mnt/windows/win2008R2-controller-c-drive/Program Files (x86)/hMailServer/Data/stark-research-labs.com/nromanoff/63/{63904A78-563D-4DA6-BF03-820E3E1EA08D}.eml\"",
"description": "Read a phishing email from nromanoff's mailbox."

It also allows AI to rapidly join artifacts together to answer what are usually complex questions. Here's an example AI generated for it's skill files:

Correlating Execution with File Activity

What Registry or Event Log entries occurred within 10 seconds of a specific file being created.

SELECT 
    art.timestamp, 
    art.message, 
    fs.message as file_name,
    fs.details->>'Type' as macb
FROM artifacts_timeline art
JOIN fs_timeline fs 
  ON art.timestamp BETWEEN fs.timestamp - INTERVAL '10 seconds' 
                       AND fs.timestamp + INTERVAL '10 seconds'
WHERE fs.file_name_lower ILIKE '%.ps1'
AND macb Ilike '%c%';

Workflow

Given a mounted disk, pcap file or memory image DuckTracy's lead case agent coordinates specialized, stateless sub-agents(data-analyst and sniper-forensics) on well defined missions to:

  1. Inventory & Ingest Evidence: Automatically map and ingest system artifacts into high-speed Parquet tables.
  2. Hunt for Anomalies: Audit process executions, persistence mechanisms, network connections, and logons.
  3. Recover & Carve Data: Locate and carve deleted files (such as Outlook PST archives) or other data as needed.
  4. Correlate Timelines: Reconstruct a unified, chronological timeline of activities.
  5. Expose Threats: Detect sophisticated indicators of compromise (IOCs) and insider threats by recognizing logical discrepancies in the data.
  6. Follow the leads: Follow your direction for further paths to follow.

💥 Challenges We Ran Into

SIFT gaps

SIFT takes on a huge task to wrangle hundreds of forensic tools into a useful package. However there are always some that defy packaging. At the time of this writing, installing the most recent release (teamdfir/sift-saltstack@v2026.04.21) results in a SIFT image lacking volatility3, plaso and other tools that fail installation on current ubuntu/virtualbox.

We took the approach of starting with SIFT and augmenting as needed. Where there was choice we chose native OS installations to take advantage of modern advances like Apple Silicon M-series chips. Where possible we augmented existing suites like plaso and dissect with targeted additions to support modern data science formats like parquet.

AI

AI wants to do forensics. AI is good and very fast at individual forensic tasks. AI can also get stuck in loops, run off on fruitless efforts, hallucinate results, ignore evidence, etc.

We built this using AI as a design partner. After iterating a new version we would ask AI for it's NPS score of the environment: How well did the environment and tooling help you accomplish your investigation. What worked well, what could use improvement. We fed the results back into the next iteration with AI building the revision. AI likes SQL over command line or MCP. It "gets it" that it can funnel data into the architecture and use it to solve cases.

AI context window management

We started with one all powerful agent. Very quickly it ate through it's context window, lost the narrative and ether ran down endless rabbit holes or discovered common forensic tooling like f-response, declared it malware and called it a day.

Agent endless pursuit

AI is a people pleaser that likes to finish what it starts. Most of the data present in a forensic case is not useful. Some leads don't pan out. This can lead to AI churning as it eats it's context, forgets what it has already tried, retries and loops endlessly.

Cost There is no sponsorship for this work and AI can be a very expensive technology. We have a limited budget compared to the endless experiments that can be run. Multi-case runs vs outcomes, swapping out models for lead/sub agents (Is Opus plus Gemini the magical pairing?), etc are cost prohibitive but could prove useful.

No defined rubric The case data presented (VANKO, ROCBA, SRL) is fantastic and very rich. There is no answer key however, so it is challenging to know if your AI solution is coming to the correct conclusions.

Accomplishments that we're proud of

We are very proud of the ability to meet the challenge; true autonomous AI agents acting as a team at machine speed using AI native tooling augmented with a forensic tooling of storied history to solve real cases.

🏆 What We Learned (The Breakthroughs)

We broke up the roles into a case lead and sub-agents with specialties. The forge harness has the ability for agents to call one another as tools, so there is a natural, scalable team created. The case lead asked us to remove shell access to force it to delegate tasks. This also allows it to retain the case context and not get stuck in the weeds of parsing tool call results.

To coordinate we built the concepts of mission cards, shared facts and the ioc tracker. The case lead cannot investigate. It can only delegate and synthesize. It starts by writing an empty case report, and an empty shared facts file. It iterates through missions delegated to sub agents as it sees fit to go from an empty case report to a completed investigation.

Mission cards contain a strictly defined goal with a strict budget of tool calls for a sub-agent to use. This prevents endless looping, preserves context and serves as a record of case activity. Additionally each mission card has it's own audit trail of every tool call for copy/paste reproducibility. Each mission card also includes an agent NPS score for it to provide feedback on the mission, tooling, etc.

Agents share information between cases using the shared facts markdown file and the ioc tracker jsonl database file. With IOCs in jsonl, duckdb can dynamically join them to evidence in parquet, sqlite etc.

Iterating through cases to find trouble spots and having the same AI agents do a retro was a goldmine of opportunity. The agents recognized their challenges, offered solutions and wrote their own skill files in their own language to help them the next time the encounter a task.

What's next for DuckTracy

We would love the opportunity to follow the trail to see how different combinations of agents/models operate together. Is a lead using Opus and subagents using gemin flash the winning combination? Is a pure play "all one provider" better? Only experimentation will tell.

Built With

Share this project:

Updates