-
-
ASCII architecture drawing
-
example in-progress case report
-
sample mission card pre mission
-
sample mission card post mission
-
sample mission audit report
-
example shared_facts.md file for all agents to share findings and state
-
Agent finding an important google account
-
Agent writing it's own skill
-
Agent retrieves a phone screenshot of a text message and correlates it to the case
-
agent resizes images to fit it's tool window, examines them and correlates them to the case
-
"seeing tiny images" as it resizes images to it's tool window.
-
agent learning how to extract outlook files
-
agent teaching me LSA secrets decoding tips
-
agent updating a skill after a mission for future accuracy
-
retro with agent to determine value of artifacts to go after the good ones
-
co-design retro session
-
agent following the trail of deleted files by file size
-
ioc.jsonl at work
-
parquet file size vs native artifact sizes
-
agent showing off the ability to query across artifacts using duckdb
-
example of forge delegating
-
enough with the skills already
-
All that's needed to start the NITROBA case
Inspiration
Inspired by the call for reacting at machine speed: "An AI-powered adversary can go from initial access to full domain control in under 8 minutes. Meanwhile, a human incident responder is still pulling up their toolkit. That gap is the most dangerous problem in cybersecurity."
🎬 What It Does
DuckTracy: Leveraging DuckDB from the world of data science and SIFT from the world of forensics, DuckTracy is an autonomous, multi-agent automation of end-to-end digital forensics and incident response (DFIR). With a nod to Daffy Duck's iconic sleuth character, building on the SANS SIFT toolset and a carefully selected set of open source companions, DuckTracy orchestrates over 200+ professional-grade forensic tools to triage disk images, memory captures, registry hives and other artifacts correlating complex timelines, exposing insider threats and attacker activity at machine speed.
This was built for speed, by agents, for agents to help them cope with the unique challenges associated with forensics while taking advantage of capabilities of AI. AI was born from data science. Bringing data science tools into forensics while bringing forensic data into data science tools and marrying them all with autonomous agents creates a new forensic harness for extremely rapid case investigation.
🛠️ How We Built It
We built DuckTracy using a Hybrid Funnel Architecture that bridges traditional forensic tools with high-speed analytical engines: Instead of forcing agents to learn forensic tooling, custom outputs and wade through GB of data, we brought the data to the agents in a modern pipeline inspired by data science advances. Making use of DuckDB, and support for common structures like jsonl and parquet
- The SIFT Container: Heavy-lifting forensic extractions (Sleuthkit, Plaso) run in a dedicated SIFT Docker container.
- The Parquet Pipeline: We built custom ingestion helpers—
rdump_to_parquet.py(converting Dissecttarget-querystreams to Parquet) andpsteal_parquet.py(converting Plaso super-timelines to Parquet), etc. - The Local Query Layer: Sub-agents run native arm64 DuckDB queries against the Parquet tables, processing millions of filesystem and registry events in milliseconds.
- The Blackboard Pattern: To solve sub-agent amnesia, we implemented a centralized Case Report (
cases/ROCBA/docs/case_report.md) and Shared Facts file (cases/ROCBA/docs/shared_facts.md) that serve as a shared brain. - Concurrent Context-Sharing: We built
helpers/ioc_tracker.pyto write indicators to an append-onlyiocs.jsonlfile. This allows parallel sub-agents to safely register and query IOCs simultaneously without write-concurrency conflicts. DuckDB allows easy joining of .jsonl, .parquet .sqlite and other formats into combined queries.
Architecture:
See full diagram at: https://github.com/jeffbryner/find_evil_hackathon/blob/main/ARCHITECTURE.md
Agent Harness: forgecode.dev
- Built in rust for memory safety.
- Allows for simple agent creation via markdown files, tools and skills.
- Supports any LLM provider and model.
- Can mix and match case orchestrator provider/model (Gemini Pro) vs sub agent provider/model (sniper agent using Gemini Flash for example)
- Can execute agents as tools, leading to parallel orchestration of multiple agents at once
- Uses standard skills files
- Easy to constrain agents for number of turns allowed, etc to control resource usage
- Simple terminal interface, Native zsh integration, can easily use asciinema to record sessions for audits
Data Format: parquet/duckdb
- DuckDB is a favorite tool for data scientists due to it's speed, and wide support for data formats
- In memory execution, no server required, extremely fast on Apple Silicon
- SQL is very well known to AI
- Parquet is extremely efficient
- DuckDB can dynamically concatenate parquet files across directories into a single view allowing ad hoc evidence additions
Tool execution: CLI over MCP
We asked AI after every session how well the environment presented to it helped it in the investigation. What worked, what could use improvement. Consistently AI preferred using command line tools it already knew how to use rather than learning custom sets of tools presented via MCP.
Agents are presented with simple cli utilities query_parquet.py to avoid having to re-create parquet views, ioc_tracker.py to add adhoc iocs that other agents can use, etc. Agents are allowed to execute shell commands in the SIFT docker container via docker exec which they consistently rate as feeling native and comfortable while retaining their innate knowledge of common shell and forensic tools.
Agent quotes:
- NPS / Feedback: 10/10. The
query_parquet.pyscript and DuckDB backend are incredibly fast and made timeline analysis of 2,600+ browser entries and 1.8M+ file events seamless. - NPS / Feedback: 10/10. The Parquet timelines combined with DuckDB allow for incredibly fast and highly precise forensic querying. WordWheelQuery was an absolute goldmine of user intent. Filtering out system noise was essential to pinpointing the insider threat activity.
SIFT as docker
We chose to package SIFT into a docker container. This allows us to use the tools in SIFT but present them to AI as command line using docker exec. Containers are named after cases so command execution is as straightforward as:
docker exec ROCBA pwd
/home/sansforensics
Host disks are mounted in the container in read only and read-write locations to protect source images while allowing "scratch" work while gathering artifacts.
This allows easy access to mounted evidence from the container:
"command": "docker exec SRL2015 cat \"/mnt/windows/win2008R2-controller-c-drive/Program Files (x86)/hMailServer/Data/stark-research-labs.com/nromanoff/63/{63904A78-563D-4DA6-BF03-820E3E1EA08D}.eml\"",
"description": "Read a phishing email from nromanoff's mailbox."
It also allows AI to rapidly join artifacts together to answer what are usually complex questions. Here's an example AI generated for it's skill files:
Correlating Execution with File Activity
What Registry or Event Log entries occurred within 10 seconds of a specific file being created.
SELECT
art.timestamp,
art.message,
fs.message as file_name,
fs.details->>'Type' as macb
FROM artifacts_timeline art
JOIN fs_timeline fs
ON art.timestamp BETWEEN fs.timestamp - INTERVAL '10 seconds'
AND fs.timestamp + INTERVAL '10 seconds'
WHERE fs.file_name_lower ILIKE '%.ps1'
AND macb Ilike '%c%';
Workflow
Given a mounted disk, pcap file or memory image DuckTracy's lead case agent coordinates specialized, stateless sub-agents(data-analyst and sniper-forensics) on well defined missions to:
- Inventory & Ingest Evidence: Automatically map and ingest system artifacts into high-speed Parquet tables.
- Hunt for Anomalies: Audit process executions, persistence mechanisms, network connections, and logons.
- Recover & Carve Data: Locate and carve deleted files (such as Outlook PST archives) or other data as needed.
- Correlate Timelines: Reconstruct a unified, chronological timeline of activities.
- Expose Threats: Detect sophisticated indicators of compromise (IOCs) and insider threats by recognizing logical discrepancies in the data.
- Follow the leads: Follow your direction for further paths to follow.
💥 Challenges We Ran Into
SIFT gaps
SIFT takes on a huge task to wrangle hundreds of forensic tools into a useful package. However there are always some that defy packaging. At the time of this writing, installing the most recent release (teamdfir/sift-saltstack@v2026.04.21) results in a SIFT image lacking volatility3, plaso and other tools that fail installation on current ubuntu/virtualbox.
We took the approach of starting with SIFT and augmenting as needed. Where there was choice we chose native OS installations to take advantage of modern advances like Apple Silicon M-series chips. Where possible we augmented existing suites like plaso and dissect with targeted additions to support modern data science formats like parquet.
AI
AI wants to do forensics. AI is good and very fast at individual forensic tasks. AI can also get stuck in loops, run off on fruitless efforts, hallucinate results, ignore evidence, etc.
We built this using AI as a design partner. After iterating a new version we would ask AI for it's NPS score of the environment: How well did the environment and tooling help you accomplish your investigation. What worked well, what could use improvement. We fed the results back into the next iteration with AI building the revision. AI likes SQL over command line or MCP. It "gets it" that it can funnel data into the architecture and use it to solve cases.
AI context window management
We started with one all powerful agent. Very quickly it ate through it's context window, lost the narrative and ether ran down endless rabbit holes or discovered common forensic tooling like f-response, declared it malware and called it a day.
Agent endless pursuit
AI is a people pleaser that likes to finish what it starts. Most of the data present in a forensic case is not useful. Some leads don't pan out. This can lead to AI churning as it eats it's context, forgets what it has already tried, retries and loops endlessly.
Cost There is no sponsorship for this work and AI can be a very expensive technology. We have a limited budget compared to the endless experiments that can be run. Multi-case runs vs outcomes, swapping out models for lead/sub agents (Is Opus plus Gemini the magical pairing?), etc are cost prohibitive but could prove useful.
No defined rubric The case data presented (VANKO, ROCBA, SRL) is fantastic and very rich. There is no answer key however, so it is challenging to know if your AI solution is coming to the correct conclusions.
Accomplishments that we're proud of
We are very proud of the ability to meet the challenge; true autonomous AI agents acting as a team at machine speed using AI native tooling augmented with a forensic tooling of storied history to solve real cases.
🏆 What We Learned (The Breakthroughs)
We broke up the roles into a case lead and sub-agents with specialties. The forge harness has the ability for agents to call one another as tools, so there is a natural, scalable team created. The case lead asked us to remove shell access to force it to delegate tasks. This also allows it to retain the case context and not get stuck in the weeds of parsing tool call results.
To coordinate we built the concepts of mission cards, shared facts and the ioc tracker. The case lead cannot investigate. It can only delegate and synthesize. It starts by writing an empty case report, and an empty shared facts file. It iterates through missions delegated to sub agents as it sees fit to go from an empty case report to a completed investigation.
Mission cards contain a strictly defined goal with a strict budget of tool calls for a sub-agent to use. This prevents endless looping, preserves context and serves as a record of case activity. Additionally each mission card has it's own audit trail of every tool call for copy/paste reproducibility. Each mission card also includes an agent NPS score for it to provide feedback on the mission, tooling, etc.
Agents share information between cases using the shared facts markdown file and the ioc tracker jsonl database file. With IOCs in jsonl, duckdb can dynamically join them to evidence in parquet, sqlite etc.
Iterating through cases to find trouble spots and having the same AI agents do a retro was a goldmine of opportunity. The agents recognized their challenges, offered solutions and wrote their own skill files in their own language to help them the next time the encounter a task.
What's next for DuckTracy
We would love the opportunity to follow the trail to see how different combinations of agents/models operate together. Is a lead using Opus and subagents using gemin flash the winning combination? Is a pure play "all one provider" better? Only experimentation will tell.
Log in or sign up for Devpost to join the conversation.