Inspiration
What inspired me to work on Project Mantis was my dual passion for Artificial Intelligence and Cybersecurity. Having recently finished my studies, I was looking for a challenge to build hands-on experience, and this SANS x Devpost hackathon was the perfect opportunity to bridge my two interests. Furthermore, simply watching the news made it apparent that we are entering an era of AI-assisted hacking. I realized that the only way to fight this new scale of threat is by building equally intelligent, autonomous defense systems. As for the name? My first hackathon project earlier this year was called "Project Lobster" (inspired by the 'moltbook' incident). I wanted to maintain a crustacean "agent" theme, so I chose the mantis shrimp. Known for its incredibly sharp sight and lightning-fast, surgical strikes, the mantis shrimp perfectly embodies the core functions of this DFIR triage agent.
What it does
Project Mantis is an autonomous Digital Forensics and Incident Response (DFIR) framework designed to ingest raw memory and disk images, extract artifacts using SIFT Workstation tools (like Volatility 3 and FLS), and perform deterministic, hallucination-free triage. Instead of operating like a standard AI chatbot that hallucinates attack narratives, Mantis operates like a digital courtroom. It parses raw telemetry, identifies anomalous behaviors (like fileless malware executing in memory), and generates a CISO-ready, MITRE ATT&CK mapped report without human intervention.
How we built it
The architecture of Mantis was fundamentally driven by extreme hardware constraints. I built this entirely on an Intel Celeron N4020 machine with less than 4GB of available RAM and a nearly full hard drive. I had no other choice but to make do with what I had.
To achieve accurate, autonomous investigations under these constraints, I had to approach the project from a data ingestion perspective first. I couldn't just feed raw memory dumps into an LLM—it would instantly freeze or crash my system.
To solve this, I built the Deterministic Sieve. Before the LLM even sees the data, the Sieve uses regular expressions and math-based heuristics to scan for known indicators of compromise (IoCs). The LLM is only activated to reason over the highly distilled, suspicious events that pass through the Sieve.
From there, the suspected artifacts are passed into an Adversarial State-Machine (FSM). The agent is split into a Prosecutor (who attempts to convict the artifact), a Defense Attorney (who aggressively tries to disprove the conviction using benign IT logic), and a Verifier (who audits the debate).
Challenges we ran into
The biggest challenge was undoubtedly building around the hardware limitations. Once I found a way to bridge the raw evidence with the LLM via the Sieve, a cascade of new challenges emerged:
- Hallucination via Training Data: Early on, the LLM recognized the public datasets (like CFReDS) and began hallucinating elaborate attack narratives based purely on its training data rather than the local evidence. To fix this, I had to sanitize the data before feeding it to the LLM.
- The "Loophole" Problem: LLMs naturally prefer to infer output rather than construct it from provided data. When I placed strict constraints on the agent, it constantly found loopholes to break free.
- The "Guilty of Everything" Problem: Once the LLM learned what malicious data looked like, it became over-eager and wanted to isolate every event.
To solve these, I implemented ClaimGuard, a structural requirement forcing the LLM to cite the evidence verbatim. If it makes a claim without providing an exact substring match from the telemetry, the action is rejected.
Accomplishments that we're proud of
I am incredibly proud of successfully transitioning this agent from a naive ReAct loop to a rigorous, deterministic engine. By enforcing strict Pydantic JSON schemas, utilizing an adversarial LLM architecture, and demanding 100% audit-trail traceability, Project Mantis completely eliminates the "Demo Magic" often seen in Generative AI security tools.
We successfully built a zero-blackbox, hallucination-free agent that accurately detected fileless process hollowing ( SearchApp.exe anomalies) and data exfiltration, all while running on a deeply constrained edge device.
What we learned
The crazy thing is that before this project, I had only learned about DFIR in theory through my studies. To guide an LLM through a forensic investigation, I had to deeply understand the process myself. I had to learn how to perform memory and disk forensics manually so that I could teach the LLM to deterministically mirror those playbooks and reporting standards.
Spending a month building and breaking this project taught me profound lessons about AI: its limitations, its stubborn desire to infer rather than read, and its devious ability to find loopholes in system prompts. It truly opened my eyes and made me realize how much potential there is in bridging Generative AI and Cybersecurity.
What's next for Project Mantis
We have laid the groundwork for an incredibly powerful Universal Forensic Engine, and the roadmap for v0.6.0 and beyond is already planned:
• Automated Cross-Dimensional Pivoting: Finalizing the pipeline so that a memory conviction automatically triggers dynamic disk timeline caching, allowing the agent to trace back from an in-memory anomaly to the exact dropper file that initiated the attack.
• Deeper Deterministic Classification: Perfecting the sieve_deterministic layer so the agent can autonomously distinguish between benign Just-In-Time (JIT) flow padding and malicious trampolines, explicitly restricting the LLM to an "Audit Only" role for extreme precision.
• Advanced Disk Enrichment: Correlating Windows LNK files with USBSTOR registry keys to autonomously prove when a payload was executed from a removable drive.


Log in or sign up for Devpost to join the conversation.