AIPOL - AIPolygraph - Autonomous IR Agent

Inspiration

The idea for AIPolygraph came from a simple observation: most security tools either detect threats OR respond to them, but rarely do both autonomously. SOC analysts are overwhelmed with thousands of alerts daily, and by the time a human investigates, the damage is often already done. https://www.loom.com/share/6402f69f6a774eeab29a83db67aabcd1 I wanted to build a system that could "find evil at machine speed" - detect, analyze, and isolate threats without waiting for human intervention. The name "AIPolygraph" reflects the system's ability to "interrogate" network events and determine if they're malicious, just like a polygraph test detects deception.

The SANS FIND EVIL! hackathon was the perfect opportunity to combine three things I'm passionate about: cybersecurity, AI, and building practical tools that actually help defenders.

What it does

AIPolygraph 2.0 is an autonomous incident response agent that:

🔍 Detects threats using 13 specialized "Animal Modules":

🦉 Sova/Owl - Rate limiting / scanning detection
🐚 Školjka/Shell - WAF / authentication bypass
🐙 Hobotnica/Octopus - Brute force attacks
🐢 Kornjača/Turtle - Persistence mechanisms
🐜 Termit/Termite - File integrity monitoring with auto-restore
🪼 Meduza/Medusa - Log tampering detection
🐹 Krtica/Mole - Error-based SQL injection
and 6 more modules covering reconnaissance, lateral movement, and data exfiltration

Analyzes incidents using Llama 3.1 LLM (via Ollama) to generate:

Severity score (1-10)
Attack pattern classification
MITRE ATT&CK tactic mapping
Executive summary
Recommended actions

Responds with automated isolation:

IP blocking (simulated, ready for production)
Account lockdown
Host quarantine
Immutable file restoration from SHA-256 verified backups

Preserves evidence with:

SQLite forensic collector (SIFT-compatible)
Cryptographic hashing (SHA-256)
Append-only immutable logs

How we built it

Architecture Overview:

Technical Stack:

Language: Python 3.8+
Database: SQLite with WAL mode (concurrent access)
LLM: Ollama + Llama 3.1:8b (local inference, no cloud costs)
Forensics: SHA-256 hashing, automatic backup restoration
Concurrency: Threading with locks for rate limiting

Key Implementation Details:

Modular Detection System - Each module inherits from AnimalModule base class with detect() and respond() methods. Adding a new module takes ~15 minutes.
Rate Limiting - Sliding window counter (20 events/IP/second) prevents DoS against the detection engine itself.
LLM Integration with Fallback - Tries Ollama first (local Llama 3.1), falls back to rule-based analysis if unavailable. Uses JSON-formatted prompts for structured output.
Immutable Forensics - Every critical file has SHA-256 hash stored in DB. On tampering, system auto-restores from verified backup before an attacker can cover tracks.
SIFT Collector - All events are written to SQLite in JSON format, making it compatible with SANS SIFT Workstation forensic workflows.

Challenges we ran into

1. LLM Latency vs. Real-time Detection Running an 8B parameter model locally adds ~15 seconds to report generation. My solution: LLM is called ONLY after all events are processed (post-incident analysis), while detection remains under 50ms per event. The system responds first, analyzes second.

2. False Positive Management Modules like Krtica/Mole (error-based injection) triggered on legitimate 500 errors. Fix: Implemented rate limiting (requires multiple errors from same IP) + whitelisting capabilities for known good endpoints.

3. Immutable Forensics Without Performance Hits 💾 Hashing every file on every check would kill performance. Solution: Hash only on initial scan + store in DB. On subsequent checks, compare hashes - O(1) operation.

4. Windows vs. Linux Compatibility 🔄 SANS SIFT Workstation runs on Linux, but I developed on Windows. Solution: Abstraction layer for path handling (os.path.join everywhere) + conditional isolation logic (simulated on Windows, real iptables on Linux).

5. Ollama JSON Formatting 📋 Llama 3.1 doesn't always return valid JSON. Fix: Added regex fallback to extract JSON from response + full rule-based fallback when LLM fails.

Accomplishments that we're proud of

8 modules triggered in under 2 seconds - The system detected a multi-vector attack (brute force, persistence, file tampering, log tampering, SQL injection) in real-time.

Working LLM integration - Ollama with Llama 3.1 successfully generates severity scores (8/10), MITRE tactics (T1110, T1021, T1190), and actionable recommendations.

Immutable restoration - The Termit module detected file tampering and automatically restored the original from SHA-256 verified backup.

SIFT-compatible output - All events stored in SQLite with JSON format, ready for forensic analysis.

Machine speed detection - Individual event processing takes 10-20ms, before LLM analysis.

What we learned

1. Detection is easier than isolation Anyone can write a regex to find "failed password" in logs. The real challenge is safely blocking an attacker without locking out legitimate users. Context matters.

2. LLMs need guardrails Llama 3.1 sometimes hallucinates MITRE IDs or recommends impossible actions. A rule-based fallback isn't optional - it's essential for production.

3. Rate limiting is non-negotiable Without it, an attacker could flood the detection engine with events, causing DoS. Built-in protection must come before detection.

4. Modular design saves lives When I needed to add a new detection module, it took 15 minutes because the base class was solid. Plan for change from day one.

5. Simulated vs. real isolation For a hackathon demo, simulated isolation (logging to a file) is acceptable. But for production, needed actual firewall rules (iptables/netsh) and careful rollback procedures.

What's next for AIPolygraph

Short-term (next week):

Real Windows Firewall integration (replace simulation with actual netsh commands)
TheHive API integration (create cases automatically)
MISP feed import (block known malicious IPs)

Medium-term (next month):

Network traffic analysis module (PCAP ingestion with Scapy)
Docker container for easy deployment
Web dashboard (React + FastAPI) for real-time monitoring

Long-term (vision):

☁️ Cloud deployment (AWS Lambda for detection + S3 for immutable logs)
☁️ Multi-tenant support for MSSPs
☁️ Integration with CrowdStrike Falcon API for real endpoint isolation

The ultimate goal: A free, open-source autonomous IR agent that any SOC can deploy in under 30 minutes.

Built With

mitre
ollama
python
sha-256
sqlite

Updates

metal alchemistspex started this project — Jun 15, 2026 11:28 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.