Inspiration
Every 11 seconds, a new organization falls victim to ransomware. When a suspicious file lands on an analyst's desk, they spend hours manually deobfuscating code, checking threat intel feeds, and writing detection rules, all while the clock ticks. Existing tools like VirusTotal give a pass/fail verdict from signature engines but can't explain what the malware does or how to stop it. We wanted to build what a full SOC team would do, but autonomous, adversarial, and fast.
What it does
SentinelHive is an autonomous malware analysis platform. Upload any suspicious file (JS, EXE, DLL, PS1, VBS, PDF, or password-protected archives) and a swarm of six specialized AI agents analyzes it inside an isolated Docker sandbox:
- Triage Agent — fast classification in seconds
- Reverse Engineer — multi-layer deobfuscation and static analysis
- Behavioral Analyst — runtime behavior and system call tracing
- Threat Intel Agent — IOC enrichment from 8+ sources (VirusTotal, MalwareBazaar, AbuseIPDB, Shodan, etc.)
- Detection Engineer — generates YARA, Sigma, and KQL rules
- Remediation Strategist — step-by-step containment runbook with MITRE ATT&CK mapping
A 7th agent, the Hive Director, orchestrates everything and runs The Gauntlet — an adversarial peer review where agents cross-verify each other's claims before anything reaches the final report.
Key features:
- Real-time WebSocket feed showing agents working live
- Analyst Checkpoints — rewind the analysis to any phase with new guidance (human-in-the-loop)
- Advanced RAG Knowledge Base — hybrid retrieval (semantic + BM25 + Reciprocal Rank Fusion) so agents learn from past analyses
- Infection chain reconstruction as a structured graph
- Living-off-the-land (LOLBAS) detection focus
- Light/dark/system theme, fully mobile-responsive UI
- Self-hosted with a single
./setup.sh devcommand
How we built it
- Frontend: Next.js 14 (App Router), Tailwind CSS, Lucide icons, WebSocket for real-time updates
- Backend: FastAPI (Python), 7-phase async pipeline with checkpoint/resume, Celery workers
- AI: Claude Opus 4.6 (heavy reasoning) and Claude Sonnet 4.6 (lightweight tasks) via ElectronHub API, with GPT-4.1 fallback. 70+ structured tools available to agents.
- Sandbox: Docker containers with network isolation, dropped capabilities, memory caps, read-only filesystem
- Data: MongoDB (case storage), ChromaDB (vector knowledge base with sentence-transformers embeddings), Redis (job queue)
- Deployment: Cloudflare Tunnel to sentinelhive.dev, self-hosted on a dedicated VPS
Challenges we ran into
- Adversarial consensus timing — getting parallel agents to halt within seconds when a checkpoint rewind is triggered required propagating interrupt signals into active LLM tool loops and handling race conditions where agents save results between the DB clear and the pipeline halt.
- MongoDB SRV resolution on VPS — the production server couldn't resolve Atlas SRV records, forcing us to self-host MongoDB via Docker.
- False positive reduction — single-agent analysis hallucinates freely. The Gauntlet debate protocol was essential — agents must present evidence for contested findings, and the Director weighs confidence scores before accepting.
- Deobfuscation depth — modern malware like Qbot uses multiple obfuscation layers (string encoding, eval packing, control flow flattening). We integrated PyJSClear for multi-pass JS deobfuscation and built custom deobfuscation prompts.
Accomplishments that we're proud of
- The adversarial Gauntlet dramatically reduces false positives compared to single-agent analysis
- Analyst Checkpoints give human experts real control over autonomous AI — rewind, steer, and iterate
- The RAG knowledge base means the system gets smarter with every analysis, not just the current one
- The entire platform deploys with one command and runs fully self-hosted — no cloud AI lock-in beyond the LLM API
What we learned
- Multi-agent debate is more reliable than single-agent confidence scores
- Human-in-the-loop isn't just a checkbox — real analysts need to steer, not just approve
- Hybrid retrieval (semantic + keyword + rank fusion) dramatically outperforms pure vector search for cybersecurity data where exact IOC matches matter as much as semantic similarity
What's next for SentinelHive
- Community threat sharing — anonymized IOC and YARA rule sharing across deployments
- Predictive analysis — "given this loader pattern, what's the likely next-stage payload?"
- Natural language analyst interface — conversational querying of the case database
- Automated red team scenarios — generate tabletop exercises from confirmed threat reports
Log in or sign up for Devpost to join the conversation.