SentinelHive

Inspiration

Every 11 seconds, a new organization falls victim to ransomware. When a suspicious file lands on an analyst's desk, they spend hours manually deobfuscating code, checking threat intel feeds, and writing detection rules, all while the clock ticks. Existing tools like VirusTotal give a pass/fail verdict from signature engines but can't explain what the malware does or how to stop it. We wanted to build what a full SOC team would do, but autonomous, adversarial, and fast.

What it does

SentinelHive is an autonomous malware analysis platform. Upload any suspicious file (JS, EXE, DLL, PS1, VBS, PDF, or password-protected archives) and a swarm of six specialized AI agents analyzes it inside an isolated Docker sandbox:

Triage Agent — fast classification in seconds
Reverse Engineer — multi-layer deobfuscation and static analysis
Behavioral Analyst — runtime behavior and system call tracing
Threat Intel Agent — IOC enrichment from 8+ sources (VirusTotal, MalwareBazaar, AbuseIPDB, Shodan, etc.)
Detection Engineer — generates YARA, Sigma, and KQL rules
Remediation Strategist — step-by-step containment runbook with MITRE ATT&CK mapping

A 7th agent, the Hive Director, orchestrates everything and runs The Gauntlet — an adversarial peer review where agents cross-verify each other's claims before anything reaches the final report.

Key features:

Real-time WebSocket feed showing agents working live
Analyst Checkpoints — rewind the analysis to any phase with new guidance (human-in-the-loop)
Advanced RAG Knowledge Base — hybrid retrieval (semantic + BM25 + Reciprocal Rank Fusion) so agents learn from past analyses
Infection chain reconstruction as a structured graph
Living-off-the-land (LOLBAS) detection focus
Light/dark/system theme, fully mobile-responsive UI
Self-hosted with a single ./setup.sh dev command

How we built it

Frontend: Next.js 14 (App Router), Tailwind CSS, Lucide icons, WebSocket for real-time updates
Backend: FastAPI (Python), 7-phase async pipeline with checkpoint/resume, Celery workers
AI: Claude Opus 4.6 (heavy reasoning) and Claude Sonnet 4.6 (lightweight tasks) via ElectronHub API, with GPT-4.1 fallback. 70+ structured tools available to agents.
Sandbox: Docker containers with network isolation, dropped capabilities, memory caps, read-only filesystem
Data: MongoDB (case storage), ChromaDB (vector knowledge base with sentence-transformers embeddings), Redis (job queue)
Deployment: Cloudflare Tunnel to sentinelhive.dev, self-hosted on a dedicated VPS

Challenges we ran into

Adversarial consensus timing — getting parallel agents to halt within seconds when a checkpoint rewind is triggered required propagating interrupt signals into active LLM tool loops and handling race conditions where agents save results between the DB clear and the pipeline halt.
MongoDB SRV resolution on VPS — the production server couldn't resolve Atlas SRV records, forcing us to self-host MongoDB via Docker.
False positive reduction — single-agent analysis hallucinates freely. The Gauntlet debate protocol was essential — agents must present evidence for contested findings, and the Director weighs confidence scores before accepting.
Deobfuscation depth — modern malware like Qbot uses multiple obfuscation layers (string encoding, eval packing, control flow flattening). We integrated PyJSClear for multi-pass JS deobfuscation and built custom deobfuscation prompts.

Accomplishments that we're proud of

The adversarial Gauntlet dramatically reduces false positives compared to single-agent analysis
Analyst Checkpoints give human experts real control over autonomous AI — rewind, steer, and iterate
The RAG knowledge base means the system gets smarter with every analysis, not just the current one
The entire platform deploys with one command and runs fully self-hosted — no cloud AI lock-in beyond the LLM API

What we learned

Multi-agent debate is more reliable than single-agent confidence scores
Human-in-the-loop isn't just a checkbox — real analysts need to steer, not just approve
Hybrid retrieval (semantic + keyword + rank fusion) dramatically outperforms pure vector search for cybersecurity data where exact IOC matches matter as much as semantic similarity

What's next for SentinelHive

Community threat sharing — anonymized IOC and YARA rule sharing across deployments
Predictive analysis — "given this loader pattern, what's the likely next-stage payload?"
Natural language analyst interface — conversational querying of the case database
Automated red team scenarios — generate tabletop exercises from confirmed threat reports

Built With

celery
chromadb
claude-api
cloudflare-tunnel
docker
electronhub
fastapi
mongodb
next.js
python
react
redis
sentence-transformers
sigma
tailwind-css
websocket
yara

Updates

Dinh Gia Bao Ngo started this project — Mar 29, 2026 10:03 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.