π» GhostTrace β Adversarial Multi-Agent Incident Response
Inspiration
Modern AI-powered cybersecurity tools often rely on a single model generating a single narrative. While powerful, these systems can confidently produce inaccurate conclusions, leading to costly mistakes during incident response.
We drew inspiration from systems where truth emerges through challenge and scrutinyβcourts, scientific peer review, and red team vs. blue team exercises. Instead of trusting one AI agent, we asked a simple question:
What if an AI-generated security claim had to survive cross-examination?
That question became GhostTrace.
What it does
GhostTrace is an adversarial multi-agent debate platform for cybersecurity incident response.
It analyzes forensic evidence using three specialized AI agents:
- π΄ Attacker Agent builds the most plausible attack narrative.
- π΅ Skeptic Agent challenges every claim using the same evidence.
- βοΈ Arbiter Agent evaluates the debate and produces a final incident report.
Rather than trusting confidence scores generated by a single model, GhostTrace derives confidence from debate outcomes.
Confidence Calculation
C_overall = Ξ£(wi) / |F|
Where:
SUSTAINED = 100
NEEDS_MORE_EVIDENCE = 50
ALTERNATIVE_EXPLANATION = 10
OVERRULED = 0
The result is a confidence-scored incident response report grounded in evidence and adversarial verification.
How we built it
Backend
- FastAPI for APIs and Server-Sent Events (SSE)
- LangGraph for multi-agent orchestration
- Pydantic for schema validation
- Groq (Llama 3.3 70B) as the primary LLM provider
- Robust JSON parsing and validation pipeline
Architecture
Evidence Bundle
β
βΌ
Attacker Agent
β
βΌ
Skeptic Agent
β
βΌ
Arbiter Agent
β
βΌ
Final IR Report
Frontend
- React 18
- Vite
- Tailwind CSS v4
- Real-time SSE streaming
- Live debate visualization
- Terminal-style forensic interface
The debate unfolds live, allowing analysts to watch agents challenge and defend claims in real time.
Challenges we ran into
Reliable Structured Output
LLMs frequently return malformed JSON, markdown wrappers, or incomplete objects. We built a multi-stage parsing and validation system to keep the pipeline stable.
Agent Role Contamination
Early versions of the Skeptic Agent started generating its own attack theories instead of challenging existing claims. Prompt engineering and stricter constraints were required to preserve agent responsibilities.
Free-Tier Token Limits
Multi-agent debates consume significantly more tokens than single-agent workflows. We optimized prompts and implemented provider abstraction for easy model switching.
Real-Time Debate Experience
Synchronizing SSE streams and maintaining a coherent debate timeline while keeping the UI responsive required careful state management and event orchestration.
Accomplishments that we're proud of
- Built a fully functional adversarial AI debate framework for cybersecurity incident response.
- Reduced blind trust in AI-generated forensic reports.
- Created confidence scoring based on evidence survival rather than model self-assessment.
- Delivered a real-time debate interface that makes AI reasoning transparent.
- Designed a modular architecture that supports future agent expansion and model replacement.
- Demonstrated how adversarial verification can reduce hallucination risk in high-stakes security workflows.
What we learned
Hallucinations are Architecturally Fragile
Claims unsupported by evidence often collapse when examined by an independent agent with access to the same data.
P(hallucination survives | cross-examination)
<<
P(hallucination survives | single agent)
Confidence Should Be Earned
A model claiming 95% confidence does not guarantee correctness. Confidence becomes more meaningful when it reflects agreement after adversarial verification.
State Design Matters
Building multi-agent systems is less about prompts and more about managing shared state, information flow, and agent boundaries.
Simplicity Wins
For one-way real-time updates, Server-Sent Events proved simpler and more reliable than WebSockets.
What's next for GhostTrace
π Live SIEM Integration
Connect GhostTrace directly to enterprise security platforms such as Splunk, Microsoft Sentinel, and Elastic Security.
π Multi-Skeptic Verification
Introduce multiple Skeptic agents with different investigative strategies to further stress-test claims.
π Evidence Grounding
Integrate Retrieval-Augmented Generation (RAG) with threat intelligence feeds, MITRE ATT&CK mappings, and security knowledge bases.
π Human-in-the-Loop Review
Allow analysts to participate in debates, challenge agents, and provide additional evidence before final report generation.
π Autonomous SOC Workflows
Enable GhostTrace to recommend containment actions, prioritize incidents, and assist analysts throughout the investigation lifecycle.
Final Thought
Truth in cybersecurity should be earned through challenge, not assumed through confidence.
GhostTrace transforms AI reasoning from a black-box answer into a transparent forensic debate. By forcing every claim to survive adversarial scrutiny, it produces incident response reports that are more explainable, more trustworthy, and ultimately more useful for security teams.
Built With
- css
- fastapi
- javascript
- langgraph
- pydantic
- python
- react
Log in or sign up for Devpost to join the conversation.