Inspiration
Security operations centres are overwhelmed. Analysts face thousands of alerts daily, yet existing tools only flag anomalies, they never explain why something is suspicious or what would need to change for it to be normal. We watched SOC analysts spend hours manually correlating logs, comparing packet captures, and trying to articulate what makes a flow anomalous. That repetitive, error-prone work felt like a problem AI should solve. We wanted to build a system that doesn't just detect threats but autonomously investigates them, delivers root-cause explanations, and tells analysts exactly what to do in real time.
What it does
IncidentLens is an autonomous AI-powered network incident investigation engine. It ingests raw network packets, constructs temporal graphs where IPs are nodes and flows are edges, runs Graph Neural Network inference to score anomalies, and triggers an LLM-driven reasoning agent that investigates flagged windows using diagnostic tools. For every anomaly, it finds the nearest normal flow via kNN vector search and produces counterfactual explanations, telling analysts exactly which features (packet count, byte volume, inter-arrival time) would need to change, and by how much, to flip the classification. When high-risk anomalies are detected, it sends automated email alerts via SMTP. The React frontend provides a guided 4-step investigation wizard with real-time WebSocket streaming of every reasoning step.
How we built it
The backend is Python 3.12 with FastAPI, PyTorch Geometric for graph neural networks, and Elasticsearch 8.12 as the search and analytics engine. We built a vectorised sliding-window graph constructor using NumPy that converts raw packets into PyG Data objects. The GNN architecture uses EvolveGCN-O, an LSTM that evolves GNN weight matrices across temporal snapshots, with a Neural ODE variant for continuous-time weight evolution. Elasticsearch handles flow indexing, kNN embedding search (cosine similarity), runtime severity fields, and feature distribution aggregations. The LLM agent uses OpenAI's tool-calling API with a controlled set of diagnostic tools. The frontend is React 19, Vite 6, Tailwind v4, and shadcn/ui components. Everything is containerised with Docker Compose.
Challenges we ran into
Elasticsearch client version mismatches caused cryptic 400 errors — the v9 Python client sent incompatible headers to our v8.12 server. Windows console encoding (cp1252) crashed on Unicode characters in log output. Stale Python bytecode caches repeatedly caused import errors where classes existed in source but Python loaded empty modules. NaN IP values in the dataset produced -1 codes during factorisation, breaking graph construction. Getting the real-time streaming pipeline, GNN inference, and LLM agent to work together asynchronously required careful error handling at every layer.
Accomplishments that we're proud of
We built a fully autonomous investigation loop, from raw packets to human-readable counterfactual explanations, with no manual intervention. The system was validated on 4M+ real SSDP flood attack packets from the Kitsune dataset. Our dual GNN architecture (LSTM and Neural ODE variants) captures both structural and temporal attack patterns. The counterfactual engine doesn't just say "malicious", it says "reduce packet_count from 450 to 9 and this flow becomes normal."
What we learned
Graph-based representations reveal structural attack patterns invisible to flat feature analysis. Elasticsearch's runtime fields and kNN search are powerful building blocks for explainable AI. Real-time streaming architectures require robust error isolation, one failed window cannot crash the entire pipeline. And stale bytecode caches are a silent killer during rapid development.
What's next for IncidentLens
Training the temporal GNN on larger, more diverse datasets to improve anomaly scoring accuracy. Adding multi-dataset support beyond SSDP floods. Integrating with SIEM platforms like Splunk and Microsoft Sentinel. Building a feedback loop where analyst decisions improve the model over time. And expanding the LLM agent's toolset to include automated containment actions, not just investigation, but response.
Built With
- elasticsearch
- fast
- kitsunenetworkattack
- python
- typeform
Log in or sign up for Devpost to join the conversation.