Inspiration

The inspiration for this project stemmed from a Master's research initiative focused on system auditing. Initially, I deployed Elastic Auditbeat to track system integrity and perform provenance analysis. However, the performance overhead was significant, the configuration was rigid, and the resulting logs were a "firehose" of disconnected events that made root cause analysis nearly impossible.

I pivoted to eBPF (Extended Berkeley Packet Filter) for its low-overhead observability capabilities. My first prototype was a Python-based BCC implementation. While functional, it suffered from high userspace CPU usage during event bursts due to the sheer volume of data crossing the kernel-user boundary.

For this hackathon, I seized the opportunity to re-architect the core from first principles. I rewrote the collector entirely in C (Kernel space) and Golang (User space) leveraging CO-RE (Compile Once, Run Everywhere) principles. This architectural shift allowed me to implement strict kernel-side filtering logic—discarding irrelevant events at the source before they incur the cost of a userspace context switch.

What it does

eBPF Provenance Monitor is a full-stack forensic observability suite that transforms raw kernel syscalls into actionable attack narratives. It leverages custom eBPF probes to capture high-fidelity telemetry directly from the kernel (execve, connect, openat) with minimal overhead. Unlike traditional linear logging, it reconstructs a causal graph of system activity, linking processes to their file modifications and network connections. The system features a custom Bursty Event Elimination for Provenance (BEEP) algorithm to filter out ~90% of machine-generated noise and an integrated Ollama AI agent that explains these attack chains in plain English.

How I built it

I architected a modular, high-performance pipeline designed for production deployment. In the kernel, I wrote C-based eBPF programs using libbpf and tracepoints for maximum stability. The userspace collector, rewritten in Golang with the cilium/ebpf library, efficiently reads the perf ring buffer and streams enriched events to Elasticsearch. The core analysis engine is Python-based, utilizing NetworkX to implement the BEEP algorithm for graph compression. Finally, the frontend uses Streamlit and VisJS for dynamic visualization, with a custom API wrapper integrating a local Ollama LLM for automated threat analysis. To ensure easy deployment, I packaged the entire solution as a standard Debian (.deb) package.

Challenges I ran into

The most significant hurdle was the "event explosion," where benign background processes generated thousands of redundant edges, rendering the provenance graph unreadable. This necessitated the development of the BEEP algorithm to intelligently compress "bursty" events without data loss. Additionally, navigating the strict constraints of the Linux kernel verifier—specifically regarding loop unrolling and stack limits while parsing complex command-line arguments in C—proved technically demanding. Orchestrating a low-latency pipeline from Go to Python via Elasticsearch also required rigorous tuning to prevent indexing lag.

Accomplishments that I'm proud of

I am proud of successfully migrating the entire prototype from a legacy Python implementation to a robust, CO-RE compliant C/Go architecture, significantly boosting performance. Developing the BEEP algorithm was a major breakthrough, allowing me to reduce graph noise by 90% while preserving the forensic integrity of the attack signal. Achieving end-to-end latency of under 2 seconds and integrating a genuinely useful AI context layer were key milestones in making this a practical security tool.

What I learned

I gained a deep understanding of writing eBPF programs in C for kernel-level system call tracing and integrating them with Go using cilium/ebpf. The main challenge was implementing the BEEP algorithm to filter out noisy repetitive events while preserving actual attack patterns in provenance graphs. This involved clustering related events and using time windows to detect bursts of activity that could obscure meaningful security incidents.

What's next for eBPF-Provenance-Analysis

Future work includes integrating Prometheus for real-time monitoring and alerting on security events. I plan to improve noise filtering with better pattern recognition and integrate MITRE ATT&CK mappings for automated threat classification. Performance optimizations will also be needed to handle high-volume production environments efficiently.

Built With

Share this project:

Updates