Inspiration

The inspiration for seepod came from a gap I noticed in Kubernetes network observability tools. While solutions like Hubble offer deep insights into network traffic, they are tightly coupled with Cilium’s datapath, making them less flexible in environments using different CNIs. I wanted to explore if I could build an independent eBPF-based network observability layer that doesn't rely on any specific CNI, and could work seamlessly across various Kubernetes setups. The aim was to provide the same level of visibility without locking users into a particular networking stack. Also apart from these my aim was to learn more about container networking and security through eBPF.

What it does

seepod is a real-time network traffic monitoring and visualization tool for Kubernetes clusters. It captures network flows directly from the kernel using eBPF, enriches these flows with Kubernetes metadata (like pod and namespace information), and streams them to a frontend UI via WebSocket for live visualization. This tool offers bidirectional tracking of network traffic, distinguishing between internal cluster traffic and external traffic, and provides a real-time view of network connections, helping users understand traffic patterns between pods, services, and external endpoints.

How I built it

The project is composed of three core components:

  1. eBPF Programs (seepod-ebpf) These programs are responsible for attaching to cgroup socket hooks in the kernel to capture ingress and egress network traffic. They track network flows in real time, gathering data such as IP addresses, ports, protocols, and packet counts.

  2. Userspace Agent (seepod) This Rust-based application loads and manages the eBPF programs, processes the captured flow events, enriches them with Kubernetes metadata (e.g., mapping cgroup IDs to pods), and exposes the events to the frontend via a WebSocket server.

  3. React Frontend The UI connects to the WebSocket endpoint to receive real-time flow events and visualizes them in a network topology graph, updating live as network traffic flows in and out of pods.

Challenges I ran into

  1. Handling Cgroup ID Instability Kubernetes pods are ephemeral, and their associated cgroup IDs can change when pods restart. Ensuring consistent mapping between cgroup IDs and pods required implementing caching and sync mechanisms, as well as fallback logic to handle pod restarts and scaling.

  2. Kernel-level Complexity Accurately capturing network traffic and handling complex scenarios like NATed packets, IPv6, and socket states presented challenges. Making sure we didn't double-count packets or miss important details required fine-tuning of eBPF programs and perf buffer handling.

  3. Frontend Performance Real-time visualization of large amounts of flow data in the frontend was challenging. Optimizing performance required techniques like debouncing updates, reducing unnecessary re-renders, and ensuring the UI remained responsive even with high-frequency event streams. But I think the performance of the current stage might not be as good as it should be, we can continuously work on improving it.

Accomplishments that I'm proud of

  1. Building an Independent eBPF-based Tool One of the proudest accomplishments is that seepod works completely independently of any CNI (like Cilium or Calico), offering full network observability with minimal overhead using eBPF. The ability to enrich raw network data with Kubernetes context is a powerful feature that adds significant value without relying on any specific networking setup.

  2. Real-time Visualization and Service Maps The real-time streaming of network events to a frontend UI is a major milestone. The ability to view traffic flows between pods and visualize network topology live is a great tool for operators and developers who need visibility into cluster communication. Also displaying them as service maps makes it easy to understand the traffic flow.

What I learned

  1. Deep eBPF Knowledge I gained a deeper understanding of eBPF's potential, from handling packet filtering at the kernel level to understanding its interaction with user-space programs. It was a great opportunity to dive into kernel programming and learn how to use eBPF in production-level applications.

  2. Kubernetes Internals Mapping cgroup IDs to Kubernetes pods was more challenging than I initially thought. The experience taught me how deeply Kubernetes integrates with the underlying OS and how dynamic the cluster environment can be, requiring thoughtful design to ensure resilience and data consistency.

  3. Rust for System-Level Programming Rust has proven to be an excellent choice for system-level programming. Its safety guarantees and powerful concurrency model made it ideal for building the performance-sensitive agent that handles eBPF data, and for handling async operations with tokio in the backend.

  4. Real-time Event Streaming Building the real-time WebSocket streaming for network events was an interesting learning experience. It highlighted the importance of efficient data flow, and I gained valuable insight into how to handle high-frequency event data and display it in a way that remains performant and easy to understand.

What's next for seepod

  1. Feature Expansion There’s still a lot to add! The next step is to expand seepod with additional features, such as:
  2. DNS tracing for deeper flow inspection
  3. Layer 7 protocol awareness (HTTP, gRPC, etc.)
  4. Improved topology visualization with more interactive and detailed UI elements
  5. Cluster-wide deployment with Helm charts for easy installation in production environments

  6. Performance Optimizations I want to continue fine-tuning the performance, especially for high-traffic clusters. This includes improving the way we handle large amounts of flow data in real-time and minimizing any potential bottlenecks in the pipeline.

  7. Extensibility In the long run, I’d like to turn seepod into a more extensible platform, potentially allowing for plug-in integrations for additional observability features. This would make it more adaptable to different cluster architectures and use cases.

Built With

Share this project:

Updates