How eBPF Works: Programmable Networking in the Linux Kernel
eBPF (extended Berkeley Packet Filter) is a technology that lets you run sandboxed programs inside the Linux kernel without changing kernel source code or loading kernel modules. Originally designed for packet filtering, eBPF has evolved into a general-purpose in-kernel virtual machine that powers a new generation of networking, observability, and security tools. If you have used Cilium for Kubernetes networking, bpftrace for tracing, or Cloudflare's DDoS mitigation on traffic headed toward networks like AS13335, you have used eBPF.
What makes eBPF remarkable is its position: it runs at the lowest level of the operating system where it can inspect and manipulate every packet, system call, and kernel function — yet it does so safely, with guarantees that a buggy program cannot crash the kernel. This combination of power and safety has made eBPF the most significant innovation in Linux systems programming in the past decade.
From BPF to eBPF: A Brief History
The original Berkeley Packet Filter (BPF) was created in 1992 by Steven McCanne and Van Jacobson at Lawrence Berkeley National Laboratory. Classic BPF was a simple virtual machine with two 32-bit registers, a scratch memory store, and an instruction set designed for one purpose: filtering packets efficiently. When you ran tcpdump -i eth0 port 80, the filter expression was compiled into BPF bytecode and loaded into the kernel, where it decided which packets to capture without copying every packet to userspace.
Classic BPF was elegant but limited. It could only filter packets, had minimal state, and its instruction set was deliberately constrained. For two decades, it served its purpose well but remained a niche subsystem.
In 2014, Alexei Starovoitov introduced extended BPF (eBPF) into the Linux kernel (version 3.18). The "extended" part was an understatement. eBPF expanded the register set to eleven 64-bit registers, added a larger instruction set closely mapping to modern CPU architectures, introduced maps for persistent key-value storage, and — crucially — allowed programs to attach to many different kernel hook points beyond packet filtering. The BPF virtual machine became a general-purpose execution engine embedded in the kernel.
By 2016, eBPF programs could attach to tracepoints, kprobes, and perf events. By 2018, BPF Type Format (BTF) and CO-RE enabled portable programs across kernel versions. Today, eBPF is used so pervasively that the term "BPF" almost always refers to the extended version.
eBPF Architecture
Understanding eBPF requires understanding how its components fit together: programs written in C or Rust are compiled to BPF bytecode, verified for safety, JIT-compiled to native machine code, and attached to kernel hook points where they execute in response to events.
eBPF Program Types
eBPF programs are not generic — each program has a specific type that determines where in the kernel it can attach, what context it receives, and what helper functions it can call. The kernel currently defines over 30 program types. The most important for networking are:
XDP (eXpress Data Path)
XDP programs run at the earliest possible point in the network receive path — before the kernel allocates an sk_buff (socket buffer) structure, before the network stack processes the packet, and in some cases before the packet even leaves the NIC's DMA ring buffer. This makes XDP extraordinarily fast for packet processing. An XDP program receives the raw packet data and returns a verdict:
- XDP_PASS — let the packet continue to the normal network stack
- XDP_DROP — drop the packet immediately (used for DDoS mitigation and firewalling)
- XDP_TX — bounce the packet back out the same interface (used for load balancing)
- XDP_REDIRECT — forward the packet to a different interface or CPU
- XDP_ABORTED — drop the packet and signal an error
XDP can operate in three modes: native mode (the NIC driver calls the XDP program directly, the fastest), offloaded mode (the program runs on the NIC hardware itself, the fastest possible but with limited program complexity), and generic mode (runs later in the stack, works with any NIC but is slower).
TC (Traffic Control)
TC programs attach to the kernel's traffic control subsystem and run on both ingress and egress paths. Unlike XDP, TC programs see the full sk_buff structure, giving them access to parsed protocol headers, routing decisions, and socket associations. TC is the right hook for tasks that need more context than raw packet bytes — rewriting headers, applying policy based on connection tracking state, or implementing container networking overlays.
Socket Filter and Socket Operations
Socket-level programs attach to individual sockets or cgroups of sockets. BPF_PROG_TYPE_SOCKET_FILTER programs can inspect (but not modify) packets on a socket — a direct descendant of classic BPF's packet filtering. BPF_PROG_TYPE_SOCK_OPS programs can intercept socket operations like TCP connection establishment and modify TCP parameters. BPF_PROG_TYPE_SK_MSG programs can redirect messages between sockets, enabling kernel-level proxying without copying data through userspace.
Tracing Programs
Beyond networking, eBPF excels at system observability. kprobe programs attach to any kernel function entry/exit, tracepoint programs attach to stable kernel tracepoints, fentry/fexit programs (added in Linux 5.5) provide faster function tracing with direct access to function arguments and return values, and perf_event programs run on hardware performance counter overflows or software events.
LSM (Linux Security Module) Programs
Added in Linux 5.7, LSM programs attach to the kernel's security hooks — the same hooks used by SELinux and AppArmor. This allows eBPF to implement custom mandatory access control policies at runtime, without compiling them into the kernel.
The Verifier: How eBPF Guarantees Safety
The eBPF verifier is the critical piece that makes the entire system work. Every eBPF program must pass through the verifier before it can execute, and the verifier rejects any program it cannot prove safe. This is not a runtime sandbox — the verifier performs static analysis at load time, examining every possible execution path through the program.
The verifier's checks include:
- Control flow analysis — The verifier builds a directed acyclic graph (DAG) of the program. Before Linux 5.3, all loops were forbidden. Since 5.3, bounded loops are allowed: the verifier must be able to prove that any loop will terminate within a fixed number of iterations.
- Register state tracking — The verifier tracks the type and range of every register at every point in every execution path. It knows whether a register holds a pointer to a map value, a packet pointer, a stack pointer, a scalar, or is uninitialized. Dereferencing an uninitialized or null pointer is rejected.
- Memory bounds checking — Every memory access is checked. If your program reads from a packet pointer at offset N, the verifier ensures N is within the packet bounds. If you read from a map value, it ensures the offset is within the value size. Out-of-bounds access is rejected at load time, not at runtime.
- Helper function validation — Each program type can only call specific helper functions. An XDP program cannot call helpers meant for tracing programs. The verifier checks argument types for every helper call.
- Instruction count limit — Programs have a maximum instruction limit (1 million verified instructions as of Linux 5.2, up from 4,096 in early versions). This bounds verification time and ensures programs terminate.
The verifier is conservative: it rejects programs it cannot prove safe, even if they are actually safe. Experienced eBPF developers learn to write "verifier-friendly" code — restructuring programs so the verifier can track register state through all branches. This is the price of the safety guarantee.
JIT Compilation
After passing the verifier, eBPF bytecode is JIT-compiled to native machine code for the host CPU architecture. The JIT compiler exists for x86-64, ARM64, RISC-V, s390x, PowerPC, and other architectures. JIT compilation eliminates interpretation overhead, making eBPF programs run at near-native speed.
Modern kernels enable JIT by default (net.core.bpf_jit_enable = 1). The JIT compiler also applies security hardening: it randomizes the memory location of JIT'd code (JIT spraying mitigation), and can use constant blinding to prevent attackers from inserting gadgets via eBPF constants.
BPF Maps: Shared State
eBPF programs by themselves are stateless — they receive a context, process it, and return a verdict. BPF maps provide persistent state that survives across program invocations and can be shared between eBPF programs and user-space applications. Maps are the mechanism for extracting data (metrics, events, logs) from kernel-space programs and for injecting configuration (rules, policies) from user space.
The kernel provides many map types, each optimized for different access patterns:
- Hash maps (
BPF_MAP_TYPE_HASH) — General-purpose key-value storage. Used for connection tracking tables, flow caches, and configuration lookups. O(1) average lookup time. - Array maps (
BPF_MAP_TYPE_ARRAY) — Fixed-size arrays indexed by integer keys. Used for per-CPU counters, lookup tables, and program arrays (for tail calls). O(1) guaranteed. - Ring buffers (
BPF_MAP_TYPE_RINGBUF) — Lock-free, multi-producer single-consumer buffer for streaming events from kernel to user space. Replaced the older perf buffer for most use cases due to better performance and memory efficiency. - LPM trie (
BPF_MAP_TYPE_LPM_TRIE) — Longest prefix match trie, designed specifically for IP routing and ACL lookups. Given an IP address, it returns the most specific matching prefix — the same operation that routers perform when consulting their BGP routing table for a prefix lookup. - Per-CPU maps — Hash and array variants where each CPU core has its own copy of the data, eliminating lock contention for counters and statistics.
- Stack and queue maps — LIFO and FIFO data structures for ordered processing.
- Sockmap and sockhash — Maps of socket references, enabling eBPF programs to redirect traffic between sockets at the kernel level.
Maps are created from user space via the bpf() system call and referenced from eBPF programs via file descriptors. They persist as long as they are referenced by a program or pinned in the BPF filesystem (/sys/fs/bpf/).
Tail Calls and BPF-to-BPF Function Calls
Complex eBPF applications often need more logic than fits in a single program. Two mechanisms address this:
Tail calls allow one eBPF program to chain into another, transferring control without returning. The call stack is not extended — the current program is replaced by the next one. This is used to break large programs into stages (e.g., a packet processing pipeline where the first program classifies the protocol and tail-calls to a protocol-specific handler). Tail calls use a special map type (BPF_MAP_TYPE_PROG_ARRAY) that stores references to other programs.
BPF-to-BPF function calls (added in Linux 4.16) allow normal function call semantics within a single eBPF program. Unlike tail calls, these use a proper call stack and the called function returns to the caller. The verifier follows calls to verify the entire call graph.
CO-RE: Compile Once, Run Everywhere
A persistent challenge with eBPF is that kernel data structures change between versions. A program that accesses struct task_struct compiled on Linux 5.10 might not work on Linux 5.15 if the offsets of the fields it reads have changed. Traditionally, eBPF programs needed to be compiled on the target machine with local kernel headers.
CO-RE (Compile Once, Run Everywhere) solves this through three components working together:
- BTF (BPF Type Format) — The kernel embeds a compact description of all its data structures in a format called BTF. This is available at
/sys/kernel/btf/vmlinuxand describes every struct, enum, and typedef in the running kernel. - Clang relocations — When compiling an eBPF program, Clang records which struct fields the program accesses as relocatable references rather than hard-coded offsets.
- libbpf relocations — When loading the program, libbpf reads the target kernel's BTF, resolves the field references, and patches the program's bytecode with the correct offsets for the running kernel.
CO-RE means you can compile an eBPF program once and distribute the resulting binary to machines running different kernel versions. This was a turning point for eBPF tooling adoption, making it practical to ship pre-compiled eBPF programs in packages and container images.
Tooling: bpftool and libbpf
bpftool is the standard command-line tool for interacting with eBPF in a running kernel. It can list loaded programs (bpftool prog list), inspect maps (bpftool map dump), show program bytecode and JIT'd assembly, attach and detach programs, and generate skeleton headers for libbpf-based development.
libbpf is the canonical C library for loading and managing eBPF programs. It handles ELF parsing, map creation, program loading, BTF processing, and CO-RE relocations. Modern eBPF development follows the "libbpf + skeleton" pattern: you write your BPF program in C, compile it with Clang, generate a skeleton header with bpftool, and write a user-space loader in C that uses the skeleton to load, attach, and interact with the program.
Alternative libraries exist for other languages: Aya (Rust), libbpfgo (Go), and libbpf-rs (Rust bindings to libbpf). Aya is notable because it avoids libbpf and LLVM entirely, letting you write both the eBPF program and the loader in pure Rust.
Networking Use Cases
eBPF's deepest impact has been in networking, where its ability to run custom logic at wire speed inside the kernel has replaced components that previously required kernel modules, hardware appliances, or slow userspace proxies.
XDP Load Balancing
XDP enables layer 4 load balancing at speeds that match or exceed dedicated hardware. An XDP program inspects incoming packets, rewrites destination addresses based on a hash of the flow tuple, and returns XDP_TX to send the packet back out the wire — all without the packet ever touching the kernel's network stack. Facebook's Katran is the most prominent example: it is an open-source XDP-based L4 load balancer that handles all production traffic for facebook.com, instagram.com, and other Meta services. Katran performs consistent hashing across backend pools, supports Direct Server Return (DSR), and handles millions of packets per second per core.
Cilium: Replacing kube-proxy and iptables
Cilium is a Kubernetes CNI (Container Network Interface) plugin that replaces the traditional kube-proxy and iptables-based container networking stack with eBPF programs. In a traditional Kubernetes cluster, every Service creates a chain of iptables rules that grows linearly with the number of services, eventually degrading performance. Cilium replaces all of this with eBPF maps and programs attached at TC and XDP hooks.
The result is O(1) service lookup instead of O(n) iptables chain traversal, native support for BGP for advertising service IPs (letting Kubernetes services be directly routable from the external network), identity-based network policy enforcement, and transparent encryption between pods via WireGuard or IPsec — all implemented in eBPF.
Cloudflare DDoS Mitigation
Cloudflare uses XDP extensively to mitigate DDoS attacks against the millions of websites it protects. When a volumetric attack targets a site behind Cloudflare's network (AS13335), XDP programs on every edge server classify and drop attack packets before they consume kernel resources. The programs are updated dynamically as new attack signatures are identified — new rules are pushed to BPF maps without reloading programs or interrupting service.
Cloudflare has publicly described processing over 30 million packets per second per server using XDP, dropping attack traffic at line rate while legitimate traffic passes through to the normal network stack.
Meta's Katran
Katran deserves special attention as a case study in what eBPF enables. Before Katran, Meta used IPVS (IP Virtual Server) in the kernel for load balancing. IPVS worked but was inflexible — changing the hashing algorithm or adding new features required modifying and recompiling a kernel module, then rolling it out across the fleet. Katran replaced this with an XDP program that runs in the NIC driver. The XDP program performs Maglev consistent hashing, encapsulates packets in IPIP or GUE tunnels for backend delivery, and handles health checking — all at speeds exceeding 10 million packets per second per core. When Meta needs to change the balancing algorithm, they update the XDP program without touching the kernel.
Observability: Seeing Inside the Kernel
eBPF has transformed Linux observability. Before eBPF, deep system introspection required kernel modules (risky), SystemTap (complex to deploy), or DTrace (not available on Linux). eBPF provides safe, production-ready kernel instrumentation.
bpftrace
bpftrace is a high-level tracing language for eBPF, inspired by DTrace and awk. It lets you write one-liners that instrument the kernel. For example, bpftrace -e 'kprobe:tcp_sendmsg { @bytes = hist(arg2); }' attaches to the kernel's TCP send function and builds a histogram of message sizes — across every TCP connection on the system, in real time, with negligible overhead.
bpftrace compiles these scripts to eBPF bytecode behind the scenes. For networking analysis, you can trace TCP retransmissions, DNS query latencies, connection establishment times, and packet drops — all from production systems without restarting services.
Pixie and Continuous Observability
Pixie (now part of New Relic) uses eBPF to automatically instrument Kubernetes applications without code changes or sidecars. eBPF programs attached to socket operations capture all network traffic in and out of pods, parse application-layer protocols (HTTP, gRPC, MySQL, PostgreSQL, Redis, Kafka), and generate request-level metrics and traces — all without any application instrumentation. This approach captures 100% of traffic because it operates at the kernel level, below the application.
Security: Runtime Enforcement
eBPF's ability to hook into security-relevant kernel paths has created a new category of security tools.
Falco
Falco (originally from Sysdig, now a CNCF project) uses eBPF to monitor system calls and detect anomalous behavior at runtime. It can detect container escapes, unexpected network connections, file access violations, privilege escalation attempts, and cryptomining processes — all by attaching eBPF programs to system call tracepoints and evaluating a rules engine against the captured events.
Tetragon
Tetragon (from Isovalent / Cilium) goes further by attaching eBPF programs to LSM hooks and kernel functions. It provides not just detection but enforcement: it can kill processes that violate security policies, block specific system calls, and enforce file integrity policies — all in kernel space, with no userspace component in the enforcement path. When Tetragon detects a prohibited action, the eBPF program returns an error code directly from the kernel hook, before the action completes.
eBPF vs Kernel Modules
eBPF occupies a space that was previously the exclusive domain of kernel modules. The comparison is instructive:
- Safety — A buggy kernel module can panic the kernel, corrupt memory, or create security vulnerabilities. An eBPF program is verified before execution and cannot crash the kernel. This is the fundamental difference.
- Portability — Kernel modules must be compiled for each kernel version (or use DKMS, which compiles on install). With CO-RE, eBPF programs can be compiled once and run across kernel versions.
- Loading — eBPF programs can be loaded and unloaded dynamically by unprivileged users (with appropriate capabilities). Kernel modules require root and trigger the module loader subsystem.
- Capabilities — Kernel modules have unrestricted access to all kernel internals. eBPF programs can only access data through defined contexts, helpers, and maps. This restricts what eBPF can do — some tasks still require kernel modules.
- Performance — Both run as native code in kernel context. For the tasks eBPF can perform, performance is comparable. Kernel modules can use optimizations (like kernel preemption control) that eBPF cannot.
- Ecosystem — The Linux kernel community is increasingly adding new features as eBPF hooks rather than as kernel modules. The direction of the ecosystem is clear: where eBPF can replace a kernel module, it should.
eBPF on Windows
In 2021, Microsoft announced eBPF for Windows, bringing the eBPF programming model to the Windows kernel. The project uses a compatibility layer that translates eBPF bytecode to run in the Windows kernel execution environment, reusing existing eBPF toolchains (Clang, libbpf) and verification (using the PREVAIL verifier, a formal-methods-based alternative to the Linux verifier).
eBPF for Windows supports XDP-like hooks for packet processing and socket-level hooks, enabling tools like Cilium to potentially work on Windows nodes in mixed Kubernetes clusters. The project is open source and under active development, though it covers a narrower set of program types than Linux.
The fact that Microsoft chose to adopt the eBPF interface rather than design a competing system underscores how dominant the eBPF model has become for in-kernel programmability.
The Broader Implications
eBPF represents a philosophical shift in operating system design. Traditionally, the kernel was a monolithic binary that changed slowly through official releases. eBPF makes the kernel programmable — operators can extend kernel behavior at runtime, without waiting for upstream patches or risking kernel module crashes.
This has practical consequences for how networks operate. Consider the path a packet takes through a modern infrastructure: it arrives at an edge server where an XDP program drops DDoS traffic, passes to a TC program that load-balances it to a backend, enters a container network managed by Cilium's eBPF programs, and has its access controlled by eBPF-based security policies. At every stage, the behavior is defined by eBPF programs that can be updated, monitored, and debugged without kernel changes.
For anyone working with internet infrastructure — whether you are looking at BGP routes, managing autonomous systems, or debugging network paths — eBPF is increasingly the layer where packets are actually processed. Understanding it is no longer optional for serious network engineering.
Explore the Infrastructure
Many of the networks that deploy eBPF at scale are visible in the global BGP routing table. You can examine their routing and connectivity:
- AS13335 — Cloudflare: XDP-based DDoS mitigation across 300+ cities
- AS32934 — Meta: Katran XDP load balancer across all production traffic
- AS15169 — Google: eBPF-based networking in GKE and production infrastructure
- AS8075 — Microsoft: eBPF for Windows and Azure networking
- 1.1.1.1 — Cloudflare DNS: protected by XDP at the edge