How eBPF Works: Programmable Networking in the Linux Kernel

eBPF (extended Berkeley Packet Filter) is a technology that lets you run sandboxed programs inside the Linux kernel without changing kernel source code or loading kernel modules. Originally designed for packet filtering, eBPF has evolved into a general-purpose in-kernel virtual machine that powers a new generation of networking, observability, and security tools. If you have used Cilium for Kubernetes networking, bpftrace for tracing, or Cloudflare's DDoS mitigation on traffic headed toward networks like AS13335, you have used eBPF.

What makes eBPF remarkable is its position: it runs at the lowest level of the operating system where it can inspect and manipulate every packet, system call, and kernel function — yet it does so safely, with guarantees that a buggy program cannot crash the kernel. This combination of power and safety has made eBPF the most significant innovation in Linux systems programming in the past decade.

From BPF to eBPF: A Brief History

The original Berkeley Packet Filter (BPF) was created in 1992 by Steven McCanne and Van Jacobson at Lawrence Berkeley National Laboratory. Classic BPF was a simple virtual machine with two 32-bit registers, a scratch memory store, and an instruction set designed for one purpose: filtering packets efficiently. When you ran tcpdump -i eth0 port 80, the filter expression was compiled into BPF bytecode and loaded into the kernel, where it decided which packets to capture without copying every packet to userspace.

Classic BPF was elegant but limited. It could only filter packets, had minimal state, and its instruction set was deliberately constrained. For two decades, it served its purpose well but remained a niche subsystem.

In 2014, Alexei Starovoitov introduced extended BPF (eBPF) into the Linux kernel (version 3.18). The "extended" part was an understatement. eBPF expanded the register set to eleven 64-bit registers, added a larger instruction set closely mapping to modern CPU architectures, introduced maps for persistent key-value storage, and — crucially — allowed programs to attach to many different kernel hook points beyond packet filtering. The BPF virtual machine became a general-purpose execution engine embedded in the kernel.

By 2016, eBPF programs could attach to tracepoints, kprobes, and perf events. By 2018, BPF Type Format (BTF) and CO-RE enabled portable programs across kernel versions. Today, eBPF is used so pervasively that the term "BPF" almost always refers to the extended version.

eBPF Architecture

Understanding eBPF requires understanding how its components fit together: programs written in C or Rust are compiled to BPF bytecode, verified for safety, JIT-compiled to native machine code, and attached to kernel hook points where they execute in response to events.

USER SPACE C / Rust Source Code LLVM / Clang Compiler BPF Bytecode .o ELF file libbpf / bpftool Loader bpf() syscall KERNEL SPACE Verifier Safety Checks JIT Compiler Native Machine Code XDP Hook TC Hook Socket Hook Kprobe Tracepoint Cgroup Hook BPF Maps (Hash, Array, Ring Buffer, LPM Trie, ...) Shared state between programs and user space Hook Points Storage Execution Pipeline

eBPF Program Types

eBPF programs are not generic — each program has a specific type that determines where in the kernel it can attach, what context it receives, and what helper functions it can call. The kernel currently defines over 30 program types. The most important for networking are:

XDP (eXpress Data Path)

XDP programs run at the earliest possible point in the network receive path — before the kernel allocates an sk_buff (socket buffer) structure, before the network stack processes the packet, and in some cases before the packet even leaves the NIC's DMA ring buffer. This makes XDP extraordinarily fast for packet processing. An XDP program receives the raw packet data and returns a verdict:

XDP can operate in three modes: native mode (the NIC driver calls the XDP program directly, the fastest), offloaded mode (the program runs on the NIC hardware itself, the fastest possible but with limited program complexity), and generic mode (runs later in the stack, works with any NIC but is slower).

TC (Traffic Control)

TC programs attach to the kernel's traffic control subsystem and run on both ingress and egress paths. Unlike XDP, TC programs see the full sk_buff structure, giving them access to parsed protocol headers, routing decisions, and socket associations. TC is the right hook for tasks that need more context than raw packet bytes — rewriting headers, applying policy based on connection tracking state, or implementing container networking overlays.

Socket Filter and Socket Operations

Socket-level programs attach to individual sockets or cgroups of sockets. BPF_PROG_TYPE_SOCKET_FILTER programs can inspect (but not modify) packets on a socket — a direct descendant of classic BPF's packet filtering. BPF_PROG_TYPE_SOCK_OPS programs can intercept socket operations like TCP connection establishment and modify TCP parameters. BPF_PROG_TYPE_SK_MSG programs can redirect messages between sockets, enabling kernel-level proxying without copying data through userspace.

Tracing Programs

Beyond networking, eBPF excels at system observability. kprobe programs attach to any kernel function entry/exit, tracepoint programs attach to stable kernel tracepoints, fentry/fexit programs (added in Linux 5.5) provide faster function tracing with direct access to function arguments and return values, and perf_event programs run on hardware performance counter overflows or software events.

LSM (Linux Security Module) Programs

Added in Linux 5.7, LSM programs attach to the kernel's security hooks — the same hooks used by SELinux and AppArmor. This allows eBPF to implement custom mandatory access control policies at runtime, without compiling them into the kernel.

The Verifier: How eBPF Guarantees Safety

The eBPF verifier is the critical piece that makes the entire system work. Every eBPF program must pass through the verifier before it can execute, and the verifier rejects any program it cannot prove safe. This is not a runtime sandbox — the verifier performs static analysis at load time, examining every possible execution path through the program.

BPF Bytecode from bpf() syscall VERIFIER 1. Build CFG (DAG check, no loops*) 2. Walk all paths (state tracking) 3. Check memory bounds & types 4. Validate helper calls & args 5. Ensure bounded execution REJECTED Error returned ACCEPTED Proceed to JIT * Bounded loops allowed since Linux 5.3

The verifier's checks include:

The verifier is conservative: it rejects programs it cannot prove safe, even if they are actually safe. Experienced eBPF developers learn to write "verifier-friendly" code — restructuring programs so the verifier can track register state through all branches. This is the price of the safety guarantee.

JIT Compilation

After passing the verifier, eBPF bytecode is JIT-compiled to native machine code for the host CPU architecture. The JIT compiler exists for x86-64, ARM64, RISC-V, s390x, PowerPC, and other architectures. JIT compilation eliminates interpretation overhead, making eBPF programs run at near-native speed.

Modern kernels enable JIT by default (net.core.bpf_jit_enable = 1). The JIT compiler also applies security hardening: it randomizes the memory location of JIT'd code (JIT spraying mitigation), and can use constant blinding to prevent attackers from inserting gadgets via eBPF constants.

BPF Maps: Shared State

eBPF programs by themselves are stateless — they receive a context, process it, and return a verdict. BPF maps provide persistent state that survives across program invocations and can be shared between eBPF programs and user-space applications. Maps are the mechanism for extracting data (metrics, events, logs) from kernel-space programs and for injecting configuration (rules, policies) from user space.

The kernel provides many map types, each optimized for different access patterns:

Maps are created from user space via the bpf() system call and referenced from eBPF programs via file descriptors. They persist as long as they are referenced by a program or pinned in the BPF filesystem (/sys/fs/bpf/).

Tail Calls and BPF-to-BPF Function Calls

Complex eBPF applications often need more logic than fits in a single program. Two mechanisms address this:

Tail calls allow one eBPF program to chain into another, transferring control without returning. The call stack is not extended — the current program is replaced by the next one. This is used to break large programs into stages (e.g., a packet processing pipeline where the first program classifies the protocol and tail-calls to a protocol-specific handler). Tail calls use a special map type (BPF_MAP_TYPE_PROG_ARRAY) that stores references to other programs.

BPF-to-BPF function calls (added in Linux 4.16) allow normal function call semantics within a single eBPF program. Unlike tail calls, these use a proper call stack and the called function returns to the caller. The verifier follows calls to verify the entire call graph.

CO-RE: Compile Once, Run Everywhere

A persistent challenge with eBPF is that kernel data structures change between versions. A program that accesses struct task_struct compiled on Linux 5.10 might not work on Linux 5.15 if the offsets of the fields it reads have changed. Traditionally, eBPF programs needed to be compiled on the target machine with local kernel headers.

CO-RE (Compile Once, Run Everywhere) solves this through three components working together:

CO-RE means you can compile an eBPF program once and distribute the resulting binary to machines running different kernel versions. This was a turning point for eBPF tooling adoption, making it practical to ship pre-compiled eBPF programs in packages and container images.

Tooling: bpftool and libbpf

bpftool is the standard command-line tool for interacting with eBPF in a running kernel. It can list loaded programs (bpftool prog list), inspect maps (bpftool map dump), show program bytecode and JIT'd assembly, attach and detach programs, and generate skeleton headers for libbpf-based development.

libbpf is the canonical C library for loading and managing eBPF programs. It handles ELF parsing, map creation, program loading, BTF processing, and CO-RE relocations. Modern eBPF development follows the "libbpf + skeleton" pattern: you write your BPF program in C, compile it with Clang, generate a skeleton header with bpftool, and write a user-space loader in C that uses the skeleton to load, attach, and interact with the program.

Alternative libraries exist for other languages: Aya (Rust), libbpfgo (Go), and libbpf-rs (Rust bindings to libbpf). Aya is notable because it avoids libbpf and LLVM entirely, letting you write both the eBPF program and the loader in pure Rust.

Networking Use Cases

eBPF's deepest impact has been in networking, where its ability to run custom logic at wire speed inside the kernel has replaced components that previously required kernel modules, hardware appliances, or slow userspace proxies.

XDP Load Balancing

XDP enables layer 4 load balancing at speeds that match or exceed dedicated hardware. An XDP program inspects incoming packets, rewrites destination addresses based on a hash of the flow tuple, and returns XDP_TX to send the packet back out the wire — all without the packet ever touching the kernel's network stack. Facebook's Katran is the most prominent example: it is an open-source XDP-based L4 load balancer that handles all production traffic for facebook.com, instagram.com, and other Meta services. Katran performs consistent hashing across backend pools, supports Direct Server Return (DSR), and handles millions of packets per second per core.

Cilium: Replacing kube-proxy and iptables

Cilium is a Kubernetes CNI (Container Network Interface) plugin that replaces the traditional kube-proxy and iptables-based container networking stack with eBPF programs. In a traditional Kubernetes cluster, every Service creates a chain of iptables rules that grows linearly with the number of services, eventually degrading performance. Cilium replaces all of this with eBPF maps and programs attached at TC and XDP hooks.

The result is O(1) service lookup instead of O(n) iptables chain traversal, native support for BGP for advertising service IPs (letting Kubernetes services be directly routable from the external network), identity-based network policy enforcement, and transparent encryption between pods via WireGuard or IPsec — all implemented in eBPF.

Cloudflare DDoS Mitigation

Cloudflare uses XDP extensively to mitigate DDoS attacks against the millions of websites it protects. When a volumetric attack targets a site behind Cloudflare's network (AS13335), XDP programs on every edge server classify and drop attack packets before they consume kernel resources. The programs are updated dynamically as new attack signatures are identified — new rules are pushed to BPF maps without reloading programs or interrupting service.

Cloudflare has publicly described processing over 30 million packets per second per server using XDP, dropping attack traffic at line rate while legitimate traffic passes through to the normal network stack.

XDP DATA PATH — PACKET PROCESSING NIC DMA Ring XDP Program classify & filter before sk_buff alloc XDP_DROP XDP_TX (bounce back) XDP_PASS Network Stack sk_buff, routing, netfilter, ... TC eBPF ingress/egress Socket Layer socket eBPF Application BPF Maps rules, counters, flow state, LPM trie

Meta's Katran

Katran deserves special attention as a case study in what eBPF enables. Before Katran, Meta used IPVS (IP Virtual Server) in the kernel for load balancing. IPVS worked but was inflexible — changing the hashing algorithm or adding new features required modifying and recompiling a kernel module, then rolling it out across the fleet. Katran replaced this with an XDP program that runs in the NIC driver. The XDP program performs Maglev consistent hashing, encapsulates packets in IPIP or GUE tunnels for backend delivery, and handles health checking — all at speeds exceeding 10 million packets per second per core. When Meta needs to change the balancing algorithm, they update the XDP program without touching the kernel.

Observability: Seeing Inside the Kernel

eBPF has transformed Linux observability. Before eBPF, deep system introspection required kernel modules (risky), SystemTap (complex to deploy), or DTrace (not available on Linux). eBPF provides safe, production-ready kernel instrumentation.

bpftrace

bpftrace is a high-level tracing language for eBPF, inspired by DTrace and awk. It lets you write one-liners that instrument the kernel. For example, bpftrace -e 'kprobe:tcp_sendmsg { @bytes = hist(arg2); }' attaches to the kernel's TCP send function and builds a histogram of message sizes — across every TCP connection on the system, in real time, with negligible overhead.

bpftrace compiles these scripts to eBPF bytecode behind the scenes. For networking analysis, you can trace TCP retransmissions, DNS query latencies, connection establishment times, and packet drops — all from production systems without restarting services.

Pixie and Continuous Observability

Pixie (now part of New Relic) uses eBPF to automatically instrument Kubernetes applications without code changes or sidecars. eBPF programs attached to socket operations capture all network traffic in and out of pods, parse application-layer protocols (HTTP, gRPC, MySQL, PostgreSQL, Redis, Kafka), and generate request-level metrics and traces — all without any application instrumentation. This approach captures 100% of traffic because it operates at the kernel level, below the application.

Security: Runtime Enforcement

eBPF's ability to hook into security-relevant kernel paths has created a new category of security tools.

Falco

Falco (originally from Sysdig, now a CNCF project) uses eBPF to monitor system calls and detect anomalous behavior at runtime. It can detect container escapes, unexpected network connections, file access violations, privilege escalation attempts, and cryptomining processes — all by attaching eBPF programs to system call tracepoints and evaluating a rules engine against the captured events.

Tetragon

Tetragon (from Isovalent / Cilium) goes further by attaching eBPF programs to LSM hooks and kernel functions. It provides not just detection but enforcement: it can kill processes that violate security policies, block specific system calls, and enforce file integrity policies — all in kernel space, with no userspace component in the enforcement path. When Tetragon detects a prohibited action, the eBPF program returns an error code directly from the kernel hook, before the action completes.

eBPF vs Kernel Modules

eBPF occupies a space that was previously the exclusive domain of kernel modules. The comparison is instructive:

eBPF on Windows

In 2021, Microsoft announced eBPF for Windows, bringing the eBPF programming model to the Windows kernel. The project uses a compatibility layer that translates eBPF bytecode to run in the Windows kernel execution environment, reusing existing eBPF toolchains (Clang, libbpf) and verification (using the PREVAIL verifier, a formal-methods-based alternative to the Linux verifier).

eBPF for Windows supports XDP-like hooks for packet processing and socket-level hooks, enabling tools like Cilium to potentially work on Windows nodes in mixed Kubernetes clusters. The project is open source and under active development, though it covers a narrower set of program types than Linux.

The fact that Microsoft chose to adopt the eBPF interface rather than design a competing system underscores how dominant the eBPF model has become for in-kernel programmability.

The Broader Implications

eBPF represents a philosophical shift in operating system design. Traditionally, the kernel was a monolithic binary that changed slowly through official releases. eBPF makes the kernel programmable — operators can extend kernel behavior at runtime, without waiting for upstream patches or risking kernel module crashes.

This has practical consequences for how networks operate. Consider the path a packet takes through a modern infrastructure: it arrives at an edge server where an XDP program drops DDoS traffic, passes to a TC program that load-balances it to a backend, enters a container network managed by Cilium's eBPF programs, and has its access controlled by eBPF-based security policies. At every stage, the behavior is defined by eBPF programs that can be updated, monitored, and debugged without kernel changes.

For anyone working with internet infrastructure — whether you are looking at BGP routes, managing autonomous systems, or debugging network paths — eBPF is increasingly the layer where packets are actually processed. Understanding it is no longer optional for serious network engineering.

Explore the Infrastructure

Many of the networks that deploy eBPF at scale are visible in the global BGP routing table. You can examine their routing and connectivity:

See BGP routing data in real time

Open Looking Glass
More Articles
How DOCSIS Works: Cable Internet Technology Explained
How DSL Works: Internet Over Telephone Lines
How Submarine Cables Work: The Physical Internet
How Rate Limiting Works
How Fiber to the Home (FTTH) Works
How WiFi Works: 802.11 from Radio to Router