How VXLAN Works: Virtual Extensible LAN Overlay Networking

Virtual Extensible LAN (VXLAN) is a network overlay technology that encapsulates Layer 2 Ethernet frames inside Layer 3 UDP packets, enabling the creation of virtualized Layer 2 networks that span across Layer 3 boundaries. Defined in RFC 7348, VXLAN addresses the scalability limitations of traditional VLANs (limited to 4,094 IDs) by providing a 24-bit segment identifier (VNI) that supports up to 16 million logical networks. VXLAN has become the dominant overlay technology in modern data center fabrics, cloud infrastructure, and container networking environments.

VXLAN matters to network engineers working with large-scale data centers because it decouples the logical network topology from the physical underlay. Virtual machines and containers can migrate freely between physical hosts while retaining their MAC addresses, IP addresses, and network membership — the VXLAN overlay provides a consistent Layer 2 domain regardless of the physical location. When combined with EVPN (Ethernet VPN) as the control plane, VXLAN becomes a sophisticated, scalable fabric technology that integrates with BGP for MAC/IP advertisement and route distribution.

Why VXLAN Exists: The VLAN Scalability Problem

Traditional VLANs use a 12-bit VLAN ID field in the IEEE 802.1Q tag, limiting the maximum number of VLANs to 4,094 (IDs 0 and 4095 are reserved). In a multi-tenant data center hosting thousands of customers, each requiring network isolation, 4,094 VLANs is insufficient. Even within a single large enterprise, application teams, development environments, and microservice architectures can easily exhaust the VLAN space.

Beyond the ID limitation, VLANs have other scaling problems:

VXLAN solves all of these problems by using the Layer 3 underlay (IP/UDP) as a transport for Layer 2 frames. The underlay provides routing, ECMP load balancing, and loop-free forwarding. The overlay provides the logical Layer 2 connectivity with a much larger identifier space.

VXLAN Encapsulation

VXLAN uses a MAC-in-UDP encapsulation scheme. An original Ethernet frame from a virtual machine or container is wrapped in a VXLAN header, then placed inside a UDP packet, which is in turn encapsulated in an outer IP packet and outer Ethernet frame for transport across the underlay network.

VXLAN Packet Encapsulation Original Frame (inner) Inner Dst/Src MAC Inner Payload VXLAN Encapsulation (50 bytes overhead) Outer Ethernet 14 bytes Outer IP 20 bytes UDP 8 bytes VXLAN Hdr 8 bytes Original Ethernet Frame (inner) variable length VXLAN Header (8 bytes) Flags (8 bits) I bit = 1 (valid VNI) Reserved (24 bits) VNI (24 bits) 0 - 16,777,215 Reserved (8 bits) UDP destination port: 4789 (IANA assigned) | Source port: hash of inner frame for ECMP

The encapsulation adds 50 bytes of overhead to every frame: 14 bytes outer Ethernet, 20 bytes outer IP, 8 bytes UDP, and 8 bytes VXLAN header. This means the underlay MTU must be at least 50 bytes larger than the inner frame size. For standard 1500-byte inner frames, the underlay needs at least an MTU of 1550 bytes. Most data center fabrics configure a jumbo frame MTU of 9214 on the underlay to accommodate VXLAN encapsulation without fragmenting inner frames.

The VNI (VXLAN Network Identifier)

The VNI is the 24-bit field that identifies the VXLAN segment — the logical Layer 2 network. It is analogous to a VLAN ID but with a vastly larger space: 16,777,216 possible values compared to 4,094 VLANs. Each VNI defines an isolated broadcast domain. Traffic from one VNI cannot reach another VNI without explicit routing (inter-VXLAN routing), providing tenant isolation in multi-tenant environments.

UDP Source Port Hashing

VXLAN uses UDP destination port 4789. The source port is computed as a hash of fields from the inner frame (typically the inner source/destination MAC, IP, and TCP/UDP ports). This source port entropy is critical because it enables the underlay network to perform ECMP (Equal-Cost Multi-Path) load balancing across multiple parallel paths. Without source port variation, all VXLAN traffic between two VTEPs would hash to the same ECMP path, wasting available bandwidth. The hash-based source port distributes VXLAN flows across all available underlay paths.

VTEPs: VXLAN Tunnel Endpoints

A VTEP (VXLAN Tunnel Endpoint) is the device that performs VXLAN encapsulation and decapsulation. VTEPs are the edge devices of the VXLAN overlay — they sit at the boundary between the Layer 2 domain (virtual machines, containers, bare-metal servers) and the Layer 3 underlay network.

VTEPs can be implemented in:

Each VTEP has at least one IP address on the underlay network (the VTEP IP, often a loopback address). VXLAN tunnels are established between VTEP IPs. The underlay routing protocol (OSPF, IS-IS, or BGP) provides reachability between VTEP IPs, and the underlay network handles forwarding the encapsulated packets.

BUM Traffic Handling

One of the biggest challenges in VXLAN is handling BUM (Broadcast, Unknown unicast, and Multicast) traffic. In a traditional VLAN, BUM traffic is flooded to all ports in the VLAN. In a VXLAN overlay, the equivalent would be sending BUM traffic to all VTEPs that participate in the same VNI. Two primary approaches exist:

Multicast-Based Flooding

The original RFC 7348 approach maps each VNI to an IP multicast group (e.g., VNI 10000 → 239.1.1.1). BUM traffic is encapsulated and sent to the multicast group address. The underlay network's multicast routing (PIM-SM) delivers it to all VTEPs that have joined the group. This approach is simple but requires multicast infrastructure in the underlay, which many operators are reluctant to deploy due to complexity and troubleshooting difficulty.

Ingress Replication (Head-End Replication)

The VTEP maintains a list of all remote VTEPs participating in each VNI and unicasts a copy of BUM traffic to each one individually. This eliminates the need for multicast in the underlay but increases the amount of traffic generated by the source VTEP proportionally to the number of remote VTEPs. For small to medium deployments (dozens of VTEPs per VNI), ingress replication works well. For very large deployments, the replication overhead can become significant.

EVPN-Based Suppression

The most modern and scalable approach uses EVPN (described below) to distribute MAC and IP information via BGP, enabling VTEPs to proxy-respond to ARP/ND requests without flooding them. This dramatically reduces BUM traffic. When a VM sends an ARP request, the local VTEP intercepts it, looks up the target IP in its EVPN-learned MAC/IP database, and responds directly. The ARP request never crosses the underlay.

EVPN: The Control Plane for VXLAN

RFC 7348 defined only the VXLAN data plane (encapsulation format) and used flood-and-learn for MAC address discovery — the same mechanism as traditional Ethernet switching, but tunneled through VXLAN. This data-plane-only approach has significant limitations: it requires either multicast or ingress replication for all BUM traffic, it has no control over which VTEPs participate in which VNIs, and MAC learning is reactive rather than proactive.

EVPN (Ethernet VPN, RFC 7432) provides a proper control plane for VXLAN using BGP as the routing protocol. EVPN with VXLAN encapsulation is defined in RFC 8365. With EVPN, MAC and IP addresses are advertised as BGP routes rather than learned from flooded data-plane traffic. Each VTEP runs BGP (typically iBGP with route reflectors) and advertises the MAC/IP addresses of locally connected endpoints.

EVPN-VXLAN Leaf-Spine Fabric Spine Layer (BGP Route Reflectors) Spine-1 RR Spine-2 RR Leaf Layer (VTEPs) Leaf-1 VTEP 10.0.0.1 Leaf-2 VTEP 10.0.0.2 Leaf-3 VTEP 10.0.0.3 Leaf-4 VTEP 10.0.0.4 VM-A VM-B VM-C VM-D VM-E BGP EVPN Control Plane Leaf-1 advertises: MAC-A, MAC-B in VNI 10000 Leaf-3 advertises: MAC-D in VNI 10000 VXLAN Data Plane: encapsulated frames traverse underlay via spine ECMP paths

EVPN Route Types

EVPN uses BGP to carry several route types, each serving a specific purpose in the overlay network:

Symmetric and Asymmetric IRB

When traffic needs to be routed between different VXLAN segments (different VNIs), Integrated Routing and Bridging (IRB) is used. Two models exist:

Asymmetric IRB

In the asymmetric model, the ingress VTEP performs both the routing lookup (L3) and the bridging into the destination VNI. The packet crosses the underlay in the destination VNI. The egress VTEP only performs L2 bridging. This is "asymmetric" because the ingress does more work than the egress, and the return traffic path may use a different VNI.

The downside: the ingress VTEP must have both the source and destination VNIs configured, which means every VTEP must be configured with every VNI in the fabric. This does not scale in large multi-tenant environments.

Symmetric IRB

In the symmetric model, both the ingress and egress VTEPs perform routing. The ingress VTEP routes the packet from the source VNI into a shared L3 VNI (associated with a VRF), encapsulates it in VXLAN with the L3 VNI, and sends it across the underlay. The egress VTEP decapsulates, performs a routing lookup in the VRF, and bridges the packet into the destination VNI.

The advantage: each VTEP only needs to be configured with the VNIs of locally connected hosts plus the L3 VNI for the VRF. This scales much better in multi-tenant environments where different tenants have different sets of VNIs on different leaf switches. Symmetric IRB is the recommended and most widely deployed model in modern EVPN-VXLAN fabrics.

VXLAN in Data Center Leaf-Spine Fabrics

The most common VXLAN deployment architecture is the BGP-EVPN leaf-spine fabric:

This architecture provides a scalable, loop-free, multi-path fabric where any VM or container can communicate with any other, regardless of physical location. VXLAN segments can be stretched across multiple data centers by extending the EVPN control plane across a DCI (Data Center Interconnect) link.

VXLAN MTU and Performance Considerations

The 50-byte VXLAN encapsulation overhead introduces several practical considerations:

VXLAN and Container Networking

VXLAN is widely used in container networking to provide overlay connectivity between pods running on different hosts. Container networking solutions like Flannel, Calico (in VXLAN mode), and Cilium use VXLAN tunnels between nodes to encapsulate pod-to-pod traffic. The container runtime on each node acts as a VTEP, encapsulating traffic destined for pods on remote nodes.

In Kubernetes environments, VXLAN overlays allow pods to communicate using a flat IP address space regardless of the underlying network topology. Each node is assigned a subnet from the pod CIDR, and VXLAN tunnels provide the connectivity between these subnets without requiring the physical network to understand pod addressing.

VXLAN Security Considerations

VXLAN itself provides no encryption, authentication, or integrity protection. The encapsulated traffic is carried in plaintext UDP packets. Any device on the underlay network that can capture traffic can inspect the inner frames. Security considerations include:

GPE, GENEVE, and the Future of Overlays

VXLAN-GPE (RFC draft) extends the VXLAN header with a Next Protocol field, enabling encapsulation of protocols other than Ethernet (e.g., IP, NSH for service chaining). GENEVE (RFC 8926) is a more flexible alternative to VXLAN that supports variable-length TLV options in the header, designed to be extensible enough to subsume both VXLAN and NVGRE.

In practice, VXLAN with EVPN remains the dominant production deployment model. GENEVE is gaining traction in cloud provider networks and is used by some container networking implementations, but the installed base of EVPN-VXLAN is massive and the transition to GENEVE is gradual.

Explore Network Infrastructure

VXLAN fabrics are part of the data center infrastructure that connects to the broader internet via BGP. Data center leaf switches often peer with border routers that run eBGP with upstream transit providers and peering partners. To see how networks interconnect at the BGP level, use the god.ad BGP Looking Glass to look up any IP address or ASN and trace the AS path between networks.

See BGP routing data in real time

Open Looking Glass
More Articles
What is DNS? The Internet's Phone Book
What is an IP Address?
IPv4 vs IPv6: What's the Difference?
What is a Network Prefix (CIDR)?
How Does Traceroute Work?
What is a CDN? Content Delivery Networks Explained