What Is MTU? Path MTU Discovery Explained
The Maximum Transmission Unit (MTU) is the largest IP packet that can be sent across a particular network link without fragmentation. Every link in the internet has an MTU — Ethernet's is 1500 bytes, older ATM links were 4470 bytes, some backbone links run 9000-byte jumbo frames. When a packet must travel across multiple links, the smallest MTU along the path — the Path MTU (PMTU) — determines the largest packet that will traverse the entire path without being fragmented or dropped. Getting MTU wrong silently breaks connections in ways that are notoriously hard to debug.
MTU vs MSS
These two terms are frequently confused:
- MTU — a link-layer (or IP-layer) property. It is the maximum size of an IP packet (header + payload) on a given link. Ethernet's MTU is 1500 bytes by default.
- MSS (Maximum Segment Size) — a TCP-layer property. It is the maximum amount of TCP payload data in a single TCP segment, announced during the three-way handshake. MSS does not include the TCP or IP headers. For IPv4 over Ethernet: MSS = 1500 (MTU) − 20 (IPv4 header) − 20 (TCP header) = 1460 bytes. For IPv6: MSS = 1500 − 40 (IPv6 header) − 20 (TCP header) = 1440 bytes.
MSS clamping adjusts the MSS value in TCP SYN packets to account for tunnel overhead — see the tunnel section below.
Ethernet MTU and the 1500-Byte Standard
The 1500-byte Ethernet MTU was established by the original Ethernet II specification and has remained the internet default for decades. It was a compromise between efficiency (larger frames amortize per-frame overhead) and latency (smaller frames reduce worst-case transmission time). At 10 Mbps, transmitting a 1500-byte frame takes 1.2 milliseconds; at 1 Gbps, just 12 microseconds — latency is now negligible, making the MTU constraint purely a legacy artifact of 1980s Ethernet.
The actual wire frame is larger because Ethernet adds its own framing: 14-byte header (source MAC, dest MAC, EtherType) plus a 4-byte 802.1Q VLAN tag (if present) plus a 4-byte FCS trailer. A maximum-size Ethernet frame with VLAN tagging uses 1518 bytes on the wire, but the IP MTU remains 1500.
Jumbo Frames
Jumbo frames are Ethernet frames with an MTU larger than 1500 bytes, typically 9000 bytes (sometimes called a 9K MTU). They are standard in data centers — storage networks (iSCSI, NFS), HPC clusters, and high-throughput server interconnects all benefit from the lower per-byte CPU overhead that jumbo frames provide. Every device on a path must support the same jumbo MTU; a single device that doesn't will silently drop oversized frames. Jumbo frames are almost never used on the public internet because the internet's edge devices (home routers, CPE) do not support them.
IPv4 Fragmentation
IPv4 includes built-in fragmentation: a router that receives a packet larger than its outbound link's MTU can split it into smaller fragments. Each fragment carries its own IP header with the same identification field as the original, plus Offset and More Fragments flag fields in the 13-bit Fragment Offset and the 3-bit Flags field.
Fragmentation happens in the router, reassembly happens at the destination. This seems convenient, but in-network fragmentation has severe costs:
- If any fragment is lost, the entire original packet must be retransmitted (TCP) — there is no mechanism to request just the missing fragment.
- Reassembly requires state at the destination. At high packet rates, the reassembly buffer can be exhausted, enabling fragment-based DoS attacks.
- Stateful firewalls must track fragmented flows, and some fail to do so correctly.
- NAT devices may struggle with fragmented UDP where port numbers only appear in the first fragment.
The Don't Fragment Bit
The IPv4 header's Flags field has a Don't Fragment (DF) bit. When set, routers must not fragment the packet. If it arrives at a link smaller than the packet, the router drops the packet and sends back an ICMP Destination Unreachable — Fragmentation Needed (Type 3, Code 4) message containing the outbound link's MTU. The sender uses this to reduce its packet size and retry. This is Path MTU Discovery (PMTUD), defined in RFC 1191. TCP sets DF on all packets by default. PMTUD allows TCP to use large packets wherever the path supports them, without risking mid-path fragmentation.
IPv6: No In-Network Fragmentation
IPv6 (RFC 8200) eliminates in-network fragmentation entirely. IPv6 routers never fragment packets — if a packet is too large for a link, the router drops it and sends back an ICMPv6 Packet Too Big (Type 2) message. The sender must reduce packet size and use a Fragment extension header if fragmentation is needed. This makes PMTUD mandatory for IPv6: without it, packets that exceed any link's MTU simply disappear. Hosts that block ICMPv6 Packet Too Big messages have broken IPv6 connectivity for large transfers.
The minimum MTU that all IPv6 links must support is 1280 bytes (RFC 8200). This is the absolute floor — a host may always send packets up to 1280 bytes without PMTUD. TCP stacks on IPv6 should clamp MSS to 1220 bytes (1280 − 40-byte IPv6 header − 20-byte TCP header) as a safe fallback when PMTUD fails.
PMTUD Failure — ICMP Black Holes
PMTUD relies on ICMP reaching the source unimpeded. When a stateful firewall or misconfigured router drops ICMP Fragmentation Needed messages, the sender never learns the actual path MTU. TCP connections appear to work for small payloads (which fit in one small packet) but silently hang on large transfers — the SYN, SYN-ACK, and first few small TCP segments get through, but the first large data segment disappears. This is the ICMP black hole problem.
The workaround is MSS clamping: a firewall or router on the path modifies the MSS value in TCP SYN packets as they pass through, reducing it to a safe value (1452, 1400, or even lower). The ip tcp adjust-mss command on Cisco IOS, or the tcpmss target in iptables/nftables, does this. MSS clamping is a hack that compensates for broken ICMP filtering, but it is widely deployed and often necessary.
An alternative approach is Packetization Layer Path MTU Discovery (PLPMTUD), defined in RFC 8899. PLPMTUD detects path MTU using probing at the application or transport layer without relying on ICMP, making it robust to ICMP filtering. QUIC implements PLPMTUD natively.
Tunnels and VPN Overhead
Every tunnel protocol adds its own headers, reducing the effective MTU for inner traffic. This is where MTU headaches become most acute in production:
| Tunnel type | Overhead | Typical inner MTU |
|---|---|---|
| WireGuard (IPv4 outer) | 60 bytes (IP+UDP+WG) | 1420 bytes |
| WireGuard (IPv6 outer) | 80 bytes | 1420 bytes |
| IPsec ESP (tunnel mode, AES-GCM) | ~73 bytes | ~1422 bytes |
| VXLAN | 50 bytes (IP+UDP+VXLAN) | 1450 bytes |
| GRE | 24 bytes (IP+GRE) | 1476 bytes |
| OpenVPN (UDP mode) | ~54 bytes | ~1446 bytes |
| PPPoE (DSL) | 8 bytes (PPPoE) | 1492 bytes |
WireGuard explicitly documents its MTU recommendation as 1420, which is conservative enough to work over most internet paths including PPPoE DSL connections (1492 MTU). The correct fix in all tunnel scenarios is to set the tunnel interface MTU and let MSS clamping handle TCP, or ensure PMTUD works end-to-end.
Diagnosing MTU Problems
The classic symptom of an MTU/PMTUD problem: small transfers work, large transfers hang. SSH connects, but large scp transfers stall. A web page loads the HTML but images never appear. A VPN tunnel passes pings but TCP connections hang after the handshake.
The standard debugging toolkit:
- Test with ping and DF bit:
ping -s 1472 -M do 8.8.8.8— sends a 1472-byte payload (1500 bytes including IP+ICMP headers) with DF set. If this times out, there is an MTU problem on the path. Reduce the size until pings succeed to find the actual PMTU. - Test specific sizes over TCP:
curl --limit-rate 0 https://cloudflare.com/cdn-cgi/traceand watch withtcpdumpfor the point where transfers stall. The last successful ACK reveals the effective MSS. - Check interface MTU:
ip link showon Linux,ifconfigon BSD/macOS,show interfaceson Cisco IOS. - Verify MSS clamping:
tcpdump -i eth0 'tcp[13] & 2 != 0'to capture SYN packets, then inspect the MSS option in the TCP options field.
MSS Clamping in Detail
MSS clamping is implemented by modifying the MSS TCP option in SYN and SYN-ACK packets as they traverse a device. The device does not need to know the actual PMTU — it simply sets the MSS to a conservative value that fits through the tunnel or link with the reduced MTU. On Linux, the nftables rule to clamp MSS to 1400:
nft add rule ip filter FORWARD tcp flags syn tcp option maxseg size set 1400
On Cisco IOS for a GRE tunnel interface:
interface Tunnel0 ip tcp adjust-mss 1452
MSS clamping only affects TCP. UDP applications, ICMP, and other protocols are not helped by MSS clamping — they must use PMTUD or conservative payload sizing directly.
MTU in Data Centers
Data center networks commonly configure jumbo frames (9000-byte MTU) on server-to-switch links and within the fabric. This requires consistent configuration across all devices — a single non-jumbo-frame switch in the path silently discards oversized frames unless it generates an ICMP error (which not all switches do). Data center ECMP fabrics and overlays add their own complications:
- VXLAN adds 50 bytes of overhead. If the physical MTU is 9000 bytes, the VXLAN inner MTU is 8950 bytes. Tenant VMs should see 8950 as their MTU, or MSS clamping must be configured on VXLAN tunnel interfaces.
- GENEVE has variable overhead (minimum 30 bytes of outer header + 8 bytes GENEVE header + option TLVs). Kubernetes CNI plugins using GENEVE typically set the pod interface MTU to 1450 to leave headroom.
- IPsec transport mode adds ~30 bytes; tunnel mode adds ~50+ bytes depending on encryption suite. Always configure DF handling and PMTUD on IPsec tunnels explicitly.
MTU Discovery for Non-TCP Protocols
TCP handles PMTUD automatically via the MSS mechanism and DF bit. Other protocols must handle it differently:
- UDP applications must either limit their payload size conservatively (1280 bytes works on all paths) or implement their own MTU probing. QUIC (RFC 9000) implements PLPMTUD (RFC 8899) to discover the actual PMTU and use it.
- SCTP (RFC 9260) has built-in PMTUD similar to TCP but with its own probing mechanism, since SCTP can fragment at the protocol level.
- GRE and IP-in-IP tunnels must propagate ICMP Fragmentation Needed messages from the inner payload path back to the original sender. This is called ICMP "skinning" and is not always implemented correctly, making GRE tunnels a common source of PMTUD failures.
PPPoE and DSL MTU
DSL connections using PPPoE have an MTU of 1492 bytes — 8 bytes less than standard Ethernet — because PPPoE adds its own 6-byte header plus a 2-byte PPP protocol field. This was a major source of PMTUD problems in the early 2000s when PPPoE DSL became widespread and many servers had ICMP filtering. The standard workaround was MSS clamping on the DSL router, and most DSL routers still apply this by default today. IPv6 over PPPoE over DSL has the same constraint: the inner IPv6 MTU is 1492 bytes, though the IPv6 minimum MTU guarantee of 1280 bytes still holds.
Black Hole Detection
RFC 4821 defines a mechanism called Packetization Layer Path MTU Discovery (PLPMTUD) that detects path MTU without relying on ICMP. Instead of using ICMP Fragmentation Needed messages, PLPMTUD uses probes of increasing size and treats the absence of an acknowledgement as evidence of a packet-too-large event. This works even when ICMP is filtered. TCP implementations can use this via the TCP_PROBE_DFLT and related socket options on Linux. QUIC mandates PLPMTUD in RFC 8899 specifically to avoid ICMP dependency.
Cached PMTU values on Linux are stored per destination in the IP route cache and expire after 10 minutes by default (net.ipv4.route.mtu_expires). After expiry, the host retries with the full MTU and will receive a new ICMP Fragmentation Needed if the path MTU has not changed. This periodic retry allows recovery if a lower-MTU link is removed from the path — traffic automatically scales back up to the full MTU after the cache expires.
MTU and IPv6 Transition Technologies
IPv6 transition technologies add yet more overhead. 464XLAT (CLAT/PLAT) used in mobile networks wraps IPv4 packets inside IPv6, adding 20 bytes. DS-Lite tunnels IPv4 over IPv6 similarly. Operators running these technologies must account for the extra headers when advertising MTU to clients, or TCP connections over the tunnel will exhibit PMTUD failures if the underlying path MTU is not large enough to accommodate both the tunnel and application headers.
Explore It Live
MTU issues often appear as mysterious connectivity problems to specific destinations. Use this tool to identify the networks along a path and check if they're known for filtering ICMP:
- 8.8.8.8 — Google DNS; Google's infrastructure is well-behaved with respect to PMTUD
- 1.1.1.1 — Cloudflare anycast; trace the path to identify intermediate ASes
- AS13335 — Cloudflare's network, which operates QUIC/HTTP3 at scale and depends on correct MTU handling
The BGP looking glass can help you identify which networks are on the path to a problematic destination, focusing your MTU debugging efforts. Large transit networks like AS3356 (Lumen) and AS1299 (Telia) are well-maintained and generally pass PMTUD correctly — when PMTUD fails, the culprit is often a smaller network or enterprise firewall at the edge.