How IPsec Works: Tunnel Mode, Transport Mode, IKEv2, ESP, and AH

IPsec (Internet Protocol Security) is a suite of protocols that provides authentication, integrity, and confidentiality for IP packets at the network layer. Unlike TLS, which secures individual TCP connections at the transport layer, IPsec operates at Layer 3 and can transparently protect all traffic between two endpoints -- every protocol, every port -- without requiring application modifications. Defined across dozens of RFCs (the core being RFCs 4301-4309), IPsec is the foundation of site-to-site VPNs, remote access tunnels, and secure host-to-host communication. It is complex, configurable, and has been the subject of more security research (and more implementation vulnerabilities) than almost any other cryptographic protocol suite.

IPsec's complexity stems from its design philosophy: provide a framework flexible enough to secure any IP communication, between any pair of endpoints, with any combination of security services. This flexibility means IPsec must handle key negotiation, algorithm selection, policy matching, tunnel establishment, rekeying, NAT traversal, dead peer detection, and packet-level encryption/authentication -- each as a separately specified component that must interoperate correctly. The result is a protocol suite that, when properly configured, provides defense-in-depth security for network-layer traffic, but when misconfigured, can provide a false sense of security or fail to establish connectivity entirely.

Transport Mode vs. Tunnel Mode

IPsec operates in two modes that determine how the original IP packet is encapsulated:

Transport mode protects only the payload of the IP packet, leaving the original IP header intact. The IPsec header (ESP or AH) is inserted between the IP header and the upper-layer protocol header (TCP, UDP, ICMP). Transport mode is used for host-to-host communication -- two servers encrypting traffic between themselves. The original source and destination IP addresses remain visible to the network, so routers along the path can make forwarding decisions based on the real addresses. Transport mode cannot carry traffic for third parties because it does not create a new IP header.

Tunnel mode encapsulates the entire original IP packet inside a new IP packet. The outer IP header has the tunnel endpoints' addresses (the IPsec gateways), while the inner IP header (encrypted) contains the original source and destination. This is the mode used for site-to-site VPNs: two gateway routers create a tunnel, and all traffic between the protected subnets flows through it. Hosts behind the gateways are unaware of IPsec -- they send normal, unencrypted packets, and the gateways handle encryption/decryption transparently.

IPsec Transport Mode vs Tunnel Mode Original IP Packet: IP Header TCP Hdr Payload Data Transport Mode (ESP): Original IP header preserved, payload encrypted IP Header ESP Hdr TCP Hdr Encrypted Payload ESP Trlr Encrypted Tunnel Mode (ESP): New outer IP header, entire original packet encrypted New IP Hdr ESP Hdr Orig IP Hdr TCP Hdr Encrypted Payload ESP Trlr Encrypted (entire original packet) Key differences: Transport: Original IP addresses visible to network. Used for host-to-host. Tunnel: Original packet fully hidden. New IP header with gateway addresses. Used for site-to-site VPN. Tunnel mode adds ~20 bytes overhead (new IP header) but hides internal network topology.

ESP: Encapsulating Security Payload

ESP (RFC 4303) is the workhorse of IPsec, providing both encryption (confidentiality) and authentication (integrity). It is by far the most commonly used IPsec protocol -- AH (Authentication Header) provides only authentication without encryption and is rarely used in modern deployments.

An ESP packet has the following structure:

ESP Header:
  SPI (4 bytes)     - Security Parameters Index: identifies the SA
  Seq# (4 bytes)    - Sequence number: anti-replay protection

ESP Payload:
  IV (variable)     - Initialization vector for the cipher
  Encrypted data    - The protected payload (transport or tunnel)
  Padding           - Pad to block cipher boundary
  Pad Length (1)    - Length of padding
  Next Header (1)   - Protocol number of the encrypted payload

ESP Trailer:
  ICV (variable)    - Integrity Check Value (authentication tag)

The SPI (Security Parameters Index) is a 32-bit value that, combined with the destination IP address and protocol (ESP), uniquely identifies the Security Association (SA) -- the set of cryptographic parameters (algorithm, key, lifetime) negotiated for this tunnel. Each direction of traffic uses a separate SPI, so a bidirectional tunnel has two SAs (one for each direction).

The sequence number is a monotonically increasing counter used for anti-replay protection. The receiver maintains a sliding window (typically 32 or 64 packets wide) and rejects any packet with a sequence number that falls outside or has already been seen within the window. Extended Sequence Numbers (ESN, RFC 4304) provide a 64-bit counter for high-speed links where a 32-bit counter could wrap during the SA lifetime.

Commonly used ESP encryption algorithms:

AH: Authentication Header

AH (RFC 4302) provides data integrity and authentication but no encryption. It computes an ICV (HMAC) over the IP header (with mutable fields like TTL zeroed) and the payload, allowing the receiver to verify that the packet was not modified in transit and originated from a legitimate peer.

AH is rarely used in practice for two reasons: (1) ESP with a null cipher provides the same authentication-only functionality with a simpler packet format, and (2) AH is fundamentally incompatible with NAT because it authenticates the IP header -- when a NAT device rewrites the source IP, the AH integrity check fails. Since nearly all IPsec deployments traverse at least one NAT device, AH is effectively unusable in most real-world scenarios.

IKEv1 vs. IKEv2: Key Exchange and SA Negotiation

Before ESP or AH can protect traffic, the two endpoints must agree on cryptographic parameters and exchange keys. This is the job of IKE (Internet Key Exchange). IKE negotiates Security Associations (SAs) and establishes the shared secrets used to derive ESP/AH encryption and authentication keys.

IKEv1 (RFC 2409)

IKEv1 is the original key exchange protocol, and its complexity is legendary. It operates in two phases:

Phase 1 establishes an IKE SA (also called the ISAKMP SA) -- a secure, authenticated channel between the two IKE peers. This phase has two sub-modes:

Phase 2 (Quick Mode) negotiates the IPsec SAs -- the ESP/AH parameters for the actual data tunnel. Quick Mode runs inside the encrypted IKE SA established in Phase 1 and requires three messages. Each Phase 2 negotiation creates a pair of IPsec SAs (one in each direction). A single Phase 1 SA can host multiple Phase 2 negotiations, allowing multiple tunnels between the same pair of gateways without repeating the expensive Phase 1 exchange.

IKEv2 (RFC 7296)

IKEv2 is a complete redesign that simplifies the protocol while adding features that IKEv1 lacked:

IKEv2 Exchange (4 messages, 2 round trips) Initiator Responder IKE_SA_INIT (cleartext) Msg 1: SAi1, KEi, Ni, NAT_DETECTION 1 Msg 2: SAr1, KEr, Nr, NAT_DETECTION, [CERTREQ] 2 DH shared secret computed DH shared secret computed SKEYSEED derived IKE_AUTH (encrypted) Msg 3: IDi, [CERT], AUTH, SAi2, TSi, TSr 3 Msg 4: IDr, [CERT], AUTH, SAr2, TSi, TSr 4 Tunnel Established IKE SA + first Child SA (ESP) ready -- traffic flows SAi/SAr = Security Association proposals (algorithms, DH groups) KEi/KEr = Key Exchange (Diffie-Hellman public values) Ni/Nr = Nonces (freshness, prevent replay) TSi/TSr = Traffic Selectors (which subnets to protect) AUTH = Authentication (PSK, RSA/ECDSA cert, or EAP)

Security Associations and the Security Policy Database

IPsec uses two databases on each endpoint:

The Security Policy Database (SPD) defines which traffic should be protected and how. Each SPD entry specifies a traffic selector (source/destination IP ranges, protocols, ports) and an action: PROTECT (apply IPsec), BYPASS (send without IPsec), or DISCARD (drop). When the IP stack is about to send a packet, it consults the SPD to determine whether the packet should be encrypted, sent in the clear, or dropped.

The Security Association Database (SAD) contains the active SAs -- the cryptographic parameters for each tunnel. Each SA entry includes: the SPI, the encryption algorithm and key, the authentication algorithm and key (for non-AEAD ciphers), the sequence number counter, the SA lifetime (time-based and/or traffic-based), and the tunnel endpoints. Inbound packets are looked up in the SAD by {destination IP, SPI, protocol} to find the correct decryption key.

Traffic Selectors (TS) in IKEv2 map to SPD entries. During IKE_AUTH, the initiator proposes TSi (initiator traffic selector -- which source subnets to protect) and TSr (responder traffic selector -- which destination subnets to protect). The responder can narrow these selectors (but not widen them). If the selectors do not overlap with the responder's SPD, the negotiation fails. This is one of the most common sources of IPsec misconfiguration: mismatched traffic selectors between the two endpoints.

NAT Traversal (NAT-T)

IPsec and NAT are fundamentally at odds. ESP is an IP protocol (protocol number 50) that does not have port numbers, so a NAT device cannot demultiplex multiple ESP sessions behind the same public IP. AH authenticates the IP header, so any NAT modification breaks the integrity check. Even with ESP, NAT checksum rewriting fails because the checksums are encrypted.

NAT-T (NAT Traversal) solves this by encapsulating ESP packets inside UDP on port 4500. The IKE exchange detects NAT by including NAT_DETECTION_SOURCE_IP and NAT_DETECTION_DESTINATION_IP notification payloads -- hashes of the IP:port pairs that each side sees. If the hashes do not match (meaning a NAT device modified the addresses or ports), both sides switch to UDP encapsulation. The ESP packet is wrapped in a UDP header (source and destination port 4500), which the NAT device can track and demultiplex using standard UDP NAT table entries.

IKEv2 includes NAT-T detection natively in the IKE_SA_INIT exchange. IKEv1 required the separate NAT-T extension (RFC 3947/3948). When NAT is detected, IKEv2 also activates keepalive packets (empty UDP packets sent every 20 seconds by default) to prevent the NAT mapping from expiring.

Authentication: Certificates vs. Pre-Shared Keys

IPsec supports several authentication methods:

Dead Peer Detection (DPD)

In IKEv1, Dead Peer Detection (RFC 3706) is an extension that allows peers to verify each other's liveness. Without DPD, if one side of a tunnel crashes and restarts, the other side has no way to detect the failure until the IKE SA expires (which could be hours). During this time, the surviving peer continues encrypting packets with the old SA and sending them to a peer that has discarded its state -- the packets are silently dropped.

DPD uses R-U-THERE / R-U-THERE-ACK messages exchanged inside the IKE SA. If a peer does not respond to multiple R-U-THERE messages (typically 3-5 retries at 10-30 second intervals), it is declared dead, and the surviving peer tears down the IKE and IPsec SAs. This triggers IKE re-establishment, which creates fresh SAs that both sides agree on.

IKEv2 provides equivalent functionality natively: either peer can send an empty INFORMATIONAL exchange at any time. If the peer responds, it is alive. If it does not respond after retransmissions, the IKE SA is deleted.

Rekeying and Perfect Forward Secrecy

IPsec SAs have limited lifetimes, specified in seconds and/or bytes. When an SA approaches its lifetime limit, the peers must negotiate a new SA before the old one expires (rekey) to avoid a traffic interruption. IKEv2 handles this with the CREATE_CHILD_SA exchange, which can also perform a new Diffie-Hellman exchange for Perfect Forward Secrecy (PFS).

PFS ensures that if the long-term authentication keys (PSK or private key) are compromised in the future, previously recorded traffic cannot be decrypted. Without PFS, the Child SA keys are derived from the IKE SA's SKEYSEED, which is itself derived from the original DH exchange. If an attacker compromises the long-term key and has recorded the initial IKE exchange, they can recompute SKEYSEED and derive all Child SA keys. With PFS, each rekey includes a fresh DH exchange, generating new key material that is independent of the previous keys. The cost of PFS is one additional DH computation per rekey -- negligible on modern hardware but potentially significant on low-power embedded devices rekeying hundreds of tunnels.

IPsec vs. WireGuard

The contrast between IPsec and WireGuard illustrates the tradeoff between flexibility and simplicity. IPsec's configurable cipher suites, multiple authentication methods, and complex negotiation protocol make it suitable for heterogeneous enterprise environments where interoperability with diverse vendors is required. WireGuard's fixed cryptographic primitives and minimal configuration make it suitable for scenarios where simplicity and auditability are priorities.

Key differences:

IPsec in Cloud and Data Center Environments

IPsec remains the standard for cloud VPN connectivity. AWS VPN, Azure VPN Gateway, and Google Cloud VPN all use IKEv2/ESP for site-to-site tunnels. Cloud providers typically support a limited set of IKEv2 proposals (AES-256-GCM, SHA-256, DH Group 20) and mandate certificate-based or PSK authentication.

For connecting on-premises networks to cloud VPCs, IPsec tunnels are established between the customer's router/firewall and the cloud provider's VPN gateway. The tunnel carries traffic between the customer's private subnets and the cloud VPC's CIDR range. BGP is often run over the IPsec tunnel to dynamically exchange routes -- the IPsec tunnel is the data plane, and BGP is the control plane. This is standard practice in AWS Direct Connect VPN backup configurations and Azure ExpressRoute failover.

In data center environments, IPsec is used for inter-site encryption between geographically distributed data centers. MACsec (IEEE 802.1AE) is preferred for intra-site encryption at Layer 2, but for inter-site traffic traversing the public internet or third-party networks, IPsec in tunnel mode remains the standard approach. High-throughput deployments use hardware IPsec acceleration (crypto offload ASICs or Intel QAT) to achieve line-rate encryption on 100 Gbps+ links.

Common IPsec Troubleshooting Issues

IPsec's complexity leads to a predictable set of operational problems:

See It in Action

IPsec tunnels underpin the private connectivity between enterprise networks and cloud providers worldwide. The BGP routes exchanged over these tunnels determine how traffic flows between on-premises data centers and cloud VPCs. When you look up a cloud provider's AS number, the prefixes you see may include VPN gateway address ranges that terminate thousands of IPsec tunnels from enterprise customers.

Use the god.ad BGP Looking Glass to explore the networks of major VPN gateway providers, or look up the IP address of an IPsec tunnel endpoint to see which AS originates the route and how it connects to the global BGP routing table.

See BGP routing data in real time

Open Looking Glass
More Articles
How TLS/HTTPS Works: Securing the Internet's Traffic
Certificate Transparency: How CT Logs Secure the Web's PKI
How Firewalls Work: Packet Filtering, Stateful Inspection, and Beyond
What is Cross-Site Scripting (XSS)?
What is Cross-Site Request Forgery (CSRF)?
What is Server-Side Request Forgery (SSRF)?