How LACP Works: Link Aggregation for Bandwidth and Redundancy

Link Aggregation Control Protocol (LACP) is a Layer 2 protocol that bundles multiple physical Ethernet links between two devices into a single logical channel, providing both increased bandwidth and redundancy. Defined originally in IEEE 802.3ad and later moved to IEEE 802.1AX, LACP dynamically negotiates which links participate in an aggregate, continuously monitors their health, and automatically removes failed members without disrupting traffic flow. Nearly every data center, campus network, and service provider backbone relies on link aggregation to scale bandwidth beyond what a single physical port can deliver while ensuring that a cable pull or transceiver failure does not cause an outage.

Without LACP, you could bond ports statically, but static link aggregation has no mechanism to detect misconfiguration — if one end of a static bond is connected to the wrong switch, frames silently black-hole. LACP solves this by exchanging control frames (LACPDUs) between partners, ensuring both sides agree on which links belong to the aggregate before any traffic flows. This negotiation makes LACP the standard for production deployments where reliability is non-negotiable.

Link aggregation interacts closely with other Layer 2 protocols. Spanning Tree Protocol treats a LAG (Link Aggregation Group) as a single logical link, preventing it from blocking individual member ports. Bidirectional Forwarding Detection (BFD) can run over a LAG to detect failures at sub-second timescales. And at higher layers, routers advertising routes via BGP often use LAGs for their uplinks to increase capacity and resilience toward upstream transit providers and Internet Exchange Points.

IEEE 802.3ad and 802.1AX: The Standards

Link aggregation was first standardized in 2000 as part of IEEE 802.3ad, an amendment to the Ethernet standard. In 2008, the specification was moved to its own standalone standard, IEEE 802.1AX (Link Aggregation), which is maintained and updated independently from 802.3. The most recent revision, 802.1AX-2020, includes enhancements for distributed resilient network interconnect (DRNI) and conversation-sensitive collection and distribution. Despite the renaming, the protocol on the wire has not fundamentally changed, and most engineers still refer to it as "802.3ad bonding."

The standard defines several key components:

LACPDU Format: The Control Plane

LACP communicates through Link Aggregation Control Protocol Data Units (LACPDUs). These are Ethernet frames sent to the slow protocol multicast address at regular intervals. Each LACPDU carries information about the sending system (the actor) and what the sender knows about its peer (the partner). The complete LACPDU is exactly 110 octets in the payload, padded to meet the Ethernet minimum frame size.

LACPDU Structure (EtherType 0x8809) Subtype = 0x01 Version = 0x01 Actor Information TLV (Type=1, Length=20) System Priority Actor System ID (MAC address, 6 bytes) Key Port Priority Port Number Actor State (8 bits) Partner Information TLV (Type=2, Length=20) System Priority Partner System ID (MAC address, 6 bytes) Key Port Priority Port Number Partner State (8 bits) Collector Information TLV (Type=3, Length=16) Terminator TLV (Type=0) State Byte Flags: bit 0: Activity bit 1: Timeout bit 2: Aggregation bit 3: Synchronization bit 4: Collecting bit 5: Distributing bit 6: Defaulted bit 7: Expired

The key fields within each LACPDU:

The State Machine: Actor and Partner

LACP models the aggregation negotiation as an interaction between two systems: the actor (the local system) and the partner (the remote system). Each port maintains both an actor state and a partner state. The actor state reflects the local system's configuration and status. The partner state reflects what the local system has learned about the remote system from received LACPDUs.

The eight state bits control the negotiation:

The negotiation proceeds through a well-defined state machine. When a port is first enabled, it begins transmitting LACPDUs (if in active mode) or waits for them (if passive). Once both sides have exchanged LACPDUs and confirmed matching keys and system information, the synchronization bit is set. The port then transitions through collecting to distributing, at which point it is fully active and carrying user traffic. This staged approach prevents loops and frame duplication that could occur if both directions were enabled simultaneously.

Active vs. Passive Mode

LACP supports two operational modes on each port:

The critical constraint: at least one side must be active. If both ends are configured as passive, neither will send LACPDUs, the negotiation will never begin, and the ports will remain as individual links — a common misconfiguration. The four possible combinations:

Active  <->  Active  =  LAG forms (both initiate)
Active  <->  Passive =  LAG forms (active side initiates)
Passive <->  Active  =  LAG forms (active side initiates)
Passive <->  Passive =  No LAG (neither initiates)

Most production deployments use active mode on both sides. The overhead of LACPDU transmission is negligible (a 110-byte frame every second or every 30 seconds per port), and active/active eliminates the risk of misconfigured passive/passive pairings. Some administrators use passive mode on server-facing ports to prevent accidentally forming LAGs with devices that were not intended to participate, but this is increasingly rare.

Aggregation Key and Port Selection

The aggregation key is the mechanism that constrains which ports can be combined into a single LAG. Each port has both an administrative key (configured by the operator) and an operational key (computed by the system based on the port's physical properties). The operational key incorporates the port speed and duplex: a 10 Gbps full-duplex port will have a different operational key than a 25 Gbps port, even if they share the same administrative key.

The aggregation rules are strict:

If a port fails any of these constraints, it cannot join the LAG. This is one of LACP's most important safety features. If you accidentally cable a port to a different switch, the System ID mismatch prevents it from joining the LAG, and the error is reported. Static link aggregation (without LACP) has no such protection — the miscabled port would silently forward frames to the wrong destination.

When the number of eligible ports exceeds the maximum members allowed in a LAG (which varies by platform — commonly 8 or 16), the system with higher priority (lower System Priority value) selects which ports become active. Ports are selected based on port priority (lower value = higher priority), then port number as a tiebreaker. Non-selected ports remain in standby and can be activated if an active member fails.

Load Distribution: Hashing Algorithms

A LAG presents itself as a single logical link, but the underlying traffic must be distributed across the physical member links. This distribution is performed by a hashing algorithm that maps each frame or flow to a specific member port. The hash must ensure that frames belonging to the same conversation are always sent over the same member link (to preserve ordering) while distributing different conversations across members as evenly as possible.

Common hashing inputs include:

The hash function itself is typically a CRC or XOR computation over the selected fields, producing a value that is modulo-mapped to a member port index. For example, with four active member links, the hash output modulo 4 determines which link carries each flow. This means that adding or removing a member causes most flows to be rehashed, potentially reordering packets in transit. Some implementations use consistent hashing (also called "resilient hashing") to minimize flow disruption when membership changes.

A critical limitation: LACP does not guarantee equal bandwidth utilization. Because hashing is deterministic and based on flow characteristics, a single elephant flow (e.g., a large backup or database replication stream) will always traverse the same physical link, potentially saturating it while other members sit idle. Flowlet-based load balancing and adaptive techniques at the switch ASIC level can help, but the fundamental per-flow constraint remains.

Failure Detection and Recovery

LACP provides two mechanisms for detecting link failures:

For environments that require sub-second failure detection, LACP's 3-second fast timeout is often too slow. BFD (Bidirectional Forwarding Detection) can be configured to run over the LAG with intervals as low as 50 milliseconds, providing much faster failure notification. Some implementations also support per-member BFD, which can detect unidirectional failures on individual member links faster than LACPDU timeout.

When a member link fails:

  1. The port is removed from the distributing and collecting states.
  2. Traffic that was hashed to the failed link is redistributed to the remaining active members.
  3. If standby ports exist, the highest-priority standby port is activated to replace the failed member.
  4. If no standby ports exist, the LAG continues operating with reduced bandwidth.
  5. The LAG itself remains up as long as at least one member link is operational. The LAG goes down only when all members fail, or when the number of active members drops below a configured minimum threshold (min-links).

The min-links parameter is a critical production safeguard. If you have a 4-member LAG providing 40 Gbps to a server that requires at least 20 Gbps, setting min-links 2 causes the entire LAG to go down if only one member remains. This triggers higher-layer failover (e.g., an ECMP re-route or a BFD-triggered routing adjacency tear-down) rather than allowing the server to limp along with insufficient bandwidth.

MLAG and MC-LAG: Multi-Chassis Link Aggregation

Standard LACP requires all member links to terminate on the same physical switch. This creates a single point of failure: if that switch dies, the entire LAG goes down, regardless of how many physical links it contains. Multi-Chassis LAG (MLAG or MC-LAG) extends the concept by allowing a LAG to span two separate physical switches that present themselves as a single logical system to the LACP partner.

MLAG / MC-LAG: Dual-Homed Link Aggregation Upstream Network / Spine Switch A MLAG Peer (Active) Switch B MLAG Peer (Active) Peer Link (Inter-Chassis Link + keepalive) Server / Host Sees single LAG partner LACP members LACP members Single LAG (from server perspective)

MLAG works by synchronizing state between the two peer switches over a dedicated peer link (also called the inter-chassis link or ICL). The peer switches coordinate their LACP System IDs so that both present the same System ID to the downstream device. From the server's or access switch's perspective, it is forming a normal LACP LAG to a single logical switch. The server has no awareness that its LAG members terminate on two different physical chassis.

The peer link carries two types of traffic:

MLAG is a vendor-specific feature — there is no single interoperability standard. Each vendor has its own implementation:

Despite being proprietary, MLAG has become the de facto standard for redundant access layer connectivity in modern data center leaf-spine fabrics. It eliminates the need for Spanning Tree to block redundant links, since all LAG members are actively forwarding. The result is full utilization of all uplinks and sub-second failover when a switch fails.

LACP and Spanning Tree Interaction

Spanning Tree Protocol (STP) treats each LAG as a single logical port. This is critical for loop prevention: if STP saw each physical member link individually, it would detect multiple parallel paths and block all but one, defeating the entire purpose of link aggregation. By aggregating first and then running STP on the resulting logical topology, all member links forward traffic simultaneously.

The interaction requires careful attention to order of operations when a LAG forms:

  1. Physical links come up.
  2. LACP negotiates and forms the LAG.
  3. STP recognizes the new logical port and runs its algorithm.
  4. If the LAG is not in a forwarding state (STP is blocking it), the member ports do not carry user traffic even though LACP shows them as distributing.

In MLAG environments, STP is largely sidelined. Because the server sees a single LAG to a single logical switch, there is no topology loop from STP's perspective, and the LAG is always in forwarding state. However, STP still runs on the peer switches' uplinks and other non-MLAG ports to protect against accidental loops elsewhere in the network.

LACP Fast Rate vs. Slow Rate

LACP defines two LACPDU transmission rates:

The timeout rate is signaled in the LACPDU's state byte (bit 1). Both sides must agree on the timeout: if one side sends with a short timeout flag, the partner must respond at the fast rate. In practice, you configure the desired rate on both ends, and LACP negotiates accordingly.

Even fast rate (3-second detection) is slow by modern standards. For critical links, pairing LACP with BFD provides detection in as little as 50-150 milliseconds. Some hardware platforms also support LACP fast-switchover, which transitions standby ports to active within a single LACPDU interval rather than waiting for the full three-interval timeout.

Configuration Examples

LACP configuration varies by platform, but the core concepts are consistent. Here is a typical configuration for a 4-member LAG on a network switch:

# Cisco NX-OS / IOS-XE style
interface port-channel 10
  description "Server uplink LAG"
  switchport mode trunk
  switchport trunk allowed vlan 100,200,300
  lacp min-links 2

interface Ethernet1/1-4
  channel-group 10 mode active
  lacp rate fast

And the server side (Linux bonding):

# /etc/network/interfaces (ifupdown) or netplan equivalent
auto bond0
iface bond0 inet static
  address 10.0.1.10/24
  bond-slaves eth0 eth1 eth2 eth3
  bond-mode 802.3ad
  bond-lacp-rate fast
  bond-xmit-hash-policy layer3+4
  bond-miimon 100

Key parameters to configure on both ends:

Maximum Number of Member Links

The IEEE 802.1AX standard allows up to 16 ports per LAG (8 active + 8 standby in the original spec, expanded in later revisions). However, practical limits vary by platform:

In practice, LAGs beyond 8 members are uncommon because at that point, you are typically better served by upgrading to higher-speed optics (e.g., replacing 8x10G with 2x100G) rather than adding more member links with their associated cabling complexity and hash distribution challenges.

Common Pitfalls and Troubleshooting

LACP is straightforward in concept but has several common failure modes in production:

Useful diagnostic commands on most platforms:

# Show LACP neighbor details (Cisco/Arista)
show lacp neighbor
show lacp counters

# Show LAG member status
show port-channel summary
show etherchannel summary

# Linux bonding status
cat /proc/net/bonding/bond0

# Check LACP state per member
show lacp internal

LACP in Data Center Leaf-Spine Fabrics

Modern data center networks use a leaf-spine (Clos) architecture where every leaf switch has equal-cost uplinks to every spine switch. In this topology, LACP appears at two main points:

In EVPN-VXLAN fabrics (which have largely replaced traditional Layer 2 fabrics in new data center deployments), MLAG is being superseded by EVPN multihoming (ESI-LAG). ESI-LAG uses the EVPN control plane (running over BGP) to coordinate the multi-homing instead of a proprietary peer link. Multiple switches advertise the same Ethernet Segment Identifier (ESI) in EVPN, and the EVPN procedures handle designated forwarder election, MAC synchronization, and fast failover. ESI-LAG is standards-based (RFC 7432) and does not require a dedicated peer link, making it architecturally cleaner than MLAG for large-scale deployments.

LACP vs. Static Link Aggregation

Static link aggregation (sometimes called "mode on" or "EtherChannel without negotiation") bonds ports without any control protocol. Both sides simply assign ports to a LAG and start forwarding. The key differences:

There is virtually no reason to use static link aggregation in modern networks. LACP's safety benefits far outweigh its minimal overhead. The only remaining use case for static mode is legacy equipment that does not support LACP, which is exceedingly rare today.

Explore Networks Using Link Aggregation

Link aggregation is invisible in the BGP routing table — it operates at Layer 2, below the level of IP routing. But every major network relies on LAGs for the physical connectivity that underpins their BGP-speaking routers. The god.ad BGP Looking Glass lets you explore the networks that use LAGs for their backbone and peering infrastructure:

Try looking up your own IP address to see which network carries your traffic and trace the BGP AS path it follows across the internet — each hop in that path relies on link aggregation for the physical bandwidth and redundancy that keeps the global routing system running.

See BGP routing data in real time

Open Looking Glass
More Articles
What is DNS? The Internet's Phone Book
What is an IP Address?
IPv4 vs IPv6: What's the Difference?
What is a Network Prefix (CIDR)?
How Does Traceroute Work?
What is a CDN? Content Delivery Networks Explained