How LACP Works: Link Aggregation for Bandwidth and Redundancy
Link Aggregation Control Protocol (LACP) is a Layer 2 protocol that bundles multiple physical Ethernet links between two devices into a single logical channel, providing both increased bandwidth and redundancy. Defined originally in IEEE 802.3ad and later moved to IEEE 802.1AX, LACP dynamically negotiates which links participate in an aggregate, continuously monitors their health, and automatically removes failed members without disrupting traffic flow. Nearly every data center, campus network, and service provider backbone relies on link aggregation to scale bandwidth beyond what a single physical port can deliver while ensuring that a cable pull or transceiver failure does not cause an outage.
Without LACP, you could bond ports statically, but static link aggregation has no mechanism to detect misconfiguration — if one end of a static bond is connected to the wrong switch, frames silently black-hole. LACP solves this by exchanging control frames (LACPDUs) between partners, ensuring both sides agree on which links belong to the aggregate before any traffic flows. This negotiation makes LACP the standard for production deployments where reliability is non-negotiable.
Link aggregation interacts closely with other Layer 2 protocols. Spanning Tree Protocol treats a LAG (Link Aggregation Group) as a single logical link, preventing it from blocking individual member ports. Bidirectional Forwarding Detection (BFD) can run over a LAG to detect failures at sub-second timescales. And at higher layers, routers advertising routes via BGP often use LAGs for their uplinks to increase capacity and resilience toward upstream transit providers and Internet Exchange Points.
IEEE 802.3ad and 802.1AX: The Standards
Link aggregation was first standardized in 2000 as part of IEEE 802.3ad, an amendment to the Ethernet standard. In 2008, the specification was moved to its own standalone standard, IEEE 802.1AX (Link Aggregation), which is maintained and updated independently from 802.3. The most recent revision, 802.1AX-2020, includes enhancements for distributed resilient network interconnect (DRNI) and conversation-sensitive collection and distribution. Despite the renaming, the protocol on the wire has not fundamentally changed, and most engineers still refer to it as "802.3ad bonding."
The standard defines several key components:
- Link Aggregation Group (LAG) — The logical interface formed by combining multiple physical links. From the perspective of higher-layer protocols, a LAG behaves as a single link with a single MAC address.
- Link Aggregation Control Protocol (LACP) — The control protocol that negotiates, establishes, and monitors the aggregation. LACP is one of the IEEE slow protocols, using EtherType 0x8809 and the well-known multicast destination MAC address
01:80:C2:00:00:02. - Aggregation Key — A value assigned to each port that constrains which ports can be aggregated together. Only ports with the same key (and connected to the same partner system) can be members of the same LAG.
- Marker Protocol — A companion protocol used to ensure lossless redistribution of traffic when the set of active member links changes. In practice, most implementations handle this internally without explicitly using the marker protocol on the wire.
LACPDU Format: The Control Plane
LACP communicates through Link Aggregation Control Protocol Data Units (LACPDUs). These are Ethernet frames sent to the slow protocol multicast address at regular intervals. Each LACPDU carries information about the sending system (the actor) and what the sender knows about its peer (the partner). The complete LACPDU is exactly 110 octets in the payload, padded to meet the Ethernet minimum frame size.
The key fields within each LACPDU:
- System Priority (2 bytes) — A configurable priority for the entire system. Lower values indicate higher priority. Combined with the System ID, this determines which system takes the controlling role in the aggregation and has precedence when deciding which ports are active versus standby.
- System ID (6 bytes) — The MAC address of the system. Together with the System Priority, this forms the System Identifier, which uniquely identifies each partner in the LACP negotiation.
- Key (2 bytes) — The aggregation key. Ports can only aggregate together if they share the same operational key. The key is typically derived from the physical port properties (speed, duplex, media type) and any administrative configuration. A 10G port and a 1G port will have different keys and will never aggregate together.
- Port Priority (2 bytes) — Priority of this specific port within the system. Used by the controlling system to decide which ports become active members when there are more eligible ports than the maximum allowed.
- Port Number (2 bytes) — A locally unique identifier for the port.
- State (1 byte) — An 8-bit field encoding the current operational state of the port. Each bit conveys critical information about the link's aggregation status.
The State Machine: Actor and Partner
LACP models the aggregation negotiation as an interaction between two systems: the actor (the local system) and the partner (the remote system). Each port maintains both an actor state and a partner state. The actor state reflects the local system's configuration and status. The partner state reflects what the local system has learned about the remote system from received LACPDUs.
The eight state bits control the negotiation:
- Activity (bit 0) — Set to 1 for active mode, 0 for passive mode. An active port initiates LACPDU transmission without waiting for the partner. A passive port only transmits LACPDUs in response to receiving them from the partner.
- Timeout (bit 1) — Controls the LACPDU transmission interval. When set (short timeout), LACPDUs are sent every 1 second and the receive timeout is 3 seconds. When clear (long timeout), LACPDUs are sent every 30 seconds and the receive timeout is 90 seconds. Short timeout enables faster failure detection.
- Aggregation (bit 2) — Indicates whether the port is capable of being aggregated. If clear, the port is configured as an individual (non-aggregatable) link.
- Synchronization (bit 3) — Set when the port's aggregation state is synchronized with the partner — meaning both sides agree on which LAG the port belongs to. This must be set before the port can carry traffic.
- Collecting (bit 4) — Set when the port is accepting incoming frames from the partner. A port transitions to collecting once synchronization is established.
- Distributing (bit 5) — Set when the port is transmitting frames to the partner. A port transitions to distributing after it confirms the partner is collecting. The collecting/distributing split prevents frame duplication and misordering during transitions.
- Defaulted (bit 6) — Set when the partner information was obtained from administratively configured defaults rather than from received LACPDUs. When the port first comes up and has not yet received a partner LACPDU, it uses default values and sets this bit.
- Expired (bit 7) — Set when the partner information has timed out (no LACPDU received within the timeout window). The port enters an expired state and sends LACPDUs at the fast rate (1 second) in an attempt to re-establish communication. If no response comes within three fast intervals, the port falls back to the defaulted state.
The negotiation proceeds through a well-defined state machine. When a port is first enabled, it begins transmitting LACPDUs (if in active mode) or waits for them (if passive). Once both sides have exchanged LACPDUs and confirmed matching keys and system information, the synchronization bit is set. The port then transitions through collecting to distributing, at which point it is fully active and carrying user traffic. This staged approach prevents loops and frame duplication that could occur if both directions were enabled simultaneously.
Active vs. Passive Mode
LACP supports two operational modes on each port:
- Active mode — The port unconditionally transmits LACPDUs at the configured interval (fast or slow). Active mode ports will initiate negotiation with any partner.
- Passive mode — The port does not transmit LACPDUs unless it first receives one from the partner. Passive mode conserves control plane bandwidth but requires the partner to be active.
The critical constraint: at least one side must be active. If both ends are configured as passive, neither will send LACPDUs, the negotiation will never begin, and the ports will remain as individual links — a common misconfiguration. The four possible combinations:
Active <-> Active = LAG forms (both initiate)
Active <-> Passive = LAG forms (active side initiates)
Passive <-> Active = LAG forms (active side initiates)
Passive <-> Passive = No LAG (neither initiates)
Most production deployments use active mode on both sides. The overhead of LACPDU transmission is negligible (a 110-byte frame every second or every 30 seconds per port), and active/active eliminates the risk of misconfigured passive/passive pairings. Some administrators use passive mode on server-facing ports to prevent accidentally forming LAGs with devices that were not intended to participate, but this is increasingly rare.
Aggregation Key and Port Selection
The aggregation key is the mechanism that constrains which ports can be combined into a single LAG. Each port has both an administrative key (configured by the operator) and an operational key (computed by the system based on the port's physical properties). The operational key incorporates the port speed and duplex: a 10 Gbps full-duplex port will have a different operational key than a 25 Gbps port, even if they share the same administrative key.
The aggregation rules are strict:
- All member ports in a LAG must have the same operational key.
- All member ports must be connected to the same partner system (identified by System ID).
- All member ports must be connected to ports on the partner that share the same partner key.
- All member ports must operate at the same speed and duplex.
If a port fails any of these constraints, it cannot join the LAG. This is one of LACP's most important safety features. If you accidentally cable a port to a different switch, the System ID mismatch prevents it from joining the LAG, and the error is reported. Static link aggregation (without LACP) has no such protection — the miscabled port would silently forward frames to the wrong destination.
When the number of eligible ports exceeds the maximum members allowed in a LAG (which varies by platform — commonly 8 or 16), the system with higher priority (lower System Priority value) selects which ports become active. Ports are selected based on port priority (lower value = higher priority), then port number as a tiebreaker. Non-selected ports remain in standby and can be activated if an active member fails.
Load Distribution: Hashing Algorithms
A LAG presents itself as a single logical link, but the underlying traffic must be distributed across the physical member links. This distribution is performed by a hashing algorithm that maps each frame or flow to a specific member port. The hash must ensure that frames belonging to the same conversation are always sent over the same member link (to preserve ordering) while distributing different conversations across members as evenly as possible.
Common hashing inputs include:
- Source and destination MAC address — The simplest hash, operating purely at Layer 2. Works well when there is a diversity of MAC addresses, but performs poorly when most traffic flows between the same two MAC addresses (e.g., between a server and its default gateway router).
- Source and destination IP address — A Layer 3-aware hash that provides better distribution in most scenarios because there is typically more diversity in IP addresses than in MAC addresses.
- Source and destination IP + TCP/UDP port — A Layer 3+4 hash (sometimes called 5-tuple hashing) that provides the finest granularity. Even traffic between two IP addresses is distributed across members if it uses different TCP/UDP ports (which it almost always does). This is the recommended mode for most deployments.
- MPLS label — In MPLS networks, the label stack can be used as hash input for distributing labeled traffic across LAG members.
The hash function itself is typically a CRC or XOR computation over the selected fields, producing a value that is modulo-mapped to a member port index. For example, with four active member links, the hash output modulo 4 determines which link carries each flow. This means that adding or removing a member causes most flows to be rehashed, potentially reordering packets in transit. Some implementations use consistent hashing (also called "resilient hashing") to minimize flow disruption when membership changes.
A critical limitation: LACP does not guarantee equal bandwidth utilization. Because hashing is deterministic and based on flow characteristics, a single elephant flow (e.g., a large backup or database replication stream) will always traverse the same physical link, potentially saturating it while other members sit idle. Flowlet-based load balancing and adaptive techniques at the switch ASIC level can help, but the fundamental per-flow constraint remains.
Failure Detection and Recovery
LACP provides two mechanisms for detecting link failures:
- Physical link failure — If the physical layer goes down (loss of signal on the fiber or copper link), the switch hardware detects this within milliseconds and removes the port from the LAG immediately. Traffic is rehashed across the remaining members. This is the fastest recovery path.
- LACPDU timeout — If the physical link stays up but LACPDUs stop arriving (e.g., due to a unidirectional fiber failure, a remote software crash, or an intermediate passive optical splitter), the local system detects the failure after the timeout expires. With fast (short) timeout, this takes 3 seconds (3 missed 1-second intervals). With slow (long) timeout, this takes 90 seconds (3 missed 30-second intervals).
For environments that require sub-second failure detection, LACP's 3-second fast timeout is often too slow. BFD (Bidirectional Forwarding Detection) can be configured to run over the LAG with intervals as low as 50 milliseconds, providing much faster failure notification. Some implementations also support per-member BFD, which can detect unidirectional failures on individual member links faster than LACPDU timeout.
When a member link fails:
- The port is removed from the distributing and collecting states.
- Traffic that was hashed to the failed link is redistributed to the remaining active members.
- If standby ports exist, the highest-priority standby port is activated to replace the failed member.
- If no standby ports exist, the LAG continues operating with reduced bandwidth.
- The LAG itself remains up as long as at least one member link is operational. The LAG goes down only when all members fail, or when the number of active members drops below a configured minimum threshold (
min-links).
The min-links parameter is a critical production safeguard. If you have a 4-member LAG providing 40 Gbps to a server that requires at least 20 Gbps, setting min-links 2 causes the entire LAG to go down if only one member remains. This triggers higher-layer failover (e.g., an ECMP re-route or a BFD-triggered routing adjacency tear-down) rather than allowing the server to limp along with insufficient bandwidth.
MLAG and MC-LAG: Multi-Chassis Link Aggregation
Standard LACP requires all member links to terminate on the same physical switch. This creates a single point of failure: if that switch dies, the entire LAG goes down, regardless of how many physical links it contains. Multi-Chassis LAG (MLAG or MC-LAG) extends the concept by allowing a LAG to span two separate physical switches that present themselves as a single logical system to the LACP partner.
MLAG works by synchronizing state between the two peer switches over a dedicated peer link (also called the inter-chassis link or ICL). The peer switches coordinate their LACP System IDs so that both present the same System ID to the downstream device. From the server's or access switch's perspective, it is forming a normal LACP LAG to a single logical switch. The server has no awareness that its LAG members terminate on two different physical chassis.
The peer link carries two types of traffic:
- Control plane synchronization — MAC address tables, ARP/ND tables, LACP state, and MLAG configuration are synchronized between the two peers so they can act as a single forwarding entity. A separate out-of-band keepalive link (often a dedicated management port or routed L3 interface) is used to detect peer failures and prevent split-brain scenarios.
- Data plane forwarding — When a frame arrives on Switch A but needs to egress on a port that exists only on Switch B, the frame is forwarded across the peer link. Good MLAG design minimizes this cross-chassis traffic by ensuring the hash distributes traffic to the switch that has a direct egress path.
MLAG is a vendor-specific feature — there is no single interoperability standard. Each vendor has its own implementation:
- Cisco vPC (Virtual Port Channel) — Available on Nexus switches. Uses a peer keepalive link, a peer link for data forwarding, and CFS (Cisco Fabric Services) for state synchronization.
- Arista MLAG — Uses a peer link and a separate VLAN for the heartbeat. Known for its maturity in large-scale leaf-spine deployments.
- Juniper MC-LAG — Supported on QFX and MX series. Uses ICCP (Inter-Chassis Communication Protocol, RFC 7275) for state synchronization, which is the closest thing to a standard for MC-LAG control plane communication.
- Nokia MC-LAG — Also uses ICCP. Common in service provider deployments.
Despite being proprietary, MLAG has become the de facto standard for redundant access layer connectivity in modern data center leaf-spine fabrics. It eliminates the need for Spanning Tree to block redundant links, since all LAG members are actively forwarding. The result is full utilization of all uplinks and sub-second failover when a switch fails.
LACP and Spanning Tree Interaction
Spanning Tree Protocol (STP) treats each LAG as a single logical port. This is critical for loop prevention: if STP saw each physical member link individually, it would detect multiple parallel paths and block all but one, defeating the entire purpose of link aggregation. By aggregating first and then running STP on the resulting logical topology, all member links forward traffic simultaneously.
The interaction requires careful attention to order of operations when a LAG forms:
- Physical links come up.
- LACP negotiates and forms the LAG.
- STP recognizes the new logical port and runs its algorithm.
- If the LAG is not in a forwarding state (STP is blocking it), the member ports do not carry user traffic even though LACP shows them as distributing.
In MLAG environments, STP is largely sidelined. Because the server sees a single LAG to a single logical switch, there is no topology loop from STP's perspective, and the LAG is always in forwarding state. However, STP still runs on the peer switches' uplinks and other non-MLAG ports to protect against accidental loops elsewhere in the network.
LACP Fast Rate vs. Slow Rate
LACP defines two LACPDU transmission rates:
- Slow rate (long timeout) — LACPDUs are transmitted every 30 seconds. The partner is considered down after 90 seconds (3 missed intervals). This is the default on most platforms and is suitable for stable environments where rapid failover is not critical.
- Fast rate (short timeout) — LACPDUs are transmitted every 1 second. The partner is considered down after 3 seconds. Fast rate is essential for environments that cannot tolerate 90-second failure detection, such as data center interconnects or financial trading networks.
The timeout rate is signaled in the LACPDU's state byte (bit 1). Both sides must agree on the timeout: if one side sends with a short timeout flag, the partner must respond at the fast rate. In practice, you configure the desired rate on both ends, and LACP negotiates accordingly.
Even fast rate (3-second detection) is slow by modern standards. For critical links, pairing LACP with BFD provides detection in as little as 50-150 milliseconds. Some hardware platforms also support LACP fast-switchover, which transitions standby ports to active within a single LACPDU interval rather than waiting for the full three-interval timeout.
Configuration Examples
LACP configuration varies by platform, but the core concepts are consistent. Here is a typical configuration for a 4-member LAG on a network switch:
# Cisco NX-OS / IOS-XE style
interface port-channel 10
description "Server uplink LAG"
switchport mode trunk
switchport trunk allowed vlan 100,200,300
lacp min-links 2
interface Ethernet1/1-4
channel-group 10 mode active
lacp rate fast
And the server side (Linux bonding):
# /etc/network/interfaces (ifupdown) or netplan equivalent
auto bond0
iface bond0 inet static
address 10.0.1.10/24
bond-slaves eth0 eth1 eth2 eth3
bond-mode 802.3ad
bond-lacp-rate fast
bond-xmit-hash-policy layer3+4
bond-miimon 100
Key parameters to configure on both ends:
- Mode —
active(recommended) orpassive. - LACP rate —
fast(1-second interval) orslow(30-second interval, the default). - Hash policy —
layer2,layer2+3,layer3+4, orencap3+4(for VXLAN/NVGRE tunneled traffic). - Min-links — The minimum number of active member links required for the LAG to remain operationally up.
- System priority — Determines which side controls port selection when there are more candidate ports than allowed members. Default is 32768 (mid-range).
Maximum Number of Member Links
The IEEE 802.1AX standard allows up to 16 ports per LAG (8 active + 8 standby in the original spec, expanded in later revisions). However, practical limits vary by platform:
- Most enterprise switches support 8 active members per LAG.
- Data center switches (Cisco Nexus, Arista, Juniper QFX) often support 16 or 32 active members.
- Linux kernel bonding has no hard-coded limit — it is constrained only by the number of available physical interfaces.
- Some high-end platforms support 64 or more members for special use cases like large-scale aggregation in service provider PE routers.
In practice, LAGs beyond 8 members are uncommon because at that point, you are typically better served by upgrading to higher-speed optics (e.g., replacing 8x10G with 2x100G) rather than adding more member links with their associated cabling complexity and hash distribution challenges.
Common Pitfalls and Troubleshooting
LACP is straightforward in concept but has several common failure modes in production:
- Passive/passive misconfiguration — Both ends are in passive mode, so neither sends LACPDUs, and the LAG never forms. Always use active mode on at least one side (preferably both).
- Speed mismatch — If member ports operate at different speeds (e.g., a transceiver negotiated to 1G instead of 10G), LACP will assign different operational keys and refuse to aggregate the mismatched port. Check transceiver and auto-negotiation settings.
- Cabling to different switches — In a non-MLAG environment, if LAG member cables are connected to two different physical switches, LACP detects the System ID mismatch and does not aggregate. The ports remain as individual links. This is LACP's safety net, but the symptom can be confusing if you expected MLAG behavior without configuring it.
- Unidirectional link failure — If the receive fiber in a fiber pair fails but transmit continues working, the local side keeps receiving LACPDUs (reflected from the partner's perspective of the other fiber), but the partner stops receiving them. With slow timeout, this takes 90 seconds to detect. Use fast rate or BFD to mitigate.
- Hash polarization — If every switch in the path uses the same hash algorithm with the same inputs, traffic that was hashed to port 0 on the first LAG will also hash to port 0 on the next LAG, meaning some members are overloaded and others are idle. Use different hash seeds or asymmetric hash algorithms at different tiers to avoid this.
- Partner flapping — If the partner's System ID or key keeps changing (due to a misconfigured or unstable peer), the local port cycles through the state machine repeatedly, causing traffic disruption. Check the partner switch for configuration stability.
Useful diagnostic commands on most platforms:
# Show LACP neighbor details (Cisco/Arista)
show lacp neighbor
show lacp counters
# Show LAG member status
show port-channel summary
show etherchannel summary
# Linux bonding status
cat /proc/net/bonding/bond0
# Check LACP state per member
show lacp internal
LACP in Data Center Leaf-Spine Fabrics
Modern data center networks use a leaf-spine (Clos) architecture where every leaf switch has equal-cost uplinks to every spine switch. In this topology, LACP appears at two main points:
- Server-to-leaf — Servers typically have 2-4 NICs bonded via LACP to one or two leaf switches (using MLAG for dual-homed designs). This provides both bandwidth and redundancy for the server's network connectivity.
- Leaf-to-spine — Less common for individual links (since ECMP across multiple spine switches provides natural load balancing), but sometimes used when multiple physical links connect a single leaf to a single spine to increase bandwidth between them.
In EVPN-VXLAN fabrics (which have largely replaced traditional Layer 2 fabrics in new data center deployments), MLAG is being superseded by EVPN multihoming (ESI-LAG). ESI-LAG uses the EVPN control plane (running over BGP) to coordinate the multi-homing instead of a proprietary peer link. Multiple switches advertise the same Ethernet Segment Identifier (ESI) in EVPN, and the EVPN procedures handle designated forwarder election, MAC synchronization, and fast failover. ESI-LAG is standards-based (RFC 7432) and does not require a dedicated peer link, making it architecturally cleaner than MLAG for large-scale deployments.
LACP vs. Static Link Aggregation
Static link aggregation (sometimes called "mode on" or "EtherChannel without negotiation") bonds ports without any control protocol. Both sides simply assign ports to a LAG and start forwarding. The key differences:
- Misconfiguration detection — LACP detects mismatched cabling, speed mismatches, and one-sided configurations. Static mode does not — frames are silently black-holed or duplicated.
- Graceful addition/removal — LACP can add new members to a LAG without disrupting existing traffic (the new port transitions through the state machine before it carries traffic). Static mode activates ports immediately, which can cause temporary frame duplication.
- Failure detection — LACP detects partner failures via LACPDU timeout, even when the physical link stays up. Static mode relies solely on physical link state.
- Overhead — Static mode has zero control plane overhead. LACP transmits one 110-byte frame per second per port (fast rate). This overhead is negligible on modern hardware.
There is virtually no reason to use static link aggregation in modern networks. LACP's safety benefits far outweigh its minimal overhead. The only remaining use case for static mode is legacy equipment that does not support LACP, which is exceedingly rare today.
Explore Networks Using Link Aggregation
Link aggregation is invisible in the BGP routing table — it operates at Layer 2, below the level of IP routing. But every major network relies on LAGs for the physical connectivity that underpins their BGP-speaking routers. The god.ad BGP Looking Glass lets you explore the networks that use LAGs for their backbone and peering infrastructure:
- AS13335 — Cloudflare (uses LACP and ECMP extensively across its edge PoPs)
- AS15169 — Google (pioneer of large-scale data center fabric design with aggregation at every tier)
- AS32934 — Meta (operates massive leaf-spine fabrics with MLAG and ESI-LAG)
- AS6939 — Hurricane Electric (heavy use of LAGs for peering at IXPs worldwide)
Try looking up your own IP address to see which network carries your traffic and trace the BGP AS path it follows across the internet — each hop in that path relies on link aggregation for the physical bandwidth and redundancy that keeps the global routing system running.