How MPLS Traffic Engineering Works: CSPF, RSVP-TE, and Fast Reroute

MPLS Traffic Engineering (MPLS-TE) is a set of protocols and mechanisms that allow network operators to steer traffic along explicit paths through a service provider backbone, overriding the default shortest-path routing computed by Interior Gateway Protocols like OSPF or IS-IS. By combining constraint-based path computation (CSPF), RSVP-TE signaling, bandwidth reservation, link coloring, and fast reroute protection, MPLS-TE transforms a best-effort IP network into an infrastructure where every major traffic flow can be placed on a path optimized for bandwidth utilization, latency, resilience, or administrative policy. Traffic engineering is the primary reason MPLS remains deeply embedded in large-scale service provider networks, even as the industry migrates toward Segment Routing.

In a pure IP network, traffic follows the IGP shortest path from source to destination. If the IGP metric is based on link bandwidth (a common convention), all traffic toward a given prefix converges on the single highest-bandwidth path, potentially leaving parallel links idle while the shortest path becomes congested. MPLS-TE solves this fundamental imbalance by decoupling the forwarding path from the IGP metric, allowing operators to distribute traffic across the network topology according to real constraints: available bandwidth, administrative policy, shared-risk avoidance, and latency targets.

The TE Information Base: IGP Extensions

Traffic engineering begins with topology awareness. Standard OSPF and IS-IS advertise only basic reachability and link cost. For TE, the IGP must flood additional link attributes so that every router has a complete picture of network resources. OSPF-TE (RFC 3630) and IS-IS-TE (RFC 5305) extend the IGP with opaque LSAs (OSPF) or sub-TLVs (IS-IS) that carry per-link information:

Every router in the IGP area collects these advertisements and builds a Traffic Engineering Database (TED). The TED is a graph of the network annotated with bandwidth, color, SRLG, and metric information at every link. When a head-end router needs to compute a TE path, it runs CSPF over this database rather than the plain IGP topology.

CSPF: Constraint-Based Path Computation

The Constrained Shortest Path First (CSPF) algorithm is a modified Dijkstra computation that finds the shortest path through the TED while respecting a set of constraints. Standard Dijkstra minimizes cost without regard to resource availability. CSPF adds a pruning step: before running Dijkstra, the algorithm removes from the topology any links that violate the tunnel's constraints.

The CSPF process proceeds in three phases:

  1. Topology pruning — remove links that do not meet the tunnel's requirements:
    • Links with insufficient unreserved bandwidth at the tunnel's setup priority
    • Links whose administrative group bits do not satisfy the tunnel's include/exclude affinity constraints
    • Links or nodes explicitly excluded by the operator (using exclude-address)
    • Links belonging to SRLGs that the operator wants to avoid
  2. Shortest path computation — run Dijkstra on the pruned topology using the TE metric (or IGP metric if no TE metric is configured). The result is the shortest feasible path that satisfies all constraints.
  3. Tie-breaking — if multiple equal-cost paths exist, CSPF applies implementation-specific tie-breaking rules. Common strategies include preferring the path with the highest minimum bandwidth, the fewest hops, or a deterministic selection based on router IDs.

CSPF runs only at the head-end router (the ingress of the TE tunnel). Transit routers do not independently compute the path. The head-end encodes the computed path as an Explicit Route Object (ERO) in the RSVP-TE signaling message, and each downstream router simply follows the ERO instructions.

CSPF: Prune Infeasible Links, Then Run Dijkstra Full Topology (TED) After Pruning (BW < 10G removed) R1 R2 R3 R4 R5 40G 5G 20G 8G 30G 10G prune R1 R2 R3 R4 R5 40G 20G 30G 10G CSPF result: R1 → R2 → R4 → R5 Shortest path on pruned topology satisfying BW ≥ 10G Link with BW ≥ 10G (feasible) Link with BW < 10G (pruned by CSPF) Tunnel constraint: minimum bandwidth = 10 Gbps, setup priority = 7

RSVP-TE Signaling

RSVP-TE (Resource Reservation Protocol — Traffic Engineering, RFC 3209) is the signaling protocol that establishes TE tunnels by distributing labels and reserving resources along the computed path. It extends the original RSVP (designed for IntServ QoS reservation) with two critical additions: the Explicit Route Object (ERO) for source-routed paths, and the Label Request/Label objects for MPLS label distribution.

RSVP-TE signaling proceeds in two phases:

Phase 1: PATH Message (Downstream)

The head-end router initiates the tunnel by sending an RSVP PATH message toward the tail-end (egress) router. The PATH message travels hop-by-hop along the path specified in the ERO. Each transit router processes the PATH message, records the upstream neighbor in its Path State Block (PSB), removes its own address from the ERO, and forwards the message to the next hop. The PATH message carries several critical objects:

Phase 2: RESV Message (Upstream)

When the PATH message reaches the tail-end router, the tail-end allocates a label, reserves bandwidth, and sends an RSVP RESV message back upstream, hop-by-hop, along the reverse of the PATH. Each transit router receiving a RESV message performs three actions: allocates its own label for the LSP, installs the label forwarding entry in its LFIB (mapping its local label to the downstream label received in the RESV), and reserves the requested bandwidth on the outgoing interface. The RESV message propagates upstream until it reaches the head-end, which installs the final forwarding entry and begins pushing labels onto data traffic.

Once both PATH and RESV have completed, the TE tunnel is established. RSVP-TE maintains the tunnel state through periodic refresh messages — PATH and RESV are re-sent every 30 seconds by default. If a router misses several consecutive refreshes (typically three), it tears down the LSP. This soft-state model provides resilience but creates signaling overhead in networks with many tunnels. Refresh Reduction (RFC 2961) and Summary Refresh extensions mitigate this by bundling refresh state and using reliable delivery with acknowledgments.

Explicit Route Object (ERO) in Detail

The ERO is the mechanism that makes RSVP-TE "traffic engineering" rather than just "signaling." It encodes the head-end's chosen path as a sequence of abstract nodes, where each node is identified by an IPv4/IPv6 address or an unnumbered interface ID. Two types of ERO hops exist:

A typical ERO produced by CSPF at the head-end consists entirely of strict hops, since CSPF has full topology visibility and can specify every hop precisely. Loose hops are used when the head-end wants to delegate part of the path computation (common in inter-area or inter-AS scenarios where the head-end lacks visibility into remote topology), or when an operator manually configures a partial explicit path with only key waypoints specified.

Bandwidth Reservation and Preemption

RSVP-TE reserves bandwidth at each link along the TE tunnel path. This reservation is an accounting mechanism, not a hardware-enforced rate limit. When a tunnel requesting 5 Gbps is signaled through a 100 Gbps link, the router subtracts 5 Gbps from the link's unreserved bandwidth counter and re-floods the updated value via IGP-TE. Subsequent CSPF computations by other head-end routers see the reduced available bandwidth and route around congested links.

Bandwidth reservation interacts with a preemption system that uses two priority values per tunnel, each ranging from 0 (highest) to 7 (lowest):

The convention is that setup priority should be numerically greater than or equal to hold priority (i.e., a tunnel should not be able to preempt others more aggressively than it can defend its own reservation). When preemption occurs, the preempted tunnel is torn down and must re-signal, potentially along a different path. Preemption enables differentiated service: critical tunnels (e.g., voice transport) can be configured with high priority to always find bandwidth, while best-effort tunnels yield when resources are scarce.

The IGP-TE extensions advertise unreserved bandwidth at all eight priority levels, so CSPF can determine not only whether a link has bandwidth available, but also whether existing reservations could be preempted to make room.

Link Coloring and Affinities

Link coloring (also called administrative groups or affinities) provides a flexible policy mechanism for constraining TE paths based on link attributes that are not captured by bandwidth or metric alone. Each link is assigned a 32-bit administrative group bitmask, where each bit represents a color or attribute. Operators define what each bit means according to their own conventions:

TE tunnels specify affinity constraints using two parameters:

For example, a tunnel carrying GDPR-regulated European traffic might set include-all = bit 0 (domestic) and exclude-any = bit 1 (submarine), ensuring the path stays on terrestrial domestic links. A premium voice tunnel might set include-all = bit 2 | bit 3 (low-latency AND gold SLA). CSPF enforces these constraints during the topology pruning phase.

RFC 7308 extended the administrative group field from 32 bits to an arbitrary length (Extended Administrative Groups), allowing operators to define far more colors for complex policy requirements.

Fast Reroute (FRR)

MPLS-TE Fast Reroute (RFC 4090) provides sub-50-millisecond protection against link and node failures, far faster than IGP reconvergence (which can take hundreds of milliseconds to seconds even with fast timers). FRR pre-computes backup paths and pre-installs backup label forwarding entries so that when a failure is detected (typically via loss of SONET/SDH framing, Ethernet link loss, or BFD timeout), the Point of Local Repair (PLR) — the router immediately upstream of the failure — can switch traffic to the backup path without waiting for the head-end to re-signal the tunnel.

Two FRR mechanisms exist, each with different scalability and protection characteristics:

Facility Backup (Bypass Tunnels)

Facility backup is the most widely deployed FRR mechanism. The PLR pre-establishes a bypass tunnel around each protected link or node. When a failure occurs, the PLR redirects all affected TE tunnels into the bypass tunnel by pushing an additional MPLS label (the bypass tunnel's label) onto the packet. The packet traverses the bypass tunnel with an extra label in the stack, and when it emerges at the Merge Point (MP) — the router downstream of the failure — the bypass label is popped, and the original TE tunnel label is restored.

Two levels of protection exist:

The key advantage of facility backup is scalability: a single bypass tunnel can protect thousands of TE tunnels that traverse the same link or node. The PLR maintains one bypass tunnel per protected element, not one per protected TE tunnel.

One-to-One Backup (Detour LSPs)

In one-to-one backup, each protected TE tunnel gets its own dedicated backup path (a detour LSP) around each potential failure point. When the PLR detects a failure, it switches the specific tunnel to its dedicated detour path. The detour is a full RSVP-TE-signaled LSP with its own label bindings.

One-to-one backup provides per-tunnel protection granularity: each tunnel can have a detour optimized for its specific constraints (bandwidth, affinity). However, it scales poorly — if 1,000 TE tunnels traverse a link, 1,000 detour LSPs must be signaled and maintained for that link alone. For this reason, facility backup is far more common in production networks.

Fast Reroute: Facility Backup (Bypass Tunnel) Head ingress PLR detects fail R-fail node down MP merge point Tail egress R-byp NNHOP Bypass Tunnel pre-established, protects all tunnels via R-fail push bypass label pop bypass label Recovery: < 50 ms (BFD detect + label push)

Auto-Bandwidth

Static bandwidth reservation requires operators to manually configure the bandwidth for each TE tunnel and update it as traffic patterns change. In a network with thousands of tunnels, this is operationally infeasible. Auto-bandwidth automates this process by continuously monitoring the actual traffic rate on each TE tunnel and periodically adjusting the reserved bandwidth to match.

The auto-bandwidth process operates in cycles:

  1. Sampling — the head-end router measures the traffic rate on the tunnel at regular intervals (e.g., every 5 minutes) and tracks the peak or average rate over the measurement period.
  2. Adjustment interval — at the end of each adjustment interval (e.g., every 24 hours, or triggered by threshold crossing), the head-end compares the measured rate to the current reservation.
  3. Re-optimization — if the measured rate differs from the reservation by more than a configurable threshold, the head-end re-signals the tunnel with the new bandwidth using make-before-break. This establishes the new LSP (with updated bandwidth) before tearing down the old one, ensuring zero traffic loss during the transition.

Auto-bandwidth typically has configurable minimum and maximum bandwidth bounds to prevent a tunnel from shrinking to near-zero (which would make it vulnerable to preemption) or growing beyond what is reasonable. An overflow threshold triggers immediate re-signaling if traffic suddenly spikes beyond the current reservation, rather than waiting for the next adjustment interval. Similarly, an underflow threshold can trigger downward adjustment to release unused bandwidth for other tunnels.

Auto-bandwidth is particularly important in networks using BGP-based traffic steering, where BGP prefix assignments to TE tunnels can cause sudden traffic shifts. When a large prefix is moved from one tunnel to another (e.g., due to a route change), auto-bandwidth detects the resulting traffic change and adjusts both tunnels' reservations accordingly.

TE Tunnels: Configuration and Behavior

A TE tunnel is a unidirectional logical interface on the head-end router. From the head-end's perspective, the TE tunnel looks like any other interface: you can route traffic into it, apply QoS policies, and monitor counters. Under the hood, the TE tunnel maps to an RSVP-TE-signaled LSP that carries traffic from the head-end to the tail-end along the CSPF-computed or explicitly configured path.

Key tunnel configuration parameters include:

Traffic is directed into TE tunnels through several mechanisms. Autoroute announce injects the tunnel as a next hop into the IGP, so that traffic destined for the tail-end (or prefixes behind it) is automatically placed in the tunnel. Static routing or policy-based routing (PBR) can direct specific traffic classes into specific tunnels. In larger deployments, RSVP-TE auto-tunnel can automatically create TE tunnels to all PE routers, simplifying mesh tunnel management.

PCEP and PCE-Based Traffic Engineering

As networks grow, running CSPF independently on every head-end router creates limitations. Each head-end has visibility only into its own IGP area (or level, in IS-IS). Multi-area or multi-AS TE tunnels require loose hops or manual stitching. Furthermore, independent per-head-end computation cannot perform network-wide optimization — each head-end optimizes its own tunnels without knowledge of other head-ends' decisions, potentially leading to suboptimal bandwidth utilization globally.

The Path Computation Element (PCE) architecture (RFC 4655) addresses these limitations by centralizing path computation in a dedicated server. The PCE has a global view of the network topology, including multiple IGP areas, multiple layers, and even multiple autonomous systems. Head-end routers (called Path Computation Clients, or PCCs) delegate path computation to the PCE instead of running CSPF locally.

Communication between PCCs and PCEs uses the Path Computation Element Communication Protocol (PCEP), defined in RFC 5440. PCEP is a TCP-based protocol (port 4189) where the PCC sends a PCReq (path computation request) specifying the tunnel's source, destination, bandwidth, constraints, and objective function. The PCE responds with a PCRep containing the computed ERO, which the PCC then uses to signal the RSVP-TE tunnel normally.

Several PCE deployment models exist:

PCE-based TE is particularly valuable in multi-layer networks (e.g., IP/MPLS over optical/DWDM), where a PCE with visibility into both layers can compute paths that optimally use both packet and optical resources. It is also the foundation for centralized TE in Segment Routing deployments, where the PCE computes SR Policies (lists of segment identifiers) instead of RSVP-TE EROs.

Make-Before-Break Re-Optimization

When a TE tunnel needs to move to a new path — due to auto-bandwidth adjustment, periodic re-optimization, or manual reconfiguration — tearing down the old LSP before establishing the new one would cause traffic loss for the duration of signaling (typically tens to hundreds of milliseconds). Make-before-break (MBB) eliminates this disruption by establishing the new LSP while the old one is still active, then atomically switching traffic from old to new.

MBB uses RSVP-TE's shared explicit (SE) reservation style. Both the old and new LSPs share the same RSVP session identifier, which allows them to share bandwidth on links they have in common. Without SE style, the new LSP would need to reserve additional bandwidth on shared links, potentially failing admission control. With SE, the reservation is shared: if the old LSP reserves 10 Gbps on a link and the new LSP also traverses that link requesting 10 Gbps, only 10 Gbps total is reserved, not 20 Gbps.

The MBB process:

  1. Head-end computes a new path via CSPF.
  2. Head-end signals the new LSP (PATH/RESV) using the same session but a different LSP ID.
  3. Once the new LSP is fully established (RESV received), the head-end atomically moves traffic from the old LSP to the new one.
  4. The old LSP is torn down (PathTear message).

Scaling Challenges of RSVP-TE

RSVP-TE is a stateful, per-tunnel protocol. Every transit router along a TE tunnel's path must maintain RSVP state (path state block, reservation state block, label forwarding entry) for that tunnel. In a network with N PE routers requiring full-mesh TE tunnels, the number of tunnels is O(N^2). Each tunnel creates state at every transit router it traverses. For a large service provider with hundreds of PEs, this means:

This scaling wall is the primary motivation for the industry's migration to Segment Routing, which eliminates per-tunnel state in the network core entirely.

Migration to Segment Routing TE

Segment Routing (SR) provides traffic engineering capabilities equivalent to RSVP-TE but without the stateful signaling overhead. In SR-TE, the head-end router encodes the desired path as an ordered list of segment identifiers (SIDs) pushed onto the MPLS label stack (in SR-MPLS) or into an IPv6 Segment Routing Header (in SRv6). Transit routers process these segments using standard MPLS label operations or IPv6 header processing, with no per-tunnel state required.

The mapping from RSVP-TE concepts to SR-TE is direct:

RSVP-TE Concept SR-TE Equivalent
TE Tunnel SR Policy (candidate paths with segment lists)
Explicit Route (ERO) Segment List (ordered SIDs)
RSVP-TE signaling No signaling; SIDs distributed via IGP or BGP
FRR bypass tunnel Topology-Independent LFA (TI-LFA)
CSPF at head-end CSPF at head-end or PCE-computed SR Policy
Bandwidth reservation No native reservation; PCE or controller manages utilization
Auto-bandwidth Controller-based bandwidth management

A critical difference is bandwidth reservation: RSVP-TE provides distributed, protocol-level bandwidth admission control. SR-TE does not — there is no signaling to accept or reject a path based on available bandwidth. Instead, bandwidth management in SR-TE is typically handled by a centralized PCE or SDN controller that tracks link utilization (via telemetry) and computes SR Policies accordingly. This is a philosophical shift from distributed admission control to centralized optimization.

Migration strategies from RSVP-TE to SR-TE are typically incremental:

  1. Ship-in-the-night — SR and RSVP-TE coexist in the same network. New tunnels are created as SR Policies; existing RSVP-TE tunnels remain until they can be migrated. Both use the same underlying IGP-TE topology database.
  2. Mapping Server — an SR Mapping Server (SRMS) advertises prefix-to-SID mappings for routers that do not yet support SR, allowing SR-capable routers to build SR paths that traverse legacy routers still running LDP.
  3. PCE-initiated migration — an active stateful PCE manages both RSVP-TE tunnels and SR Policies. The PCE gradually replaces RSVP-TE tunnels with SR Policies, managing the transition centrally.

TI-LFA (Topology-Independent Loop-Free Alternate) replaces RSVP-TE FRR with a mechanism that requires no pre-established bypass tunnels. When a failure occurs, each router independently computes a post-convergence shortest path and installs a backup path expressed as a short segment list that steers traffic around the failure. Because the backup path is expressed as segments rather than a signaled tunnel, it requires no advance signaling or state maintenance, and it provides protection for any topology (hence "topology-independent").

Interaction with BGP and IGP

MPLS-TE operates at the intersection of multiple routing protocols. Understanding these interactions is essential:

Real-World Deployment Patterns

MPLS-TE is deployed by virtually every major service provider backbone. The specific deployment pattern depends on the network's size, traffic matrix complexity, and operational maturity:

When you look up a major transit provider in a BGP looking glass, the BGP routes you see are carried over backbone paths that are almost certainly traffic-engineered. The AS paths visible in the public routing table represent the inter-domain layer, but within each AS, MPLS-TE (or increasingly, SR-TE) determines the actual physical path packets take through the provider's infrastructure.

Explore Traffic-Engineered Networks

MPLS-TE is internal to each provider's network and not directly visible in the public BGP table. However, you can explore the autonomous systems that rely on traffic engineering for their backbone transport. Use the god.ad BGP Looking Glass to examine routes, AS paths, and peering relationships for networks that operate large-scale MPLS-TE and SR-TE deployments.

See BGP routing data in real time

Open Looking Glass
More Articles
What is BGP? The Internet's Routing Protocol Explained
What is an Autonomous System (AS)?
What is a BGP Looking Glass?
How to Look Up an IP Address's BGP Route
Understanding BGP AS Paths
What is RPKI? Securing BGP Routing