How BGP Graceful Restart Works: Preserving Forwarding Across Restarts

BGP Graceful Restart (GR) is a mechanism defined in RFC 4724 that allows a BGP speaker to preserve its forwarding state across a restart of the BGP process. Without Graceful Restart, when a BGP session drops, the peer immediately withdraws all routes learned from the restarting speaker, triggering a ripple of convergence events across the internet. With GR, the peer (called the "helper" or "receiving speaker") continues to use the previously learned routes for a negotiated period, giving the restarting speaker time to re-establish the session and re-advertise its routes without causing a traffic disruption.

This matters enormously in production networks. Software upgrades, process crashes, and configuration reloads are routine. Each of these events can cause a BGP session reset. In a network carrying thousands of prefixes across dozens of peering sessions, a restart without GR can cause millions of route withdrawals to propagate globally, shifting traffic across alternate paths, creating congestion, and potentially causing packet loss far from the network that restarted. Graceful Restart transforms a potentially service-affecting event into one that is transparent to traffic.

The Problem: What Happens Without Graceful Restart

To understand why GR exists, consider what happens when a BGP speaker restarts without it:

  1. The BGP process terminates (planned upgrade, crash, or configuration reload).
  2. The TCP connection underlying the BGP session drops. The peer detects this either immediately (via TCP RST) or after the BGP hold timer expires (default 90 seconds, or much faster if BFD is in use).
  3. The peer removes all routes learned from the restarting speaker from its Adj-RIB-In, runs the best-path algorithm, and withdraws those routes from all of its own peers.
  4. Those withdrawals propagate through the internet, causing every AS along each AS path to reconverge.
  5. When the restarting speaker comes back up, it re-establishes the BGP session, exchanges OPEN messages, and re-advertises all of its routes.
  6. The peer re-installs the routes, runs best-path selection again, and re-announces them to its own peers.
  7. The rest of the internet converges back to the original state.

The entire sequence -- withdraw, converge to backup paths, re-advertise, converge back -- can take minutes, during which traffic takes suboptimal paths and may be dropped entirely if no alternate path exists. All of this churn is unnecessary if the restarting router's forwarding plane (the hardware or kernel-level data path) never actually stopped forwarding packets. The control plane (BGP process) went away, but the forwarding table in the router's line cards or kernel remained intact.

Graceful Restart exploits this separation between control plane and forwarding plane. If the router can keep forwarding packets using its existing FIB while the BGP process restarts, and the peer can keep its routes in the RIB during that window, traffic continues to flow as if nothing happened.

RFC 4724: The Graceful Restart Mechanism

RFC 4724, published in January 2007, defines the core Graceful Restart mechanism for BGP. It introduces three key concepts: the Graceful Restart capability (negotiated in OPEN messages), the restart timer, and the End-of-RIB marker.

GR Capability Negotiation

Graceful Restart must be negotiated between peers. A BGP speaker advertises its willingness to support GR by including the Graceful Restart Capability (capability code 64) in its OPEN message during session establishment. This capability contains:

Both peers must include the GR capability in their OPEN messages for Graceful Restart to be active on the session. However, the roles are asymmetric: at any given restart event, one peer is the restarting speaker and the other is the helper (or receiving speaker).

BGP Graceful Restart Timeline Restarting Speaker Helper (Peer) normal operation (GR capability negotiated) BGP session established, routes exchanged BGP restarts TCP connection drops Start Restart Timer retain stale routes in RIB control plane down FIB preserved OPEN (R bit=1, F bit=1 per AFI/SAFI) OPEN (GR capability) UPDATE messages (re-advertise routes) End-of-RIB marker Remove remaining stale routes Restart Timer

Restarting Speaker Behavior

When the BGP process restarts on a router that supports GR:

  1. Preserve forwarding state -- Before or during the restart, the router preserves its Forwarding Information Base (FIB). On modern routers, the FIB lives in hardware (ASICs, line cards) or in the kernel, separate from the BGP process. The BGP process signals the forwarding plane to mark existing entries as stale but continue using them -- this is called Non-Stop Forwarding (NSF) in many vendor implementations.
  2. Re-establish TCP and BGP session -- The restarting speaker opens a new TCP connection to the peer and sends an OPEN message with the Graceful Restart capability. It sets the Restart State (R) bit to 1, indicating that this is a post-restart session establishment. For each AFI/SAFI where forwarding state was preserved, it sets the Forwarding State (F) bit to 1.
  3. Defer best-path selection -- The restarting speaker should defer running the decision process for routes received from the helper until it has received the End-of-RIB marker from the helper, or until a "stale timer" expires. This prevents premature route selection with incomplete information.
  4. Send End-of-RIB -- After sending all UPDATE messages to re-advertise its routes, the restarting speaker sends an End-of-RIB marker to signal that its initial routing table dump is complete.

Helper (Receiving Speaker) Behavior

The helper is the peer that did not restart. When it detects that the BGP session has dropped (via TCP RST, hold timer expiry, or BFD):

  1. Check GR capability -- If the peer had previously advertised the GR capability with a non-zero Restart Time, the helper enters Graceful Restart mode instead of immediately withdrawing routes.
  2. Mark routes as stale -- All routes received from the restarting peer are marked as stale in the Adj-RIB-In. These routes remain in the RIB and continue to be used for forwarding and announced to other peers.
  3. Start the Restart Timer -- The helper starts a timer set to the Restart Time value from the peer's most recent OPEN message. If this timer expires before the restarting speaker re-establishes the session, the helper deletes all stale routes and performs normal withdrawal procedures. The restart has failed.
  4. Accept new session -- When the restarting speaker reconnects, the helper checks the R bit and F bits in the new OPEN message. If the F bit is set for an address family, the helper continues to retain the stale routes for that AFI/SAFI. If F is not set, the helper immediately deletes stale routes for that address family -- the restarting speaker is signaling that its forwarding state was not preserved.
  5. Process End-of-RIB -- When the helper receives the End-of-RIB marker from the restarting speaker, it deletes any routes still marked as stale that were not refreshed by the new UPDATE messages. These routes were present before the restart but are no longer being advertised, so they should be withdrawn.

The End-of-RIB Marker

RFC 4724 introduces the End-of-RIB marker (also called the "initial update completion marker") as a signal that a speaker has finished sending its initial routing table after session establishment. The marker is simply an UPDATE message with no reachable NLRI and no withdrawn routes -- an empty UPDATE for the address family in question.

For IPv4 unicast, the End-of-RIB marker is an UPDATE message with no NLRI field and no Withdrawn Routes field. For other address families using multiprotocol extensions (RFC 4760), it is an UPDATE with an empty MP_UNREACH_NLRI attribute for the appropriate AFI/SAFI.

The End-of-RIB marker serves two purposes:

The End-of-RIB marker has proven so useful that it is now commonly implemented even outside the GR context. Many BGP implementations send an End-of-RIB after the initial route exchange on any new session, regardless of whether GR is negotiated. Route reflectors and other aggregation points use it to sequence their own best-path calculations.

Timers and Their Interaction

Getting the timers right is critical to a successful Graceful Restart. There are several timers involved, and their relationships determine whether the restart is transparent or causes a traffic disruption:

Restart Timer

Advertised in the GR capability, this is the time the restarting speaker expects to need to come back up and re-establish the BGP session. The helper uses this as the deadline. Typical values range from 120 to 300 seconds. Setting it too low risks the helper deleting routes before the restart completes. Setting it too high means the helper retains potentially invalid routes for too long if the restart fails entirely.

Stale Path Timer (Selection Deferral Timer)

This is a local timer on the restarting speaker. After re-establishing sessions, the restarting speaker waits for End-of-RIB markers from its peers before running the best-path decision process. The stale path timer sets an upper bound on this wait. RFC 4724 suggests a default of 360 seconds. If the timer expires before all peers have sent End-of-RIB, the restarting speaker runs best-path with whatever information it has and deletes any remaining stale routes from its own FIB.

BGP Hold Timer and BFD

The BGP hold timer (default 90 seconds per RFC 4271) governs how long a speaker waits for a KEEPALIVE or UPDATE before declaring the session dead. BFD (Bidirectional Forwarding Detection) can detect link or neighbor failures in milliseconds. When GR is enabled, the interaction between BFD and GR requires careful consideration: BFD can trigger a session reset very quickly, which starts the GR process. This is usually desirable -- fast detection of a restart event means the helper enters GR mode promptly and the restart timer starts early, giving the restarting speaker the full window to recover.

However, BFD and GR can conflict if BFD detects a failure that is not a graceful restart -- for example, a link failure where the restarting speaker's forwarding plane is also down. In that case, retaining stale routes for the full restart timer causes traffic to be black-holed. Many implementations allow configuring BFD to be "GR-aware" so it can distinguish between a control-plane restart (where GR should activate) and a forwarding-plane failure (where routes should be withdrawn immediately).

Forwarding State Preservation: The Hard Part

The GR mechanism in RFC 4724 assumes that the restarting router's forwarding plane continues to work during the restart. This assumption is the foundation of the entire mechanism -- if the forwarding plane goes down, retaining stale routes on the helper just causes traffic to be black-holed instead of being rerouted.

How forwarding state is preserved depends on the router platform:

Long-Lived Graceful Restart (LLGR)

RFC 4724's Graceful Restart has a fundamental limitation: the Restart Time field is only 12 bits, capping the maximum at 4095 seconds (~68 minutes). For many operational scenarios, this is insufficient. A major software upgrade, a complex debugging session, or a hardware replacement can take hours. If the BGP process does not return within the restart timer, the helper withdraws all stale routes, triggering full reconvergence.

Long-Lived Graceful Restart (LLGR), defined in RFC 9494 (published 2023), extends the GR mechanism to support arbitrarily long restart periods. LLGR adds a second phase after the regular GR timer expires:

  1. The regular GR procedure runs as defined in RFC 4724.
  2. When the Restart Timer expires without the restarting speaker reconnecting, instead of deleting stale routes, the helper transitions them to LLGR stale state.
  3. LLGR stale routes are kept in the RIB but with a dramatically reduced priority: the LLGR_STALE community (65535:6) is attached, and the routes are treated as less preferred than any non-stale route in the best-path algorithm. They are also re-advertised to other LLGR-aware peers with the LLGR_STALE community, so those peers also de-prefer them.
  4. A new Long-Lived Stale Time (32 bits, supporting up to ~136 years) governs how long LLGR stale routes are retained. Practical values are typically hours to days.
  5. When the restarting speaker finally reconnects, the LLGR stale routes are refreshed or deleted, just as in regular GR.
GR vs. Long-Lived Graceful Restart (LLGR) Phases Session drops GR Phase routes marked stale, full pref Restart Timer expires LLGR Phase LLGR_STALE community, de-preferred LLGR expires or speaker returns Restart Time 12 bits, max ~68 min Long-Lived Stale Time 32 bits, hours to days Route state: stale, normal preference LLGR_STALE, lowest preference purged Without GR: routes immediately withdrawn

The key insight of LLGR is that a de-preferred route is better than no route. If the only path to a destination goes through the restarting speaker, an LLGR stale route will still be used -- but if any alternative path exists, that path will be preferred. This provides a graceful degradation: traffic shifts to alternate paths during an extended outage but falls back to the stale route as a last resort.

LLGR Communities

RFC 9494 defines two well-known communities for LLGR:

The Notification (N) Bit

Standard BGP Graceful Restart as defined in RFC 4724 only activates when the TCP session drops unexpectedly. If the BGP session is closed with a NOTIFICATION message (the BGP error mechanism), GR does not apply -- the assumption being that a NOTIFICATION indicates a protocol error that should trigger full reconvergence.

However, many operational scenarios involve NOTIFICATION messages that are not protocol errors: a hard reset triggered by a configuration change, a session reset for maintenance, or a peer sending CEASE with a subcodes like "administrative reset." In these cases, preserving forwarding state during the restart is just as desirable as during a crash.

RFC 8538 (updated by RFC 9494) introduces the Notification (N) bit in the Graceful Restart capability flags. When both peers set the N bit, GR procedures apply even when the session is terminated by a NOTIFICATION message. The speaker that sends the NOTIFICATION can indicate via the N bit in the subsequent OPEN (when it reconnects) that its forwarding state was preserved despite the NOTIFICATION-triggered session close.

GR Roles: Restarting Speaker vs. Helper

The asymmetry between the restarting speaker and the helper is a common source of confusion. A few important clarifications:

Operational Considerations

Deploying Graceful Restart in production involves several practical considerations beyond the protocol mechanics.

When GR Helps

When GR Hurts

Tuning Recommendations

GR in the Real World: Vendor and Software Support

Graceful Restart is supported by all major BGP implementations, though the depth of support varies:

GR and Route Server Operations at IXPs

GR is particularly important at Internet Exchange Points where route servers aggregate hundreds of peers. When a route server restarts for maintenance or upgrades, without GR, all peers would lose all routes learned via the route server simultaneously -- potentially disrupting peering traffic across the entire exchange.

With GR (and especially LLGR), a route server restart is transparent to the participants. The peers (acting as helpers) retain the routes learned from the route server during the restart, and traffic continues to flow over the peering LAN. Major IXP route server platforms like BIRD and OpenBGPD support GR precisely for this reason.

Conversely, when a peer restarts, the route server acts as a helper, retaining that peer's routes so other participants continue to send traffic to the restarting peer (whose forwarding plane is presumably still active).

Relationship to Other High-Availability Mechanisms

Graceful Restart does not exist in isolation. It is part of a broader set of high-availability mechanisms in modern networks:

Common Pitfalls and Debugging

When GR does not work as expected, the following are common causes:

Explore BGP Sessions and Routes

Understanding Graceful Restart helps explain why BGP convergence events sometimes cause traffic shifts and sometimes do not. Use the god.ad BGP Looking Glass to examine real AS paths and observe how routes propagate through the global routing table. When you see a route through a particular autonomous system, consider whether that AS's routers support GR -- and whether a restart on that path would be transparent or disruptive.

See BGP routing data in real time

Open Looking Glass
More Articles
What is BGP? The Internet's Routing Protocol Explained
What is an Autonomous System (AS)?
What is a BGP Looking Glass?
How to Look Up an IP Address's BGP Route
Understanding BGP AS Paths
What is RPKI? Securing BGP Routing