How BFD Works: Sub-Second Failure Detection for Network Protocols

Bidirectional Forwarding Detection (BFD) is a lightweight, protocol-independent mechanism for detecting failures in the forwarding path between two adjacent systems. Defined in RFC 5880, BFD provides sub-second failure detection that is orders of magnitude faster than the native keepalive mechanisms of routing protocols like BGP (default hold time: 90 seconds), OSPF (default dead interval: 40 seconds), and IS-IS (default hold time: 30 seconds). When BFD detects that a forwarding path has failed, it notifies the registered routing protocols, which can then immediately withdraw routes and reconverge — typically within 50 milliseconds rather than tens of seconds.

BFD was designed to fill a critical gap in network reliability. Routing protocol keepalives were never intended for fast failure detection. Making BGP hold timers or OSPF dead intervals aggressively short (say, 1 second) is dangerous: it creates false positives from transient CPU spikes, route computation bursts, or brief congestion episodes, causing unnecessary session flaps that destabilize the entire network. BFD solves this by decoupling failure detection from the control plane entirely. It operates at a separate, dedicated layer — often offloaded to hardware or to a low-level process that is immune to control-plane congestion — and provides a clean "up or down" signal that routing protocols can subscribe to. The result is fast failure detection without the instability risks of aggressive routing protocol timers.

The Problem BFD Solves

Consider a network where Router A and Router B are eBGP peers connected through a Layer 2 switch or an optical transport network. If the physical link between them fails cleanly (fiber cut, interface goes down), both routers detect the loss of carrier signal immediately and tear down the BGP session. No problem.

But many failure modes are not clean physical failures. A Layer 2 switch in the path might fail in a way that keeps both physical interfaces up but stops forwarding traffic. An optical amplifier might introduce bit errors that corrupt packets without triggering a link-down event. A line card might crash while the physical port LEDs stay green. In these scenarios, the routers have no immediate indication that the forwarding path has failed. They continue to believe the BGP session is up and keep sending traffic into a black hole — until the BGP hold timer (default 90 seconds) expires and the session finally drops.

Ninety seconds of black-holed traffic is catastrophic for modern networks. Even OSPF's 40-second dead interval is unacceptable. BFD addresses this by continuously probing the forwarding path with lightweight packets sent at high rates (commonly every 50 or 100 milliseconds). If a configurable number of consecutive probes go unanswered, BFD declares the path down and notifies all registered clients within milliseconds.

BFD Protocol Architecture (RFC 5880)

BFD is specified across several RFCs. RFC 5880 defines the core protocol mechanics. RFC 5881 covers BFD for IPv4 and IPv6 single-hop paths. RFC 5882 describes the generic application of BFD to routing protocols. RFC 5883 covers multihop BFD. Additional RFCs address BFD for specific environments: RFC 7130 for LAG member links (micro-BFD), RFC 7419 for common BFD application interval requirements, and RFC 7726 for BFD over point-to-point MPLS LSPs.

BFD operates between exactly two systems. Each system runs a BFD session that independently monitors the forwarding path to the peer. The protocol is bidirectional — both sides must agree that the path is operational for it to be considered up. A BFD session is identified by a pair of discriminators: a "My Discriminator" chosen by the local system and a "Your Discriminator" learned from the remote system during session establishment.

BFD Control Packet Format

BFD control packets are UDP datagrams. For single-hop IPv4/IPv6 sessions (RFC 5881), the destination port is 3784 and the source port is chosen from the range 49152–65535. For multihop sessions (RFC 5883), the destination port is 4784. The BFD payload is a compact 24-byte header (or 28 bytes with authentication):

BFD Control Packet (24 bytes, UDP port 3784) 0 2 5 7 13 15 31 Vers (3) Diag (5) Sta (2) Flags (6) Detect Mult (8) Length (8) My Discriminator (32 bits) Your Discriminator (32 bits) Desired Min TX Interval (32 bits, microseconds) Required Min RX Interval (32 bits, microseconds) Required Min Echo RX Interval (32 bits, microseconds) Flags: P (Poll), F (Final), C (Control Plane Independent), A (Auth), D (Demand), M (Multipoint)

Key Header Fields

Version (3 bits): Currently version 1. This field ensures forward compatibility with future BFD revisions.

Diagnostic (5 bits): When a session transitions to a down state, this field communicates the reason. Common values include 0 (No Diagnostic), 1 (Control Detection Time Expired), 2 (Echo Function Failed), 3 (Neighbor Signaled Session Down), and 7 (Administratively Down). The diagnostic code is invaluable for troubleshooting — it tells the remote side why the session went down.

State (2 bits): The current BFD session state: 0 (AdminDown), 1 (Down), 2 (Init), 3 (Up). These map directly to the BFD state machine described below.

Flags (6 bits): Six single-bit flags control protocol behavior. The P (Poll) and F (Final) bits implement a reliable parameter-change mechanism. The C (Control Plane Independent) bit indicates that the BFD implementation can continue to function even if the control plane (routing process) fails — a critical indicator for hardware-offloaded BFD. The A (Authentication) bit signals that authentication data follows the standard header. The D (Demand) bit activates Demand mode. The M (Multipoint) bit is reserved for future multipoint BFD extensions.

Detect Multiplier (8 bits): The detection time multiplier. If the remote system does not receive any BFD packets for (Detect Multiplier × negotiated receive interval), it declares the session down. Typical values are 3 or 5, balancing speed against false-positive risk.

Discriminators (32 bits each): "My Discriminator" is a locally chosen, nonzero identifier for this BFD session. "Your Discriminator" is the remote system's discriminator, learned during session establishment. Together they demultiplex BFD sessions when multiple sessions exist between the same pair of systems (e.g., separate BFD sessions for different routing protocols or different VRFs).

Timer intervals (32 bits each, in microseconds): "Desired Min TX Interval" is the minimum interval between transmitted BFD packets that the local system can sustain. "Required Min RX Interval" is the minimum interval between received BFD packets that the local system can handle. "Required Min Echo RX Interval" is the minimum receive interval for Echo packets; setting this to zero tells the remote system not to send Echo packets.

BFD Session Establishment

A BFD session goes through a three-way handshake similar in spirit to TCP, ensuring both sides are ready before declaring the path operational. The BFD state machine has four states:

The transition sequence is: both sides start in Down. When Router A receives a packet from Router B with State=Down, Router A moves to Init. When Router B receives Router A's packet with State=Init, Router B moves to Up. When Router A receives Router B's packet with State=Up, Router A also moves to Up. The session is now established.

BFD Session Establishment (Three-Way Handshake) Router A Router B Down Init Up Down Up State=Down, MyDisc=1, YourDisc=0 State=Down, MyDisc=2, YourDisc=1 State=Init, MyDisc=1, YourDisc=2 State=Up, MyDisc=2, YourDisc=1 Periodic BFD control packets at negotiated interval (e.g., every 100ms)

Once established, both sides continuously send BFD control packets at the negotiated rate. If either side stops receiving packets for the detection time (Detect Multiplier × negotiated interval), it declares the session down and notifies all registered routing protocol clients.

Asynchronous Mode

Asynchronous mode is the primary and most widely deployed BFD operating mode. In this mode, both systems periodically send BFD control packets to each other at a negotiated rate. Each system monitors the incoming packets from its peer, and if the detection time expires without receiving a valid BFD packet, the session is declared down.

The key parameters in Asynchronous mode are the transmission interval and the detection time. The actual transmission interval used by a system is the greater of its own Desired Min TX Interval and the peer's Required Min RX Interval. This negotiation ensures that neither system sends faster than the other can handle. For example, if Router A wants to transmit every 100ms but Router B can only process packets every 300ms, the effective TX interval will be 300ms.

The detection time is calculated as:

Detection Time = Remote Detect Multiplier × max(Remote Desired Min TX, Local Required Min RX)

With typical production values of a 100ms interval and a Detect Multiplier of 3, the detection time is 300ms — a dramatic improvement over BGP's 90-second default hold timer. Some aggressive deployments use 50ms intervals with a multiplier of 3, yielding 150ms detection time. The trade-off is that shorter intervals consume more CPU (or dedicated hardware resources) and increase the risk of false positives from transient packet loss or jitter.

Timer Negotiation and the Poll Sequence

BFD timers can be changed dynamically without tearing down the session, using the Poll/Final (P/F) mechanism. When a system wants to change its timer parameters, it sets the P (Poll) bit in its next BFD control packet. The remote system must respond with a packet that has the F (Final) bit set. The new parameters take effect only after the Poll/Final exchange completes. This ensures that both sides agree on the new timing before either side changes its behavior, preventing a transient period where one side expects packets at the old rate and the other sends at the new rate.

This mechanism is also used when a system needs to verify that the peer is still alive after a period of silence (for instance, when transitioning from Demand mode back to Asynchronous mode).

Echo Mode

Echo mode is an optional BFD feature that tests the forwarding path without requiring active BFD processing on the remote system. In Echo mode, the local system sends BFD Echo packets that are addressed so they will be forwarded back by the remote system's forwarding plane, without being processed by the remote system's BFD control plane. The local system monitors the return of its own Echo packets to determine path liveness.

Echo packets are UDP datagrams sent to the peer's address with a source address of the local system. The destination port is 3785. Because these packets are sent to the remote system but are designed to be looped back by the remote system's forwarding path (without being consumed by the remote system's CPU), they test the actual forwarding path end-to-end, including the remote system's line cards and forwarding ASICs.

The primary advantage of Echo mode is CPU savings on the remote system. Since the remote system only needs to forward the Echo packets (a data-plane operation), it does not need to run the BFD state machine at the high-frequency detection rate. The remote system can reduce its Asynchronous mode control packet rate to a slow maintenance rate (e.g., 1 second) while the local system's Echo packets provide the fast failure detection. This is particularly beneficial when one side of a BFD session is a low-end device with limited CPU.

However, Echo mode has limitations. It requires that the remote system's forwarding plane can handle the Echo packets correctly (some platforms have issues with forwarding packets back to the source due to uRPF or interface filters). It also does not detect control-plane failures on the remote system, only forwarding-path failures. For these reasons, Echo mode is always used in conjunction with a slow-rate Asynchronous mode that maintains the BFD session state.

Demand Mode

Demand mode eliminates periodic BFD control packet transmission once a session is established. After the session reaches the Up state, a system that activates Demand mode (by setting the D bit) stops sending periodic BFD packets. It relies on an independent, application-specific method to verify path connectivity — for example, data-plane traffic monitoring or hardware-level link integrity checks. If the system needs to explicitly verify the path, it initiates a Poll sequence by sending a BFD packet with the P bit set and waiting for the F response.

Demand mode is rarely used in practice. Most deployments rely on Asynchronous mode because it provides continuous monitoring without requiring the application to implement its own path verification. However, Demand mode can reduce BFD packet overhead in scenarios where an independent path-monitoring mechanism already exists (such as an optical transport system that provides its own failure notification).

Detection Time Calculation

The detection time formula is central to BFD's operation and is worth examining in detail. The formula differs slightly depending on whether Echo mode is active:

Without Echo (Asynchronous mode only):

Detection Time = Detect Mult × agreed_interval

where agreed_interval = max(peer's Desired Min TX Interval, local Required Min RX Interval)

With Echo mode active:

Detection Time = Detect Mult × agreed_echo_interval

where agreed_echo_interval = max(local Desired Min Echo TX Interval, peer's Required Min Echo RX Interval)

Common production configurations and their detection times:

The Detect Multiplier is intentionally separate from the interval to allow operators to tune fault tolerance independently from detection speed. A multiplier of 3 means that 3 consecutive packets must be lost before a failure is declared. This tolerates occasional packet loss (e.g., during a brief microloop or queue overflow) without triggering a false down event. Increasing the multiplier to 5 provides more resilience against transient loss at the cost of slower detection.

BFD for BGP

BGP is one of the most important BFD clients. A BGP session between two routers can register with BFD to receive fast failure notification. When BFD detects that the forwarding path to a BGP neighbor has failed, the BGP process is immediately notified and tears down the session without waiting for the hold timer to expire. This triggers immediate route withdrawal, allowing the network to reconverge in sub-second time rather than waiting 90 seconds.

The integration is straightforward from a configuration perspective. On most platforms, enabling BFD for a BGP neighbor is a single command (e.g., neighbor 10.0.0.1 fall-over bfd on Cisco IOS, or bfd under the neighbor configuration on Junos). The BGP process registers as a BFD client, and a BFD session is established to the BGP neighbor address. If BFD declares the session down, BGP receives a callback and immediately resets the peering.

BFD is particularly critical for eBGP sessions where the peer is reachable through a Layer 2 domain (such as an IXP peering LAN). At a large Internet Exchange Point, your router might have hundreds of eBGP peers reachable through a shared switching fabric. A switch failure that partitions the fabric would leave your router with no physical-layer indication of the loss — all interface LEDs stay green, all LLDP neighbors remain visible, but traffic to some peers is being black-holed. BFD detects this within milliseconds and brings down the affected BGP sessions.

For iBGP sessions, BFD also provides value. iBGP sessions often run between loopback interfaces and traverse multiple physical hops through the IGP. If a forwarding failure occurs on the path between two iBGP speakers, the BGP hold timer (even if reduced from the default 90 seconds to 30 seconds) may still be too slow. BFD running between the loopback addresses (multihop BFD, described below) detects the failure much faster.

BFD's interaction with BGP Graceful Restart deserves special attention. When BGP Graceful Restart is configured, a BGP session reset normally triggers the "stale routes" mechanism: the receiving router marks the peer's routes as stale and keeps them in the forwarding table for a grace period, expecting the peer to re-establish the session. However, if BFD is the reason for the session reset, it usually means the forwarding path has genuinely failed — not that the peer is doing a graceful control-plane restart. Most implementations allow operators to configure whether a BFD-triggered session down should honor Graceful Restart (preserving stale routes) or should override it (immediately withdrawing routes). The correct choice depends on the network topology and failure mode.

BFD for OSPF and IS-IS

OSPF and IS-IS both benefit enormously from BFD. Without BFD, OSPF detects neighbor failure when the Dead Interval expires (default 40 seconds, which is 4 missed Hello packets at the 10-second Hello interval). IS-IS detects failure when its hold timer expires (default 30 seconds, which is 3 missed IS-IS Hellos at the 10-second interval). Both are unacceptably slow for modern networks.

When BFD is enabled on OSPF or IS-IS interfaces, the routing protocol registers each adjacency with BFD. If BFD detects a forwarding failure, the routing protocol is immediately notified and tears down the adjacency. This triggers an SPF (Shortest Path First) computation and route recalculation within milliseconds rather than tens of seconds. The combination of BFD failure detection (sub-100ms) with optimized SPF computation and fast route installation can achieve end-to-end convergence in well under one second.

The sequence for an OSPF convergence event with BFD is:

  1. T=0ms: Link failure occurs (e.g., forwarding ASIC crash on a switch between the routers).
  2. T=150-300ms: BFD detects the failure (3 missed packets at 50-100ms intervals).
  3. T=150-310ms: BFD notifies OSPF. OSPF tears down the adjacency and floods a new Router LSA with the failed link removed.
  4. T=200-400ms: All routers in the area receive the updated LSA and run the SPF algorithm.
  5. T=250-500ms: New forwarding entries are installed in the FIB. Traffic is rerouted.

Compare this to the same scenario without BFD: the failure is not detected until T=40 seconds (OSPF Dead Interval), and total convergence takes 40-50 seconds. BFD reduces end-to-end convergence from tens of seconds to hundreds of milliseconds — a 100x improvement.

For IS-IS deployments, BFD is especially common in service provider networks running Segment Routing. When TI-LFA (Topology-Independent Loop-Free Alternate) is configured, pre-computed backup paths are installed in the forwarding table. The moment BFD detects a failure, the forwarding plane switches to the backup path immediately, often achieving near-zero packet loss for single link failures.

Multihop BFD (RFC 5883)

Standard single-hop BFD (RFC 5881) operates between directly connected systems, with packets sent using a TTL of 255 and validated by the receiver (GTSM — Generalized TTL Security Mechanism). This TTL check ensures that BFD packets cannot be spoofed from a remote attacker, since any packet traversing even one router hop would have its TTL decremented below 255.

Multihop BFD (RFC 5883) extends BFD to monitor paths that traverse multiple routing hops. This is needed for:

Multihop BFD uses destination UDP port 4784 (instead of 3784 for single-hop). It does not use GTSM TTL checks (since packets must traverse multiple hops). This makes multihop BFD more susceptible to spoofing attacks, which is why RFC 5880 provides optional authentication mechanisms (Simple Password, Keyed MD5, Keyed SHA-1, and Meticulous Keyed variants). In practice, multihop BFD sessions should be protected by infrastructure ACLs or IPsec to mitigate spoofing risks.

Multihop BFD monitors the routed path between two endpoints. If the IGP reroutes traffic between the two BFD endpoints (e.g., due to a link failure that is compensated by an alternate path), multihop BFD may continue to report the session as Up because the forwarding path still works, just via a different route. This is the correct behavior — multihop BFD tests reachability, not path identity.

Micro-BFD for LAG (RFC 7130)

Link Aggregation Groups (LAGs, also known as port channels or link bundles) present a unique challenge for BFD. A standard BFD session running over a LAG interface monitors the aggregate as a single logical link. If one member link of a four-member LAG fails but the other three continue forwarding, the BFD session stays up because the aggregate interface is still operational. The LAG simply redistributes traffic among the surviving members.

This seems correct at first glance, but consider the failure mode where a member link is "stuck" — it appears physically up but is not forwarding traffic. LACP (Link Aggregation Control Protocol) may continue to see the link as active because its PDUs are processed by the remote system's CPU, but data-plane traffic hashed to that member is being dropped. The LAG as a whole loses a fraction of its traffic (typically 25% for a four-member LAG) without any protocol detecting the failure.

Micro-BFD (RFC 7130) solves this by running an independent BFD session on each individual member link of the LAG. If a BFD session on a member link fails, that specific member is removed from the LAG bundle. The remaining members continue operating, and the LAG rebalances traffic across them. The per-member BFD session detects forwarding-plane failures that LACP alone cannot catch.

Micro-BFD sessions use destination UDP port 6784. Each session runs independently, with its own discriminators and timer negotiation. The sessions are bound to specific physical interfaces rather than the logical LAG interface, ensuring that each member's forwarding path is individually validated.

Micro-BFD is widely deployed in data center and service provider environments where LAGs carry critical traffic. A common deployment is on inter-chassis links between routers where each link is a 100GE member of a LAG. Without micro-BFD, a failed optic or forwarding ASIC on one member could black-hole 25% of traffic indefinitely. With micro-BFD, the failure is detected and the member removed within 300ms.

Hardware-Offloaded BFD

Running BFD at aggressive intervals (50ms or faster) in software places significant demands on the router's CPU. Each BFD session requires generating and processing packets at high rates, and a router with hundreds of BFD sessions — common at a large IXP or in a dense backbone — can consume substantial CPU resources. Worse, if the CPU becomes overloaded (due to a routing convergence event, a BGP table scan, or a software bug), BFD packet processing may be delayed, causing false positives: BFD declares sessions down even though the forwarding path is perfectly healthy.

This is the fundamental problem that drove the development of hardware-offloaded BFD. When BFD is offloaded, the generation and processing of BFD packets is handled by dedicated hardware (typically the forwarding ASIC or a programmable network processor on the line card), completely independent of the router's general-purpose CPU. The control plane configures the BFD session parameters, but the data plane handles all packet I/O autonomously.

Hardware-offloaded BFD provides two critical benefits:

Most modern routing platforms from major vendors (Cisco, Juniper, Arista, Nokia) support hardware-offloaded BFD on their high-end platforms. The specific capabilities (minimum interval, maximum sessions, supported BFD modes) vary by platform and line card generation. For example, a modern data center switch ASIC might support 256 hardware BFD sessions at 50ms intervals, while a service provider router line card might support 4,000 sessions at 10ms intervals.

When evaluating a platform's BFD capabilities, the key questions are: how many simultaneous hardware-offloaded BFD sessions does it support? What is the minimum supported interval? Does it support micro-BFD in hardware? Can it maintain BFD sessions across control-plane restarts (relevant for Graceful Restart scenarios)? The answers determine whether BFD can be deployed pervasively across all interfaces or must be selectively applied to the most critical links.

BFD Authentication

RFC 5880 defines optional authentication for BFD control packets. The authentication types are:

In practice, BFD authentication is not widely deployed for single-hop sessions because the GTSM TTL check (requiring TTL=255) already provides strong protection against remote spoofing. For multihop BFD, authentication or infrastructure ACLs are more important since the TTL check cannot be used.

BFD Deployment Considerations

Deploying BFD effectively requires careful attention to several operational factors:

Timer Selection

Choosing BFD intervals is a balance between detection speed and stability. Intervals that are too aggressive increase the false-positive rate, especially on congested links or platforms where BFD runs in software. A common starting point is 100ms with a multiplier of 3 (300ms detection time). This can be tightened to 50ms × 3 (150ms) on platforms with hardware-offloaded BFD, or relaxed to 300ms × 3 (900ms) on low-end devices or unreliable paths.

Interaction with ECMP

When multiple equal-cost paths exist between two routers, a single-hop BFD session only monitors one of those paths (the one that the BFD packets happen to be hashed to). If a different ECMP member fails, BFD will not detect it. Solutions include running separate BFD sessions per ECMP path (supported on some platforms as "per-link BFD"), or using micro-BFD if the ECMP paths correspond to LAG members.

BFD and GR/NSR

Graceful Restart (GR) and Non-Stop Routing (NSR) mechanisms expect that the forwarding plane continues operating during a control-plane restart. If BFD is implemented purely in software and the control plane restarts, BFD sessions will drop, triggering neighbor failures that the GR mechanism was designed to prevent. Hardware-offloaded BFD (with the C bit set) avoids this problem: the forwarding plane keeps sending BFD packets even while the control plane restarts, and the peer sees no interruption.

Scale Planning

Each BFD session consumes resources: memory for session state, packet processing bandwidth, and (if hardware-offloaded) a slot in the hardware BFD table. At a large IXP with 500 peers, running BFD to every peer at 100ms intervals means the router must process 5,000 BFD packets per second inbound and generate 5,000 per second outbound. While this is trivial for modern hardware, it requires verification during capacity planning, especially on platforms where BFD sessions compete for limited hardware resources.

Dampening

To prevent BFD instability from cascading through the network, many implementations support BFD dampening. If a BFD session flaps repeatedly within a configurable time window, the system suppresses the session's state notifications for an increasing penalty period (exponential backoff). This prevents a flapping BFD session from repeatedly bringing down and re-establishing a BGP session or OSPF adjacency, which would cause continuous route withdrawals and re-advertisements across the network.

BFD in Modern Network Architectures

BFD has become a foundational protocol in modern network designs. In data center fabrics running BGP as the IGP (as in Clos/Spine-Leaf topologies popularized by RFC 7938), BFD provides the fast failure detection that BGP itself lacks. Every BGP session in the fabric — between leaf and spine, and between spine and super-spine — typically runs BFD at 100ms × 3. This ensures that any link or node failure is detected within 300ms, and the remaining BGP paths take over.

In service provider backbones running OSPF or IS-IS with Segment Routing, BFD is the trigger for TI-LFA fast reroute. The pre-computed backup paths installed by TI-LFA activate the moment BFD detects a failure, achieving convergence times measured in tens of milliseconds. Without BFD, TI-LFA backup paths would sit unused for 30-40 seconds while the IGP dead timer expires — defeating the purpose of pre-computed backup paths.

In SD-WAN overlay networks, BFD or BFD-like probing is used to monitor underlay path quality (loss, latency, jitter) in real time. While these implementations may not strictly follow RFC 5880, they are architecturally derived from BFD concepts. The overlay controller uses the path quality metrics to steer application traffic to the best-performing underlay path.

In cloud provider networks, BFD runs between virtual routers and physical infrastructure to detect hypervisor or host failures. When a compute node running virtual network functions fails, BFD sessions from the physical top-of-rack switch to the virtual routers on that node go down, triggering immediate route withdrawal and traffic failover to redundant virtual routers on other hosts.

Explore Routing Protocol Interactions

BFD is a silent enabler of fast convergence across the internet. While it operates below the surface, its effects are visible in how quickly networks recover from failures. Use the god.ad BGP Looking Glass to observe BGP routing in real time — the stability and responsiveness of the routes you see are made possible, in large part, by BFD running between routers throughout the internet. Explore autonomous systems, examine OSPF and IS-IS as the IGPs that rely on BFD for convergence, and investigate how route leaks and BGP hijacks interact with failure detection mechanisms across the global routing table.

See BGP routing data in real time

Open Looking Glass
More Articles
What is DNS? The Internet's Phone Book
What is an IP Address?
IPv4 vs IPv6: What's the Difference?
What is a Network Prefix (CIDR)?
How Does Traceroute Work?
What is a CDN? Content Delivery Networks Explained