The CenturyLink/Level3 Flowspec Outage (2020)

On August 30, 2020, a single misconfigured BGP flowspec rule cascaded across the global backbone of CenturyLink/Level 3 (AS3356), one of the world's largest Tier 1 transit providers. For roughly five hours, routers across CenturyLink's network dropped legitimate traffic, disrupting internet service for millions of users and knocking 911 emergency call systems offline in multiple US states. The incident remains one of the most significant examples of how a small configuration error in a critical autonomous system can ripple across the entire internet.

What is BGP Flowspec?

Before examining what went wrong, it helps to understand the technology at the center of this outage. BGP flowspec (defined in RFC 5575) is an extension to BGP that allows routers to distribute traffic filtering rules alongside routing information. While standard BGP deals with where to send traffic (via prefixes and AS paths), flowspec deals with which traffic to allow, rate-limit, or drop.

A flowspec rule can match on multiple packet fields simultaneously:

The action associated with a flowspec rule can drop packets, rate-limit them, redirect them to a different VRF (virtual routing table), or mark them with specific DSCP values. These rules propagate across a network through BGP sessions, just like regular routes do, which means they can spread very quickly through iBGP meshes and route reflectors.

BGP Flowspec Rule Distribution DDoS Controller injects flowspec rule Route Reflectors (iBGP) propagate to all routers Edge Router A drops traffic Edge Router B drops traffic Edge Router C drops traffic Edge Router D drops traffic Malformed rule reaches every edge router within seconds

Flowspec is a powerful tool for DDoS mitigation because it lets a network deploy surgical traffic filters at every router in the network within seconds, without logging into each device individually. But this same speed and reach is precisely what makes it dangerous when something goes wrong.

CenturyLink/Level 3: The Backbone's Backbone

To appreciate the scale of this outage, you need to understand the role that CenturyLink (now Lumen Technologies) plays in the internet's architecture. Operating under AS3356, Level 3/CenturyLink is consistently ranked as one of the two or three largest autonomous systems by the number of downstream networks it serves. It is a Tier 1 network -- a provider that can reach every destination on the internet purely through settlement-free peering, without purchasing transit from anyone.

As of 2020, AS3356 carried an estimated 3.5% of all global internet traffic. Thousands of ISPs, enterprises, and content providers relied on CenturyLink's backbone to reach the rest of the internet. When AS3356 has a routing problem, it is not just CenturyLink's customers who are affected -- the ripple effects touch networks multiple hops away.

AS3356 as a Tier 1 Transit Hub AS3356 CenturyLink / Level 3 AS2914 NTT AS1299 Arelion AS6762 Telecom Italia AS6461 Zayo Regional ISPs Enterprises Gov / 911 peering peering transit customers (thousands of networks depend on AS3356)

The Sequence of Events

The outage began in the early morning hours of August 30, 2020 (UTC). Here is the timeline as reconstructed from public reports, FCC filings, and network monitoring data:

10:04 UTC -- The Bad Rule is Injected

CenturyLink's automated DDoS mitigation system generated a flowspec rule intended to block malicious traffic associated with an ongoing denial-of-service attack. The rule itself was malformed -- it contained a configuration error that made it far broader than intended. Instead of targeting specific attack traffic, the rule matched a vastly larger set of packets.

The flowspec rule was injected into the iBGP mesh via a route reflector, which propagated it to every router in CenturyLink's global network within seconds. This is standard flowspec behavior -- the same mechanism that makes it effective for rapid DDoS response also ensures that a bad rule reaches everywhere, fast.

10:04-10:30 UTC -- Cascading Failures

As routers across the CenturyLink backbone installed the malformed flowspec rule, they began dropping legitimate traffic that matched the overly broad filter. But the situation was more complex than a simple traffic blackhole. The flowspec rule interacted with the routers' forwarding behavior in a way that caused additional instability:

The result was a feedback loop: the flowspec rule caused traffic drops, which caused BGP instability, which caused route withdrawals, which caused more traffic to shift to remaining paths, which overloaded those paths, which caused more BGP flaps.

10:30 UTC -- External Impact Becomes Visible

Within 30 minutes, monitoring systems worldwide detected the problem. Cloudflare, Kentik, ThousandEyes, and other network monitoring platforms reported massive packet loss traversing CenturyLink's backbone. The RIPE RIS route collectors showed BGP route instability for thousands of prefixes normally transiting AS3356.

Networks that relied on CenturyLink as their sole transit provider lost internet connectivity entirely. Networks with multiple upstreams experienced degraded performance as traffic failed over to alternate paths that were not provisioned to handle the full load.

11:00+ UTC -- 911 Services Go Down

The most alarming consequence was the impact on 911 emergency services. Multiple US states reported failures in their emergency call routing. 911 systems in several states -- including Idaho, Missouri, and Washington -- experienced outages or degraded service. The 911 infrastructure in many regions depended on CenturyLink's network for call routing and database lookups. With the backbone dropping traffic, emergency calls could not be completed.

This is what elevated the incident from a network engineering problem to a public safety crisis and ultimately triggered an FCC investigation.

~15:00 UTC -- Restoration

CenturyLink engineers identified the malformed flowspec rule as the root cause and began the process of removing it from the network. However, recovery was not instantaneous. Removing the rule required careful coordination to avoid triggering additional instability, and the BGP convergence process -- where thousands of routes need to re-establish -- took additional time. Full restoration of service took approximately five hours from the initial incident.

Outage Timeline (August 30, 2020 UTC) 10:04 Bad rule injected 10:30 Global impact detected 11:00 911 systems affected ~13:00 Root cause identified ~15:00 Service restored ~5 hours of disruption 3.5% of global traffic Millions of users affected Multiple states lost 911 service

Technical Root Cause: The Malformed Flowspec Rule

The FCC's subsequent investigation and CenturyLink's own disclosures revealed the technical details. The flowspec rule was generated by an automated DDoS mitigation platform. The rule was intended to filter traffic associated with a specific attack, but a flaw in the rule's construction caused it to match a far wider range of traffic than intended.

The critical problem was that the flowspec rule, once installed on a router, caused the router to begin dropping packets that were part of the BGP control plane itself -- specifically, packets needed to maintain iBGP sessions between CenturyLink's own routers. This created the devastating feedback loop:

  1. The flowspec rule is distributed to all routers via iBGP
  2. Routers install the rule and begin filtering traffic
  3. The filter matches some iBGP packets, disrupting BGP sessions
  4. Disrupted BGP sessions cause route withdrawals
  5. Route withdrawals cause traffic shifts and more instability
  6. Engineers attempting to remove the rule face difficulty because the management plane itself is degraded

This last point is crucial: the very mechanism needed to remove the bad rule (BGP) was itself impaired by the rule. It is similar to a fire that has damaged the fire suppression system -- the tool you need to fix the problem is broken by the problem itself.

The Feedback Loop 1. Flowspec rule installed 2. Legitimate traffic dropped 3. iBGP sessions disrupted 4. Routes withdrawn 5. Traffic shifts overload 6. More BGP flaps self-reinforcing failure cycle

Global Impact: Measured in Lost Packets

Network monitoring companies provided extensive data on the outage's reach. Cloudflare reported that traffic from CenturyLink's network dropped by approximately 30% during the peak of the incident. Kentik's measurements showed packet loss rates exceeding 50% on paths transiting AS3356.

The impact was not limited to CenturyLink's direct customers. Because AS3356 serves as a transit provider for thousands of downstream networks, the outage created a "gravity well" in the internet's topology. Networks that used CenturyLink as one of multiple upstreams saw traffic fail over to their other providers, but those alternate paths often lacked the capacity to absorb the full load. This caused secondary congestion on networks that had no direct relationship with CenturyLink.

Major services affected included:

Perhaps most critically, the August 2020 incident occurred during the COVID-19 pandemic, when internet reliability was more important than ever. Millions of people were working remotely, students were attending school online, and telehealth visits had replaced in-person doctor visits. The outage underscored just how dependent modern life had become on a small number of backbone networks.

The 911 Crisis

The failure of 911 emergency services was the most serious consequence and the primary reason the FCC launched a formal investigation. The US 911 system relies on a complex chain of telecommunications infrastructure to connect callers to the appropriate Public Safety Answering Point (PSAP). Many PSAPs use dedicated circuits or IP-based connections that traverse carrier backbone networks.

When CenturyLink's backbone began dropping traffic, the signaling and media paths for 911 calls were disrupted. In some cases, callers heard nothing when dialing 911. In other cases, calls connected but without the caller's location data (ANI/ALI -- Automatic Number Identification / Automatic Location Identification), which is essential for dispatching emergency responders.

The FCC determined that the impact on 911 was particularly egregious because CenturyLink serves as a Local Exchange Carrier (LEC) in many US markets, meaning it is not just a backbone provider but also the local telephone company responsible for last-mile connectivity. The concentration of both backbone and local access in a single provider meant there was no fallback path for 911 calls in affected areas.

The FCC Investigation

The FCC opened an investigation into the outage, focusing on the 911 impact. Their findings highlighted several systemic issues:

CenturyLink agreed to a series of corrective actions, including improved change management procedures for flowspec rules, enhanced monitoring of flowspec rule effects, and investments in 911 network resilience.

Lessons for Network Engineering

The CenturyLink outage is a case study in how powerful automation tools can amplify human or software errors. Several lessons apply broadly to network operations:

1. Flowspec Needs Guardrails

Flowspec is essentially "firewall rules distributed at BGP speed." Its ability to instantly deploy filtering rules across an entire backbone is both its greatest strength and its greatest vulnerability. Best practices that emerged after this incident include:

2. Control Plane Protection is Non-Negotiable

One of the most important practices in network engineering is Control Plane Policing (CoPP) -- ensuring that traffic to and from the router's control plane (BGP sessions, routing protocol adjacencies, management access) is protected from data plane filtering. The CenturyLink outage demonstrated what happens when this protection fails: the network loses the ability to heal itself.

Modern router configurations should ensure that flowspec rules can never match control plane traffic, regardless of the rule's content. This is equivalent to a "do not filter the filters" principle.

3. Tier 1 Failures Have Outsized Impact

The internet's routing system is theoretically decentralized, but in practice a small number of Tier 1 networks carry a disproportionate share of global traffic. When AS3356, AS1299, or AS2914 has a problem, the effect is global. Networks that depend on a single Tier 1 provider have no fallback -- this is why multihoming (purchasing transit from multiple providers) is considered essential for any service that requires high availability.

4. Automated Systems Need Circuit Breakers

The flowspec rule was generated by an automated DDoS mitigation system. Automation is necessary at the scale of a Tier 1 backbone -- no human operator can manually respond to the volume and velocity of DDoS attacks these networks face. But automation must include circuit breakers: mechanisms that halt automated actions when unexpected outcomes are detected.

If the DDoS mitigation system had monitored the network's health after deploying its rule and detected the resulting packet loss and BGP instability, it could have automatically withdrawn the rule within minutes rather than hours.

Flowspec vs. RTBH: Two Approaches to DDoS Mitigation

The CenturyLink incident renewed discussion about the tradeoffs between flowspec and RTBH (Remotely Triggered Black Hole) routing, the two primary BGP-based DDoS mitigation techniques:

Many networks now use a layered approach: RTBH for emergency response when an attack threatens network stability, and flowspec for more targeted mitigation once the attack traffic has been characterized. Some operators have moved flowspec filtering off their backbone routers entirely, instead steering traffic to dedicated scrubbing centers where flowspec rules are applied in a more controlled environment.

How This Compares to Other Major Outages

The CenturyLink flowspec outage belongs in the pantheon of major internet incidents alongside other BGP-related events. Each highlights a different vulnerability in the internet's routing infrastructure:

All of these incidents share a common theme: the internet's routing system is powerful and efficient at propagating changes, which means it is equally efficient at propagating mistakes. A single misconfiguration in a single autonomous system can have global consequences within seconds.

The Current State of Flowspec Safety

Since the 2020 incident, the networking industry has made progress on flowspec safety. Router vendors have added features like flowspec validation rules that prevent rules from matching control plane traffic, rate limits on the number of flowspec rules that can be installed, and logging and alerting when flowspec rules cause significant traffic drops.

RFC 8955 (published in 2020, coincidentally around the same time as the outage) updated the original flowspec specification with improved security considerations. Network operators have also developed community best practices, including the recommendation to always test flowspec rules in a lab environment, implement gradual rollouts, and maintain out-of-band management access that cannot be affected by data plane filtering.

Despite these improvements, flowspec remains a tool that requires careful handling. The fundamental tension between "deploy filters quickly to stop an attack" and "deploy filters carefully to avoid collateral damage" has no easy resolution. Every large network that uses flowspec must balance these competing pressures.

Investigate AS3356 Yourself

You can explore CenturyLink/Lumen's network in real time using the looking glass. Look up AS3356 to see its current BGP announcements, the number of prefixes it originates and transits, and its peering relationships with other major networks. You can also examine the AS paths for any IP address to see whether AS3356 appears in the transit path -- chances are it does for many destinations.

Try it now: Enter any IP address, domain, or ASN in the search box to see its live BGP routing data -- including which Tier 1 transit providers like AS3356 carry its traffic.

See BGP routing data in real time

Open Looking Glass
More Articles
The Pakistan YouTube BGP Hijack (2008)
The Facebook DNS Outage (October 2021)
The Cloudflare-Verizon BGP Leak (2019)
The AWS S3 Outage (February 2017)
The Dyn DNS DDoS Attack and Mirai Botnet (2016)
The Fastly CDN Global Outage (June 2021)