Rogers Outage 2022: How a Routing Filter Took Down Canada

On July 8, 2022, Canada's largest telecommunications company suffered the most consequential network failure in the country's history. Rogers Communications, operator of AS812, lost connectivity for approximately 12 million customers — more than a third of Canada's population — for over 19 hours. The outage took down not only consumer internet and wireless service, but also the Interac debit payment network that underpins most point-of-sale transactions in Canada, 911 emergency call routing in several provinces, and banking services for millions of Canadians. The root cause was a configuration change made during a routine network maintenance window in the early hours of the morning, and the engineers who needed to fix it found themselves locked out of the very network they were trying to repair.

Background: Rogers Network Architecture

Rogers operates one of Canada's three major national wireless and wireline networks. Its infrastructure spans a national fiber backbone connecting distribution nodes in major cities, regional access layers feeding neighborhoods and cell towers, and a core routing infrastructure handling traffic between Rogers customers and the broader internet.

The network uses a layered BGP architecture typical of large ISPs: iBGP within the Rogers autonomous system distributes routing information internally, while eBGP sessions connect Rogers to transit providers and peering partners at Canadian and international Internet Exchange Points. The distribution layer — the tier between the national backbone and regional access infrastructure — is the layer that failed.

The Timeline

July 8, 2022 — Rogers Outage Timeline ~04:43 ET Maintenance window: routing filter policy update applied. Change removes route acceptance filter on distribution layer routers. ~04:45 ET Distribution layer flooded with full internet routing table. Routers exceed memory/FIB capacity; BGP sessions collapse. ~04:47 ET National backbone loses reachability to access layer. ~12M wireless and wireline customers lose connectivity. ~05:00-08:00 ET Engineers unable to access management systems remotely. Out-of-band access paths also depend on affected infrastructure. Mid-morning ET Teams dispatched to physical equipment locations across Canada. Scale of outage requires simultaneous multi-site restoration. ~18:00-24:00 ET Gradual restoration of services as routers recover region by region. Full service restoration declared early July 9. Total duration ~19+ hours.

Root Cause: A Routing Filter Removal

Rogers disclosed the root cause in its report to the Canadian Radio-television and Telecommunications Commission (CRTC). During a maintenance window in the early hours of July 8, a network engineer applied a configuration update to the routing filter policy on routers in the distribution layer of Rogers' network.

Routing filters (also called route maps or prefix lists) are policies applied to BGP sessions that control which routes a router will accept. A common configuration for internal distribution layer routers is to accept only a filtered set of routes from the backbone — typically the specific prefixes needed to serve the access layer below them — rather than accepting the full internet routing table. This filter serves as both a performance optimization (distribution routers don't need to hold 900,000+ internet routes) and a safety control (it prevents the routers from being overwhelmed by unexpected route volume).

The maintenance change inadvertently removed this filter. With no inbound route filter in place, the distribution layer routers began accepting the full internet routing table from backbone routers. The resulting flood of BGP routes — hundreds of thousands of entries — exceeded the memory and forwarding table (FIB) capacity of the distribution layer hardware. The routers became resource-exhausted, their BGP sessions destabilized, and connectivity between the backbone and the access layer collapsed.

Rogers Network: Before and After Filter Removal BEFORE: Filter in place Backbone filtered Distribution Layer ~500 internal routes Access Layer AFTER: Filter removed Backbone 900K routes Distribution Layer FIB overflow — crash X Access Layer 12M customers lose connectivity when access layer is isolated.

The Self-Reinforcing Lockout

Once the distribution layer collapsed, Rogers engineers discovered the same problem that Facebook's engineers encountered in 2021: the tools needed to fix the network depend on the network being up. Remote management access, network monitoring systems, ticketing tools, and internal communications platforms all relied on the same Rogers infrastructure that had failed. Engineers were locked out of their own network.

Unlike a typical outage where an engineer can SSH into a router from a laptop at home, this failure required physical presence at equipment racks. The scale made this enormously slow: Rogers' distribution layer comprises dozens of nodes across Canada, from Vancouver to Halifax. Restoring service required technicians to visit each site, gain physical console access, and manually reconfigure the routing policies — a process that could not be parallelized quickly enough to avoid an extended outage.

National Impact: Interac, 911, and Banking

The social and economic consequences of the Rogers outage extended far beyond internet and mobile service disruption. Several critical national systems depend on Rogers' network for connectivity:

Interac debit payments. Canada's national debit payment network, Interac, routes transactions through telecommunications infrastructure. A significant portion of Interac's transaction routing used Rogers' network. When Rogers failed, point-of-sale terminals across the country could not process debit payments — affecting grocery stores, gas stations, pharmacies, and virtually every retail environment. Many businesses that relied on Interac were forced to accept cash only or turn customers away. The precise scope was not fully disclosed publicly, but the disruption was nationwide.

911 emergency calls. In several Canadian provinces, 911 call routing infrastructure routes over Rogers' network. The failure meant that some Rogers wireless subscribers attempting to call 911 reached a busy signal or failed to connect. Governments in multiple provinces issued public advisories telling residents to use landlines or competitor networks if they needed emergency services. The failure of 911 infrastructure is treated with particular seriousness by regulators; Rogers faced specific questions about this in the CRTC proceedings that followed.

Banking and financial services. Several Canadian banks reported service disruptions for online banking and ATM networks during the outage. The extent varied by institution depending on their routing diversity and dependency on Rogers infrastructure.

CRTC Investigation and Regulatory Fallout

The Canadian Radio-television and Telecommunications Commission (CRTC) — Canada's telecommunications regulator — launched an immediate investigation. Rogers was ordered to appear before the CRTC and explain the outage, its causes, and remediation plans. The CRTC's concerns centered on three areas:

  1. Critical infrastructure resilience. How can a single configuration change take down a national network? What redundancy existed and why did it not protect against this failure?
  2. 911 service reliability. Telecommunications regulations impose strict obligations on carriers regarding emergency services. Failure of 911 routing triggered potential regulatory sanctions and requirements for network architecture changes.
  3. Critical sector interdependencies. The Interac failure exposed how deeply a single carrier's network is embedded in national financial infrastructure. Regulators began examining whether critical systems should require routing diversity across multiple carriers.

The CRTC imposed new emergency preparedness requirements on Rogers and the other major Canadian carriers, including requirements for network-to-network interconnection agreements that would allow traffic to be rerouted to competitor networks during outages.

Lessons: Staged Rollouts and Out-of-Band Management

Change management and staged rollouts

The filter removal was applied network-wide during a single maintenance window. A staged rollout — applying the change to one distribution node, monitoring for 15 minutes, then proceeding — would have caught the problem after the first node crashed rather than after all of them crashed. This is a standard principle in network change management that clearly was not followed. Post-incident, Rogers and other carriers have strengthened their change management procedures to require canary deployments and mandatory soak periods for any routing policy changes.

Out-of-band management networks

Out-of-band (OOB) management is a physically separate network — typically a dedicated cellular connection, serial console server, or management-plane-only MPLS VPN — that allows engineers to access routers even when the production data plane is completely down. The Rogers outage demonstrated that their OOB management paths were insufficient in scope, apparently relying in part on the same infrastructure that failed. Proper OOB architecture requires independent physical paths (different fiber, different carrier, or cellular) that do not transit the equipment under management.

Routing filter governance

The critical routing filter that was removed should have been protected by configuration management tooling that prevents its removal without explicit multi-party authorization. Infrastructure-as-code practices, mandatory peer review for routing policy changes, and automated pre-flight simulation (verifying that a proposed change would not cause route flooding before applying it) are all standard techniques that would have caught this specific failure mode.

BGP Perspective: What the World Saw

From external BGP route collectors, the Rogers outage manifested as a sudden reduction in prefix visibility from AS812. Some prefixes remained reachable via alternate paths (Rogers has peering at multiple IXPs and some prefixes may have been reachable via transit), but a large fraction of Rogers' address space became unreachable from the perspective of external networks as the distribution layer failure propagated. BGP monitoring services recorded the event in real time.

The outage differs from a classic BGP withdrawal event like the Facebook 2021 outage — where routes were explicitly withdrawn — because the Rogers failure was an internal collapse. External routes to Rogers were still being announced by backbone routers, but the internal infrastructure needed to actually deliver packets to customers was non-functional. From outside Rogers, the prefixes were visible but unreachable — a more difficult failure mode to diagnose from BGP data alone.

Regulatory Aftermath and Industry Changes

The CRTC's investigation concluded that Rogers had violated its regulatory obligations regarding the reliability of emergency services. Beyond the immediate sanctions, the outage accelerated several regulatory and industry changes:

Mandatory network interconnection. The CRTC moved to require that major carriers maintain bilateral roaming and interconnection agreements that would allow customers to use competitor networks for voice calls (including 911) when their primary carrier's network fails. This is similar to emergency roaming agreements that exist in other countries but had not been mandated in Canada.

Critical infrastructure resilience standards. The outage prompted a broader examination of which national systems — payment processing, emergency services, health care networks — depend on single carrier infrastructure without adequate redundancy. The Interac failure, in particular, became a case study in systemic risk from carrier concentration.

Change management audits. The CRTC required Rogers to submit detailed reports on its change management procedures and demonstrate that improvements had been implemented. Other carriers were asked to review their own procedures for managing routing policy changes, particularly those that could affect routing table capacity.

The Rogers outage, like the Optus outage in Australia in 2023, demonstrates that even mature national carriers with decades of operational experience can suffer catastrophic failures from routine maintenance mistakes — and that the consequences of such failures extend far beyond degraded internet access into critical national infrastructure.

Explore It Live

Examine Rogers' current network state and the scale of infrastructure involved in this outage:

Enter any Canadian IP address or AS number in the lookup tool to see real-time BGP routing data, including the upstream transit paths and peering relationships that determine how routing failures propagate. The mechanics of BGP route leaks and the importance of RPKI prefix origin validation are directly relevant to understanding what safeguards would have bounded the impact of the Rogers filter removal had they been in place.

See BGP routing data in real time

Open Looking Glass
← Previous Optus BGP Outage 2023: How a Routing Change Took Down an Entire National Carrier
More Articles
The Pakistan YouTube BGP Hijack (2008)
The Facebook DNS Outage (October 2021)
The Cloudflare-Verizon BGP Leak (2019)
The AWS S3 Outage (February 2017)
The Dyn DNS DDoS Attack and Mirai Botnet (2016)
The CenturyLink/Level3 Flowspec Outage (2020)