Top 10 outages of 2022

Uncategorized

The most significant network and service outages of 2022 had far-reaching consequences. Flights were grounded, virtual meetings cut off, and communications hindered.

The culprits that took down major infrastructure and services providers were varied, too, according to analysis from ThousandEyes, a Cisco-owned network intelligence company that tracks internet and cloud traffic. Maintenance-related errors were cited more than once: Canadian carrier Rogers Communications experienced a massive nationwide outage that was traced to a maintenance update, and a maintenance script error caused problems for software maker Atlassian.

BGP misconfiguration also showed up in the top outage reports. Border gateway protocol tells Internet traffic what route to take, but if the routing information is incorrect, then traffic can be diverted to an improper route, which happened to Twitter. (Read more about US and worldwide outages in our weekly internet health check.)

Here are the top 10 outages of the year, organized chronologically.

British Airways lost online systems: Feb. 25

British Airways’ online services were inaccessible for hours on Feb. 25, causing hundreds of flight cancellations and interrupting airline operations. Flights couldn’t be booked, and travelers couldn’t check in to flights electronically. The airline was reportedly forced to return to paper-based processes when its online systems became inaccessible, and the impact was felt globally. “Our monitoring showed that the network paths to the airline’s online services (and servers) were reachable, but that the server and site responses were timing out,” ThousandEyes said in its outage analysis, which blamed unresponsive application servers – rather than a network issue – for the outage.

“The nature of the issue, and the airline’s response to it, suggests the root cause is likely to be with a central backend repository that multiple front-facing services rely on. If that is the case, this incident may be a catalyst for British Airways to re-architect or deconstruct their backend to avoid single points of failure and reduce the likelihood of a recurrence. Equally possible, however, is that the chain of events that led to the outage is a rare occurrence and can be mostly controlled in future. Time will tell,” ThousandEyes said.

Twitter felled by BGP hijack: March 28

Twitter was unavailable for some users for about 45 minutes on March 28 after JSC RTComm.RU, a Russian Internet and satellite communications provider, improperly announced one of Twitter’s prefixes (104.244.42.0/24) and, as a result, traffic that was destined for Twitter was rerouted for some users and failed. Access to Twitter’s service was restored for impacted users after RTComm’s BGP announcement was withdrawn. ThousandEyes notes that BGP misconfigurations can be used to block traffic in a targeted way – however it’s not always easy to tell when the situation is accidental versus intentional.

“We know that the March 28th Twitter event was caused by RTComm announcing themselves as the origin for Twitter’s prefix, then withdrawing it. While we don’t know what led to the announcement, it’s important to understand that accidental misconfiguration of BGP is not uncommon, and given the ISP’s withdrawal of the route, it’s likely that RTComm did not intend to cause a globally impacting disruption to Twitter’s service. That said, localized manipulation of BGP has been used by ISPs in certain regions to block traffic based on local access policies,” ThousandEyes said in its outage analysis.

One way for organizations to deal with route leaks and hijacks is to monitor for…

Source

Leave a Reply

Your email address will not be published. Required fields are marked *