These major outages from Q3 of 2018 may have been avoided with a solid, frequently tested Disaster Recovery (DR) plan.
High availability has become an uncompromisable aspect of business. Having a solid DR plan that is tested regularly means that if an IT outage does occur, your organization will recover within minutes, thereby avoiding the customer distrust and anger that occur with more protracted outages. The notable outages among financial institutions and social media this past quarter caused customers to question the handling of their personal data by the companies experiencing outages, as well as their reliance on any organization that can’t maintain business continuity. Downtime, even when under an hour in duration (as some of the outages below) is no longer considered reasonable. As we enter the last quarter of 2018, here are six major reasons to reevaluate your DR plan, run a DR test, and feel confident about your organization’s DR solution before 2019:
1. Technical Glitch Affects Disney Parks on Both Coasts
When: July 6, 2018
Duration: 4 hours
What Happened: A network outage caused problems for summer visitors to Disney theme parks in both Orlando and Anaheim. Visitors were unable access the Walt Disney World website, the My Disney Experience app (the Disneyland app), the Plays Disney Parks app, or the Fast Pass/MaxPass system during the outage. ESPN, owned by Disney, was also affected by the outage, causing the site to display out-of-date scores for games.
— 🤦🏻♂️ DJ-Kim = 🇺🇸+🇰🇷+🤓 (@djjkim) July 6, 2018
2. Customers Hit Out at Britain’s TSB Bank After Second IT Outage
When: July 10, 2018 and throughout September 2018
Duration: 4 hours and intermittently thereafter
What Happened: The July outage came just as TSB was returning to normalcy after an April attempt to migrate its customer base to a new IT system left thousands of customers locked out of their accounts, with some unable to access their accounts for over a week or make vital payments, and others falling victim to fraud. Intermittent downtime and outages continued to plague TSB throughout the quarter, with CEO Paul Pester ultimately stepping down in early September amid customer outrage regarding ongoing IT difficulties.
I’m a software engineer, I’d love to know how you can have such awful network uptime for your customers… Based on a quick analysis I’d guess your database is down, but do you *seriously* not have a failover server? I’m strongly tempted to leave TSB for another bank.
— Phil Gibson (@imphilgibsonok) September 28, 2018
3. MasterCard Customers Suffer Outages Around the World
When: July 12, 2018
Duration: 1.5 hours
What Happened: Although Mastercard has apologized for an outage that caused customer payments to be blocked around the world, they have refused to share the cause of this extensive outage. Coming on the heels of a major Visa outage last quarter, regulators are starting to intensify scrutiny of these payments systems, and cryptocurrency supporters are having a field day.
— Shazleen Sanders (@shazleensanders) July 12, 2018
4. Instagram Suffers Global Outage
When: July 13, 2018
Duration: 45 minutes
What Happened: Social media users made a mad rush to Twitter when Instagram’s site and app went down worldwide. The most recent October 3 Instagram outage, which we’ll cover in our Q4 outages post, came just days after Instagram’s co-founders resigned from Facebook, and a day after a new boss was appointed. An estimated 500 million people who access the site daily are affected by Instagram outages.
— Mrs Maria Iavarone MEd (@mrsiavarone) July 13, 2018
5. Outage Causes American Airlines Flight Delays Nationally
When: July 29, 2018
Duration: 40 minutes
What Happened: A connectivity issue at one of American Airlines’ data centers, between its main operating system and its dispatch operation, resulted in a full nationwide ground stop for 40 minutes. American Airlines customers weren’t the only passengers inconvenienced by delays this quarter, though. A Delta IT systems outage resulted in a similar ground stop, Southwest Airlines had a domestic ticket counter outage at BWI, and a British Airlines supplier’s IT system outage resulted in flights being delayed by up to 15 hours.
Traffic delays to Orlando, arrived 30 minutes later than planned, my luggage is special & got selected to be searched & now American Airlines has a systemwide computer system outage!!! I think the Universe wants me to extend my Florida vacay!!! 😎
— Amy Pope Fitzgerald (@live2runbfree) July 29, 2018
6. SunTrust Patches Glitch That Knocked Out Mobile, Online Banking
When: September 15, 2018
Duration: 2 days
What Happened: A routine system upgrade resulted in an outage lasting more than two days. SunTrust provided no additional details about the outage or its cause, except to say that it was not a cyberattack or a weather-related problem (with Hurricane Florence raging at the time of the outage). The bank announced that it will refund all ATM fees and surcharges, as well as late fees related to delayed payments, due to the downtime, but has yet to alleviate customer concern about the safety of their personal data.
Going on 2 days with online access to @SunTrust still down. The fact that they can’t roll back a “normal system upgrade” doesn’t give me warm fuzzies about their tech team’s ability to safeguard my information. @AskSunTrust #SunTrust
— Brian Gass (@gassmeister) September 17, 2018