Login to your account

Username *
Password *
Remember Me

Blog

FAA NOTAM outage should scare all of us into (finally) testing our DRPs

FAA NOTAM outage should scare all of us into (finally) testing our DRPs

The U.S. air travel system experienced its worst meltdown in years last week after a database in the Notice to Air Missions (NOTAM) service – a system designed to advise pilots of conditions at their destination airports before they are allowed to depart – failed.

 While initial fears that this was a cyberattack were soon quashed, the actual root cause could prove to be just as disturbing – because it suggests structural weaknesses in how the information technology needs of the world’s busiest airspace are being met. 

one bad file

The Federal Aviation Administration has confirmed the event was triggered by an employee error that caused a corrupt file in the NOTAM system’s database. Worse, the backup system that would have been used to restore the database also contained the same corrupt file.

This should make IT practitioners cringe – I know I sure did. That’s because the presence of a corrupt backup strongly suggests the FAA’s disaster recovery plan (DRP) had likely not been tested in the leadup to this failure. Because the only way IT leaders would have known they had a problematic backup would have been if they had walked through a test-recovery scenario.

In this case, the undiscovered corrupt backup guaranteed a much longer recovery in the event of an outage - which is precisely what played out.

In the weeks and months ahead, FAA officials will undoubtably be held accountable as the U.S. Department of Transportation moves ahead with its investigation. And rightly so, as this colossal debacle sidelined thousands of flights in the U.S. and slowed service from dozens of countries worldwide.

But we don’t have to be working in the aviation industry to appreciate why understanding this event is so critical to any organization that uses technology.

 

the universal truth

We all (should) have DRPs - or business continuity plans (BCPs) – in one form or another. In a report from iLand and Zerto, only 54% of survey respondents said they had a full, company-wide DRP in place.

That’s a frightening figure. And in 2023, we’re long past the point where technology is a luxury. It is a critical pillar of every organization’s ability to deliver, compete, and survive. And organizations that choose to fly without a DRP are quite literally playing with fire.

With this in mind, it’s safe to say we can all be doing more to incorporate the potential for failure into our technology and business planning processes, and ensuring the DRP is a core piece of our IT strategy and culture.

A well-formed DRP understands what can fail, how it can fail, and in what context. But simply documenting what can go wrong isn’t enough. While documenting failure modes is crucial, the actual document can’t simply be something that gets tossed onto a shelf and dusted off every once in a while, if at all.

Any reasonable DRP must incorporate recovery, as well – steps the organization must take to ensure continuity in the event of a natural, accidental, or human-caused disaster. The recovery component should identify specific accountabilities, communication flows, and checkpoints. Ideally, it will include a detailed schedule for real-world recovery testing, with recommended intervals to ensure key leaders, staff, contractors and other stakeholders are conditioned to execute the plan if and when an actual disaster occurs.

The FAA learned the hard way that recovery testing, like a good insurance policy, can save heartache later on. Any organization runs an elevated risk of losing revenue, alienating customers, and damaging the brand if it fails to have the appropriate disaster response resources and capabilities in place. 

 

THE best advice i ever received

Earlier in my career, I was an IT project manager and application development support lead. My mentor at the time said we had two choices when considering how we wanted to support the business areas that relied on us: We could either plan to fail, or we could fail to plan.

The FAA now finds itself on the wrong side of my mentor’s guidance. It needs a better plan, and it needs to test that plan to ensure it’s ready for the inevitable. That means looking at the 30-year-old system at the center of this debacle and fast-tracking plans to update it to current-century standards – and doing so not six years from now, as currently planned, but now.

It also has implications for the rest of us, in any organization and in any industry. Because we need to plan our own plans, too. Because eventually, we’ll all find ourselves needing to recover from an outage or an attack, and no one ever wants to find out the hard way that their backups can’t be restored.

If you’re not sure where your own DRP stands, we’re always here to help you get started.

 

Read 175 times Last modified on Wednesday, 18 January 2023 14:13
Rate this item
(0 votes)
5 Tips for Creating a Great UX  - STEP Software Inc. - Custom Software Development https://t.co/I4cPf4ngRS https://t.co/PmPDcrLJwr


Our exceptional talented developers and supportive team, combined with our highly effective, well-developed methodology has provided custom applications to Fortune 500 corporations and entrepreneurial companies.

 

Latest Posts from Blog

Italian ransomware attack highlights bad patch management

Italian ransomware attack...

A major ransomware attack Sunday kicked the majori...

Here’s why it takes so long to write great code

Here’s why it takes so lo...

“You want how much?” Software developers often hea...

Creative Disruption – why we should embrace change, not fear it

Creative Disruption – why...

I did a lot of reading and reflection over the hol...

Tech layoffs don’t mean that the sky is falling anytime soon

Tech layoffs don’t mean t...

There’s no denying that the headlines are jarring:...

FAA NOTAM outage should scare all of us into (finally) testing our DRPs

FAA NOTAM outage should s...

The U.S. air travel system experienced its worst m...

A new year means a new approach to cybersecurity

A new year means a new ap...

Forgive us, but here at STEP Software, we’re optim...

Custom software or out of the box? The 5 reasons why custom could be the answer for your business

Custom software or out of...

If there is one truth in today’s technology-driven...

Southwest Airlines meltdown: a reminder why software maintenance matters

Southwest Airlines meltdo...

When a massive winter storm roared across much of...

We look at 2022’s top 10 tech stories – and why they matter

We look at 2022’s top 10...

As we wind down 2022 and prepare for some long-ove...

Staff Augmentation Part 3 – Way more than outsourcing

Staff Augmentation Part 3...

Note: This article is the third in our ongoing ser...

5 suggestions for reviewing your software before the new year dawns

5 suggestions for reviewi...

Whatever business you’re in, it’s reasonable to as...

STEP Software celebrates the 2022 holiday season!

STEP Software celebrates...

As we head into the 2022 holiday season, we at STE...