For novel ideas about building embedded systems (both hardware and firmware), join the 40,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. Click here to subscribe.

Fear of Flying

Summary: Flying is incredibly safe because accidents are analyzed. How about firmware engineering?

The last few months haven't been great for the aviation industry. A spike in accidents has many people alarmed.

My wife is one of them. While she's not afraid to fly, she is a somewhat fearful flyer. After one quite bumpy approach and landing in Denver Marybeth said she'd never go to, or via, that city again. To date, she hasn't.

I fly a lot, having logged 3 or 4 million miles over the decades. I've heard people scream, pray, have had seatmate strangers grip my hand in turbulence, and have been on flights where a passenger passed away, though not as a result of the flying. On a couple of trips we've landed with everyone in crash position, once I watched flames leaping from a wheel, and on another memorable journey the pilot did four go-arounds before getting the plane down on the fifth try. During the first Gulf War (I'm starting to lose count of them) on a non-stop from Munich to London the plane went into a crash dive and made an unexpected and abrupt landing in Dusseldorf where we were marched through metal detectors, Americans getting additional scrutiny, before reboarding. No explanations were given.

But those cases are a handful compared to vast majority of flights I've been on, most of which are characterized by boredom and fatigue. Even the exciting ones ended with us shuffling out the jetway with not a single injury other than perhaps some GI discomfort from the so-called "meals."

In 1956 a Super Constellation collided with a DC-7 over the Grand Canyon, killing 128 people. Partly as a result of this accident the CAA morphed into the FAA and mandated "black boxes" on commercial aircraft. The idea is that we need to learn from these disasters, and one part of that is to instrument the aircraft with survivable telemetry. The results have been stunning:

fear of flying graph

When Air France 447 went down in the Atlantic authorities searched for the black boxes for two years before finding them. Some $50 million in extra funding has just been approved to extend the search for Malaysia Air 370's recorders.

Is there any other industry that is willing to spend so much to avoid making the same mistake twice?

Last week I reviewed Bertrand Meyer's book Agile!. Some Agile methods require retrospectives, a practice that makes a lot of sense. A retrospective is one form of a black box for a software engineering effort: we devote time and resources to learn from our failures. We collect metrics during the project - that is, we instrument the effort - and use those numbers and more qualitative parameters to constantly improve.

A project might consume hundreds of thousands of dollars (or much more) of engineering resources. How foolish it is that so many aren't willing to invest a tiny fraction of that in a retrospective as a "force multiplier" to save a bundle on future projects!

It's easy to dismiss instrumenting projects as a feel-good practice without demonstrated benefits. I feel passionately that engineering without numbers is really art, and despair that so many of us are willing to argue for practices that are not substantiated by metrics. So here's one example of many from a company I worked with. Their instrumentation included bugs per thousand lines of code over seven quarters. Each quarter the results were analyzed to tune their engineering:

fear of flying bug rate

Bugs per KLOC over 7 quarters.

The cost to collect the data and inoculate their engineering? Negative. After about two years schedules had been shortened by 40%. This is a very old and well-known adage from the quality movement: higher quality products cost less. It has repeatedly been shown to be true in software engineering as well.

Does your team have a metaphorical black box? Do you perform retrospectives? Do you collect any metrics? Why or why not?

This discussion reminds me of a story:

A 747 is flying across the Atlantic when an engine fails. The pilot gets on the PA and assures the passengers that the aircraft is perfectly able to fly on three engines; however, they will be about twenty minutes late arriving at their destination. A little while later a second engine fails, and the pilot makes the same announcement. This time, he says they will now be about 40 minutes late arriving at their destination. A third engine fails, and the pilot says their arrival will be about an hour late. One passenger turns to another and says, "If that last engine fails, we'll be up here all day!"

Published July 30, 2014