 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
Margin
Published 9/23/2005
The video made my jaw drop. Flight 292's nosegear was cocked sideways, twisted 90 degrees from its normal position. As the pilot began his approach I wanted to turn away but was glued to the screen in fascinated horror. The wheel touched down, smoked, burst into flame and the tire tore away, nothing but metal grinding along the runway.
Astonishingly, the strut held fast.
Those seconds summed up a lot of the nature of engineering. The strut held fast, for loads that far exceed anything the plane experiences in normal operation. Engineers designed enough margin into the system to handle almost unimaginable and unanticipated forces.
It also shows the human side of the mechanical world. This failure had apparently been experienced before by other Airbus 320s. Something went wrong with the system used by the air industry to eliminate known defects.
I'm struck by the difference between failures in mechanical systems and those in computer programs. Software is topsy-turvy. Mechanical engineers can beef up a strut to add margin, to handle unexpected loads. EEs specify components heftier than needed and wires that can take more current than anticipated. They handle surges with fuses and weak links.
In software if just one bit out of hundreds of millions is wrong the application completely crashes. Margin is difficult, perhaps impossible, to add. Exception handlers can serve as analogs to fuses, but they're notoriously hard to test and generally have a bug rate far higher than that of the application.
Worse, we write code with the assumption that everything will work and there won't be any unexpected inputs. So buffer overflows are rampant. This complacent attitude isn't exclusive to desktop developers; after a software error destroyed Ariane 5 the review board cited a culture that assumed software can't fail. If it works in test it will work forever.
A plane, bridge and dare I say levee must have a reliability vanishingly close to 100%. So mechanical engineers design a structure that takes 110% or 150% of expected loads.
Many software apps require just as much reliability. But we can't add margin, so must build code that's 99.999% correct or better.
Yet humans aren't good at perfection. In school a 90% is an "A". If our code earned an "A," a million line-of-code program would have 100,000 errors.
Software is inherently fragile. We can, and must, add great exception handlers and use the very best methods to produce correct code. But until we find a way to make code that is more robust than the environment it's in, the elusive goal of perfection is our only hope.
What do you think? How can we add design margin to code?
|