As Good As It Gets

By Jack Ganssle

As Good As It Gets

How good does firmware have to be? How good can it be? Is our search for perfection, or near-perfection an exercise in futility?

Complex systems are a new thing in this world. Many of us remember the early transistor radios which sported a half dozen active devices, max. Vacuum tube televisions, common into the 70s, used 15 to 20 tubes, more or less equivalent to about the same number of transistors. The 1940s-era ENIAC computer required 18,000 tubes, so many that technicians wheeled shopping carts of spares through the room, constantly replacing those that burned out. Though that sounds like a lot of active elements, even the 25 year old Z80 chip used a quarter of that many transistors, in a die smaller than just one of the hundreds of thousands of resistors in the ENIAC.

Now the Pentium IV, merely one component of a computer, has 45 million transistors. A big memory chip might require a third of a billion. Intel predicts that later this decade their processors will have a billion transistors. I'd guess that the very simplest of embedded systems, like an electronic greeting card, requires thousands of active elements.

Software has grown even faster, especially in embedded applications. In 1975 10,000 lines of assembly code was considered huge. Given the development tools of the day - paper tape, cassettes for mass storage, and crude teletypes for consoles - working on projects of this size was very difficult. Today 10,000 lines of C - representing perhaps 3 to five times as much assembly - is a small program. A cell phone might contain a million lines of C or C++, astonishing considering the device's small form factor and miniscule power requirements.

Another measure of software size is memory usage. The 256 byte (that's not a typo) EPROMs of 1975 meant even a measly 4k program used 16 devices. Clearly, even small embedded systems were quite pricey. Today? 128k of Flash is nothing, even for a tiny app. The switch from 8 to 16 bit processors, and then from 16 to 32 bitters, is driven more by addressing space requirements than raw horsepower.

In the late 70s Seagate introduced the first small Winchester hard disk, a 5 Mb 10 pound beauty that cost $1500. 5 Mb was more disk space than almost anyone needed. Now 20 Gb fits into a shirt pocket, is almost free, and fills in the blink of an eye.

So, our systems are growing rapidly in both size and complexity. And, I contend, in failure modes. Are we smart enough to build these huge applications correctly?

It's hard to make even a simple application perfect; big ones will possibly never be faultless. As the software grows it inevitably becomes more intertwined; a change in one area impacts other sections, often profoundly. Sometimes this is due to poor design; often, it's a necessary effect of system growth.

The hardware, too, is certainly a long way from perfect. Even mature processors usually come with an errata sheet, one that can rival the datasheet in size. The infamous Pentium divide bug was just one of many bugs - even today the Pentium 3's errata sheet (renamed "specification update") contains 83 issues. Motorola documents nearly a hundred problems in the MPC555.

I salute the vendors for making these mistakes public. Too many companies frustrate users by burying their mistakes.

What is the current state of the reliability of embedded systems? No one knows. It's an area devoid of research. Yet a lot of raw data is available, some of which suggests we're not doing well.

The Mars Pathfinder mission succeeded beyond anyone's dreams, despite a significant error that crashed the software during the lander's descent. A priority inversion problem - noticed on Earth but attributed to a glitch and ignored - caused numerous crashes. A well-designed watchdog timer recovery strategy saved the mission. This was a very instructive failure as it shows the importance of adding external hardware and/or software to deal with unanticipated software errors.

The August 15, 2001 issue of the Journal of the American Medical Association contained a study of recalls of pacemakers and implantable cardioverter-defibrillators. (Since these devices are implanted subcutaneously I can't imagine how a recall works). Surely designers of these devices are on the cutting edge of building the very best software. I hope. Yet between 1990 and 2000 firmware errors accounted for about 40% of the 523,000 devices recalled.

Over the ten years of the study, of course, we've learned a lot about building better code. Tools have improved and the amount of real software engineering that takes place is much greater. Or so I thought. Turns out that the annual number of recalls between 1995 and 2000 increased.

In defense of the pacemaker developers, no doubt they solve very complex problems. Interestingly, heart rhythms can be mathematically chaotic. A slight change in stimulus can cause the heartbeat to burst into quite unexpected randomness. And surely there's a wide distribution of heart behavior in different patients.

Perhaps a QA strategy for these sorts of life-critical devices should change. What if the responsible person were one with heart disease! who had to use the latest widget before release to the general public?

A pilot friend tells me the 747 operator's manual is a massive tome that describes everything one needs to know about the aircraft and its systems. He says that fully half of the book documents avionics (read: software) errors and workarounds.

The Space Shuttle's software is a glass half-empty/half-full story. It's probably the best code ever written, with an average error rate of about one per 400,000 lines of code. The cost: $1000 per line. So, it is possible to write great code, but despite paying vast sums perfection is still elusive. Like the 747, though, the stuff works "good enough", which is perhaps all we can ever expect.

Is this as good as it gets?

The Human Factor

Let's remember we're not building systems that live in isolation. They're all part of a much more complex interacting web of other systems, not the least of which is the human operator or user. When tools were simple - like a hammer or a screwdriver - there weren't a lot of complex failure modes. That's not true anymore. Do you remember the USS Vincennes? She is a US Navy battle cruiser, equipped with the incredibly sophisticated Aegis radar system. In July, 1988 the cruiser shot down an Iranian airliner over the the target wasn't an incoming enemy warplane, but the data was displayed on a number of terminals that weren't easy to see. So here's a failure where the system worked as designed, but the human element created a terrible failure. Was the software perfect since it met the requirements?

Unfortunately, airliners have become common targets for warplanes. This past October a Ukrainian missile apparently shot down a Sibir Tu-154 commercial jet, killing all 78 passengers and crew. As I write the cause is unknown, or unpublished, but local officials claim the missile had been targeted on a close-by drone. It missed, flying 150 miles before hitting the jet. Software error? Human error?

The war in Afghanistan shows the perils of mixing men and machines. At least one smart bomb missed its target and landed on civilians. US military sources say wrong target data was entered. Maybe that means someone keyed in wrong GPS coordinates. It's easy to blame an individual for mistyping! but doesn't it make more sense to look at the entire system as a whole, including bomb and operator? Bombs have pretty serious safety-critical aspects. Perhaps a better design would accept targeting parameters in a string that includes a checksum, rather like credit card numbers. A mis-keyed entry would be immediately detected by the machine.

It's well-known that airplanes are so automated that on occasion both pilots have slipped off into sleep as the craft flies itself. Actually, that doesn't really bother me much, since the autopilot beeps when at the destination, presumably waking the crew. But, before leaving the fliers enter the destination in latitude/longitude format into the computers. What if they make a mistake (as has happened)? Current practice requires pilot and co-pilot to check each other's entries, which will certainly reduce the chance of failure. Why not use checksummed data instead and let the machine validate the data?

Another US vessel, the Yorktown, is part of the Navy's "Smart Ship" initiative. Hugely automating the engineering (propulsion) department reduces crew needs by 10% and saves some $2.8 million per year on this one ship. Yet the computers create new vulnerabilities. Reports suggest that an operator entered an incorrect parameter which resulted in a divide-by-zero error. The entire network of Windows NT machines crashed. The Navy claims the ship was dead in the water for about three hours; other sources claim it was towed into port for two days of system maintenance. Users are now trained to check their parameters more carefully. I can't help wonder what happens in the heat of battle, when these young sailors may be terrified, with smoke and fire perhaps raging. How careful will the checks be?

Some readers may also shudder at the thought of NT controlling a safety-critical system. I admire the Navy's resolve to use a commercial, off the shelf product, but wonder if Windows, which is the target of every hacker's wrath, might not itself create other vulnerabilities. Will the next war be won by the nation with the best hackers?

A plane crash in Florida, in which software did not contribute to the disaster, was a classic demonstration of how difficult it is to put complicated machines in the hands of less-than-perfect people. An instrument lamp burned out. It wasn't an important problem, but both pilots became so obsessed with tapping on the device they failed to notice that the autopilot was off. The plane very gently descended till it crashed, killing everyone.

People will always behave in unpredictable ways, leading to failures and disasters with even the best system designs. As our devices grow more complex their human engineering becomes ever more important. Yet all too often this is neglected in our pursuit of technical solutions.

Solutions?

I'm a passionate believer in the value of firmware standards, code inspections, and a number of other activities characteristic of disciplined development. It's my observation that an ad hoc or a non-existent process generally leads to crummy products. Smaller systems can succeed from the dedication of a couple of overworked experts, but as things scale up in size heroics becomes less and less successful.

Yet it seems an awful lot of us don't know about basic software engineering rules. When talking to groups I usually ask how many participants have (and use) rules about the maximum size of a function. A basic rule of software engineering is to limit routines to a page or less. Yet only rarely does anyone raise their hand. Most admit to huge blocks of code, sometimes thousands of lines. Often this is a result of changes and revisions, of the code evolving over the course of time. Yet it's a practice that inevitably leads to problems.

By and large methodologies have failed. Most are too big, too complex, or too easy to thwart and subvert. I hold great hopes for UML, which seems to offer a way to build products that integrates hardware and software, and that is an intrinsic part of development from design to implementation. But UML will fail if management won't pay for quite extensive training, or toss the approach when panic reigns.

The FDA, FAA, and other agencies are slowing becoming aware of the perils of poor software, and have guidelines that can improve development. Britain's MISRA (Motor Industry Software Reliability Association) has guidelines for the safer use of C. They feel that we need to avoid certain constructs and use others in controlled ways to eliminate potential error sources. I agree. Encouragingly, some tool vendors (notably Tasking) offer compilers that can check code against the MISRA standard. This is a powerful aid to building better code.

I doubt, though, that any methodology or set of practices can, in the real world of schedule pressures and capricious management, lead to perfect products. The numbers tell the story. The very best use of code inspections, for example, will detect about 70% of the mistakes before testing begins. (However, inspections will find those errors very cheaply). That suggests that testing must pick up the other 30%. Yet studies show that often testing checks only about 50% of the software!

Sure, we can (and must) design better tests. We can, and should, use code coverage tools to ensure every execution path runs. These all lead to much better products, but not to perfection. Because all of the code is known to have run doesn't mean that complex interactions between inputs won't lead to bizarre outputs. As the number of decision paths increases - as the code grows - the difficulty of creating comprehensive tests skyrockets.

When time to market dominates development, quality naturally slips. If low cost is the most important parameter, we can expect more problems to slip into the product.

Software is astonishingly fragile. One wrong bit out of a hundred million can bring a massive system down. It's amazing that things work as well as they do!

Perhaps the nature of engineering is that perfection itself is not really a goal. Products are as good as they have to be. Competition is a form of evolution that often does lead to better quality. In the 70s Japanese automakers, who had practically no US market share, started shipping cars that were reliable and cheap. They stunned Detroit, which was used to making a shoddy product which dealers improved and customers tolerated. Now the playing field has leveled, but at an unprecedented level of reliability.

Perfection may elude us, but we must be on a continual quest to find better ways to build our products. Wise developers will spend their entire careers engaged in the search.