Follow @jack_ganssle

The logo for The Embedded Muse For novel ideas about building embedded systems (both hardware and firmware), join the 27,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype, no vendor PR. It takes just a few seconds (just enter your email, which is shared with absolutely no one) to subscribe.

By Jack Ganssle

Catastrophe Disentanglement

Published 10/16/2006

Correspondent Wil Blake sent along a link to an article with the compelling name "An Introduction to Catastrophe Disentanglement ( ). It's worth reading, especially if you've ever had to bail out a project that's in dire straights.

In it the author describes ten steps to take when a software project is out of control. Start by stopping development. Yet in my experience management rarely has the courage to do this. Instead increasingly-desperate attempts to reign in a spiraling schedule always happen within the context of "work harder, faster, more hours." Sometimes features are appropriately sacrificed, but work continues unabated.

The author's fourth step is to "evaluate the team." Though that's a wise move it glosses over an important part of the evaluation: fire someone or some group of people. In many cases - not all - part of the team knowingly made bad decisions. Management creates schedules based on fantasies or sales squeezes in an impossible load of features. Catastrophes are ultimately people problems (except in the unusual case where one is stumped by a previously-unknown bit of science) so the people have to change or be changed.

Such firings are rare, of course, with predictable results. The pressures that originally created unrealistic estimates are now exacerbated. The same people tend to respond to the same pressures in the same dysfunctional manner.

Here's an example from last year. A large company spent 4 years developing a replacement for a legacy embedded system. They had hundreds of thousands of lines of code that "worked" but was riddled with more bugs than the American embassy in the USSR. Dozens of developers were each working 60 to 80 hours a week stamping out defects, but as the testing grew more realistic problems mushroomed. It became clear that the requirements were vague and poorly understood. The product had to work within regulatory frameworks that varied widely in every country and even county to county in the US. Often some very simpleminded tricks can reveal a lot about a product; in this case a grep showed over 2000 conditional compiler directives used to alter the system's performance based on locality, yet the test regime only included 150 different conditions. Clearly there was still an awful lot of unvalidated code, likely to be as teeming with bugs as the rest had been.

For a number of reasons I recommended they toss the entire code base and start over, this time using reasonable software processes. And I gave them a short list of people to either fire or move to another group.

They did start over. But with the same team and the same management, who immediately proposed an unrealistic schedule and again bypassed any sort of serious requirements gathering and analysis. Last week I heard from one developer who told me the project is once again a quagmire. Sadly, I wasn't surprised.

People problems doom big software projects. We have only three choices: replace, retrain or repeat the whole dismal experience.