On Management

Originally in Embedded Systems Programming, October, 1999.

By Jack Ganssle

The voice on the other end of the phone was quiet, furtive. In guarded tones "Joe" recounted the sordid details of his story, as if I were an investigative reporter. "I can send you documents, copies of our listings, memos, you name it", he murmured.

He worked for a small undercapitalized firm which was intolerant of any sort of failure. Joe's boss had a rule: software bugs were unacceptable. At first I mistakenly assumed he meant delivered products were to be defect-free, a noble goal I'd hope we all share.

"Nah, we're not allowed to do any debugging," he almost whispered. "The boss says we've got to write code correctly the first time, and he's not spending any money on fixing our mistakes. So there's no tools and we all work OT to fix problems."

I was reminded of my teenaged technician years, when the QA manager decreed that we were to replace defective only components, under threat of dismissal. From time to time we techs screwed up, of course, removing parts that in fact were just fine. Since Mr. QA had us send all of these extracted components to an outside lab for failure analysis, we learned to manually fry those that we had replace in error. His in-basket consequently received some quite puzzling failure reports from the lab.

Part of me wanted to believe that Joe's boss was a man of deep insight, who had created a dynamic development environment where the product was so well designed all - or most - bugs disappeared before test started. Unhappily the man was as capricious as my old QA tormentor, as he mandated a terribly difficult goal without providing any system or framework in which to achieve it.

Managing Bugs

Developers almost universally complain about managers, for good reasons and bad. The number one complaint I hear is about fickle decision making. Schedules with imposed end-dates, or those arbitrarily revised. Disappearing resources. Marketing overrides very real technical requirements, and even impose plainly impossible technical goals.

In my opinion, management is a lost art. Today's bosses are most often preoccupied with generating reports, preparing for shows, and endlessly meeting with vendors. Though perhaps these are important roles, managers have two critical functions, unhappily usually neglected. First, managers must manage. This means tracking the day by day, week by week progress of an engineering effort, which presumes there's an honest schedule which forms the project's roadmap. When something changes - and things always change - then an effective manager revises the roadmap.

Management means acting as a buffer between the doers and upper level decision makers, insulating developers from distractions from on high, and waging battles to secure needed resources.

Management means looking forward and anticipating changes and problems, clearing minor potholes before they become roadblocks. When disaster strikes - as it will - it means working with the team to find reasonable solutions, and negotiating with the manager's superiors to adapt to the new realities, without imposing impossible requirements on the development team.

But there's another aspect to proactive management. Your boss is immediately tasked with building a product, but much more important his role is to create an effective, constantly improving, way of building products. Keep your shoulder to the grindstone and you will eventually fail.

Truly enlightened managers transcend an incessant focus on "shipping the damn thing now" to look a year or more ahead, and to develop the skills and technologies that will surely be required.

This is an extension to the philosophy promoted in "The E-Myth Revisited" by Michael Gerber (1995, HarperBusiness, NY, ISBN 0887307280), one of the better books around for people building businesses. Gerber suggests that too many small companies (and by extension, engineering departments) fail because the leader spends too much time doing productive work, and not enough time inventing a company (department) that does more work better in less time. We've all discovered a tool that could save us weeks of effort, but being trapped by an impossible deadline we feel we can't spend time to install and learn how to use it.

An old Farside cartoon shows two armies battling with swords and arrows, while the general tells a machine gun vendor he's just too busy to look at the salesman's products. While this metaphorical position might be unavoidable in the panic of getting a product to market, it's inexcusable when there's never a break, never a chance to adjust to new technologies or ideas sure to help us win the next battle.

Yet, few managers look more than a microsecond into the future; virtually all are obsessed with completing the current project. Between development efforts there's perhaps a flurry of vacations, but the focus remains on projects and not on building a more effective engineering department.

Consider Code Inspections, a well-known way of drastically reducing debugging time. Effective Inspections remove 70 to 80% of all bugs before any testing starts. If, as observation all too often indicates, 50% of a project's time gets sucked into the dark hole of debugging, then Code Inspections can shorten development time by 35 to 40%, while at the same time producing better quality products that are more likely to meet the spec.

The cost? After compiling and plotting the results of several dozen studies I've found a "sweet spot", an elbow in the Inspection speed versus bug-rate removal graph, of about 150 to 200 lines per hour. That is, inspect 150 to 200 lines of source code each hour and you'll maximize your efficiency.

Load that out for four developers (a typical inspection team size) and the cost per line of an Inspection is one to two dollars. Now we know that code costs vary widely (the Space Shuttle's firmware reputedly ran over $1000/line), but typical embedded software costs about $15 to $30 per line, when costs from design to shipping are included. If half the development time is lost to debugging, that $1 to $2/line charge invested for Inspections saves almost an order of magnitude of total delivered code cost.

But cost isn't the only issue. What's the impact of bugs discovered by the customer? As a consumer I'm truly frustrated with the buggy embedded products that run the world, from phone systems that behave oddly to cardiac pacemakers that require firmware updates after having been installed in patients. Embedded projects are getting worse as code sizes increase. Twenty years ago a 4k EPROM space was large. Now a million plus lines of code is just not that uncommon, yet bug rates seem unchanged.

A number of studies have shown that testing - debugging - often leaves as much as half the code unchecked! Complex IFs, large SWITCH statements, and a wealth of exception handling tends to get ignored during test.

Let's look at the numbers to see what this means. Post-compile, without Inspections, 5% of the source will typically be in error. In a 10,000 line program that's 500 bugs you'll have to chase. Find them all - except for the 50% you might miss due to the inadequacies of the debugging process - and you're shipping a product with 250 latent bugs. Ouch!

Inspect the code and design documents, and instead of 500 initial errors you'll see less than 150. The 250 hidden bugs will shrink to 75 or fewer.

This isn't rocket science. Don't inspect and ship lousy code late, while spending far too much money creating it. Inspect and reduce the project's development time while increasing quality. Add more techniques - perhaps code coverage tools - and eliminate essentially all of the remaining bugs.

Another way of looking at this is considering the return on investment: for Inspections the ROI is at least a factor of 10, and in many cases quite a bit more. That is, for every hour spent looking at the code and design documents, 10 or more hours are shaved from the project. That's a negative cost!

Yet in the embedded world few do any sort of Inspections. My informal surveys show that pretty close to zero percent of embedded developers religiously perform Inspections. Perhaps 10% hack at them, inspecting parts of a project, though all too often looming deadlines destroy even these half-hearted efforts.

Is there any wonder we do so much debugging?

Managers must take the blame for (first) not scouting out alternatives to tediously slugging away at bugs, and (second) for not implementing solutions (like Inspections), and (third) for not instituting a disciplined approach to these better techniques - you can't abandon that which works simply because of the onset of panic mode.

Any developer or supervisor who treats bugs as an inevitable consequence of creating firmware is na've. Yes, they are inevitable. But we simply must manage bugs, aggressively and proactively, optimize the engineering environment to minimize their occurrence.

Managing Skills

I've observed that all too many developers seem to reach the zenith of their abilities in college. Yes, building real products in industry hones our skills though repetitive practice, but the raw knowledge of how we should design systems and write code seems cast in concrete when the diploma arrives.

How many engineers over 30 ever truly master a new concept like OOP or UML? For every success story there are too many who stayed locked into the concepts with which they have long been comfortable.

We can acquire new skills only by dint of struggle and practice. With today's development effort always overwhelming every other consideration it's no surprise we careen from project to project with little chance to improve. Home life gets ever more complicated with time; children and spouses require attention that detracts from after hours study opportunities.

Yet this field changes daily. Those not changing with the times are doomed to professional lives in ever more restrictive corners of the field. My dad tells a story of a mechanical engineer he knew in the 60s who was the world's expert at designing wheels for lunar roving vehicles. Surely that poor sod became unemployable during the engineering crash of the early 70s.

It's interesting - and scary - to see how well other industries adapt to new ideas, yet how poorly we do the same. Manufacturing adopts surface mount technology, awesome automatic machinery, just-in-time inventories, and extraordinarily effective quality enhancement processes while too many of us develop products in the same old tired ways.

We should take charge of our own careers, and manage those careers aggressively. Figure out what you need to know, and then take action.

But managers, too, must enhance their peoples' skill sets. A manager's most expensive resource is his people. Let them stagnate and the entire team will degrade.

We know how to improve software development. The Capability Maturity Model (CMM) is one such process. Though the CMM is no panacea, it does describe a disciplined way to create software products in a predictable manner: predictable bug rates, predictable schedules, and the like.

Like ISO9000, the CMM requires a major commitment from management as it changes the nature of the entire organization. Effective? Yep. Hard? You bet.

In my opinion software development is so hard that we need every asset possible to help us. Pursuing the CMM is one strategy.

Another is Watts Humphrey's Personal Software Process (PSP). Where the CMM requires a total management buy-in from the top down, the PSP is an approach individuals and small teams can use.

Yet when I talk to firmware developers just about 0% work for a company pursuing the CMM. Another 0% know anything about the PSP.

Managers must manage their people's skills. They've got to cajole, plead, push and order their folks to study these and other ideas to improve. Stagnation is death.

Managing Failure

Crashes are the biggest success story of the aviation industry.

That's not a cynical comment or a sign of flying fear. There is no industry I admire more for their management of failure than the commercial airplane business. Every crash brings out an army of investigators who are tasked with finding the accident's cause. The planes themselves are equipped with the notorious "black boxes", tools designed to record and report the parameters that lead up to the crash. When possible, investigators recommend changes in training, documentation or plane design so the chances of a similar problem will be reduced.

In other words, the system has a built-in learning process which is highly targeted towards making changes

Why don't we work as hard to learn from our mistakes? Failure will always be part of doing anything hard. We'll miss schedules, deliver products whose quality falls below expectations, run over budget, and even suffer massive redesigns due to our own problems and changes imposed from without.

But a wise manager simply has to understand that if we don't manage failure, if we don't study our mistakes and turn them into learning experiences, we'll repeat the same old dysfunctional problems forever.

The airplane industry has a feedback loop that improves the process. If we don't emulate their example via aggressive failure management, we'll also have a loop. Repeat forever: the usual mistakes. End Program.

Conclusion

I've often wondered what happened to Joe. Did his boss resign him to he scrap heap of failed employees?

Here in the course of 2000 words I've listed a number of things Joe's manager should be doing to enable Joe to succeed, to give him a chance at achieving the capricious goals. CMM and/or PSP, failure analysis, and Inspections are all essential aspects of a well-managed firmware engineering environment.

How many of us either manage only to meet just a project's goals, or manage to build an awesome engineering department?