Jack's Rules of Thumb

By Jack Ganssle

Jack's Rules of Thumb

Performance anxiety strikes in many ways.

I remember starting engineering school and being overwhelmed by the curriculum, the physics, and the vast amount of math I'd somehow have to master. It seemed impossible as a nascent freshman. Engineering courses looked even worse. I flipped through a third year transistor theory book and another on electromagnetics. Fear crept up my spine. How would I ever master this stuff? And even if I tricked the profs into passing grades, I couldn't imagine understanding it all enough to be a practicing engineer!

I went to my Dad, a mechanical engineer who designed spacecraft, for insight. After he got over the initial shock that any of his five kids would come to him for advice, he told me that engineering is a practical art. Sure, we'd use some math on the job. But academia's detailed analysis of fundamental engineering concepts was just to give us insight; it wasn't the way real engineers built things.

For example, civil engineers rarely analyze loads in small structures. Instead they use handbooks, vast matrices of tables that show, for instance, what standard beam to select to support a floor of particular size. No doubt the engineer could painfully derive such data! but in practice they do not, instead relying on the handbook.

That's when I learned the importance of "rules of thumb". These are the basis for design of most real-world products. They're never cast in stone, and are always subject to exceptions and revision, both by more detailed analysis and better experience. But the rules form an reasonable first order approximation to the truth. We know pi is about 3, for instance. That's a pretty good estimate for some needs. Not for all, but it's a mental guide to the magnitude of the truth.

When we learned to use slide rules we mastered another way of approximating the truth. Since these now-ancient calculating devices were so crude all engineers learned to first run a rough calculation in their heads. If the slide rule says 314, our mental computation scaled it to 3.14, 3140, or whatever result made the most sense for the problem at hand. To this day I have an (to others at least) infuriating habit of checking numbers for sense. "The paper said she swam the English channel in 3 hours. It's 26 miles across, so she averaged about 7 knots. Sounds awfully fast to me."

"You said the budget is $2 trillion? Divide by 280 million Americans and that suggests a total tax load of almost a $10k for every man, woman and child. Can't be so." (OK, OK, sometimes these estimations turn out to be sadly wrong.)

Since then I've developed many rules of thumb for understanding embedded systems. Some came from my own painful experience, others from watching developers, and still more from the experience of others which I shamelessly steal. These rules guide me in making sense of projects, in checking to be sure we're doing the right sort of things, and in looking for problem areas needing optimization.

So here's a few, with explanations. Please send me yours; I'll share some of those with the Embedded Systems Programming community and steal the rest.

DIs Terrify

Code liberally sprinkled with Disable Interrupt instructions sets off my warning klaxons. There's nothing wrong with DI, but an excess suggests poor design.

DIs slip into code in two fashions. Forward-thinking developers recognize that certain actions in a program are inherently non-reentrant. Accessing a shared variable (a global) is fraught with danger since an interrupt may create a context switch to another task that also requires the same variable. So we often issue a quick DI/EI pair around the code that uses the global to inhibit such a switch. Reentrancy problems disappear when interrupts are off.

All shared resources are subject to reentrancy problems. A complex peripheral might have dozens or even hundreds of registers; a context switch while setting these up can cause total brain-freeze of the device. Again, DI/EIs can preserve the integrity of the part.

But these DI/EI pairs slip into code in great numbers when there's a systemic design problem that yields lots of critical regions susceptible to reentrancy problems. You know how it is: chasing a bug the intrepid developer uncovers a variable trashed by context switching. Pop in quick DI/EI pair. Then there's another. And another. It's like a heroin user taking his last hit. It never ends.

Disabling interrupts tends to be A Bad Thing in general, because even in the best of cases it'll increase system latency and probably decrease performance. Increased latency leads to missed interrupts and mismanaged devices.

It's best to avoid shared resources whenever possible. Eliminate globals. Create drivers for hardware devices. Encapsulate to excess. Use semaphores and let a well-designed RTOS manage the interrupt headaches. An occasional DI/EI isn't too bad, but lots means we've let chaos creep into the code.

Jack's Rule of Thumb is: worry when the code is peppered with DI/EI pairs.

(For more discussion of reentrancy see the April 2001 Beginner's Corner, or my June 2001 column).

ISRs and EIs

The Enable Interrupt instruction, too, brings perils and opportunities. An EI located outside an interrupt service routine (ISR) often suggests peril - with the exception of the initial EI in the startup code.

Most interrupt-driven systems leave interrupts on more or less all of the time. EIs indicate someone, somewhere, turned them off! which suggests something very complex and difficult to manage is going on. When the enable is not part of a DI/EI pair (and these two instructions must be very close to each other to keep latency down and maintainability up) then the code is likely a convoluted, cryptic well; plumbing these depths will age the most

Leave interrupts on, for all but the briefest times and in the most compelling of needs. Don't create difficult blocks of code where they're off and reenabled in some other place.

My Rule of Thumb: be wary of solo EIs.

Follow the ISR design rules in most textbooks and you'll violate another one of my rules of thumb. The classic service routine pushes registers like mad, services the interrupting hardware, does something useful, pops ad nauseam, issues an EI to enable interrupts, and returns. Sometimes that makes a lot of sense. More often not.

One of our ISR goals should be to minimize latency (for more info check out my September 2001 column) to ensure the system does not miss interrupts. It's perfectly fine to allow another device to interrupt an ISR! or even to allow the same interrupt to do so, given enough stack space. That suggests we should create service routines that do all of the non-reentrant stuff (like servicing hardware) early, issue the EI, and continue with the reentrant activities. Then pop registers and return.

The Rule: Check the design of any ISR that reenables interrupts immediately before returning.

What's the practical limit to an ISR's size? You'd be amazed at how many products are nothing but one giant ISR. The main loop idles till an interrupt fires off a ten thousand line service routine. This can work, but leads to nightmarish debugging struggles. Few tools work well in interrupt routines. Single stepping becomes problematic.

Keep ISRs small. If they need to do something complicated spawn off a task that runs with interrupts enabled. If you're clever enough to produce very short interrupt handlers you can generally debug them by inspection - which is a lot easier than using an ICE or BDM.

So: be wary of ISRs longer than half a page of code.

9 To 5

Sirens flare in my head whenever I hear a developer say "I can't get anything done around here during normal working hours." A long story about how he comes in early or stays late - or both - inevitably follows.

If you can't get your job done inside normal working hours, you're being interrupted too often. Change your environment, not your working hours. Crazy time-shifting destroys important non-work relationships and crashes your personal life. Will your tombstone read "brought the XYZ project in on time", or "gave of himself always, loved by everyone"?

A central tenant of eXtreme Programming is we never work two 40 hour workweeks in a row. There's a lot to love and hate about XP, but this rule expresses obvious truisms about people: we need outside lives. We get tired and run down. Rested people are productive people. But to keep to a 40 hour workweek we have to get interruptions under control.

It takes 15 minutes, on average, for your brain to move from active perception of the busy-ness around you to being totally and productively engaged in the cyberworld of coding. Yet a mere 11 minutes passes between interruptions for the average developer. Ever wonder why firmware costs so much? Email, the phone, people looking for coffee filters and your boss all clamor for attention. If you do not manage these interruptions you cannot be productive.

DeMarco and Lister claim a 300% difference in productivity between software teams interrupted often and those who aren't. 300%! Clearly we have to manage our interruptions; the alternative is missed schedules.

Most companies sentence developers to cubicles rather than private offices. Dilbert aptly names these antiproductivity pods. Cubes are concentration vampires. Who can think when you can't block out the sound of your neighbor's call to his divorce lawyer?

Figure out when your brain is most effective; for me it's first thing in the morning. Take control of these hours. Turn off the email, cut the phone cord, blanket the PA system with headphones, and pull a curtain across the opening that masquerades as a door. Schedule meetings for some other time. Guard these precious hours and use them to focus on your project. It's astonishing how much work you'll accomplish.

My Rule of Thumb: Developers who live in cubicles probably aren't very productive. Check how they manage interruptions.

Fear of Editing

We all have a visceral feel for another rule of thumb: a little bit of the code causes most of the problems. We've all had that nasty bit of code that breaks every time someone changes nothing more than a comment. Fear of Editing is a symptom of this problem. We try to beat the beast into submission but it's a never-tamed hydra.

5% of the functions consume 80% of debugging time. I've observed that most projects wallow in the debug cycle, which often accounts for half of the entire schedule. Clearly, if we can do something about those few functions that represent most of our troubles, the project will get out the door that much sooner.

Barry Boehm observed that these few functions that create so much trouble cost four times as much as any other function. That suggests it's much cheaper to toss the junk and recode than to reactively remove the never-ending stream of bugs. Perhaps we really blew it when first writing the code, but if we can identify these crummy routines, toss them out, and start over, we'll save big bucks.

My Rule of Thumb is: when the developers are afraid to change a function, it's time to rewrite that code from scratch.

Estimations

Isn't it amazing how badly we estimate schedules for most projects? 80% of embedded systems are delivered late. Most pundits figure the average project consumes twice the development effort originally budgeted.

Scheduling disasters are inevitable when developers don't separate calendar time from engineering hours. When I see people nervously sliding triangles around in Microsoft Project I know their project is doomed. Hours and dates are largely unrelated parameters.

Some data suggests the average developer is only about 55% engaged on new product work. Other routine activities, from handling paperwork to talking about Survivor XVI, burn almost half the work week. This is an interesting number, since it correlates so well to the observation that so many projects need twice the estimated time.

The Rule of Thumb is: Estimating dates instead of hours guarantees a late project. If the schedule hallucinates a people-utilization factor of much over 50% the project will be behind proportionately.

But, then, in some companies such delusions are the only source of peace between management and workers. "This time we'll do better, work harder, and waste less time."

That's like believing in the lottery. The odds of success are about the same.