Jack's Rules of Thumb
 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
Jack's Rules of Thumb
Performance anxiety strikes in many ways.
I remember starting engineering school and being
overwhelmed by the curriculum, the physics, and the vast amount of math I'd
somehow have to master. It seemed impossible as a nascent freshman. Engineering
courses looked even worse. I flipped through a third year transistor theory book
and another on electromagnetics. Fear crept up my spine. How would I ever master
this stuff? And even if I tricked the profs into passing grades, I couldn't
imagine understanding it all enough to be a practicing engineer!
I went to my Dad, a mechanical engineer who designed
spacecraft, for insight. After he got over the initial shock that any of his
five kids would come to him for advice, he told me that engineering is a
practical art. Sure, we'd use some math on the job. But academia's detailed
analysis of fundamental engineering concepts was just to give us insight; it
wasn't the way real engineers built things.
For example, civil engineers rarely analyze loads in small
structures. Instead they use handbooks, vast matrices of tables that show, for
instance, what standard beam to select to support a floor of particular size. No
doubt the engineer could painfully derive such data! but in practice they do
not, instead relying on the handbook.
That's when I learned the importance of "rules of
thumb". These are the basis for design of most real-world products. They're
never cast in stone, and are always subject to exceptions and revision, both by
more detailed analysis and better experience. But the rules form an reasonable
first order approximation to the truth. We know pi is about 3, for instance.
That's a pretty good estimate for some needs. Not for all, but it's a mental
guide to the magnitude of the truth.
When we learned to use slide rules we mastered another way
of approximating the truth. Since these now-ancient calculating devices were so
crude all engineers learned to first run a rough calculation in their heads. If
the slide rule says 314, our mental computation scaled it to 3.14, 3140, or
whatever result made the most sense for the problem at hand. To this day I have
an (to others at least) infuriating habit of checking numbers for sense. "The
paper said she swam the English channel in 3 hours. It's 26 miles across, so
she averaged about 7 knots. Sounds awfully fast to me."
"You said the budget is $2 trillion? Divide by 280
million Americans and that suggests a total tax load of almost a $10k for every
man, woman and child. Can't be so." (OK, OK, sometimes these estimations
turn out to be sadly wrong.)
Since then I've developed many rules of thumb for
understanding embedded systems. Some came from my own painful experience, others
from watching developers, and still more from the experience of others which I
shamelessly steal. These rules guide me in making sense of projects, in checking
to be sure we're doing the right sort of things, and in looking for problem
areas needing optimization.
So here's a few, with explanations. Please send me yours;
I'll share some of those with the Embedded Systems Programming community and
steal the rest.
DIs Terrify
Code liberally sprinkled with Disable Interrupt
instructions sets off my warning klaxons. There's nothing wrong with DI, but
an excess suggests poor design.
DIs slip into code in two fashions. Forward-thinking
developers recognize that certain actions in a program are inherently
non-reentrant. Accessing a shared variable (a global) is fraught with danger
since an interrupt may create a context switch to another task that also
requires the same variable. So we often issue a quick DI/EI pair around the code
that uses the global to inhibit such a switch. Reentrancy problems disappear
when interrupts are off.
All shared resources are subject to reentrancy problems. A
complex peripheral might have dozens or even hundreds of registers; a context
switch while setting these up can cause total brain-freeze of the device. Again,
DI/EIs can preserve the integrity of the part.
But these DI/EI pairs slip into code in great numbers when
there's a systemic design problem that yields lots of critical regions
susceptible to reentrancy problems. You know how it is: chasing a bug the
intrepid developer uncovers a variable trashed by context switching. Pop in
quick DI/EI pair. Then there's another. And another. It's like a heroin user
taking his last hit. It never ends.
Disabling interrupts tends to be A Bad Thing in general,
because even in the best of cases it'll increase system latency and probably
decrease performance. Increased latency leads to missed interrupts and
mismanaged devices.
It's best to avoid shared resources whenever possible.
Eliminate globals. Create drivers for hardware devices. Encapsulate to excess.
Use semaphores and let a well-designed RTOS manage the interrupt headaches. An
occasional DI/EI isn't too bad, but lots means we've let chaos creep into
the code.
Jack's Rule of Thumb is: worry
when the code is peppered with DI/EI pairs.
(For more discussion of reentrancy see the April 2001
Beginner's Corner, or my June 2001 column).
ISRs and EIs
The Enable Interrupt instruction, too, brings perils and
opportunities. An EI located outside an interrupt service routine (ISR) often
suggests peril - with the exception of the initial EI in the startup code.
Most interrupt-driven systems leave interrupts on more or
less all of the time. EIs indicate someone, somewhere, turned them off! which
suggests something very complex and difficult to manage is going on. When the
enable is not part of a DI/EI pair (and these two instructions must be very
close to each other to keep latency down and maintainability up) then the code
is likely a convoluted, cryptic well; plumbing these depths will age the most
eager of developers.
Leave interrupts on, for all but the briefest times and in
the most compelling of needs. Don't create difficult blocks of code where
they're off and reenabled in some other place.
My Rule of Thumb: be
wary of solo EIs.
Follow the ISR design rules in most textbooks and you'll
violate another one of my rules of thumb. The classic service routine pushes
registers like mad, services the interrupting hardware, does something useful,
pops ad nauseam, issues an EI to enable interrupts, and returns. Sometimes that
makes a lot of sense. More often not.
One of our ISR goals should be to minimize latency (for
more info check out my September 2001 column) to insure the system does not miss
interrupts. It's perfectly fine to allow another device to interrupt an ISR!
or even to allow the same interrupt to do so, given enough stack space. That
suggests we should create service routines that do all of the non-reentrant
stuff (like servicing hardware) early, issue the EI, and continue with the
reentrant activities. Then pop registers and return.
The Rule: Check the
design of any ISR that reenables interrupts immediately before returning.
What's the practical limit to an ISR's size? You'd be
amazed at how many products are nothing but one giant ISR. The main loop idles
till an interrupt fires off a ten thousand line service routine. This can work,
but leads to nightmarish debugging struggles. Few tools work well in interrupt
routines. Single stepping becomes problematic.
Keep ISRs small. If they need to do something complicated
spawn off a task that runs with interrupts enabled. If you're clever enough to
produce very short interrupt handlers you can generally debug them by inspection
- which is a lot easier than using an ICE or BDM.
So: be wary of ISRs
longer than half a page of code.
9 To 5
Sirens flare in my head whenever I hear a developer say
"I can't get anything done around here during normal working hours." A
long story about how he comes in early or stays late - or both - inevitably
follows.
If you can't get your job done inside normal working
hours, you're being interrupted too often. Change your environment, not your
working hours. Crazy time-shifting destroys important non-work relationships and
crashes your personal life. Will your tombstone read "brought the XYZ project
in on time", or "gave of himself always, loved by everyone"?
A central tenant of eXtreme Programming is we never work
two 40 hour workweeks in a row. There's a lot to love and hate about XP, but
this rule expresses obvious truisms about people: we need outside lives. We get
tired and run down. Rested people are productive people. But to keep to a 40
hour workweek we have to get interruptions under control.
It takes 15 minutes, on average, for your brain to move
from active perception of the busy-ness around you to being totally and
productively engaged in the cyberworld of coding. Yet a mere 11 minutes passes
between interruptions for the average developer. Ever wonder why firmware costs
so much? Email, the phone, people looking for coffee filters and your boss all
clamor for attention. If you do not manage these interruptions you cannot be
productive.
DeMarco and Lister claim a 300% difference in productivity
between software teams interrupted often and those who aren't. 300%! Clearly
we have to manage our interruptions; the alternative is missed schedules.
Most companies sentence developers to cubicles rather than
private offices. Dilbert aptly names these antiproductivity pods. Cubes are
concentration vampires. Who can think when you can't block out the sound of
your neighbor's call to his divorce lawyer?
Figure out when your brain is most effective; for me it's
first thing in the morning. Take control of these hours. Turn off the email, cut
the phone cord, blanket the PA system with headphones, and pull a curtain across
the opening that masquerades as a
door. Schedule meetings for some other time. Guard these precious hours and use
them to focus on your project. It's astonishing how much work you'll
accomplish.
My Rule of Thumb: Developers
who live in cubicles probably aren't very productive. Check how they manage
interruptions.
Fear of Editing
We all have a visceral feel for another rule of thumb: a
little bit of the code causes most of the problems. We've all had that nasty
bit of code that breaks every time someone changes nothing more than a comment.
Fear of Editing is a symptom of this problem. We try to beat the beast into
submission but it's a never-tamed hydra.
5% of the functions consume 80% of debugging time. I've
observed that most projects wallow in the debug cycle, which often accounts for
half of the entire schedule. Clearly, if we can do something about those few
functions that represent most of our troubles, the project will get out the door
that much sooner.
Barry Boehm observed that these few functions that create
so much trouble cost four times as much as any other function. That suggests
it's much cheaper to toss the junk and recode than to reactively remove the
never-ending stream of bugs. Perhaps we really blew it when first writing the
code, but if we can identify these crummy routines, toss them out, and start
over, we'll save big bucks.
My Rule of Thumb is: when
the developers are afraid to change a function, it's time to rewrite that code
from scratch.
Estimations
Isn't it amazing how badly we estimate schedules for most
projects? 80% of embedded systems are delivered late. Most pundits figure the
average project consumes twice the development effort originally budgeted.
Scheduling disasters are inevitable when developers don't
separate calendar time from engineering hours. When I see people nervously
sliding triangles around in Microsoft Project I know their project is doomed.
Hours and dates are largely unrelated parameters.
Some data suggests the average developer is only about 55%
engaged on new product work. Other routine activities, from handling paperwork
to talking about Survivor XVI, burn almost half the work week. This is an
interesting number, since it correlates so well to the observation that so many
projects need twice the estimated time.
The Rule of Thumb is: Estimating
dates instead of hours guarantees a late project. If the schedule hallucinates a
people-utilization factor of much over 50% the project will be behind
proportionately.
But, then, in some companies such delusions are the only
source of peace between management and workers. "This time we'll do better,
work harder, and waste less time."
That's like believing in the lottery. The odds of success
are about the same.
|