Interrupt Predictability
Will your ISRs be fast enough? How do you know?
Published in Embedded Systems Programming, May 1995
 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
"There are strange things done in the midnight sun by the men who moil for gold .
The arctic trails have their secret tales that would make your blood run cold;
The northern lights have seen queer sights, but the queerest they ever did see
Was that night on the marge of Lake Lebarge, when I cremated Sam Mcgee."
Switch a few words from these lines from Robert Service's ode
to the Yukon and you'd have a description of the weird and mysterious
ways software developers make their real time systems run properly.
For, it seems when interrupts start coming fast and furious, like
a shower of arrows in an injun attack, no one really knows how
to insure that each interrupt gets serviced on-time and in-time.
How do you know that your code handles every interrupt in a timely
manner? Though it's possible to watch for a system crash, some
failures occur slowly and insidiously, like a Windows application
that erratically leaks resources. A few missed interrupts may
only cause your system to lose track of time, at first, or to
miscount an encoder by just a few pulses. These cancerous infections
may lurk for months or years before manifesting themselves as
noticeable bugs.
Components of Disaster
A simple embedded system with a single interrupt clearly will
run correctly as long as the ISR never takes longer to execute
than the frequency of the interrupt. Correctness is easy to prove:
measure the ISR's maximum execution time and compare it to the
interrupt's minimum interval.
Unless you have very smart hardware that can stack up backlogged
interrupts, then the software must service each interrupt
before two get backed up. Two? One interrupt can go pending while
another is being serviced; though the processor will ignore the
single backlogged one as the first is in process, after the ISR
completes the CPU will go ahead and respond to the asserted request.
If yet another interrupt were to occur (say, from that rotating
encoder), then one of the two backlogged requests will be lost.
After all, the interrupt comes in to the CPU on but a single pin;
it can only express "pending" or "not pending"
states; there's just not enough bits to indicate "hey, now
I've got two pending!"
The obvious moral is to make sure interrupts are never disabled
for so long that one can be missed. It's not easy - perhaps sometimes
not even possible - to guarantee that the code will satisfy this
condition. I contend that any reasonably complex system will probably
not have an interrupt structure that is "practically"
provably correct. "Practically" is the operative word
- I have yet to speak to any embedded designer using any
formal method of proving code correctness for any application.
If the academics have a solution, we're not using it!
Crummy hardware design will create significant interrupt service
problems as well. Most processors have level sensitive interrupt
inputs. Any device requesting an interrupt must assert the request
until the processor acknowledges it. You can't just bleep the
input and expect the CPU to catch it.
Design your hardware to assert the input until the CPU responds
with an interrupt acknowledge cycle. Most modern processors will
require this, as you'll have to drop a vector on the bus at the
same time. Others, though, include default vectoring that sorely
tempts a chip-limited designer to just assume the software will
always be in an interrupt-ready state. Your code could be off
doing something, with interrupts disabled, and miss that oh-so-short
input signal.
Reentrancy
Well designed interrupt handlers are largely reentrant. Reentrant functions, AKA "pure code", are often falsely thought
to be any code that does not modify itself. Too many programmers
feel if they simply avoid self-modifying code, then their routines
are guaranteed to be reentrant, and thus interrupt-safe. Nothing
could be further from the truth.
A function is reentrant if, while it is being executed, it can
be re-invoked by itself, or by any other routine. Reentrancy was
originally invented for mainframes, in the days when memory was
a valuable commodity. System operators noticed that a dozen or
hundreds of identical copies of a few big programs would be in
the computer's memory array at any time. At the University of
Maryland, my old hacking grounds, the monster Univac 1108 had
one of the early reentrant FORTRAN compilers. It burned up a (for
those days) breathtaking 32kw of system memory, but being reentrant,
it required only 32k even if 50 users were running it. Each user
executed the same code, from the same set of addresses.
A routine must satisfy the following conditions to be reentrant:
1) It never modifies itself. That is, the instructions of the
program are never changed. Period. Under any circumstances. Far
too many embedded systems still violate this cardinal rule.
2) All variables changed by the routine must be allocated to a
particular "instance" of the function's invocation.
Thus, if reentrant function FOO is called by three different functions,
then FOO's data must be stored in three different areas of RAM.
The C language makes this trivial, assuming you are clever enough
to use automatic variables in your code. Automatics are stored
on the stack; each incarnation of a reentrant routine brings in
its own stack frame, and own set of automatics.
This is not the only reentrancy issue, though. Suppose your main
line routine and the ISRs are all coded in C. The compiler will
certainly invoke runtime functions to support floating point math,
I/O, string manipulations, etc. If the runtime package is only
partially reentrant, than your ISRs may very well corrupt the
execution of the main line code. This problem is common, but is
virtually impossible to troubleshoot since symptoms result only
occasionally and erratically. Can you imagine the difficulty
of isolating a bug which manifests itself only occasionally, and
with totally different characteristics each time?
Be sure your compiler has a pure runtime package.
Now, sometimes we're tempted to cheat and write a nearly-pure
routine. If your ISR merely increments a global 32 bit value,
say, to maintain time, it would seem legal to produce code that
does nothing more than a quick and dirty increment. Beware! Especially
when writing code on an 8 or 16 bit processor, remember that the
C compiler will surely generate several instructions to do the
deed. On a 186, the construct ++j might produce:
mov ax,[j]
add ax,1 ; increment low part of j
mov [j],ax
mov ax,[j+1]
adc ax,0 ; prop carry to high part of j
mov [j+1],ax
An interrupt in the middle of this code will leave j just partially
changed; if the ISR is reincarnated with j in transition, its
value will surely be corrupt.
Even the perfectly coded reentrant ISR leads to problems. If such
a routine runs so slowly that interrupts keep giving birth to
additional copies of it, eventually the stack will fill. Once
the stack bangs into your variables the program is on its way
to oblivion. You must insure that the average interrupt
rate is such that the routine will return more often than it is
invoked.
Measuring Interrupt Response
Though predicting a system's interrupt response is probably impossible,
you can use a few tricks to get typical performance numbers.
Typical numbers are the best we can get. There's no assurance
that measurements taken over a second, year or century will represent
worst case system performance. Perhaps one day users select an
unusual combination of inputs; the temperature is running a bit
hot; interrupts are bunched up by a faster than usual serial stream,
whose data for some reason consists of once-in-a-lifetime numbers
that are tough to compute, burning more CPU time. One interrupt
runs just a shade too long, causing another to back up till a
third gets missed. It's a chaotic situation that we hope never
occurs, but our hopes are based on nothing more than a nervous
prayer. Thankfully the occupants of the aircraft whose autopilot
your system controls don't understand just how poorly we know
what we're doing!
Branch analyzers are the rage in larger systems. These devices,
akin to emulators, monitor your code's execution to insure that
every possible branch in the code takes place. A branch analyzer
insures that the code has been at least totally exercised, though
correctness is more difficult to monitor. Though a branch analyzer
will prove that each ISR has executed at least once, it simply
can't insure that interrupts will never be missed.
A scope can measure interrupt latency and response very effectively
in a single-interrupt system, but when more than one device can
interrupt the processor, the scope is generally unsatisfactory.
Too much is going on, too fast, in too many dimensions, to monitor
on even a fast digital scope. Similarly, logic analyzers do a
poor job of finding crummy interrupt response.
Probably the best hardware tool you can use is a decent performance
analyzer. Be sure to get one that measures more than average response
to an interrupt; it must log the worst case, or maximum time,
in each ISR. Make sure it can monitor all of the ISRs simultaneously.
Run your tests for weeks over every possible condition - and then
cross your fingers and hope things don't degenerate after the
product starts to ship.
Personally, I think the best way to measure interrupt predictability
is to instrument the code to fault when an error occurs. Plan
for failure. If your system can at least alert the user that things
have gone to hell, you'll avert a crash and will have the option
of failing gracefully. In a life-critical application add a little
hardware to indicate "lost interrupt"... but don't tie
the output of this circuit to the CPU's normal interrupt pin!
Use NMI, as this situation is as catastrophic as a power failure.
Beware of reentrant routines. Add a bit of code in the system's
main loop to monitor the stack pointer. If the SP bottoms out,
you've clearly got a problem that could be related to getting
interrupts faster than the system can process them. Any sort of
creeping SP is a deadly problem that is easy to detect.
Common Sense Coding
Poorly coded interrupt service routines are the bane of our industry.
Most ISRs are hastily thrown together, tuned at debug time to
work, and tossed in the "oh my god it works" pile and
forgotten. A few simple rules can alleviate many of the common
problems.
First, don't even consider writing a line of code for your new
embedded system until you lay out an interrupt map. List each
one, and give an English description of what the routine should
do. Include your estimate of the interrupt's frequency.
Now approximate the complexity of each ISR. Given the interrupt
rate, with some idea of how long it'll take to service each, you
can assign priorities (assuming your hardware includes some sort
of interrupt controller). Some developers assign the highest priority
to things that must get done; remember that in any embedded system
every interrupt must be serviced sooner or later. Give
the highest priority to things that must be done in staggeringly
short times to satisfy the hardware or the system's mission (like,
to accept data coming in from a 1 Mb/sec source).
The cardinal rule of interrupt handling is to keep the handlers
short. A long ISR simply reduces the odds you'll be able to handle
all time-critical events in a timely fashion. If the interrupt
starts something truly complex, have the ISR spawn off a task
that can run independently. This is an area where an RTOS is a
real asset, as task management requires nothing more than a call
from the application code.
Reenable interrupts as soon as practical in the ISR. Do the hardware-critical
and non-reentrant things up front, then execute the interrupt
enable instruction. Give other ISRs a fighting chance to do their
thing.
Use reentrant code! Write your ISRs in C if at all possible, and
use C's wonderful local variable scoping. Globals are an abomination
in any programming environment; never more so than in interrupt
handlers. Reentrant C code is orders of magnitude easier to write
than reentrant assembly code.
Don't use NMI for anything other than catastrophic events. Power-fail,
system shutdown, interrupt loss, and the apocalypse are all good
things to monitor with NMI. Timer or UART interrupts are not.
When I see an embedded system with the timer tied to NMI, I know,
for sure, that the developers found themselves missing interrupts.
NMI may alleviate the symptoms, but only masks deeper problems
in the code that most certainly should be cured.
NMI will break a reentrant interrupt handler, since most ISRs
are non-reentrant during the first few lines of code where the
hardware is serviced. NMI will thwart your stack management efforts
as well.
Conclusion
Start your interrupt planning before writing a single line of
code. Work out the details, priorities, and maximum execution
times.. Plan for problems: include code that looks for failures.
In a really busy system try desperately to get time allocated
for lots of testing, though we all know that when the system works
at all, management will usually yell their mantra: "ship
it!"
References:
The Cremation of Sam Mcgee, by Robert Service, from Collected
Poems of Robert Service, 1907, G.P. Putnam Sons, NY
|