The Perils of NMI
NMI is a critical resource, yet all too often it's misused.
Published in Embedded Systems Programming, April 1991
||For novel ideas about building embedded systems (both hardware and firmware), join the 25,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype, no vendor PR. It takes just a few seconds (just enter your email, which is shared with absolutely no one) to subscribe.
By Jack Ganssle
Wise amateurs fear interrupts. Fools go where wise men fear to
tread. Normal sequential code is hard enough to understand, code,
and debug. Toss in a handful of asyncronous events that randomly
change the processor's execution path, perhaps thousands of times
per second, and you have a recipe for disaster.
Yet interrupts are an important fact of life for all real time
systems. No experienced programmer would dream of replacing a
clean interrupt service routine with polled I/O, particularly
where fast I/O response is required.
In fact, interrupts are the both the best and worse microprocessor
feature. Well thought out interrupt-driven code will be reasonably
easy to write, debug and maintain. A poorly conceived interrupt
routine is probably the worst possible software to work on. Because
interrupts are so important to embedded systems, it is vital to
become proficient with their use.
If interrupts are tough to work with, then the non-maskable interrupt
(NMI) is the true killer of the business. Be careful before you
connect a peripheral to your processor's NMI input - think through
the problems carefully.
Almost every processor has some sort of NMI signal, though it
may be called something else. On the 68000, a level 7 interrupt
cannot be masked, and is equivalent to NMI. Some 8051-family CPUs
have no non-maskable interrupt, an idea that is sort of appealing
in terms of enforcing interrupt discipline.
I'm a firm believer in restricting NMI to those conditions that
are truly unusual and of momentous importance. Quite a few designers
use NMI as a general purpose interrupt, a practice that usually
When timing gets tight, the code can easily disable a conventional
interrupt. Indeed, the very assertion of an interrupt signal automatically
turns all interrupts off until the software explicitly reenables
them, giving the code a clean window to process a high priority
task. Not so with NMI. An NMI at any time will interrupt the CPU
- no ifs, ands or buts. As long as the hardware supplies NMIs
to the processor, it will stop whatever it's doing and vector
through the NMI handler.
The very fact that NMI can never be disabled makes it ideal for
handling a small but vital class of extremely high priority events.
Chief among these is a power failure. If a system must die gracefully,
then hardware that detects the imminent loss of power can assert
NMI to let the software park disk heads, put moving sensors into
a "safe" state, copy important variables from RAM to
non-volitile storage, and generally prepare for being down.
Modern power supplies have little reserve capacity. Old linear
designs had massive filtering capacitors that acted like batteries
with several seconds of reserve capacity. Today's off-line switchers
use comparatively tiny capacitors; smart electronics does the
filtering. When the AC power goes down, the switcher's output
quickly follows suit.
During the short time it takes for power to trail away the code
may very well be executing with interrupts disabled. Only NMI
is guaranteed to be available at all times. Power fail is such
an important event, that NMI is really the only option for notifying
the software of power's impending demise.
Perhaps more should be said about power fail circuits at this
point, since so many suffer from serious design flaws. Most embedded
systems ignore power fail conditions. Running ROM based code with
no dangerous or critical external hardware, they can restart without
harm from the top when power resumes. However, two types of systems
require power-fail management hardware and software. The first
category are those systems controlling moving objects; a disk
controller should park the head, a robot should stop all motors,
and an X-ray system should shut down the beam.
The other class are systems that preserve transient data through
a power-up cycle. A data acquisition system might need to keep
logged data even when power goes down, an instrument sometimes
has to save painfully collected calibration constants, and a video
game should remember high scoring individuals' initials and totals.
Far too many designs rely on nothing more than battery backed
up static RAMs or some true non-volitile device like an EEPROM
to store data through multiple on/off cycles. More often than
not these schemes work, but all will sooner or later fail. Let's
consider what happens when the AC power fails.
Without AC, the power supply stops working. The computer continues
to run from the energy stored in the supply's output capacitor.
The amount of time left before the computer goes haywire is proportional
to the size of the capacitor in microfarads and inversely proportional
to the amount of current consumed by the electronics.
Until the computer's 5 volts decays to about 4.75 it continues
to run properly. At the 4.75 volt level most of the system's chips
are no longer operating in their design region. No one can predict
what will happen with any certainty.
At about 4.8 to 4.9 volts the well-designed power fail circuit
will inject an NMI into the computer (some detect missing AC cycles,
a better but more expensive approach). Probably the system has
only milleseconds before Vcc decays to the 4.75 volt region of
instability. The NMI routine should quickly shut down external
events and save critical variables.
After processing the power fail condition, the computer and external
I/O is all in a safe state. The voltage level continues to decline
past 4.75 volts, eventually reaching zero. Unfortunately, the
supply's capacitor decays exponentially. It will provide something
between zero and 4.75 volts for a comparatively long time (perhaps
What does the CPU chip, memories, and glue logic do with, say,
4 volts applied? No one knows. No vendor will guarantee any behavior
under the 4.75 volt level. Frequently the program just runs wild,
executing practically random instructions. Your carefully saved
data or meticulously protected I/O could be destroyed by rogue
No power fail circuit is complete unless it clamps the reset line
whenever power is less than the magic 4.75 volt level. A suitable
circuit keeps the CPU in a reset state, preventing wild execution
from corrupting the efforts of the NMI power save routine. Motorola
sells a 3-terminal reset controller for less than a dollar which
will hold reset down in low Vcc conditions.
Consider another case: suppose the power grid's sadly overload
summertime generating capacity experiences a brownout. If the
line drops from 110 VAC to, say, 80 volts, what happens to the
+5 volt output from your system's power supply? Most likely it
will go out of regulation, giving perhaps 3 or 4 volts until the
110 input level is reestablished. Hopefully the power fail circuit
will assert an NMI to the processor chip. Using the conventional
resistor/capacitor unclamped reset circuit, the reset input will
decline only to the 3-4 volt level, not nearly low enough to force
a reset when power comes back.
The reset clamping circuit will not only keep the CPU in a safe
state; in this brownout case it will also insure that the system
restarts properly when +5 volts is reestablished.
Regardless, NMI is the only reasonable interrupt choice for power
Unfortunately, NMI is widely abused as a general purpose interrupt.
Use NMI only for events that occur infrequently. Never substitute
it for poor design.
It's not too unusual to see a divider circuit driving NMI, generating
hundreds or thousands of interrupts per seconds. Usually these
designs start life using a reasonable maskable interrupt. As the
programmers debug the system they find the CPU occasionally misses
an interrupt, so they switch over to NMI. This is a mistake. If
the code misses interrupts, there is a fundamental flaw in its
design that NMI will not cure.
Your code will miss interrupts only if some bottleneck keeps them
disabled for too long. Always design the code to keep interrupts
disabled only while servicing the hardware. Reenable them as soon
as possible. With good reentrant design, interrupts should never
be off for more than a few tens of microseconds.
On the Edge
Quite a few processors implement NMI as an edge sensitive interrupt.
This guarantees that even a breathtakingly short pulse will set
the CPU's internal NMI flip flop, so the interrupt simply cannot
be missed. It might, however, cause several kinds of nasty problems.
Suppose the input comes from the real world, perhaps after having
been transmitted a few feet. Without proper pulse shaping circuitry,
the signal could easily have ragged edges or even multiple, closely
spaced transitions. Maskable interrupts live quite happily with
short bouncing on their lines, since the first transition will
make the processor disable the input and start the ISR. Even the
fastest code will take a few microseconds to service and reenable
the interrupt, by which time the transients will be long gone.
NMI cannot be disabled; every bit of bounce will reinitiate the
NMI service routine. The result: one real interrupt might masquerade
as several independent NMIs, each one pushing onto the stack and
recalling the ISR.
Edge sensitive inputs respond when the input voltage crosses some
threshold. Imperfect digital circuits give a rather broad window
to the threshold. If the NMI input signal is perfectly clean but
moves slowly from the idle to the asserted state, it stays within
the threshold region for far too long, sometimes causing multiple
Finally, the edge sensitive nature of the NMI signal renders it
susceptible to every stray bit of electrical noise. A clean NMI
driven by a gate on the other side of a circuit board might pick
up unexpected noise from other parts of the circuit.
Edge sensitive NMI inputs must be clean, noise free, and should
switch quickly and cleanly.
Remember that debugging NMI service routines is sometimes tough.
How will you single step in an NMI service routine if, while debugging,
dozens more NMIs keep coming? Most of us debug code by stopping
at a breakpoint and looking at the registers and variables. If,
when debugging the NMI handler, another comes along while we're
stopped, after resuming execution the service routine will re-invoke
itself, probably corrupting a non-reentrant value.
In summary, NMI is a valuable feature. Don't abuse it; restrict
its use to those few situations where only an NMI will solve a