A Different Take On Asserts
 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
A Different Take On Asserts
Published in ESP, January 2004.
Expert programmers use the assert()
macro to seed their code with debugging statements. So should we all.
Readers unfamiliar with assert()
should check out Niall Murphy's two articles in the April and May 2001 issues
of Embedded Systems Programming. Niall does a great job of describing the what,
how and why of this debugging construct. Basically, if assert()'s argument is zero it flags an
error message.
The macro must have no effect on the system - it
shouldn't change memory, interrupt state, or I/O, and it should make no
function calls. The user must feel free to sprinkle these at will throughout the
program. Disabling the asserts in released code must never change the way the
program operates.
The macro generally looks something like:
#define
assert(s)
if (s)
{}else (printf("Error at %s %d: ", __FILE__, __LINE__)
It's sort of an odd way to write code - if the
condition is true (in which case assert()
takes no action) there's a null pair of braces. Why not test the negation of
the statement "s"?
The answer lies in my, uh, assertion above: the macro must
have no effect on the system. Code it without the else and poorly written software, perhaps
including an unmatched else
from another if (this
is a debugging construct after all; it's role is to look for dumb mistakes)
will cause the assert to change the system's behavior in unpredictable ways.
Embedded Issues
Few embedded systems make use of the
assert() macro since we're usually in an
environment with neither display console nor printf
statement. Where do the error messages go? Equally problematic, embedding
strings ("Error at" and the file name) eats valuable ROM.
Replace the printf
with a software interrupt. Plenty of processors have a one byte or single word
instruction that vectors the execution stream to an interrupt handler. Write
code to capture the exception and take some sort of action. Log the error to a
data structure, flash an LED, or put the device in a safe mode.
Initialize your debugging tools to automatically set a
breakpoint on the handler. You'll forget the breakpoint is set. But once in a
while, Surprise! The system suddenly stops, the debug window highlighting the
first line of the exception routine. You've captured a bug in its early
phases, when it's easiest to track.
Systems constrained by small ROM sizes can leave out the __LINE__
and __LINE__
arguments. Pop the stack and log the
caller's return address. It's an easy matter to track back to find which
assertion threw the exception.
Most developers embed assert()
inside a preprocessor conditional. Flip a compile-time switch and they all
disappear. This gives us a highly-debuggable version of the code for our own
use, and a production release not seeded with the macros.
But I'm not keen on shipping code that's differs from
our test version. NASA's commandment to "test what you fly and fly what you
test" makes an awful lot of sense for building spacecraft as well as any
embedded system. When ROM and CPU cycles aren't in short supply leave the assert()s
seeded in the code, enabled, not #defined
out. If you believe the system can merrily proceed despite an assertion failure,
code the exception handler to log the error, leaving debugging breadcrumbs
behind, and return.
Is your system connected to the 'net? Why not send an
email message logging the error with a snapshot of the snapshot to crash@yourcompany.com?
Be sure to include the firmware's version number.
This is yet another reason to get and use a version control
system. If your code isn't static, if new versions proliferate like
politician's promises of tax relief, the stack contents will be meaningless
unless you can recreate earlier releases.
Embedded systems should use asserts to check housekeeping
problems, like stack growth. On an x86 processor the instruction mov
ax,sp grabs the current stack pointer. Embed this, or a similar
instruction for your CPU, into an assert that then compares against a
nearly-full stack. Seed every task with the macro.
Check malloc's
return value, too, if you're gutsy enough to use it in firmware. If malloc
fails it always returns an error code. Ignore it at your peril!
Real Time Asserts
I'm frustrated that so many developers of real time
systems think only in the procedural domain, working diligently to build a
sequential series of instructions that implement some algorithm, blithely
ignoring the time aspects of their code. Building real time code means managing
time as seriously as we build if-elses.
Track free CPU microseconds as well as available bytes of ROM.
There's surprisingly little interest in managing time. C
and C++ compilers give us no help. They should produce a listing which gives us
the execution time, or at least a range of times, for each statement in our
program. We compiler users have no idea if a simple ++int takes a microsecond or a week. More
complex statements invoke the compiler's runtime library, truly a black hole
of unknown execution times.
Years ago a friend worked on a Navy job using the CMS-2
language. The compiler was so bug-ridden that they adopted a unique coding
strategy: write a function, compile it, and see if the assembly output made
sense. If clearly wrong, change something in the CMS-2 source - even spacing
- and hope a recompile produced better results. In effect, they made random
changes to their source in hopes of taming the compiler.
We do the same thing when the compiler does not produce
timing information. How do we fix a slow snippet? Change something - almost
anything - and hope. That's more akin to hacking than to software engineering.
Seems to me the vendors do their customers a horrible disservice by not
documenting best, worst and average execution times.
A very cool product, despite its high price, was Applied
Microsystem's CodeTest. Among other things, it measured execution time of each
function in a program, seeding the code with special instructions whose action
was captured by a hardware probe. Yet Applied failed, never able to generate
much interest in that product. MetroWerks (www.metrowerks.com)
sells the device now; I wish them success, and hope more of the embedded
community seeks tools for managing the time part of their applications.
In lieu of buying time tools, why not build real-time assert()s,
macros that helps us debug in the time domain?
The oldest timing trick is to drive an output bit high when
entering a routine, and then low as we exit. Probe the port with an oscilloscope
and measure the time the bit is set.
Rather than dropping a pair of outport() C statements directly into the code, I
prefer to define a assert-like macros. Never sprinkle low level I/O into
anything other than drivers, except in the most ROM-constrained systems.
Call them RT_set_assert()
and RT_unset_assert(),
the RT meaning real-time to clearly indicate that these are not your father's
asserts anymore.
This brain-dead trick has been around forever. But you can
learn a lot about the behavior of a system by extending the idea just a bit.
Want to measure min and max execution time of a function or ISR? Set a digital
scope to store multiple acquisitions on the screen. Trigger on the I/O bit's
positive edge. You'll see the bit go high, a clear space on the screen, and
then a lot of hash before it goes low. The clear space is the minimum time. The
interval from the positive edge to the end of the hash is the maximum execution
time.
I often want to know the total amount of time spent
servicing interrupts. Heavy interrupt loads are a sure sign of a system which
will be a bear to debug and maintain. Drop the RT_assert pair into every ISR.
That is, every ISR toggles a common bit. If the scope trace stays mostly low,
there's little loading. Mostly high, and kiss your family goodbye for the
duration of the project.
We developers are being shortchanged by our tools; the
continued disappearance of the in-circuit emulator gives us ever less insight
into our projects' time domains. But we're lucky in that most high
integration CPUs include plenty of timers. Even simple systems today use RTOSes
sequenced by interrupts from a timer. If you're not using an RTOS, do program
one of these peripherals to track time for the sake of performance measurements.
Create a simple RT_assert pair to compute elapsed time over
a function or region of code. RT_enter_assert()
saves the clock; RT_exit_assert()
figures time, in tics or microseconds. In a preemptive system it's best to
pass an argument with a unique identifier for each code snippet being measured.
This is then an index into a data structure, so that multiple instances of the
assert don't clobber each other's data.
In a real system logging every instantiation of the code
chunk would generate a flood of data. Instead, have
"Courier New"">RT_exit_assert() compute the max execution time.
Or, if needed, figure min and max.
Examine the structure with a debugger to get a quick code
profile.
How much accuracy do you need? If the timer is used for
context switching, no doubt it only generates an interrupt every handful of
milliseconds. Yet the hardware actually counts CPU clocks, or CPU clocks divided
by some very small number. The RT_assert macros can read OS-maintained time
(which counts interrupts) and the timer's hardware. Resolution to a few
microseconds is possible.
To "tare" the readings, compensating for the overhead
contributed by the RT_asserts, issue the two macros back to back. Subtract the
tare value to increase precision.
Multitasking can queer the results. Preempting the code in
the middle of a profiling operation then includes the execution time of the
preempter. Just as we want to include the overhead contributed by the
compiler's runtime routines, most of the time we want to know elapsed time,
which includes preemption. Alternatively, disable interrupts during the
measurement, though of course that may greatly effect system operation.
Some developers advocate using macros to send unique codes,
a start/stop pair per function, to a parallel port. A logic analyzer then
acquires the data. This is more trouble than it's worth. The analyzer quickly
accumulates thousands or millions of data points; sometimes much more than it
can store. It's impossible to make sense of the information without writing a
computer program to empty the tool's buffer and convert raw codes and time
into meaningful info.
Worse, the analyzer collects data only during the debugging
phase, and only then when you're specifically taking performance measurements.
I believe in proactive debugging - seed the code and the execution environment
with tools and tricks to log data and flag errors all the time.
And that's where the idea of "real time asserts"
shines. Expand RT_exit_assert()
just a bit. Pass a parameter that indicates the maximum allowed time. The macro
computes the actual period! and errors if it exceeds what we've expected.
Maybe you leave this in only during debugging to grab your attention when the
system snarls. Perhaps it's deployed with the production release to take some
action when insane bottlenecks occur.
After all, missing a deadline is as much of a bug in many
real time systems as is a complete system crash. If we use asserts to capture
errors in the procedural domain, we should do no less in the time domain.
|