For novel ideas about building embedded systems (both hardware and firmware), join the 30,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. Click here to subscribe.
By Jack Ganssle
Published in Embedded Systems Programming, May 2001
Within months of the introduction of the 8008 an explosion of microprocessor-controlled products appeared. Perhaps the somewhat earlier four bit 4004 had started engineers thinking about applications. Clearly, eight bits was the right technology at the right time, as evidenced by companies frantically trying to hire non-existent embedded developers and the emergence of whole new classes of smart products
Almost immediately engineers discovered the difficulty of building firmware. Embedded systems were universally small applications without a terminal or other debugging interface. The old model of computing had failed. Embedded systems were not like general purpose computers that concurrently hosted both debugger and various applications; they carried one set of code burned permanently into EPROM, and did nothing else. They were as dedicated to the single application as the hardwired logic they had replaced.
How was one to debug hardware and software? The essential challenge of embedded work is the lack of visibility into the product's functioning. It's all buried under the hood. Perhaps our sole interface with the system is a pair of blinking LEDs, yet 100,000 transistors and 10,000 lines of code all interact in complex ways. We're not smart enough to get it right the first time, so need a way to poke into the internals.
The immediate response were remote debuggers that used a serial port to offer crude debugging capability. Set a breakpoint. Examine and change registers and memory. But transistors were still expensive then and UARTs were large 40 pin chips which were not yet integrated on-board the CPU. A spare port just for debugging was simply too expensive for many of the systems then being built.
In desperation people did crazy things. I remember coupling the bus of our embedded system to that of an "Intellec 8", a sort of general purpose computer based on the 8008. The terminal - a teletype - drove the Intellec; the embedded system looked like an extension of the computer's bus. This gave us remote debugger capability, and we could exercise the system hardware to track design faults, as long as a fault didn't step on the bus, crashing both systems.
About 1975 Intel invented the In-Circuit Emulator (ICE), a brilliant idea which saved a generation of developers. The tool replaced the application's CPU and (usually) gave engineers non-intrusive access to all of the target's resources. Even the very first ICE included real time trace, which captured program execution at full speed. Complex breakpoints supported if-then traps. A wealth of other resources greatly eased developers' trials.
At $20,000 (this was a lot of money in 1975, before Carter's inflation) emulators were princely acquisitions. But even 25 years ago embedded systems were being delivered late. Companies were willing to purchase expensive tools to speed time-to-market.
Those were kinder, gentler days. The hardware and software teams never battled over tools, schedule or other issues. A single hardware engineer usually designed the system and wrote all of the code in horribly scrambled spaghetti assembly language. Hence the need for a fancy debugger.
But this same developer realized that the ICE was as powerful tool for finding hardware faults as well as software bugs. When working with a new peripheral it's handy to send controlled I/O streams to the device to insure that the thing functions properly. Sure, you can write a program to do this, but it's so much easier to use the debugger to interactively probe the peripheral and immediately examine its response.
Much more difficult, though, is probing a brand new hardware design. Usually the slightest fault means the system crashes within microseconds. One mixed up data or address line, a bad chip select, or any of a hundred different errors will bring things down immediately. Traditional debug tools - like a remote debugger - are no help since they are themselves software which rely on functioning hardware. In the pre-ICE days the only solution was to use a logic analyzer and tediously capture the digital flow, constantly hitting reset, hoping to find the mistake that so quickly causes the system to go awry.
(Actually, before the microprocessor - which was dynamic and could never run slower than hundreds of KHz - there was another option. I built one computer out of 74LS logic components and slowed the clock to 1 Hz. A simple voltmeter was enough to track design faults!)
But the ICE was the perfect tool for trapping hardware problems. Even with a totally shorted bus the emulator still worked properly. It was simple to issue I/O or memory reference commands and see what happened, probing with a scope to track signals. Emulators even then had their own memory that developers could map in place of the target's, so one could write a tiny bit of code in known-good RAM that looped, issuing memory reads very quickly to the target. An ICE and scope together were a potent pair that made hardware troubleshooting quite easy.
We've come full circle. The BDM or JTAG tool dominates now, yet it's not much more than the crude remote debugger that predated the ICE. Though the price of ICEs have plunged with ever increasing functionality, they are rapidly fading from the scene. Managers don't want to hear things like "I need a $10,000 tool to find my mistakes." CPU vendors no longer make the specialty bond-out chips needed to make the tools. High speeds and too many go-fast CPU features are problematic for emulators. BDM tools are clearly the future.
BDMs are far superior to software-based debuggers for troubleshooting hardware since they will often boot up despite all sorts of target system design faults. Since they lack features like internal RAM, though, they're not as helpful as an ICE in finding basic hardware flaws.
So what do we do?
If you're still using ICEs, break through the metaphorical wall that divides hardware and software teams. Use the emulator - which seems to have evolved into a software-only tool - as your prime hardware debugger. I spent a lot of years in the ICE business, and was always amazed at how few EEs used emulators. They'd sweat bullets capturing that one microsecond event when an emulator would let them set up a loop that created the same problem 100,000 times per second.
Though BDMs are quite useful troubleshooting tools they can't function without a lot of operational hardware. The good news is that few need working target memory, address or data lines. The huge number of nodes on the busses makes these the most likely source of design problems.
The BDM does expect correctly functioning control inputs to the CPU. So use your scope and check the wait state input - is it asserted? How about interrupts? Make sure the clock is perfect. No modern processor is happy with sloppy rise and fall times or voltage levels that don't meet the processor's spec. Simple CPUs have only a handful of control inputs; more complex ones might have dozens. It's pretty easy, though, to probe each one and insure it behaves properly.
Don't forget Vcc! As a young engineer a wiser older fellow taught me to always round up the usual suspects before looking for exotic failures. On occasion I've forgotten that lesson, only to wander down complex paths, wasting hours and days. Check power first and often.
With the basics intact, boot up the BDM and use it to exercise address and data lines. One problem with some of these tools is that they create a lot of bus activity even when you've requested a single byte; how can you tell which transaction is the one you've commanded?
One solution is to program a processor chip select to be on for the narrowest possible range, one that's (hopefully) outside of the spurious debugger cycles. Trigger your scope on this, and then issue the appropriate read/write test commands.
Though the tools don't include emulation RAM you can still create loops to ease scooping. The debugger program that drives the BDM invariably includes a command structure that lets you issue the same read or write repeatedly. Expect a very slow repetition rate since each loop has a lot of serial overhead to the BDM.
Brains, not Tools
Commercial tools are wonderful but limiting. The most powerful debugger is the one that sits between your ears. Be innovative.
Most folks test new hardware using a simple loop burned into EPROM or Flash and hit the reset switch. If anything is wrong the system crashes instantly. You'll grow calluses hitting the reset switch when trying to find the problem with a scope or logic analyzer.
Some folks connect a pulse generator to the processor's reset input. This clever approach resets the CPU hundreds or even thousands of times per second, making it much easier to use a scope or analyzer to track down the defect.
Unfortunately, a null loop only checks a handful of address lines. This is the most frustrating of all conditions: you think the thing works, only to find that code at particular addresses, or large programs, don't function. It's a mistake to assume the hardware works just because you've proven a small subset; better, explicitly check every address and every data line before pronouncing the system ready for software integration.
With an ICE you can write a program in emulation RAM to cycle all target address and data lines. If a bit of target RAM works then the BDM can download test code there. But in either case the processor bus lines will be cycling like mad as the test code runs. How can your scope or analyzer identify the one test cycle from the blizzard of others needed just to run the test code?
Perhaps it's time to put the tools away. What can we do that sequentially toggles every address and data line?
Lots of processors have single-byte or -word software interrupt instructions. Suppose you could make the system execute nothing but an interrupt, over and over, no other instructions getting in the way. What would happen? The CPU issues a fetch, reads the interrupt instruction, and then pushes the current context on the stack. It will vector to the interrupt handler, read another interrupt, and push the context again - now with the stack pointer decremented by a few words since there was no return from interrupt. This goes on forever, the stack pointer marching down through all of memory.
It's easy to find these stack cycles as they are the only memory writes taking place. Trigger your scope on write and then look at each address line in turn. A0 cycles at a very fast rate. A1 at half that rate. A2 half A1. (On an 8 bit processor A0 will probably cycle, but it won't on 16 and 32 bit machines). Probe each and every node - every memory chip, every I/O device that's connected to address lines. You'll prove, for sure, that the address bus is properly connected everywhere. Or not.
Look at the data lines during the write cycles. Depending on what the CPU pushes during an interrupt you'll see the return address (probably that of the ISR) and perhaps flag register bits. Again, insure these bits go to each and every data node.
What about that ISR? We don't have one as we're executing the same instruction no matter where we are. There are also no interrupt vectors. In fact, we'll probably vector to the address equal to the software interrupt's hex code. So what? As long as the hex does not violate a basic processor rule the fact we're going to an arbitrary address is irrelevant. (A basic rule might be that the vector must be even, like on the 68k family).
Consider a real-mode x86 chip, like the 186. The one-byte software interrupt instruction is 0xcc. Executing that means the part will read high and low vectors, getting 0xcc for each read. The ISR is then at 0xcccc:cccc, a perfectly valid address. Your scope will see lots and lots of fetches from this address, plenty of vector reads from the vector addresses (0x0000c-0x000f), but writes to ever-decreasing stack locations.
But I've glossed over the biggest part of the trick: forcing the processor to execute the same software interrupt regardless of fetch address.
Remove all memory components, or disable them by disconnecting and idling the chip selects. Memory mapped peripherals need the same treatment. Then tie the data bus to the software interrupt instruction via pull-up and pull-down resistors. You must use resistors rather than hard-wired connections since the CPU will be driving these lines during stack writes. So for a real-mode x86 connect both high and low data lines to 0xcc. On a 68k use a breakpoint instruction. The Z80/180/8085 family is nicest of all since 0xff is RST7.
Turn power on and scope each data line to be sure it goes to the proper state. A shorted data line will invalidate the test.
If the processor doesn't have a single byte/word software interrupt this process won't work. There may be other limitations as well: the stack rolls over on a 64k boundary on the 186 so the test doesn't check 4 of the 20 address lines.
But the tool cost is zero, setup time almost immediate, and we get a tremendous amount of information.
Wither We Go
With emulators slowly disappearing from 16 and 32 bit development projects we're required more than ever to use clever debugging strategies in place of good tools. I'm frankly dismayed by the trend. Programs are getting bigger with more real time constraints, while at the same time tools grow dumber. Managers seem to have less tolerance than ever for expensive development systems, yet time-to-market concerns and engineering salaries both skyrocket.
Yes, new processors include more built-in debugging resources. But the goal of increased visibility into our embedded creations seems even more elusive than in 1975. Simulation will become more important, but ultimately some poor developer is always faced with the almost impossible task of figuring out why that code which runs so well under simulation crashes when ported to the perfectly-tested hardware.
A fantastic toolchain pays for itself every time.