In-Circuit Emulators

The basics of the ICE. Originally in Embedded Systems Programming, October, 1998.

By Jack Ganssle

Throughout the 70s and 80s legions of engineers became entrepreneurial wannabes by designing - and in some cases actually selling - cheap in-circuit emulators. Though most failed horribly, some of the well-known vendors of today (like Huntsville Microsystems, Softaid, and Orion) started in this manner.

I, too, participated in this madness, designing and selling a $600 unit for various 8 bit processors. Over the years I learned some basic truisms of the emulator business, truisms that remain fundamental factors of the business:

Customers demand - though rarely use - complex feature sets.
The cost to build the unit is insignificant compared to the costs of developing the unit, support and sales.
It's just as easy (or hard) to sell and support a $6000 emulator as a $600 unit. The $6k price, while offering more features, yields a healthier profit margin.

The embedded tool marketplace is horribly fragmented. Hundreds of different processors, each with innumerable variants, are in common use today. Emulator sales volumes for a particular CPU are small. That simple-looking palm-sized emulator costs $8000 not because of it's internal complexity; rather, the vendor amortizes perhaps a million dollars of engineering over a small sales volume. Worse, support costs eat up enormous amounts of expensive engineering time.

Though developers often rail against the high costs of emulators, in fact the average price has declined from $20-30k (in 1975 dollars) to under $10k today - while the performance of the tools has skyrocketed.

The ICE Defined

Firmware is typically inaccessible, deeply buried inside of a product lacking any sort of realistic user interface. One of the greatest hassles we developers face is finding ways to get at the code, to see what it's doing, in both the time domain and the execution path.

The need to peer inside of embedded code has spawned a great number of debugging products. All have their own strengths and weaknesses, their unique features and price points. Figure 1 shows typical features for different sorts of tools.

Feature	ICE	BDM	ROM Monitor	Logic Analyzer	ROM Emulator

Source debugging	Yes	Yes	Yes	Some	Yes
Download code	Yes	Yes	Yes	No	Yes
Single Step	Yes	Yes	Yes	No	Yes
Basic Breakpoints	Yes	Yes	Yes	No	Yes
Display/alter registers, memory & I/O	Yes	Yes	Yes	Yes	Yes
Watch Variables	Yes	Yes	Yes	Yes	Yes
Real Time Trace	Yes	No	No	Yes	No
Event Triggers	Yes	No	Yes	No
Overlay RAM	Yes	No	No	No	Yes
Shadow RAM	Some	No	No	No	No
Hardware Breakpoints	Yes	Some limited	No	No	Some
Complex breakpoints	Yes	No	No	Yes	No
Time stamps	Yes	No	No	Yes	No
Execution Timers	Yes	No	No	Yes	No
Non-intrusive access	Yes	Yes	No	Yes	No

Figure 1: Typical features of various debugging tools

A quick inspection of the figure shows that logic analyzers and emulators share a number of features offered by no other products. These are:

Non-intrusive access - Probably the most important feature of an emulator, and one shared by BDMs and logic analyzers, non-intrusive access means the emulator 'gets inside the head' of your target system without consuming the target's memory, peripherals, or any other resources. Yet BDMs are available for only a small set of processors, and logic analyzers generally cannot probe microcontrollers.

As CPUs get more complex, though, ICEs and all tools have more restrictions that you, the user, must be aware of. If the part has cache, will the ICE work with cache enabled? A more insidious - and common - problem stems from pins shared between several functions. If address line 18, for example, can be changed to a timer output under program control, will the emulator gork? Call the vendor and ask for the 'restriction list' before buying any emulator.

Real Time Trace - Trace is the emulator's primary miracle feature. It captures the execution stream of your code in real time, displaying it in the original C or C++ source. Trace depths are measured in frames, where one frame is one memory or I/O transaction - thus, a single instruction may eat up several frames of storage.

Trace width is given in bits, and generally includes the address, data and some of the control busses, perhaps also with external inputs (to show how the code and hardware synchronize), and timing information. Widths vary from 32 bits to over one hundred.

Event Triggers and filters - Event triggers start and stop trace acquisition. You define a condition (say 'when foobar=23'); in real time the emulator detects that condition and starts/stops the trace collection. Filters include or exclude cycles from the trace buffer (it makes little sense, for example, to acquire the execution of a delay routine).

Even with the hundreds of thousands of trace frames offered by some emulators, there's never enough depth to collect more than a tiny bit of the code's operation. Triggers and filters let you specify exactly what gets captured. The skillful use of triggers and filters reduces your needs for deep trace, and greatly reduces the amount of acquire data you'll have to sift through.

Overlay RAM - also known as Emulation RAM - while physically inside of the emulator, is mapped into the target processor's address space. Overlay RAM replaces the ROM or Flash on your system so you can quickly download updated code as bugs are discovered and repaired. ICEs provide great latitude in mapping this RAM, so you can change between the emulator's memory and target memory with fine granularity. A singular benefit of overlay is that you can often start testing your code before the target hardware is available.

Today's Flash-based systems might seem to eliminate the need for overlay, but in fact Flash programs more slowly than RAM, leading to longer download times.

Shadow RAM - When the emulator updates the source debugger's windows it interrupts the execution of your code to extract data from registers, I/O, and memory - an interruption that can take from microseconds to milliseconds. Shadow RAM is a duplicate address space that contains a current image of your data, but that the emulator can access without interrupting target operation.

Hardware Breakpoints - Breakpoints stop program execution at a defined address, without corrupting the CPU's context. A software breakpoint replaces the instruction at the breakpoint address with a one byte/word 'call'. There's no hardware cost, so most debuggers implement hundreds or thousands. Hardware breakpoints are those implemented in the emulator's logic, often with a big RAM array that mirrors the target processor's address space. Hardware breakpoints don't change the target code; thus, they work even when debugging firmware burned in ROM.

Some pathological algorithms defy debugging with software breakpoints. A ROM test routine, for example, might CRC the code itself; if the debugger changes the code for the sake of the breakpoint the CRC will fail. There's no such restriction with an emulator's hardware BPs.

Hardware BPs do come at a cost, though, so some emulators (particularly smaller units) offer lots of breakpoints, with a few implemented in hardware and the bulk in software.

Complex Breakpoints - Simple BPs stop the program only on an instruction fetch ('stop when line 124 is fetched'). Their complex cousins, though, halt execution on data accesses ('stop when 1234 is written to foobar'). They'll also allow some number of nested levels ('stop when routine activate_led occurs after led_off called'). Emulators offer quite a diverse mix of nesting levels; few customers use more than two.

Desktop debuggers like that supplied with Microsoft's VC++ usually offer complex breakpoints - but they do not run in real time, and impose significant performance penalties. Part of the cost of an ICE is in the hardware required to do breakpoints in real time.

It's important to understand that a simple hardware or software breakpoint stops your code before the instruction is executed. Complex BPs, especially when set on data accesses, stop execution after the instruction completes. On processors with prefetchers it's not unusual for the complex breakpoint to skid a bit, stopping execution several instructions later.

Time Stamping - Most emulators include time information in the trace buffer. Usually around 32 bits of trace width is dedicated to the timestamp. Combined with the trace system's triggers, it's easy to perform quite involved timing measurements.

To summarize, there are two profound differences between an emulator and any other debugging tool: first, the ICE works even in partially alive hardware, so it's unbeatable for hardware development as well as for firmware work.

Second, no other tool gives such profound insight into the timing issues so important to real time systems. Though every year pundits predict the demise of the ICE, if that were to happen we'd lose virtually our only way to see how the code works in the time domain.

ICE Challenges

Unhappily, in their quest for speed and ever cheaper parts, CPU vendors have created a blizzard of parts that challenge the skills of the best emulator designer.

The fast processor is the traditional emulator enemy. As bus speeds escalate to 100 MHz and beyond, the speed of electrons in wire becomes an ever more potent obstacle; couple this with small but ever-present delays in the ICE logic and the problem grows worse. A 5 nsec ICE-induced delay is, at 50 MHz, 25% of the total bus cycle time. The vendors continue to reduce emulator delays, but many admit that 50 to 100 MHz seems to be an almost insurmountable barrier.

Microcontrollers with internal memory, prefetchers, pipelines and on-board caches all hide the CPU core from the pins. These performance-enhancers make it all but impossible to get a clear view of what the processor is doing when. An emulator must use a special version of the processor - the bondout - that brings extra signals to additional pins to peer deeply into the CPU's brain.

Only hundreds of emulators for any particular chip are sold each year, so IC manufacturers are understandably reluctant to devote a lot of attention to bondouts. As a result they're a famous source of problems. Sometimes the bondout's version lags that of production chips. Or, the bondout might not quite as fast as the production parts. Again, check the ICE vendor's restriction list to be sure the emulator performs just like an unadulterated CPU.

Most higher end processors do not come with bondout support. AMD, Intel, and others are doing intriguing things to make the standard version of the chip more 'emulatable'. Though the visibility into the core is never as complete as with a true bondout, complex debugger software that infers execution paths and is deeply knowledgeable about the processor's operation makes yields seamless debugging.

All ICE vendors are necessarily now connection experts. A quick glance at their web sites shows a plethora of options for tying the emulator to your target system. PQFP devices really started the torrent of problems - hundreds of pins sprouting on all four sides of a device, with lead widths so small they're invisible from a foot or two away. BGA parts have of the connections are on the bottom of the chip, which makes these devices virtually unprobable.

Yet the emulator must somehow connect reliably to all of the pins, while minimizing propagation delays.

Every problem spawns a solution; in America every problem creates a web of companies devoted to solving that problem. Emulation Technology, EDI, Advanced Interconnects and even HP offer a bewildering array of adapters that help probe these difficult parts. The prices are sometimes breathtaking.

A 'clip-on' adapter, that simply snaps onto the surface-mounted processor, is unquestionably the ideal connection option - from a user perspective. Clip-on connections are indeed simple and reliable on low density pin count devices, like PLCC and under-144 pin PQFPs.

At 144 pins and up clip-ons become more problematic. Some vendors recommend soldering an adapter in place of the CPU. This provides a very reliable connection at the cost of making your test board a prototype forever. A few others, like Beacon Development Tools, do offer clip-on adapters they claim function reliably even at these high densities. HP has a novel approach which involves gluing a mounting stud to the top of the CPU, and then pressing a conductive elastomer (rubber-like) connector down around the chip.

With dense parts expect to spend time insuring the ICE-to-target connection is reliable. Follow the ICE vendor's recommendations, even if it means sacrificing one system as a prototype. Nothing is more frustrating than having an intermittent pin or two inside of a stack of adapters.

Market Trends

Despite the difficulties of supporting modern processors, emulators continue to be a critical tool for embedded development. As Lisa Evans of Applied Microsystems says, 'No matter what the processor is - fast, complex, BGA, etc. - we don't have a technical barrier for supporting it as long as we can establish a relationship with the CPU vendor.' Emulators are not going away; rather, they continue to evolve to meet changing market requirements.

In the early 90s, while prices of compilers, debuggers and the like held rather steady, ICE costs took a nosedive. $20-30k units all but disappeared, except for bleeding edge products. This helped reshape the nature of the products.

Now emulators look rather like toys - often they're small, palm-sized units with all of the electronics placed right at the target system's CPU socket. Applied Microsystems, Beacon, Hitex and others have all capitalized on ASIC technology to compact the debug electronics into a few complex parts. Eliminating the hundreds of chips once required reduces PCB costs, simplifies the power supply, and most importantly of all gets rid of those problematic emulation cables. Running high speed signals up a bundle of wires requires very precise signal management, greatly increasing costs.

Yet even as prices and physical sizes decline functionality has skyrocketed. 256k-deep trace buffers are not uncommon; breakpoint and triggering flexibility continues to expand. Some units offer more advanced features like performance analysis, while others break these add-ons into extra products more closely tailored to the intended application.

An almost incestual partnering pervades the industry. Before the advent of C all emulator vendors wrote their own software debugger that drove the ICE. C started to shift the landscape somewhat - a few vendors continued to provide proprietary debuggers, but more linked third party products to their products. With C++ it's almost impossible for an ICE vendor to create both the hardware tool and supporting adequate software. A few standard debuggers mostly from compiler companies - like XRAY from Microtec and SingleStep from Software Development Systems - drive most of the emulators in the world.

Even more partnering comes from the prevalence of RTOSes in today's applications. An RTOS brings it's own unique debugging challenges. To understand what tasks are running when, or to understand message and semaphore activity, the debugger needs a tremendous amount of knowledge about the operating system's internals. Though a lot of RTOS vendors sell their own debuggers, as a user you must find the intersection of many sets of tools: an ICE that supports a debugger that supports the RTOS that works with your compiler. Talk to you vendors, and make sure you select the combination that most of their customers use. Being a pioneer or on the fringe will always create support headaches.

An interesting trend is the evolution of the embedded programmer him/herself, a development that is also reshaping the tool market. In days of yore all developers were hardware folks, banging a bit of assembly code out. They worked at a very low level, deep inside the bits and bytes, with inadequate processors that demanded lots of size and speed optimizations. Being used to expensive tools (scopes, logic analyzers and the like) they demanded high-end emulators, and used them to the fullest.

Now we see far more developers coming from the computer science side. Though many straddle the hardware/software fence, most spend the bulk of their development time working on high-level, hardware-independent code. These people want visibility into the operation of the code, but often don't need more exotic features. Embedded Support Tool's Jim Watkins said that 'trace is diminishing in importance and demand, since it requires a person who can use trace well. The typical embedded programmer will not take time to learn.'

So traceless tools are making inroads, led by new debugging strategies like Motorola's BDM interface, which dedicates a few processor pins to a serial communication link used exclusively for debugging. BDMs solve most of the speed and visibility problems produced by modern processors, at (generally) the cost of losing a real-time look at your code.

Some companies (HP, HMI and EST for instance) now produce hermaphrodite debuggers. They use a BDM serial link to get inside the processor's brain, while coupling optional trace logic to the pins to provide a real time look at the code. The trend is to separate run-control (start/stop, access to CPU internals) from trace into two separate yet integrated tools. High end programmers who don't work in the bits and bytes buy just the BDM part; people working at a lower level get the whole shebang.

BDMs and BDM-like devices herald an important new trend in development tools, one too large to cover here. We'll have more to say in next month's special report on debugging about this subject.

Conclusion

Most emulator vendors interviewed for this article admitted being perplexed by the market's intolerance for expensive tools. Salaries continue to go up, and developers are in short supply. These conditions would seem to create a situation where companies would be willing to spend a bit more on emulators and the like to get more productivity from their engineers.

Yet an emulator purchase is still hard to swallow for a lot of MBA types. 'You mean you want me to spend $10k on a gizmo that finds all of the mistakes you make? Just stop making mistakes!', they cry incredulously. Bugs are a part of programming - a huge part of programming, so it makes sense to get the proper tools to deal with them quickly.

The emulator remains unique in it's ability to look deeply into the soul of an embedded system, in both procedural and time domains. Check the restriction lists, take time to establish a reliable connection to your target board, and learn how to best use the tool's wide range of features.