For novel ideas about building embedded systems (both hardware and firmware), join the 40,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. Click here to subscribe.

By Jack Ganssle

Two Completely Unique Products

Most of the many dozens of press releases that daily flood into my in-box gush over minor product upgrades or wax poetic about a contract award. But once in a while I run into a truly novel product that is exciting or important. Micrium's new uC/Probe fits that description.

One of the classic distinctions between embedded systems and other types of computer applications is one of visibility. A PC programmer can seed his programs with printf()s, and even populate special windows with variables and classes. An embedded system often has no display, or one that's very limited, so those sorts of debugging strategies just don't work.

On top of that about 70% of embedded apps have a real-time OS. Multiple activities competing for scarce resources furiously sequence themselves, creating and releasing stacks and other resources, yet the developer has only the slightest insight into what is going on.

Intel addressed this issue in the 70s with the introduction of the MDS-800, the first in-circuit emulator, which gave developers hardware breakpoints and real-time trace to capture program flow and variable states. But that sort of great technology, still in use today though usually manifested in BDM and JTAG debuggers, only gives snapshots of activity rather than a real-time and uninterrupted view of what was going on. It's like operating a nuclear power plant by stopping the dynamos once in a while and dipping a ladle into the cooling water to take temperature measurements.

Wouldn't it be nice to see the A/D's output all the time? A continuous graph might be even nicer. Or watch all of the stack pointers' high water marks? That's how industrial plants work: operators have an array of gauges that continuously read out critical parameters.

Micrium, of uC/OS-II fame, ( has a tool that does all of this and more. uC/Probe is a hard-to-describe application that samples any number of your program's variables in real-time and displays them in pretty much any manner you'd like. Connect them to virtual gauges, spreadsheets, LEDs or other sorts of widgets to create an industrial control room for your code.

A few KB of code or less lives in your firmware to provide a communications channel to your PC, via a serial port, USB, TCP/IP or the ARM J-Link protocol. A Windows application forms the meat of the product. In "design mode" uC/Probe offers a big palette of graphical resources ("widgets") that you drag onto the screen. A symbol browser shows your program's variables and classes. Click on a display widget (like a gauge) to bring it into focus, and with just one more click you associate a variable with that widget. You'll need a toolchain that produces Elf or IEEE-695 so uC/Probe knows about your symbols.

Enter "run" mode and uC/Probe starts sending periodic request for the variable values to the target, which are then displayed in the appropriate widgets (see figure 1).

A Probe screen shot.
Figure 1: A run screen

In the figure two spreadsheet widgets monitor a number of variables associated with multitasking (in this case via uC/OS-II, though it will work with any RTOS, and, of course, with no RTOS). Notice that in this case the spreadsheet shows stack's current and high-water marks, as well as the CPU loading by task. Think how easy it would be to find performance problems! The gauge displays another variable in both an analog and digital (in the odometer) format.

In design mode choose from lots of widgets, like a variety of moving graphs, LEDs, several sorts of gauges, bar charts, spreadsheets and switches and knobs. That's right: data can flow both ways. Want to stress test an application? Connect a knob to an update rate and crank it up to see, perhaps on a graph, how CPU utilization climbs.

The spreadsheet widget is essentially Excel with most of that application's functions and capabilities. So it's trivial to convert a binary A/D reading to engineering units or implement a filter or curve fit. Hyperlink support means you could display a value from a web-enabled instrument in a cell, or use that input to compute some other value. The possibilities for creating a test harness are simply breathtaking.

I think there's tremendous value in setting up uC/Probe to run all the time, not just when you're looking for a particular problem. Display task status, stack usage and the other parameters so critical to building a reliable embedded system. Toss in widgets monitoring analog parameters or computed results. My computer has dual monitors, so I put uC/Probe on one, and the IDE on the other, giving me both sorts of views into the code. That gives me a tremendous amount of visibility under the hood of a normally-inscrutable embedded system.

With a lot of widgets the communications channel to the target can get pretty busy, but it's possible to control the update rate, and to allocate variables to slow or fast queues. On a 25 MHz ARM9 at the max update rate about 200KB/sec were being transferred, burning 60% of the CPU's cycles. But at a more reasonable 10KB/sec that dropped to an inconsequential 2.5%. Micrium has versions for a number of processors (it will run on any CPU from 8 bits on up), but the source is included complete porting instructions and test cases.

uC/Probe is a big application and wants some decent PC horsepower, a situation getting more common in these Eclipse days. Priced at about $1k it's a tool I've wanted for a long time. Highly recommended.

Spinning Multicore

Parallax ( of BASIC Stamp fame released a very innovative - and very different - multicore processor a year or so ago called the Propeller. It has eight 32 bit processors ("cogs") sporting 2KB of RAM each. Other resources, including 64KB of RAM and ROM, are shared by all of the cogs. Nothing terribly new there. But a "hub" controls access to these assets by sequentially opening a window of opportunity for each cog. Cog 0 gets a shot, then cog 1, etc. Think of a V-8's distributor rotor. This hardware lockout means there are no bus contention problems or mutual exclusion issues. It hugely simplifies real-time programming.

Unlike many multicore CPUs this is a microcontroller. The bus does not connect to the part's pins so memory cannot be expanded. In fact, other than power and a couple of housekeeping pins, the device has just 32 I/O connections to the outside world. And herein lies another bit of clever quirkiness: each of those I/O lines goes to all of the cogs. If configured as outputs the result is the logical OR of all of the cog's assertions. If cog 0 sets pin 23 to a zero, and cog 5 drives it high, the output will be high. Odd, huh? But this is a part meant for control applications. If a pin were tied to a warning light then any cog can easily, and without any complex interprocessor communications code, turn the light on.

In addition to the 2KB of RAM, each cog has a pair of counters and, of all things, a video generator. But a lot of embedded applications feed small LCD displays; put the video on-chip and you can buy a cheaper display unit.

The cogs have no stack, and in fact have no need of a stack. Calls pass the return address very much like the SLJ instruction on the Univac 1100 series (for you old-timers). There are no registers, per se, either. Every instruction specifies both a destination and source for arguments in the cog's RAM. Cogs have instructions to start and stop other cogs, and to control built in hardware semaphores.

There are no interrupts: just assign a cog to the activity. It makes you think about a very different paradigm when processors are abundant.

A cog can read and write to the shared memory (well, they can only write to the 32B that's not ROM), but cannot execute from that space. So programs are limited to 2 KB/cog, though it is possible to reload that memory at any time from shared memory. 2KB might not sound like much, but many apps just don't need big programs.

An on-chip interpreter called Spin makes bigger programs possible at the expense of execution time. A cog executing a Spin app is really running the Spin interpreter from its local memory while executing tokens stored in the larger shared area. Tokens, of course, are much more compact than compiled binary from traditional languages.

Here's a complete two-cog Spin program that blinks a pair of LEDs:

               Long Stack[9]
Pub Main
               cognew(Toggle(16, 3_000_000, 100), @Stack)
               Toggle(17, 2_000_000, 200)
Pub Toggle (Pin, Delay, Count)
               repeat Count
               waitcnt(Delay + cnt)

The cognew token starts routine Toggle running on another cog, while it continues to run Toggle on the initial cog. Note that the code for all of the cogs lives in a single source file. Multicore programming just can't get much simpler.

The 32KB shared ROM has the Spin interpreter in it, of course, but that is under 2KB. The rest contains the video generator character set, plus log, antilog, and sine tables! Again, this defies our notion of what's found on a CPU, but makes some sense: logs make for faster multiplications and divisions, and trig calculations are common in many apps. Putting them in shared memory saves limited cog RAM.

The Propeller IDE looks like a simplified C development environment, but it, too, is full of novel thinking. Copy and paste work on arbitrary rectangles. There's a font for drawing schematics, which is TrueType, so any other Windows application can use it. Normal comments exist, of course, as do a sort of meta-comment. In "documentation" view all of the code disappears but the function definitions and meta-comments remain. Smart formatting automatically sets functions in alternating colors so the extent of a function is instantly clear. There's a lot more, but not room enough to describe it here.

At about $8 in quantity these 160 MIP parts are worth checking out.