Published February 1997 in Embedded Systems Programming
For novel ideas about building embedded systems (both hardware and firmware), join the 35,000 engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. Click here to subscribe.
By Jack GanssleI love small embedded systems. The massive projects we see in weapons systems, for example, hold little attraction due to their complexity. Worse, these development projects drag on for years, consuming big fractions of a person's career.
Me, I get bored after 6 months to a year working on the same thing. Systems designed around 8 bit microprocessors are limited by address space to smallish programs and thus shorter development cycles. If you live and breathe small CPUs, you have a chance to design a lot of different systems in your career. That's fun.
I've often complained about the press's fascination with the biggest and most exotic processors. The attention focused on embedded Sparc's, Pentiums and PowerPCs is all out of proportion to the number of systems that actually use these behemoths. Billions of 8 bit micros are sold each year, dwarfing the real or imagined 32 bit market.
Yet, the embedded 32 bit market does exist, and is indeed growing. For instance, I'm told the 747 uses several 386DX chips in its dashboard. These are identical to the parts used in (old!) PCs. Presumably, the avionics' video requirements demand quite a bit of compute horsepower, more than available from even the latest 8051 and other, traditional embedded CPUs.
Now there are a plethora of new CPUs, processors designed expressly for the embedded market, many of which capitalize on the success and popularity of the PC's ubiquitous architecture. AMD's Elan series, National's NS486SXF, and Intel's 386EX all offer 386 and 486 performance bundled with integrated peripherals. Like Motorola's and IBM's PowerPC family, they offer workstation performance mixed with embedded cost-reducing on-board peripherals.
All of these represent an attempt to bring fast processing to the cost-sensitive embedded market. One interesting trend is that of narrow busses. The 386EX, for example, is at heart a 32 bit 386. The data bus, though, is but 16 bits wide, greatly reducing system costs at the expensive of some performance.
These processors aren't for everyone. Where there's serious data reduction, or a need for lots of fast floating point, one of these 32 bit speed demons may make a lot of sense. The average controller, though, still requires little more than an 8 or maybe a 16 bitter. Processor envy (you know, "my bus is bigger than yours") drives too many engineering decisions, greatly inflating system costs for no benefit.
Before committing to a 32 bitter, it makes sense to look at cheaper alternatives (assuming cost is an issue). One of the neat things about the x86 family is the broad range of processors available. From the lowly 8088 to the Pentium, there's a mix of performance and speed for any application. AMD's 186EM/ES family offers high performance at low system cost due to relaxed bus timing. NEC's V-series includes a broad range of embedded peripherals, and in some cases on-board memory.
Real and Protected
The two most important factors that mandate a 32 bitter are raw horsepower and address space. Both very busy and very big programs are candidates for this technology. Plenty has been written about performance (though we still are unsure how to predict this). Little gets said about address space. There are few issues when working the nice flat confines of a 68k-like architecture; accessing location 0ffffffff is no more difficult than getting to zero.
The embedded x86 arena, though, is a bit different. All members of this family carry the burden of the past. The sins of the father are visited up the children, yea unto the Nth generation!
So, since the newsgroups see a constant barrage of questions about using large x86 address spaces, and since my electronic IN box overfloweth with questions on the same subject, here's some comments and resources.
A generation now has grown up with some level of exposure to Intel's famed and hated "real mode" architecture. This relic of the 8088 days limits addressing to 20 bits (1 Mb). In real mode all addresses in your program are 16 bit values, which the CPU adds to a "segment" value (stored in one of 4 segment registers) to compute the final 20 bit location. This technique provided some compatibility with older 8 bit systems while giving an albeit awkward way to handle larger address spaces.
In assembly language segment handling is tedious. C compilers mostly insulate the programmer from the confusion of real mode addressing, mitigating some of the impact of this design.
Protected Mode changes everything you ever learned about x86 segmentation while offering direct access to 32 bit addresses. Though segment registers still play a part in every address calculation, their role is no longer one of directly specifying an address. In protected mode segment registers are pointers to data structures that define segmentation limits and addresses. They're now called "selectors" to distinguish their operation from that of real mode.
The selector is a 12 bit number that is an index into the descriptor table (more on that momentarily). Selectors live in the fondly remembered segment registers: CS, DS, ES, SS, and two new ones: GS, and FS. Just as in real mode, every memory access uses an implied or explicitly referenced segment register (selector).
The "descriptor table" data structure contains the segmentation information that, in real mode, existed in segment registers. Now, instead of real mode's 4 lousy segments, you may define literally thousands, allocating one 8 byte entry in the descriptor table to each segment. And, each descriptor defines the segment's size as well as its base address, so the CPU's hardware "protection" mechanism (hence the name) can insure no program runs outside of memory allocated to it.
Each descriptor contains the segment's base, or start, address (a 32 bit absolute address), the segment size (expressed as a 20 bit number with 4k granularity, or as a 32 bit number), and numerous segment rights and status bits.
Thus, the descriptor tells the processor everything it needs to know about accessing data or code in a segment. Accesses to memory are qualified by the descriptor selected by the current segment register.
I'll present my Better Firmware Faster seminar in Melbourne and Perth, Australia February 20 and 26th. All are invited. More info here. The early registration discount ends January 20.
You, the poor overworked programmer, creates the descriptor table before switching into protected mode. Since normal">every memory reference uses this table, the common error of entering protected mode with an incorrect descriptor table guarantees an immediate and dramatic crash.
An example is in order. Code fetches always use CS. A protected mode fetch starts by multiplying CS by 8 (the size of each descriptor) and then adding the descriptor base register (which specifies the start address of the descriptor table). The CPU then reads an entire 8 byte record from the descriptor table. The entry describes the start of the segment; the processor adds the current instruction pointer to this start to get an effective address.
A data access behaves the same way. A load from location DS:1000 makes the processor read a descriptor by shifting DS left 3 bits (i.e., times 8), adding the table's base address, and reading the 8 byte descriptor at this address. The descriptor contains the segment's start address, which is added to the offset in the instruction (in this case 1000). Offsets, and segment start addresses, are 32 bit numbers - it's easy to reference any location in memory.
Every memory access works through these 8 byte descriptors. If they were stored only in user RAM the processor's throughput would be pathetic, since each memory reference needs the information. Can you imagine waiting for an 8 byte read before every memory access? Instead, the processor caches a descriptor for each selector (one for CS, one for DS, etc.) on-chip, so the segment translation requires no overhead. However, every load of a selector (like MOV DS,AX or POP ES) will make the processor stop and read all 8 bytes to it's internal cache, slowing things down just a bit.
It's mind blowing to watch these transactions on a logic analyzer. All 32 bit x86 parts are fast (if you use them correctly - the 386EX boots with 31 wait states). The processor screams along, sucking in data and code at a breathtaking rate, and then suddenly all but stops as it reloads a descriptor into its local cache.
It's all a little mind boggling. The CPU manipulates these 8 byte data structures automatically, reading, parsing, caching, and working with them as needed, with no programmer intervention (once they are set up).
Not only does the CPU translate addresses as described. In parallel it checks every memory reference to insure it behaves properly. If the effective address (base plus offset) is greater than the segment limit (stored in the descriptor), the processor aborts the instruction and generates a protection violation exception. It won't let you do something stupid. You can even specify that a segment is read-only; a write will create the same exception.
Now, despite the fact that the x86 permits thousands of protected mode segments, most embedded applications run with merely one or two. In fact, you can create a single descriptor table entry that puts the system into "flat" mode, where all 4 Gb of memory is available in a single absolute segment starting at zero. The descriptor gives a base address of zero and a length of 4 Gb. All references to memory then are perfectly linear, emulating the linear addresses pioneered so long ago by Motorola et al.
A flat model is the easiest protected mode configuration, and is adequate for many applications. It doesn't take advantage of the wealth of protection schemes available on-chip, but most ROMed systems are not threatened by rogue or malicious programs.
Actually entering protected mode is quite simple; Intel's data books contain example code. Setting up the descriptor table, though, is a pain. My advice: don't.
The linkers and locators available from a number of sources will nearly automatically create these tables. You may have to specify some setup information in a command file, which gives segmentation rules for each named segment. This is not much more complicated than the normal setup of any embedded linker, where ROM, RAM and other addresses must be specified. Contact the vendors and get their application notes. (Beg, borrow, or steal app notes; these are generally the codified source of industry wisdom. Breeze through the product hype and learn from the information presented.)
Do get Intel's series of databooks. The two most important for understanding protected mode are "The 386 DX Microprocessor Programmer's Reference Manual" and "80386 System Software Writer's Guide". You'll find a bit of useful code and lots of detailed functional descriptions.
Watcom (http://www.powersoft.com/products/languages/watccpl.html, contact them at
Sybase, Inc., 561 Virginia Road, Concord, MA 01742) sells a compiler specifically targeted at protected mode applications.
Metaware (www.metaware.com, 2161 Delaware Avenue, Santa Cruz CA 95060-5706, (408) 429-6382) also sells a protected mode compiler.
Borland and Microsoft dominate the compiler market for real mode embedded x86 applications. The .EXE files these products produce are not ROMable. The segmentation information in a .EXE is relative; DOS allocates absolute segments during program load. When there's no DOS (as in most ROMed systems), there's no segment fixup. Paradigm () sells a Locator that converts the .EXE to absolute files. When working in real mode, Paradigm and a Borland or Microsoft compiler is the usual choice.
The 32 bit versions of these compilers also produce relative output files, and also require the use of a Locate, though one that understands protected mode. Systems and Software (18012 Cowan #100, Irvine, CA 92614-6809, (714) 833-1700) sell a very popular Locate for these compilers and those from Metaware and Watcom. They also provide several protected mode debuggers.
Concurrent Sciences (www.debugger.com, 530 S. Asbury, P.O. Box 9666, Moscow, ID 83843, (208) 882-0445) is another source of protected mode debuggers and locators. Also check out PharLap (www.pharlap.com, 60 Aberdeen Avenue, Cambridge, MA 02138, (617) 661-1510).
The net is full of information about building protected mode applications. One that contains more information than you may want to know about anomalies with some of the chips is www.x86.org. Ignore the nonsense about the site's religious war over Intel's "secrets" (yawn) and check out the technical tidbits. One of the best parts of the site is a series of links to Intel's databooks, which are available for download. It's easier to find the links here than on Intel's site.
A 32 bit x86 processor makes sense if your application needs either a lot of basic compute capability or a large address space. The downside to a big chip is extra cost and increased power consumption.
In embedded systems protected mode is primarily useful for getting access to addresses above 1 Mb. The wonderful protection schemes that come with descriptors are nice, but probably not all that important for non-multiprogramming applications. Bear in mind that all of the new instructions and addressing modes provided by the 32 bit x86 parts are available in both protected modes and real modes.
If you're new to protected mode design, milk the net and vendors' application notes. It's better to steal success than painfully rediscover well-known code and techniques.