Protected Mode
More on Protected Mode programming on x86 CPUs
 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
I love small embedded systems. The massive projects
we see in weapons systems, for example, hold little attraction due to their
complexity. Worse, these development projects drag on for years, consuming big
fractions of a person's career.
Me, I get bored after 6 months to a year working on
the same thing. Systems designed around 8 bit microprocessors are limited by
address space to smallish programs and thus shorter development cycles. If you
live and breathe small CPUs, you have a chance to design a lot of different
systems in your career. That's fun.
I've often complained about the press's
fascination with the biggest and
most exotic processors. The attention focused on embedded Sparc's, Pentiums
and PowerPCs is all out of proportion to the number of systems that actually use
these behemoths. Billions of 8 bit micros are sold each year, dwarfing the real
or imagined 32 bit market.
Yet, the embedded 32 bit market does exist, and is
indeed growing. For instance, I'm
told the 747 uses several 386DX chips in its dashboard. These are identical to
the parts used in (old!) PCs. Presumably, the avionics' video requirements
demand quite a bit of compute horsepower, more than available from even the
latest 8051 and other, traditional embedded CPUs.
Now there are a plethora of new CPUs, processors
designed expressly for the embedded market, many of which capitalize on the
success and popularity of the PC's ubiquitous architecture. AMD's Elan
series, National's NS486SXF, and Intel's 386EX all offer 386 and 486
performance bundled with integrated peripherals. Like Motorola's and IBM's
PowerPC family, they offer workstation performance mixed with embedded
cost-reducing on-board peripherals.
All of these represent an attempt to bring fast
processing to the cost-sensitive embedded market. One interesting trend is that
of narrow busses. The 386EX, for example, is at heart a 32 bit 386. The data
bus, though, is but 16 bits wide, greatly reducing system costs at the expensive
of some performance.
These processors aren't for everyone. Where
there's serious data reduction, or a need for lots of fast floating point, one
of these 32 bit speed demons may make a lot of sense. The average controller,
though, still requires little more than an 8 or maybe a 16 bitter. Processor
envy (you know, "my bus is bigger than yours") drives too many engineering
decisions, greatly inflating system costs for no benefit.
Before committing to a 32 bitter, it makes sense to look at
cheaper alternatives (assuming cost is an issue). One of the neat things about
the x86 family is the broad range of processors available. From the lowly 8088
to the Pentium, there's a mix of performance and speed for any application.
AMD's 186EM/ES family offers high performance at low system cost due to
relaxed bus timing. NEC's V-series includes a broad range of embedded
peripherals, and in some cases on-board memory.
Real and Protected
The two most important factors that mandate a 32 bitter are
raw horsepower and address space. Both very busy and very big programs are
candidates for this technology. Plenty has been written about performance
(though we still are unsure how to predict this). Little gets said about address
space. There are few issues when working the nice flat confines of a 68k-like
architecture; accessing location 0ffffffff is no more difficult than getting to
zero.
The embedded x86 arena, though, is a bit different.
All members of this family carry the burden of the past. The sins of the father
are visited up the children, yea unto the Nth generation!
So, since the newsgroups see a constant barrage of
questions about using large x86 address spaces, and since my electronic IN box
overfloweth with questions on the same subject, here's some comments and
resources.
A generation now has grown up with some level of
exposure to Intel's famed and hated "real mode" architecture. This relic
of the 8088 days limits addressing to 20 bits (1 Mb). In real mode all addresses
in your program are 16 bit values,
which the CPU adds to a "segment" value (stored in one of 4 segment
registers) to compute the final 20 bit location. This technique provided some
compatibility with older 8 bit systems while giving an albeit awkward way to
handle larger address spaces.
In assembly language segment handling is tedious. C
compilers mostly insulate the programmer from the confusion of real mode
addressing, mitigating some of the impact of this design.
Protected Mode changes everything you ever learned
about x86 segmentation while offering direct access to 32 bit addresses. Though
segment registers still play a part in every address calculation, their role is
no longer one of directly specifying an address. In protected mode segment
registers are pointers to data structures that define segmentation limits and
addresses. They're now called "selectors" to distinguish their operation
from that of real mode.
The selector is a 12 bit number that is an index into
the descriptor table (more on that momentarily). Selectors live in the fondly
remembered segment registers: CS, DS, ES, SS, and two new ones: GS, and FS. Just
as in real mode, every memory access uses an implied or explicitly referenced
segment register (selector).
The "descriptor table" data structure contains
the segmentation information that, in real mode, existed in segment registers.
Now, instead of real mode's 4 lousy segments, you may define literally
thousands, allocating one 8 byte entry in the descriptor table to each segment.
And, each descriptor defines the segment's size as well as its base address,
so the CPU's hardware "protection" mechanism (hence the name) can insure
no program runs outside of memory allocated to it.
Each descriptor contains the segment's base, or start,
address (a 32 bit absolute address), the segment size (expressed as a 20 bit
number with 4k granularity, or as a 32 bit number), and numerous segment rights
and status bits.
Thus, the descriptor tells the processor everything
it needs to know about accessing data or code in a segment. Accesses to memory
are qualified by the descriptor selected by the current segment register.
You, the poor overworked programmer, creates the descriptor
table before switching into protected mode. Since
normal">every memory reference uses this table, the common error of entering
protected mode with an incorrect descriptor table guarantees an immediate and
dramatic crash.
An example is in order. Code fetches always use CS. A
protected mode fetch starts by multiplying CS by 8 (the size of each descriptor)
and then adding the descriptor base register (which specifies the start address
of the descriptor table). The CPU then reads an entire 8 byte record from the
descriptor table. The entry describes the start of the segment; the processor
adds the current instruction pointer to this start to get an effective address.
A data access behaves the same way. A load from
location DS:1000 makes the processor read a descriptor by shifting DS left 3
bits (i.e., times 8), adding the table's base address, and reading the 8 byte
descriptor at this address. The descriptor contains the segment's start address,
which is added to the offset in the instruction (in this case 1000). Offsets,
and segment start addresses, are 32 bit numbers - it's easy to reference any
location in memory.
Every memory access works through these 8 byte descriptors.
If they were stored only in user RAM the processor's throughput would be
pathetic, since each memory reference needs the information. Can you imagine
waiting for an 8 byte read before every memory access? Instead, the processor
caches a descriptor for each selector (one for CS, one for DS, etc.) on-chip, so
the segment translation requires no overhead. However, every load of a selector
(like MOV DS,AX or POP ES) will
make the processor stop and read all 8 bytes to it's internal cache, slowing
things down just a bit.
It's mind blowing to watch these transactions on a
logic analyzer. All 32 bit x86 parts are fast
(if you use them correctly - the 386EX boots with 31 wait states). The processor
screams along, sucking in data and code at a breathtaking rate, and then
suddenly all but stops as it reloads a descriptor into its local cache.
It's all a little mind boggling. The CPU manipulates
these 8 byte data structures automatically, reading, parsing, caching, and
working with them as needed, with no programmer intervention (once they are set
up).
Not only does the CPU translate addresses as
described. In parallel it checks every memory reference to insure it behaves
properly. If the effective address (base plus offset) is greater than the
segment limit (stored in the descriptor), the processor aborts the instruction
and generates a protection violation exception. It won't let you do something
stupid. You can even specify that a segment is read-only; a write will create
the same exception.
Now, despite the fact that the x86 permits thousands
of protected mode segments, most embedded applications run with merely one or
two. In fact, you can create a single descriptor table entry that puts the
system into "flat" mode, where all 4 Gb of memory is available in a single
absolute segment starting at zero. The descriptor gives a base address of zero
and a length of 4 Gb. All references to memory then are perfectly linear,
emulating the linear addresses pioneered so long ago by Motorola et al.
A flat model is the easiest protected mode
configuration, and is adequate for many applications. It doesn't take
advantage of the wealth of protection schemes available on-chip, but most ROMed
systems are not threatened by rogue or malicious programs.
References
Actually entering protected mode is quite simple; Intel's
data books contain example code. Setting up the descriptor table, though, is a
pain. My advice: don't.
The linkers
and locators available from a number of sources will nearly automatically create
these tables. You may have to specify some setup information in a command file,
which gives segmentation rules for each named segment. This is not much more
complicated than the normal setup of any embedded linker, where ROM, RAM and
other addresses must be specified. Contact the vendors and get their application
notes. (Beg, borrow, or steal app notes; these are generally the codified source
of industry wisdom. Breeze through the product hype and learn from the
information presented.)
Do get Intel's series of databooks. The two most
important for understanding protected mode are "The 386 DX Microprocessor
Programmer's Reference Manual" and "80386 System Software Writer's
Guide". You'll find a bit of useful code and lots of detailed functional
descriptions.
Watcom (http://www.powersoft.com/products/languages/watccpl.html,
contact them at
Sybase, Inc., 561
Virginia Road, Concord, MA 01742) sells a compiler specifically targeted at
protected mode applications.
Metaware (www.metaware.com, 2161 Delaware Avenue,
Santa Cruz CA 95060-5706, (408) 429-6382) also sells a protected mode compiler.
Borland and Microsoft dominate the compiler market
for real mode embedded x86 applications. The .EXE files these products produce
are not ROMable. The segmentation information in a .EXE is relative; DOS
allocates absolute segments during program load. When there's no DOS (as in
most ROMed systems), there's no segment fixup. Paradigm () sells a Locator
that converts the .EXE to absolute files. When working in real mode, Paradigm
and a Borland or Microsoft compiler is the usual choice.
The 32 bit versions of these compilers also produce
relative output files, and also require the use of a Locate, though one that
understands protected mode. Systems and Software (18012 Cowan #100, Irvine, CA
92614-6809, (714) 833-1700) sell a very popular Locate for these compilers and
those from Metaware and Watcom. They also provide several protected mode
debuggers.
Concurrent Sciences (www.debugger.com, 530 S. Asbury,
P.O. Box 9666, Moscow, ID 83843, (208) 882-0445) is another source of protected
mode debuggers and locators. Also check out PharLap (www.pharlap.com, 60
Aberdeen Avenue, Cambridge, MA 02138, (617) 661-1510).
The net is full of information about building
protected mode applications. One that contains more information than you may
want to know about anomalies with some of the chips is www.x86.org. Ignore the
nonsense about the site's religious war over Intel's "secrets" (yawn)
and check out the technical tidbits. One of the best parts of the site is a
series of links to Intel's databooks, which are available for download. It's
easier to find the links here than on Intel's site.
Conclusion
A 32 bit x86 processor makes sense if your application
needs either a lot of basic compute capability or a large address space. The
downside to a big chip is extra cost and increased power consumption.
In embedded systems protected mode is primarily
useful for getting access to addresses above 1 Mb. The wonderful protection
schemes that come with descriptors are nice, but probably not all that important
for non-multiprogramming applications. Bear in mind that all of the new
instructions and addressing modes provided by the 32 bit x86 parts are available
in both protected modes and real
modes.
If you're new to protected mode design, milk the
net and vendors' application notes. It's better to steal success than
painfully rediscover well-known code and techniques.
|