386 Protected Mode (part 1)
Here's how protected mode changes everything you know about the
x86.
 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
In the few years since Intel release the 386 processor, it has
gone from a tremendously overpriced compute engine to the minimum
processor for anyone considering purchasing a PC. Proliferation
versions (like the 386SX and AMD's variants) drive the chip cost
down while maintaining software compatibility with the rest of
the line.
It seems those of us in the embedded world could ignore this technology,
since so many designs revolve around low performance controllers.
Now, however, more and more embedded systems use the 386 series
of components. Examples include high speed data communications
devices (though in cheap modems the Z80 still reigns supreme),
graphics equipment, and ultra-high-speed data acquisition gear.
Even the cockpit displays of some modern jetliners use 386's as
controllers.
Why? What's so great about the 386 that compels a designer to
include a $325 processor in his embedded system? The 386 offers
two important features: raw compute horsepower, and the potential
for a huge address space.
I recently had the opportunity to design a rather complex embedded
system using a 386, and found the experience to be both frustrating
and rewarding. Frustrating, because Intel's documentation assumes
the reader is completely knowledgeable about protected mode. Rewarding,
because the processor's power and complexity is awesome. I ended
the project with a great deal of respect for those who mastered
this complexity to design the chip way back in the mid-80s.
386 Benefits
Most of us computing with a 386-based PC run the processor in
its slowest and least functional mode. Yet, even then we get staggering
performance improvements over that for which we lusted a decade
ago. Most PC applications run in "real mode", using
8088-like 20 bit addresses and 16 bit registers.
The 386 can and does often act just like a very fast 8088. It's
most obvious virtue is its raw speed. With no wait states machine
cycles take only two clocks. At 33 Mhz, this is a blazing 61 nsec
per cycle. Short instructions (e.g., a register to register move)
complete in two cycles, or about 122 nsec. This baby is no slouch
at moving data!
There is a sort of hidden price to running so fast, though. How
many memory systems can present data so quickly? Inject a single
wait state, and the machine's performance declines by a third.
Any high performance embedded system will likely need costly cache
to properly match memory speeds to the processor's bandwidth.
The 386 has a richer instruction set than it's 80x88 cousins.
32 bit multiply/divides, barrel shifters that shift up to 32 bits
in 7 cycles, and bit manipulations are all included. All registers
are 32 bits, so handling decent sized data is a breeze.
Embedded people might be disappointed with its lack of peripherals.
64180/Z180, 8051, 80196, and other embedded parts include timers,
serial ports, and the like, all designed to reduce the cost and
size of a system. Not so the 386, which is targeted only at high
performance, high cost applications. I hope Intel or AMD does
eventually come up with versions specifically for embedded markets,
including serial and parallel ports. It would seem a sensible
use of the vendors' ability to cram ever more functionality onto
a piece of silicon. After all, even the RISC folks are now targeting
processors specifically towards the embedded marketplace.
Protected vs. Real Modes
If you've worked with the 80x88 family, you are intimately familiar
with what 386 documentation calls "Real Mode". Real
Mode addresses are limited to 20 bits, and are generated by adding
a 16 bit segment register, shifted left four bits, to a 16 bit
offset. This much maligned segmentation causes no end of grief
for programmers trying to access large data structures. Since
an offset cannot exceed 16 bits, you just can't increment beyond
64k; you'll have to watch for a 64k boundary and then play games
with the segment register.
The 386's Protected Mode changes everything you ever learned about
80x88 segmentation. Protected mode offers direct access to 32
bit addresses. Though segment registers still play a part in every
address calculation, their role is no longer one of directly specifying
an address. In protected mode segment registers are pointers to
data structures that define segmentation limits and addresses.
More on this later.
On a 386 operating in real mode you have access to practically
every feature the 386 has to offer - with the exception of 32
bit addressing. Just about all of the new instructions are available.
All operands can be 8, 16, or even 32 bits. That's right - real
mode programs can easily handle double word long data, using 32
bit registers. On the 386, in real or protected modes, you access
operands as follows:
mov al,[1000] ; load 8 bits
mov ax,[1000] ; load a word
mov eax,[1000] ; load a double word
Manipulate data the same way:
add al,cl ; add two bytes
add eax,ecx ; add two 32 bit numbers
You can use the 32 bit registers to address memory, but in real
mode the effective address may not exceed 20 bits. The 386 will
generate an exception if the address is too large.
Take advantage of the 386's extended instructions (even in real
mode), to greatly speed processing:
mul eax,edx ; 32 x 32 multiply
; 64 bit result goes to edx:eax
The processor includes extra segment registers. Where an 80x88
CPU only provides ES, DS, SS, and CS, the 386 adds FS and GS,
which you can use in real or protected mode.
Protected Mode Addressing
Segment registers are called "selectors" when operating
in protected mode, to distinguish their operation from that of
real mode. For these registers do indeed perform a selection process.
In protected mode, segment register simply point to a data structures
that contain the information needed to access a location.
Every protected mode program must include a table of "descriptors",
which are 8 byte data structures that define the start and end
of a segment. Depending on the type of segment, a descriptor may
have other information such as access rights and the like. A typical
descriptor contains the following information, packed into an
8 byte record:
- Segment start: absolute 32 bit address
- Segment limit: Maximum address this segment can reference
- Segment status: privilege level, segment present, segment
available, segment type, etc.
Thus, the descriptor tells the 386 everything it needs to know
about accessing data or code in a segment. Accesses to memory
are qualified by the descriptor selected by the current segment
register. This selector is a 12 bit number indicating which entry
to use in the descriptor table; if the selector is 0, the first
descriptor is taken, a selector of 1 takes the second, etc. The
386 multiplies the selector by 8 (8 bytes per entry), and adds
this to the base address of the table of descriptors (contained
in an internal 386 register loaded by the programmer before switching
to protected mode.)
For example, a code fetch always uses the current CS. A protected
mode fetch starts by multiplying CS by 8 and then adding the descriptor
base register. The 386 then reads an entire 8 byte record from
the descriptor table. The entry describes the start of the segment;
the processor adds the current instruction pointer to this start
to get an effective address.
A data access behaves the same way. A load from location DS:1000
makes the processor read a descriptor by shifting DS left 3 bits
(i.e., times 8), adding the table's base address (stored in the
386's on-board descriptor table register), and reading the 8 byte
descriptor at this address. The descriptor contains the segment's
start address, which is added to the offset in the instruction
(in this case 1000). Offsets, and segment start addresses, are
32 bit numbers - it's really easy to reference any location in
memory.
Every memory access works through these 8 byte descriptors. If
they were stored only in user RAM the 386's throughput would be
pathetic, since each memory reference needs the information. Can
you imagine waiting for an 8 byte read before every memory access?
Actually, the processor caches a descriptor for each selector
(one for CS, one for DS, etc) on-chip, so the segment translation
requires no overhead. However, every load of a selector (like
MOV DS,AX or POP ES) will make the 386 stop and read all 8 bytes
to it's internal cache, slowing things down just a bit.
Figure 1 shows how addressing works. The figure ignores Paging,
yet another 386 feature that permits extending the address space
far beyond 4 Gb.
It's all a little mind boggling. The CPU manipulates these 8 byte
data structures automatically, reading, parsing, caching, and
working with them as needed, with no programmer intervention (once
they are set up).
Not only does the CPU translate addresses as described. In parallel
it checks every memory reference to insure it behaves properly.
Remember the "limit" field in the descriptor? If the
effective address (base plus offset) is greater than this limit,
the 386 aborts the instruction and generates a protection violation
exception. It won't let you do something stupid. You can even
specify that a segment is read-only; a write will create the same
exception.
But wait a minute! Everyone seems to think that segments aren't
used in protected mode! In fact, segmentation is practically essential,
and is far more useful than you might think.
On a 80x88 processor you'll frequently write programs divided
into more than one named code segment. The linker combines like-named
segments together, and then groups the segments into one hunk.
In the embedded world, using a Locator (like ones sold by Systems
and Software and Paradigm), you can separate named segments into
specific RAM or ROM addresses to match the nuances of your particular
hardware environment. The 386 takes this one step further.
A 386 linker groups like-named segment together. Then, if you
wish, you can assign any group to any descriptor. Though the selector
uses only 12 bits to pick a descriptor, another bit selects which
of two descriptor tables to read from (the Local or Global tables),
giving up to 8192 separate segments.
This is a lot of power; most DOS users ignore it. It is ideal
for embedded applications. Suppose you have memory mapped I/O:
group it into a named segment and assign read/write attributes
to it. Even better, separate read and write ports into different
segments to insure your code never accidently accesses one incorrectly.
Make your code fetch-only, so illegal accesses create protection
violation errors - debugging will be a lot easier with this enabled.
Some embedded systems include a ROMed version of DOS. DOS runs
in real mode only, so use the 386's segmentation to define real
and protected segments. The real ones will (sigh) not have the
great protection mechanisms. Restrict them to low addresses (under
20 bits), and put the protected mode code up high. The real mode
code will not physically be able to generate a high address that
might effect the protected mode code.
Linkers
If we had to define the selectors and descriptors ourselves, protected
mode would be just too hard to use. The descriptors are arranged
in a nasty, hard to assemble format. Fortunately, Intel and others
supply linkers that do all of the hard work for you.
It is a little tedious to actually switch from real to protected
mode, but Intel application notes do a pretty good job of describing
the procedure. There seems to be surprisingly little written about
actually building an application. It turns out that the linker
does most of the work of building descriptors.
I've been using System & Software's (Irvine, CA) Link &
Locate 386 lately, and find that writing protected mode code with
it is a breeze. Writing protected mode code is really no different
than for real mode. Break your code into named segments, separating
data and code, and segment them further if you wish to restrict
access in some fashion. Assemble the code with any decent assembler:
Microsoft's MASM and Borland's TASM do just fine. Then, use a
linker with a carefully scripted command file to assign descriptors
as wished. Figure 2 shows a script file for Link & Locate
386 for a typical application.
This program consists of just 4 segments. Real_code is real mode
code executed occasionally by the program. Cgroup is the bulk
of the program. Dgroup is a data area. Flat_seg is a special segment
defined so the program can perform a linear address anywhere in
memory.
Notice how the segments, in many cases, have absolute addresses
assigned, defining their start. The Linker puts in ending limits
automatically.
Flat_seg is a special case; we've set it to start at 0 and end
at the end of memory. This more or less bypasses protection checking,
as the segment's definition precludes getting an addressing error.
Sometimes, in embedded systems we need to access any area to get
to specific hardware.
A program operating with this structure will have its code all
in segment cgroup, and all data in dgroup. The program will start
with code that looks something like:
dgroup segment use32 ; data segment
data1 dd ?
data2 dd ?
dgroup ends
cgroup segment
assume cs: cgroup, ds:dgroup
mov ax,dgroup
mov ds,ax ; set selector DS to dgroup
mov eax,data1 ; using DS, reference data1
This looks just like 80x88 code. Now, suppose we want an absolute
reference anywhere in memory (say, we have some wierd hardware
device to read from). Do this:
mov ax,flat_seg
mov es,ax ; set selector ES to flat_seg
mov esi,<some address>
mov al,es:[esi] ; read from an absolute address
Since selector ES points to a descriptor that is a flat, 32 bit
address space, any number in ESI is a 32 bit offset added to flat_seg's
start address of 0.
Avoid writing code that runs in one 32 bit flat segment. Sure,
it is the easiest way to generate a big program. You'll lose the
benefits of the 386's protection checking. This is especially
deadly with ROMed code - how will you know that the code is not
sometimes accidently writing over the ROM? A ROM write is not
in itself a problem, but usually indicates some software flaw
that may go undetected.
The code set up selectors just like real mode 80x88 code sets
segment registers. There really is no difference. The linker replaces
segment references with pointers to the descriptor table. In the
linker command file, we've defined "gdt" (the Global
Descriptor Table), and specific entries for each segment. GDT
entries 1 to 8 are reserved in this case, but 9 corresponds to
dgroup, 10 to cgroup, etc. The linker will build GDT and insert
it into the program.
Conclusion
Next month I'll discuss the processor's protection mechanisms
in more detail, including the way the 386 handles context switching.
*********************************************************
segment
*segments ( dpl = 0 ),
real_code( dpl = 0, base = 08000h, usereal ),
dgroup( dpl = 0),
cgroup( dpl = 0, base=200000h ),
flat_seg( dpl = 0, base=0, limit=0ffffffffh),
table
gdt (location = gdt_start,
reserve = (1..8),
entry =
( 9: dgroup,
10: cgroup,
11:flat_seg);
end;
Figure 2: Typical Linker Command File
***********************************************************
|