386 Protected Mode
Part 2 of a two part series on protected mode.
Published in Embedded Systems Programming, August 1991
 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
Last month I introduced the architecture of the 386, and described
how it uses "segment" registers to access a 4 Gb address
space. Though many believe that segmentation isn't used in protected
mode, in fact it is every bit as crucial as with an 8088. Every
address reference is made via segmentation whether in real or
protected mode. However, protected mode segments can be any size,
from a single byte to all the way up to 4 Gb (32 bits).
To summarize last month's description of 386 addressing, every
protected mode memory reference uses a selector, an offset, and
a descriptor to form a linear address. CS, DS, ES, SS, FS, and
GS (segment registers in real mode) are called "selectors",
and are pointers to data structures that define characteristics
of a segment. These 8 byte data structures are known as "descriptors",
and are grouped into tables. The Global Descriptor Table (GDT)
is available to every task in a 386 program, and contains up to
8192 descriptors. Local Descriptor Tables (LDTs) can be private
to individual tasks, and also contain up to 8192 descriptors.
Every descriptor contains the starting address of the segment
(a 32 bit absolute number), the end address (for checking out-of-bounds
errors), and miscellaneous access right bits.
Just like in real mode, a "segment register" is associated
with every type of memory access. In protected mode, these selectors
contain a 13 bit index into the GDT or LDT. The instruction:
MOV AX,[data]
uses selector DS (by default) to index into the GDT or LDT, where
the processor finds the base address of the segment containing
item "data". The CPU adds this base address to the offset
(i.e., the address of "data" as stored in the instruction
bytes) to create a linear address to send to memory.
Thus, the descriptor tables define the bases and sizes of every
segment in the program, and define the areas of memory that are
addressable. It's easy to set up the descriptor tables using special
386-aware linkers available from a number of vendors.
Protection Systems
So far I've glossed over the details of the format of selectors
and descriptors. In fact, each contains information used to keep
ill-behaved programs in check. The whole issue of capturing address
violation errors is perhaps a bit new to the embedded world, but
with the proliferation of ever more complex systems will certainly
become important in the next few years. As one who has suffered
through watching programs crash and write over themselves, I find
it breathtaking to watch buggy 386 code recover from practically
any insult I toss at it; the protection mechanisms insure that
the code never gets overwritten, and that the operating system,
if any, remains intact and functional.
The 386 supports 3 privilege levels, numbered 0 to 3. The highest,
most privileged level is 0 - a program running at this level can
gain access to any 386 resource. Programs running with lower privilege
levels are restricted in their ability to use memory, I/O, and
some instructions.
Privilege levels are intimately tied to descriptors. As I mentioned,
the descriptor contains the base address of a segment, the segment
size, and access rights bits. Two of these bits specify the Descriptor's
Privilege Level (DPL). Privileges are thus associated with segments,
a somewhat novel concept when you consider that most CPUs simply
have a global privilege setting that effects all of memory equally.
Before describing how a segment's DPL effects memory access rights,
it makes sense to answer the obvious question: what defines the
processor's privilege level? Cleverly enough, this is handled
entirely within the context of segment privileges. The CPU runs
at the privilege level defined within the DPL of the current code
segment - the Current Privilege Level (CPL). Privileges are somewhat
removed from the code, then. A transfer to a segment with a DPL
of 0 (say, the operating system), will always run with the greatest
access rights. Vector off to a code segment with DPL=3 and you'll
be very limited in your ability to run amok.
Every time any section of code accesses another segment, the 386
hardware compares the CPL to the referenced segment's DPL (i.e.,
it compares the privilege level the CPU is running at to the privilege
defined for the segment). If the CPL is the same or higher (smaller
number) than the DPL, then it can proceed with the access. An
attempt to access a segment more privileged then the computer's
CPL results in an exception, letting us know something is wrong.
Thus, code running in a segment with a DPL of 0 pumps the CPU
up to a CPL of 0, and gives the CPU access to every other segment.
Novice 8086 assembly programmers always moan about the complexity
of segments and segment groups. Sometimes the ASSUMEs, GROUPs,
and other pseudo-ops seem to be an awful lot of trouble. When
you switch to the 386 suddenly these constructs make perfect sense:
group like segments together, simultaneously grouping privilege
levels. Perhaps the operating system will be grouped into one
segment with a DPL of 0 so it can access any resource. Maybe device
drivers can fit into a less important group, giving them just
as much power as needed but no more, preventing them from trashing
code. Finally, run the application program at a very low privilege
(i.e., high number, like 3), so it cannot effect system data structures
or I/O.
We're now talking about two independent levels of protection.
The first is defined by segment sizes: no task can access outside
of whatever segment it is attempting to use, since an address
that exceeds the segment-size field in the descriptor will generate
an exception. Obviously, array subscripting errors just cannot
cause major crashes if the segments are defined cleverly. The
second level of protection is DPL checking, which prevents accesses
to higher privileged segments.
In addition, the processor provides hardware protection of certain
dangerous instructions. Obviously, the HLT instruction is one
to be limited only to very highly privileged tasks. In addition,
those instructions that load the 386's internal control registers
(including the debug registers), and those that load the descriptor
table base pointers should be restricted to only some tasks. These
and a few other instructions will cause an exception if they are
executed by ny code running with a CPL greater than 0.
I/O instructions are protected as well. An I/O protection level
is defined in the processor's EFLAGS register. Instructions to
enable and disable interrupts will cause an exception if executed
from a section of code less privileged than the I/O protection
level. Any I/O instruction will create a similar error only if
a particular port is set to "protected" in the I/O Permission
bitmap, an array of 64k bits that indicates the protection status
for each and every port.
Call Gates
Given that a low privileged task cannot access code or data with
a higher privilege (lower number), then how can any task invoke
the operating system? The operating system, probably running at
CPL 0, can access outwards; a mechanism is needed to permit application
programs access to OS resources.
The 386 uses "call gates" to access higher privileged
routines. A call gate is a special type of descriptor, stored
in the GDT or LDT, that contains a pointer to an entry point.
To invoke a higher privilege routine the linker will replace your
CALL instructions with a CALL that works indirectly through this
new form of descriptor.
Where a normal descriptor contains just the segment's base address,
length, and access rights bits, a call gate (which is also 8 bytes
long) has only the destination routine's selector, offset, and
DPL. The call gate is an indirect pointer to the destination segment's
descriptor.
Though this is a bit tricky, essentially all a call gate does
is remove the selector and offset from the call instruction (where
these things would normally go), and place them inside of the
descriptor table. That is, the call gate contains the complete
destination address selection parameters. The CALL instruction
itself has a selector (that selects the call gate, just as any
selector picks a descriptor), and an ignored offset (since the
offset to the routine is in the call gate).
If you use a call gate to access routine invoke_os, the linker
will replace your CALL with a CALL to the gate - it will load
the selector with the gate's index in the descriptor table and
probably store garbage in the offset part of the instruction.
At runtime, the 386 sees the call, uses the selector to read the
gate's 8 bytes, saves the offset part from the descriptor, and
uses the descriptor's selector to load in the destination address's
code segment descriptor. This yields a base address (and length
and access rights), which is added to the offset from the call
gate, generating the linear address of the routine.
The 386 uses the DPL in the call gate to insure the invoker is
allowed to use the gate: the caller must be at least as privileged
as the gate. It then switches to the privilege level indicated
in the descriptor pointed to by the gate. Thus, a low level application
routine calls for operating system service with a call gate. The
transfer through the gate will raise the privilege level to that
of the OS.
Call gates add yet another level of complexity to a program's
structure, but most of the details can be left to the linker.
One of the nice advantages of the gate is that every call to it
uses the same selector. If the gate is defined at some sacred
location that never changes from version to version, then the
gate is sort of like a jump table. I've always been a big fan
of using jump tables in embedded systems, so you can figure out
where routines are, even in the field with limited tools, even
after 50 versions of the ROM.
Call gates are designed mostly for use when privilege level transitions
are needed. Since they are stored in a descriptor table, you are
limited in the number of gates the system will support. Remember
that the GDT and each LDT is limited to 8k entries, which is far
from infinity. Generally, gates are used to funnel requests for
operating system service through a single OS dispatcher.
Other Goodies
The 386 is just chock full of features for managing complex operating
systems and code. This list is far too extensive to cover here
in any detail. However, I'll briefly mention several other features
that can help in developing any kind of system, embedded or otherwise.
The processor does support virtual memory. One of the attribute
bits in every segment descriptor indicates if the segment is present.
A reference to a not-present segment creates an exception, allowing
system software to load the required segment from disk. Frankly,
I'm not sure what this would be useful for in an embedded system,
but it does seem like a neat feature. I'd welcome ideas...
The processor's memory management has yet another level beyond
the segmentation I've described. Optionally, you can divide the
4 Gb address space into smaller chunks and then remap the physical
address of each chunk through page tables. You define the page
tables to translate practically any address into any other. Thus,
two tasks could be compiled at identical addresses, yet run at
different physical addresses by using different paging. Again,
is this useful for an embedded system? Does someone out there
have some devilishly clever technique you'd care to share with
us?
The 386 does include a number of debug registers that let you
set hardware breakpoints on up to 4 addresses simultaneously.
These breakpoints work rather like those produced by an emulator:
they are non-intrusive, and work in ROM or RAM. You can set them
on code or data accesses. If you'd care to write a monitor to
embed in the product (always a good idea for long term product
maintenance), then by all means use these resources.
Conclusion
Why use protected mode in embedded applications? The biggest attraction
is the large, 32 bit address space that becomes immediately available.
Of course, most any other 32 bit CPU will give easier access to
lots of memory.
Certainly the DOS based tools that so many non-embedded people
use are a compelling incentive to stick with the 80x86 architecture.
How many millions use all of the great DOS Cs and assemblers?
You can use any of these on the 386, and as they become more 32
bit aware they'll take even greater advantage of the 386's features.
Quick development cycles demand proven tools, and it's awfully
hard to argue against those from the DOS world. You can even do
a lot of the development on a DOS machine, and port to the harder
embedded world after removing most of the bugs.
A lot of embedded folks are now putting DOS into ROM - a subject
I know will see a lot of discussion at the upcoming Embedded Systems
Conference. With the 386 you can run DOS as a task in its own
segment, and run other applications concurrently.
Finally, protected mode really does protect your code. With the
right segmentation, you'll never, and I mean never, see a rogue
program overwrite the code. This could be important in medical
and other life-critical applications.
For those wishing to explore the mysteries of this processor in
more detail, be sure to get the complete set of Intel reference
manuals.
Intel's "Microprocessors" manual (mine is dated 1990)
contains a pretty complete hardware and software description of
the part, but is definitely not for the faint hearted. It is complete
but succinct.
Their "386 DX Microprocessor Programmer's Reference Manual"
is far more readable, but neglects all hardware issues. It gives
a pretty readable account of the operation of all of the processor's
major modes. This is a must read for serious 386 users.
Intel's "80386 System Software Writer's Guide", though
thin, does include lots of sample code, including routines to
enter and exit protected mode. It is a good adjunct to the Programmer's
Reference Manual.
Finally, the "80386 Microprocessor Hardware Reference Manual"
helps explain how to design hardware that will really work with
the 386. This is not a trivial problem, as the CPU can get out
of sync with it's bus cycles - you have to build a sort of state
machine to determine what it is doing when. Even adding wait states
is a bit challenging.
|