Z180 Memory Management
The Z180 MMU is confusing, but quite useful when well understood.
Published in Circuit Cellar Ink, February 1990
For novel ideas about building embedded systems (both hardware and firmware), join the 40,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. Click here to subscribe. |
By Jack Ganssle
Programs are getting big! Part of today's shift towards 16 and 32 bit processors comes from the need for correspondingly huge address spaces, since conventional wisdom holds that a 512kb program just cannot fit in the 64k address space of most 8 bit CPUs. Where performance is the overriding concern, a 32 bit CPU may be the only solution. It does seem a shame to abandon all the accumulated knowledge and code gleaned from two decades of 8 bit microprocessors just to get more programming elbow room.
For the past few years some 8 bit CPUs have been equipped with memory management units (MMU) that free programs from most memory limitations. It's tedious and complex to control a MMU manually; now, many languages and other tools include built-in MMU support.
Logical -> Physical
The problem of memory management is easy to define: we need some way of connecting lots of memory to a processor that just cannot handle or address it. For example, we might want to put 512kb on a Z80. Since the Z80 only generates 16 bit addresses, it can only directly address 64k of RAM. Somehow, though memory management, we must expand this capability.
For now let's assume that magic hardware gives us more address lines. Perhaps it is as simple as an I/O port loaded by the CPU with an extra upper 8 address lines (A16 to A23), giving us a potential address space of 24 mb. Or, it can be hideously complex, providing some ability to access different sections of address space in wild and wonderful ways.
In any event, as soon as some external mechanism is added to translate addresses in some fashion, the programmer suddenly must contend with two very different sorts of address spaces.
"Physical" memory is that actually connected to the hardware. For example, the 512kb we attach to the sadly-overloaded Z80 is physical memory. Its address ranges from 00000 to 80000 hex in a linear manner.
"Logical" memory is the memory currently located in the processor's address space. Obviously, if the computer can only issue addresses in the range of 0000 to FFFF (0 to 64k), then some of the physical memory is visible and some is not. As the code changes the memory manager's settings different memory becomes visible. That which is addressable at any time is the logical memory.
Thus, addresses generated by the program are always logical addresses - they get translated by some yet undefined hardware into real physical addresses.
So, at one time address 1000 logical might be translated into 28000 physical. Later, in the same code, 1000 could correspond to 80000 physical. The old one-to-one mapping of addresses we're all familiar with is gone!
In summary, addresses used by the code are logical; the memory array sees physical. Between these two the memory management unit (MMU) falls.
Standard Architectures
Several years ago Hitachi introduced the 64180, a high integration version of the venerable Z80. While other vendors were trying to push new proprietary architectures, Hitachi took what might seem a step backwards towards the Z80. They realized an important fact of the industry - customers had a fortune invested in Z80 code and were unwilling to switch to an incompatible instruction set.
The 64180 is a Z80 at heart. The designers resisted the temptation to add fancy new instructions and addressing modes that could have made it incompatible with the Z80. Rather, they integrated timers, serial ports, and DMA controllers onto the chip. Even better, they added a memory management unit to translate 64k logical addresses into a 1 mb physical address space.
Now Hitachi sells several other versions of the part. The 64180S is designed especially for telecommunications. The 647180X is a microcontroller version, containing a 64180 core, ROM, RAM, and parallel I/O. Zilog stepped into the act, offering the Z180 (a second source of the 64180) and Z280, a very high performance Z80 upgrade. Zilog is just now announcing the Z181 and will soon offer a microcontroller version of the part, probably a 647180X look-alike.
The most important peripheral on the 64180-family processors is the memory management unit (MMU). The MMU is a hardware device built onto the processor's silicon. The MMU translates every memory address from 16 to 20 bits.
The 64180's MMU uses three internal control registers. In keeping with the chip's design philosophy, on reset the MMU gives a straight logical to physical mapping, simulating the Z80 and, of course, limiting the address space to 64k.
You can divide the 64180's logical address space into one, two, or three areas. The logical space itself is unaltered; even when divided it is still a contiguous 64k.
CBAR is an 8 bit I/O port that can be accessed by the processor's OUT and IN instructions. The lower 4 bits specify the starting address of the bank area, and the upper 4 give the start of common 1. These bits determine the upper four bits of the address. If CBAR were A8, then the base area starts at 8000 logical, and common 1 starts at A000.
Common 0, if it exists, always starts at logical 0000 and runs up to the bank area. The bank area then runs to the start of common 1.
Therefore, you can always understand the logical address space by examining the contents of CBAR by itself. No other information is needed.
The logical address is only part of the problem. How does logical space get mapped to physical? Two other ports provide the rest of the answer.
BBR (the Base Area Bank Register) specifies the starting physical address of the base area (remember, the logical start is in CBAR). CBR (Common Bank Register) provides the same information for common 1. Both of these specify the upper 8 bits of the 20 bit physical address.
A simple formula gives the translation from logical to physical address for the bank area:
Physical = Logical + (BBR * 4096)
The same formula gives Common 1:
Physical = Logical + (CBR * 4096)
BBR and CBR gives the upper 8 address bit only - hence the 4096 multiplier. The lower 12 bits come from the logical address. Thus, the translation only affects the upper 8 bits; the lower 12 physical bits are always identical to the lower 12 logical.
On reset, the 64180 sets CBAR to F0, and CBR=BBR=0. This maps logical to physical exactly, with no translation; the bank area starts at logical 0 and common 1 at F000 (since CBAR=F0), the bank area physically starts at 0000 (BBR=0), as does common 1 (CBR=0). If the logical address is 1000, then the MMU allocates this to the bank area (CBAR=F0; 1000 is less than the start of common 1 at F000), and adds the physical base of bank to it (0), giving a translated address of 01000. Similarly, logical F800 is in common 1, and translates to 0F800.
The most important point that can be made about the MMU is that it does not provide the 1mb linear address space we all crave. After all, Z80 instructions use 16 bit address operands and 16 bit register pointers - there is no way to address a number larger than 64k. A jump instruction will always have an argument that is 16 bits long - the logical destination address. The MMU translates this logical address to a possibly large physical number, but the software still operates in a 64k space.
This has a subtle implication - logical address space is a valuable commodity that must be conserved. Wasting physical memory isn't so bad, since the 64180 can deal with up to 1mb. As an example, suppose that your program will have three banks (COMMON 0, BANK, and COMMON 1). If the program is large you might want to bank it in and out of the BANK area, leaving COMMON 1 for data. If BANK is too large, you could be left with little data space - it is important to make BANK as small as feasible to maximize the (in this case) unbanked data.
Language Support
Despite the fabulous extra power offered by the 64180's MMU, we've all been making do with Z80 assemblers and compilers. Sure, some claim to support the new processor's extended features, but in truth, until recently, that support has been minimal.
Just what features are important in a 64180 assembler or compiler? Certainly it should be fast, efficient, and all that, but more than anything else the language should give you some sort of way of handling the MMU.
There are two related but different aspects to MMU management. The first is to provide some sort of mechanism to control the MMU with as little programer help as possible. An ideal solution would be a smart compiler that simulates a nearly linear huge address space. The second is to provide output files that contain compiled code and debugging records in some manner that supports current 8 bit tools (like the PROM programmer), but that accounts for the large address spaces.
Taking these two criterion separately, especially with a C compiler we'd really like some method of compiling an ordinary C program in multiple banks. Sure, you might have to tell the compiler or linker about your memory configuration, but ideally the tools should segment and package functions into memory banks as needed. Even better, we'd want it to remap the MMU automatically. Just like working with Turbo C, we would like to be able to invoke a function through a conventional function call, without worrying about its location in memory.
The second requirement is not quite so obvious. How will you burn ROMs for the final project? If the compiled/assembled code exceeds 64k, there may be a problem with using standard Intel hex records for output. Every ROM programmer in the world takes Intel hex input, but the format only supports 16 bit addresses.
One solution is to divide the source program into many separately compiled small pieces. This is especially hard in C, since the linker will not be able to resolve calls between pieces. Another approach is to ensure that the compiler or assembler can produce "Type 2" Intel records. Whenever the code crosses a 64k bank the linker could output a type 2 record to specify a new segment address (physical address shifted right 4 bits). This does imply that the linker can handle large physical addresses, and the PROM programmer can accept type 2 records.
Decent debugging files are just as important as useful PROM files. You can't use an emulator, simulator, or monitor to debug the code if the debug records are inadequate. Suppose you wish to display the value of a variable. The debugger must know the physical address of that quantity, since only the physical address is constant. Remapping the MMU changes its logical address, and at times no logical address might correspond to the variable.
This implies that the software packages must maintain both logical and physical addresses for all lines and symbols. Compiling, say, jumps requires logical addresses. All jumps and calls take logical addresses as arguments (since they can only support a 16 bit number). Physical addresses are needed in the debugging records so debuggers can unambiguously resolve the location of symbols, functions, and line numbers, all whose logical address changes with the current MMU setting.
Assemblers
Most 64180 programs written in assembly language control the memory manager by tediously issuing many MMU control instructions. The programmer must first decide exactly what configuration logical memory will assume, and then come up with CBAR, BBR, and CBR values for every possible combination of banks. Then, the code must send these values out to change maps. Needless to say, this takes a lot of work.
Softools (8770 Manahan Drive, Ellicott City, MD 21043, (301) 750-3733) came up with an interesting approach that eliminates most of the work. Their SASM assembler and linker will automatically drop in all the code needed to bank a program. In effect, this means you can write code as if the 64180 had a 1 mb linear address space.
Like most good assemblers, SASM supports lots of named segments - up to 256. Most of the time we assembly programmers just need a CODE, DATA, and ASEG segment, but SASM's segmentation lets us break a program into mapped and unmapped sections. When using SASM on large programs, you can assign any segment or segments to have a "mapped" attribute, identifying those that require some MMU manipulation to bring them into the address space.
Segments are the key to SASM's mapping scheme. The linker identifies how much data the program uses and the number of bytes used for unmapped code (that which must never be mapped out). It computes CBAR to define the characteristics of the runtime logical address space: COMMON 0 being just big enough to hold all the unmapped code, COMMON 1 containing the data, and the rest, the BANK area, is allocated to mapped routines.
The linker groups all mapped segments together and starts to assign both logical and physical addresses to each routine. Whenever a routine will exceed the size of the BANK area the linker moves it to the start of a new BANK area. It then converts all jumps and calls between banked areas to transfers to code that manages the MMU in COMMON 0.
When finally linked, the program has three parts - a COMMON 0 non-mapped (i.e., always in the address space) area which typically contains startup code, frequently-used routines, and SASM's banking code. COMMON 1 is usually your data area. The BANK area contains most of the program code. Calls between these banked routines will cause remapping as needed to bring in ones that are not currently visible in the address space.
For example, suppose the program is as follows:
routine length characteristics main 3700h not banked vectors 80h not banked sub1 4000h banked sub2 38f0h banked sub3 1200h banked sub4 6f00h banked sub5 800h banked data 3200h not banked
On a 64180 the reset jump is at 0, so it makes sense to put the unbanked code (vectors and main) at 0. The data area cannot be banked (especially the stack!) and is traditionally in high memory. Suppose the code that starts at physical location 0 is to go into linker will first divide the logical address space based on the unmapped memory requirements: main and vectors need 3780h bytes starting at location 0, and data occupies 3200h at the end of the logical space. Bearing in mind that the mapping resolution of the 64180 is 4k, memory thus looks like:
Logical Physical Area 0000 00000 unbanked (main and vectors) 4000 04000 start of banked routines c000 40000 start of RAM data
All the logical address space from 4000 to bfff is available to routines that can be banked. If the sum of the banked sizes is less than the BANK logical area, then no mapping need take place. In our example, however, banked routines need some 64k, much more than the available logical space. If CBAR is C4 (COMMON 1 at c000 and BANK at 4000), SASM will assign addresses as follows:
Logical Physical Routine length BBR 0000 00000 main 3700 -- 3780 03780 vectors 80 -- 4000 04000 sub1 4000 00 8000 08000 sub2 38f0 00 4000 0c000 sub3 1200 08 4000 0e000 sub4 6f00 0a af00 14f00 sub5 800 0a c000 40000 data 3200 --
For sub1 SASM assigned a logical and physical address of 4000 - reasonable, since this is the first free spot after COMMON 0. sub1 is in BANK, so a BBR value is required. BBR=0 will map 4000 to 04000. sub2 follows sub1, again with BBR=0. So far, no surprises.
sub2 ends at b8f0, practically right before the logical start of data (c000). There is no way sub3 can fit, since sub3 is 1200 bytes long. SASM therefore put sub3 at logical address 4000 (the same as sub1). sub3 follows in physical memory at 0c000 (the next physical address rounded up to a 4k boundary). BBR equals 08. sub1 and sub3 occupy the same place in logical address space (4000), but different physical addresses. To get to sub3, BBR must be set to 08 and a logical address of 4000 issued.
While sub3 is very short, leaving plenty of room for code in the same bank map, sub4 is not. sub3 and sub4 will not both fit into BANK together, so SASM once again reset the logical address to 4000. sub4 comes after sub3, rounded up 4k, and a BBR of 0a is assigned. (Remember the math - BBR * 4096 + logical = physical, so 0a * 4096 + 4000 = e000). sub5 fits into the space between the end of sub4 and COMMON 1, and is so assigned.
SASM's linker generates address assignments as we've just seen, but how are calls and jumps between subroutines handled? Obviously, if sub1, sub3, and sub4 all reside at logical address 4000, a simple CALL 4000 will not always resolve properly. As mentioned earlier, SASM's linker converts all inter-BANK calls and jumps to a transfer to a jump table which is usually linked into COMMON 0. In particular, if sub4 were to call sub1, the following will be automatically substituted for the call instruction:
call bank_table+x ; invoke MMU handler db BBR ; BBR of sub1 DW sub1 ; address of routine to call
The code in bank_table stores the current BBR value and return address on a local stack, remaps the MMU by outputting the indicated BBR, and then transfers to the logical address supplied as a parameter. Returns operate in a reverse procedure, being vectored to another Softools-supplied routine to reverse the mapping.
What might not be entirely obvious is that SASM does it all. Once you tell SASM's linker where ROM and RAM are (which has to be done for any linker) it automatically allocates logical and physical addresses. The linker also replaces the calls and jumps as shown above. SASM does offer options to control memory allocation and the like, but in most cases these are not needed.
This means that you can write large programs without ever considering the MMU. SASM takes care of it all. There's an interesting subtle implication - you can link a 256k program to take up only 8k or so of logical space! Assign 4k for COMMON 0 and 56k for data. SASM will bravely partition the 256k code into 50 or more sections, each of which will get remapped through the 4k BANK area. The mapping overhead might get high, but logical address space will be conserved.
Since SASM partitions the program during the link phase, it can save the addresses of all symbols, line numbers and other parameters in a debug file. Symbols' physical addresses are stored in the debug file, maintaining true addresses regardless of the MMU mapping. If the debugger (emulator or monitor) can handle physical addresses, then you can access any routine, variable, or source line number at any time, without manually remapping to bring the desired value into logical address space.
C Compilers
As we write this, only three compilers currently support 64180 bank switching. Archimedes Software's C180, Whitesmiths' C, and Software Development System's C all automatically generate code for large memory models. Manx (MANX Software Systems, P.O. Box 55, Shrewsbury, NJ 07701) will soon have a compiler, and no doubt others are on the way.
Archimedes (2159 Union Street, San Francisco, CA, 94123 (415) 567-4010) approach to memory management is much like that used by SASM. As you write your C code you do not need to be especially concerned with the MMU. There are no special procedures to use or functions to invoke.
Before linking the compiled object files various parameters must be passed to the linker in its indirect command file. The first are values for CBAR and CBR. It is the programmer's responsibility to determine exactly the memory configuration, and to compute these simple values.
In addition to the MMU register settings, the programmer must provide the linker with a table of modules (i.e., file of source code, each of which may contain several functions) and names. If the module is not to be banked, that must be indicated as well.
The memory model supported by the compiler puts all non-banked functions into COMMON 0, the banked code into BANK, and data areas into COMMON 1.
The linker generates a table ("FLIST") of data about every mapped function in the program. For each function, FLIST gives an encrypted BBR value, logical start address, and bank number. FLIST is a sort of global cheat sheet, located in COMMON 0 so it is never mapped out, that describes every function's logical and physical address.
The linker replaces all mapped function calls with:
ld hl,FLIST entry for the function call remap_code
The remap_code extracts pertinent data from the FLIST entry and remaps the MMU as needed before branching to the function's logical address.
The beauty of using an FLIST table is that pointers to functions will work - the pointer becomes an FLIST pointer. With FLIST always mapped in, indirect function invocations will work even to mapped functions.
The Archimedes compiler does produce a good debugging file, which contains useful information about the physical address of every function. The information is stored in FLIST. A conversion routine easily extracts the real physical address of each function, line number, and global symbol.
Whitesmiths (733 Concord Ave., Cambridge, MA 02138, (617) 661-0072) took a somewhat different approach to using the MMU. When writing C code for this compiler, all calls to mapped functions must be specified as FAR calls. This directs the compiler to generate the proper code to bring the function into the map and execute it.
The called function needs no special handling, since it can be called as either a FAR or as a near. For example, if a function in one module invokes another in the same module, it can use a conventional call structure. Only if a different function, possibly located outside of this bank, calls the same function, does the extra call overhead have to be inserted.
A typical call sequence looks like:
@far int sub(); main() { sub1(); }
The function is FAR in the definitions, and then all references to in within that module generate banked calls.
All banked calls do produce overhead, both in code size and speed. The Whitesmiths approach eliminates the overhead in cases where it is not needed. Archimedes, on the other hand, vectors all banked function calls through FLIST, even if both the caller and callee are co-resident in BANK.
Whitesmiths uses the indirect linker command file to indicate the location of every banked function. The programmer provides both the logical and physical addresses of each of these functions. Again, the peril here is having to go through iterative modifications of these parameters during development.
To date, no compiler is smart enough to automatically set banked addresses. Perhaps soon this will change.
The compiler generates a call to library routine c.libc to do the bank switching. Space is allocated on the stack for the return address and return BBR. c.libc gets a "far pointer" to the function so it can reset BBR and the logical address.
Uniware, from Software Development Systems (4248 Belle Aire Lane, Downers Grove, IL 60515 (800) 448-7733) implements bank switching by simulating linker overlays. In other words, the compiler and linker are not even aware that the 64180 processor has an MMU; each mapped function appears to be an overlay.
Like the other compilers, the Uniware compiler breaks memory into three sections. Only the middle area is mapped dynamically. Your initialization code must preset CBAR and CBR to their static values.
The indirect linker file specifies which functions are to be mapped, and each function's BBR value. The linker changes the normal call sequence to:
ld c,<BBR> ld IY,<function logical address> call _call
_call is a low level routine in COMMON 0 that remaps the MMU and vectors off to the function.
You must give the linker much more information than for the Archimedes product, so Uniware is a bit harder to use. One of the nice side benefits of this approach, though, is that it is directly applicable to Z80 bank switched applications. You just have to modify the _call code to handle your proprietary hardware.
The Emulator
In the embedded world generating code is only a small part of the development battle. Somehow it must be tested and debugged. The only suitable tool for embedded debugging code is an In Circuit Emulator, since only the emulator lets you interactively isolate bugs in a ROMed environment.
Like compilers, 64180 emulators are all basically extensions of technology developed for the Z80. After all, the timing is similar and the software is practically the same. Unfortunately, the extra four address bits found on the 64180 can cause lots of emulation problems.
On a Z80 the logical and physical address space is the same. Not so for the 64180 - only by knowing the MMU values can the translation take place. It's therefore crucial that the emulator can handle physical addresses, since only physical ones never change.
While this seems fairly obvious it can be difficult to implement. Emulators use the 64180 for all target memory accesses, so the machine cycles are identical to those expected with a processor in the socket. A translation from desired physical address back to logical, CBAR, CBR, and BBR must take place, since the 64180's code can only issue logical addresses.
In other words, if memory at physical address 20000 is to be displayed, then some routine has to figure out settings for all three MMU registers, plus a logical address, that the 64180 can use to access the memory. Not a trivial task.
As users we don't care what the emulator does or how it works. All we're concerned with is the debugging interface - the source level debugger (SLD) that runs on a PC and communicates with the emulator over RS-232. If we type DISPLAY SYMBOL FOO, then we want to see the value of FOO, no matter where it is or how the MMU is setup. The SLD must therefore know about FOO's physical address.
Fortunately, all the products mentioned generate physical symbol addresses in the debugging files. The SLD can send these values down to the emulator and let it deal with coming up with the proper address.
This does mean that the SLD/emulator interface is completely linear, like the 68000's. You can randomly access any location in the target's memory just by typing in the right address.
What if you wish to see a logical address? Is this important? Herein lies a source of confusion. Only physical addresses unambiguously identify each public symbol and line number. Your program works through logical addresses - the two are not the same or even similar. Looking at disassembled code, you might see a LD A,(1000). The 1000 is logical - its physical equivalent depends on the current MMU mapping.
Softaid's emulators get around this problem by letting you suffix any address with a tilde to indicate that the logical address is needed, rather than the default physical. Of course, the emulator will use the current MMU setting to access the memory, so if the MMU is not set up as it would be when executing that instruction, the data may not be correct. Normally this is not a problem - you debug in you execution context, rather than randomly hunting through code.
64180 registers are 16 bits long. When used as pointers, they form logical addresses, creating the same sort of problem just mentioned. Again, when displaying the contents of a register pointer, that will be logical. If you ask for a dump of memory at the address in HL, what will result? The correct solution is to use indirect register references as logical (saving the bother of suffixing a tilde all the time), since this is what the programmer really wants.
In C, an automatic pointer will be stored as a 16 bit value on the stack. Suppose, while debugging, you wish to dump *ptr? In other words, display the data pointed to by ptr, which is presumably on the stack. Again, only one correct solution exists: get the stack pointer, convert it to physical using the current MMU, extract the 16 bit value of ptr from the stack, make that physical, and then access the destination address.
Conclusion
The 64180 family solves a long standing Z80 problem - that of handling more memory. Lots of current Z80 applications can be easily ported to the 64180 to take advantage of the larger memory model and high integration peripherals. Don't try to get away with Z80-style development tools - select assemblers, compilers, and debuggers that exploit the 64180's resources to ease your development efforts.