Huge Z180 Arrays

The Z180's banking scheme is great for handling code; data is a bit more complex. Here's example code.

Published in Embedded Systems Programming, August 1990

By Jack Ganssle

Zilog's Z180 (aka 64180) is sometimes used in applications where huge quantities of data are collected or analyzed. Their CMOS low power design makes them perfect for battery powered data acquisition equipment. Or, as part of an instrument, their architecture is ideal for storing large amounts of data during analysis.

As I mentioned in a previous article, these processors include a powerful Memory Management Unit (MMU) designed to extend the address space to a full megabyte. With some understanding of the intricacies of the MMU, it's not too hard to upgrade an old Z80 application to make use of the extra memory. Of course, the HD64180 still uses a 64k logical address (for compatibility with the Z80), so a Z80 program doesn't suddenly get a nice 1 Mb linear addressing range. The software must be somewhat restructured if huge programs or data areas are needed.

(As a refresher, "logical" addresses are those issued by the program; on the HD64180 these can never exceed 64k, the limit imposed by using 16 bit addresses in load and jump instructions. "Physical" addresses are those appearing on the CPU's pins; the internal Memory Management Unit converts Logical to Physical, using a translation algorithm controlled by the programmer.)

Suppose you're designing a remote data collection device that gets one 16 bit A/D reading every second. This doesn't sound like much data, but after only a day the system will have acquired 86,400 words (172,800 bytes) - far more than the logical address space of 64k.

Managing a 64180's memory in this sort of application demands careful analysis of both the logical and physical address space needs. There is no way all 176k will fit within the 64k logical space, so the program will have to remap the MMU, possibly often, to get to the array.

Accessing Arrays

This problem requires "sequential" access to the data, since the current data point is appended immediately after the last. It is a special case of the more general array accessing situation, where a program might need any arbitrary data point, in no particular order. "Random" array access is analogous to that employed by disk drives, while sequential is typical of tapes.

Byte-oriented data (like strings) is easy to work with. Element I is the "Ith" address in the string; in other words, the address of DATA(8) is the start address of the array plus 8 (assuming DATA(0) is a valid element). This is called a vector - it has only one dimension, that given by the subscript.

Of course, in real applications we often work with data that is 16 or more bits long. Integers are often 2 bytes; floating point values typically are 4 or even 8 bytes. Array element I of a vector is found from:

base + I * element size,

where "base" is the start address of the array.

Again, this assumes element 0 is valid. Element 0 is at the first address of the array, followed sequentially by each other element.

Multidimensional arrays use an extension of this formula. In most systems these arrays are stored in "row major" order: given A(row,column), all column elements of row 0 are stored first, then the column elements of row 1, etc. We can get to any element A(I,J) using the formula:

address=base + I * (Jmax * Esize) + J * Esize

Jmax is the number of columns in the array and Esize is the number of bytes per element of the array.

As you can imagine, this can be a computationally expensive way to get to an array. Multiplications are slow. The HD64180's multiply instruction only handles 8 bit operations, and so by itself it is not adequate for indexing into an array. Generally you can count on the quantity Jmax * Esize to be a constant; for speed critical applications it may be wise to set this to a convenient value (a power of two), even if some memory is wasted. Similarly, aim for an element size that is 1, 2, 4, or 8 so shifts can be substituted for multiplies.

Any number of dimensions can be accommodated by extending the formula. Higher dimension arrays require even more math to access, so try to limit them to one or two.

In real time applications it's often nice to support two forms of data access. A perfectly general form is useful for offline data reduction; your application program can request any array element in any sequence. During data acquisition you might need a shortcut to avoid the computational overhead of computing an array index. If data is gathered in some sequential form, it's easy to visualize the data as a one dimensional vector (instead of a multidimensional array), and store each value sequentially. We'll explore both methods.

The Memory Manager

Sure, it's easy to write code to compute an array index along the lines presented. We'll run into trouble when the array gets too big. Suppose your code uses most of a 27256 PROM, leaving only 32k of logical address space for data. If all of this were dedicated to a 2 dimensional array consisting of 4 byte-long elements, then only 8192 elements fit in memory - ARRAY(100,100) exceeds address 64k.

Here the MMU comes to the rescue, but not without a lot of help from you the programmer. Obviously, we can simply reprogram the MMU's control registers every so often and bank switch portions of RAM into the processor's address space. The secret to success is careful planning.

If you've never really blown a software project by immediately jumping into coding, then you are probably sick of hearing about software methodologies. In fact, as processors and programs get more complicated careful design is far more important than it once was. Oh for the days of 4k programs! We could crank out a few thousand lines of code in a couple of weeks and be done. Now, when 300k+ programs are then norm, a carelessly designed system will be a disaster. Guaranteed.

This is certainly true when using any sort of memory manager. One penalty of the MMU is a segmentation of your logical address space that must be designed in up front, and that very likely can NEVER be changed, without completely rethinking the entire structure of the program.

Using huge arrays forces you into a three bank memory model on the Z280's fantastically complex and powerful MMU will let you have up to 16 banks in the logical address space). One bank points to the ROMed program; in most cases this logical to physical mapping will never change. Another bank accesses an area of RAM for program variables and transients. In all but extreme cases this never be remapped, because the stack will reside here. Finally, one bank points to the huge array(s).

*****************************   FFFF
*                           *
*  huge array section       *
*                           *
*****************************   F000
*                           *   EFFF
*                           *
*                           *
*                           *
*   Program data and stack  *
*                           *
*                           *
*                           *
*****************************   8000
*                           *   7FFF
*                           *
*                           *
*    Program (ROM)          *
*                           *
*****************************   0000

Figure 1: Organization of Logical Address Space

Figure 1 shows one possible configuration of the memory management unit. This assumes 32k of ROM from logical 0 to 7FFF, 28k of RAM from 8000 to EFFF, and 4k of huge array RAM at the end of memory.

Wait a minute - 4k of RAM for huge arrays? This doesn't seem like much!

This 4k section of the system's address space is essentially a "window" into the huge array. We have to provide a section of logical space to allow access to the data; the 4k window is this section. This is much like a disk buffer used in operating systems - data is written to the buffer and then flushed to disk when full. It's size is a result of the mapping resolution of the MMU - 4k is the minimum amount we can allocate. You can always make this larger, which will cut down on MMU remapping, but you'll sacrifice either variable or program area.

The window is for storage of huge arrays. Never, never store the stack or other variables here, since remapping will invalidate the data.

The idea behind using this window is to compute the physical (20 bit) address of the proper array element, and then to position the window to bring the data into view. Earlier we saw how to compute an index into any array. Now, this windowing step is also introduced. The algorithm is not complex, but a wise programmer will insulate his code from the machinations of array element lookup by hiding the details in subroutines.

The routine "PUT1D" stores a byte to a huge one dimensional array using the map shown in figure 1. The code to get a byte from the array is nearly identical and so is not shown. PUT1D accepts an array index in HL that can range all the way up to 65k. Thus, the one 4k window in logical address space gives the routine access to a 65k hunk of data. Note that it supports random accesses - it stores the data in B into the array at the index in HL. HL can be 3 on one invocation and 40000 on the next.

PUT1D locates the array in physical memory at address: 0F000 + BASE_CBR*4096. Thus, if BASE_CBR is 1, then the array runs from 10000 to 1FFFF. This implies that the memory manager's CBAR register must be set to Fx (where "x" defines the logical start of the base area) so that logical addresses from F000 to FFFF (our window) are translated into common area 1.

For the sake of simplicity the code stores a one byte value. A word version would require the "index" to be multiplied by 2 before it is used in the computations. Use a shift to do the multiply, but be wary of overflows.

Multiple dimension arrays can be programmed just as easily, but you must compute the address using the formula previously discussed. Multiplies are slow and should be avoided; try to pick values of jmax that are powers of two, and then use a series of shifts. Fast access will require some thought to optimize the computations.

One easy speed trick is shown in "NEXT1D". Especially when gathering data, we often just put one value in the next location - random array accesses are not needed. This means we can skip all of the multiplications needed to compute the address; we just increment the last address. NEXT1D illustrates this approach with the one dimensional array we've already looked at. PUT1D saves the current CBR and logical addresses in SAVE_CBR and SAVE_LOG for NEXT1D. After one call to PUT1D to set things up, all further sequential accesses can use the faster NEXT1D. With the single indexes shown, the speed difference is not important; with a 2 or 3 dimensional array a tremendous time saving will result.

What if your program uses a number of big arrays? You can't assign more windows since the HD64180 limits mapping to three sections. Probably the easiest solution is to write a "PUT" and "GET" routine for each array, using a different BASE_CBR for each. Avoid using sequential accesses unless you can be really sure that accesses will be restricted to one array at a time.

;
;  Put an element in a huge 1 dimensional array. 
; Here we assume:
;
; 	The "window" is from F000 to FFFF.
;	BASE_CBR is the first (lowest) CBR value
;	Register B has the data to store
;	Register HL has the array index 
;
; cbr = base_cbr + (index AND ff000) / 4096
; logical address= f000 + (index AND 0fff)
;
put1d:
	ld	a,h		; get high order index value
	and	a,0f0h	; mask off upper 4 bits (cbr bits)
	rra			; shift right 4 bits
	rra			; (since we ignore lower 8 bits, this 
	rra			;  a 12 bit shift)
	rra			; a= (index AND ff000)/4096
	add	a,base_cbr
	out0	(cbr),a	; set new cbr value
	ld	(save_cbr),a	; save cbr for next1d
	ld	a,h		; high part of index
	or	a,0f0h	; mask off high nibble and or in f0
	ld	h,a		; hl is now right logical addr in window
	ld	(hl),b	; save data
	ld	(save_log),hl	; save logical addr for next1d
	ret
;
; put register B in the huge dimensional array at the next
; open location. This assumes that put1d was once called to
; index a specific value.
;
;  	On entry, B is the data to store
;	save_cbr was set by put1d or next1d
;	save_log was set by put1d or next1d
;
next1d:
	ld	hl,(save_log)	; last logical address used
	ld	a,(save_cbr)	; last cbr used
	inc	hl		; skip to next element
	jr	nz,ne1	; jp if we have not exceeded the window
	ld	hl,0f000h	; re-init to the start of the window
	inc	a		; next cbr value
	ld	(save_cbr),a	; save current cbr
ne1:	ld	(save_log),hl	; save current logical address
	out0	(cbr),a		; set cbr
	ld	(hl),b		; save data
	ret