Writing Relocatable Code

Some embedded code must run at more than one address.

Published in Embedded Systems Programming, February 1992

For novel ideas about building embedded systems (both hardware and firmware), join the 40,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. Click here to subscribe.

By Jack Ganssle

Employees of large multinationals think of packing their house and home when hearing the word "relocation"; however, embedded programmers use relocation as a way to reduce hardware costs.

Relocatable code is software whose execution address can be changed. A relocatable program might run at address 0 in one instance, and at 10000 in another.

Just to confuse the issue, partially built programs are composed of object modules unfortunately called "relocatables". Linkers combine multiple relocatables to one final program. The word "relocatable" is applicable, since each is assembled at a pseudo-address of 0. The linker corrects all address references to the proper execution values. Once linked, the code is frequently no longer relocatable, since it can typically run only at a single address.

Obviously, this sort of relocation is an important consideration for linking multi-module programs. Without it we'd be doomed to giving absolute start addresses to each of the modules before assembly, a mind-numbing prospect since changing the length of any one module may necessitate changing the origin of all of them.

Once linked, it may not be so obvious why you'd care if the program could be relocated. Consider the case of an application running on a multiuser system - if every users' program had to run at some absolute address, who assigns these addresses? What if two users pick the same one? It is possible to swap the programs in and out of memory, sharing the same addresses, but this makes rather poor use of the huge address spaces supported by modern CPUs. Certainly a system with 16 Mb of memory should be able to run a dozen or more moderate-sized applications without the disk-intensive overhead of swapping!

Most mainframes and other large systems avoid this by either requiring all programs to be reentrant (and thus inherently relocatable), or by assigning an absolute address range to each user. A hardware base register defines the start address of this memory within the physical address space

It is useful, though, to write intrinsically relocatable code even on small embedded systems; code that, even after linking, can be dynamically moved around in the computer's address space and executed at more than one address. In embedded systems the relocation may be handled in hardware, all but invisible to the program.

Start-up Problems

One of the more common cases for simple relocation is dealing with the power-up sequence of a CPU. The 68000 fetches its starting address and initial stack pointer from address 0. This forces the system designer to put ROM at the beginning of memory. However, like the 80x88 it keeps all exception vectors in low memory. Quite a few systems dynamically change interrupt pointers as the code executes, demanding that these low memory vectors be in RAM. We're faced with a conflict: the reset information at 0 must be in ROM, but the interrupt vectors, also near 0, must be in RAM. Code relocation is one solution.

Back in the pre-DOS dark ages of CP/M, Z80 systems solved the same sort of problem by mapping in a "phantom" ROM at address 0000 during boot. Once the system was up and running, and the operating system copied or loaded to RAM, the phantom ROM simply disappeared from the processor's address space. Though the boot phase was ugly, once complete the CPU had a clean full-address-space of RAM.

In minimal cost systems a single ROM chip (or ROM pair, in the case of a 16 bit CPU using bytewide memories) likely contains the entire program. Though the phantom ROM approach is feasible (copy the ROM to RAM, start executing code just above the interrupt vectors, and shut down the ROM), it does mean you'll spend more on memories. That is, you'll need to add as much RAM as there is ROM just to solve the relocation problem. It makes more sense to write smarter code and design cleverer hardware.

If the system's recurring (i.e., manufacturing) costs are a concern, then by all means try to make every part of the hardware do double duty. Software is expensive, but you only pay for it once. Hardware costs money each time an "instance" of the system is built.

Quite a few processors have program counter (PC) relative jump and call instructions. Instead of transferring control to a fixed address embedded in the instruction, the destination address of PC relative instructions is encoded as a positive or negative offset to be added to the current program counter. This makes particularly good sense in a ROM-based system, as pretty much all of the code lies contiguously in one or more non-volatile memories; after each compilation the distance between subroutines is fixed.

Solving the power-up relocation problem with PC relative instructions is straightforward: start the code from ROM fixed at address 00000. Enter an initialization routine that is located near 00000. Copy a short routine to RAM:

	<set an I/O bit>
	<jump to reloc_address>

Then, jump to this RAM routine. The I/O bit should move the start vectors in low memory. "reloc_address" is the start of the code at the new, moved, address of the ROM. If all transfers are made with PC-relative instructions, then the code will not notice or care that it is operating at some other address.

A variant of this involves tricking the assembler to compile part of the code at the final, moved destination address. Then PC relative instructions are no longer needed as the assembler and linker will be able to resolve all transfers directly. On some assemblers the pseudo operators PHASE and DEPHASE let you assemble code for use at other addresses. This is a better solution when using a high level language where you have no control over what instructions are compiled (i.e., you cannot ensure the compiler will generate PC-relative transfers).

Both of these solutions do require that some RAM exists for the transfer code. Think about it: if the ROM is to suddenly move, you better not be executing in that ROM during the jump! After the I/O instruction moves the memory, the jump instruction will no longer be accessible.

I have seen folks use a CPU's prefetcher to pick up the jump before the I/O completes, but this seems a little like playing Russian roulette. Sooner or later a new mask of the silicon will change the prefetching algorithm.

Sometimes the hardware includes a little state machine that defers changing the ROM's base address until the jump just completes. While elegant, more hardware is needed. Prefetchers (most modern CPUs have them now) make it almost impossible to get the timing right.

Often you won't have RAM available until the ROM has been moved. It's a lot easier to put one big hunk of RAM at an address than several little pieces scattered around. Before the ROM move, the low (at least) RAM cannot be accessed as it is at the same address as the ROM.

What other solutions are there? The embedded world is unlike that of big computers. Few of us use the entire address space of the processor, so why not minimally decode the address bus?

To take a fer instance: the 68000 has 24 address lines, giving 16 mb of memory space. This is a lot more than almost any low cost embedded system needs. If your program eats up, say, 256k for code and 256k for data, many of the address lines can be ignored.

In this case, one solution is to use a pair of 128k by 8 ROMs and a similar pair of RAM chips. Generate the chip select signals from a single address wire. You might tie the inversion of A23 to the ROM chip select, and the inversion of the ROM select to the RAM. (Inverted signals are universally used by static memory devices). Thus, the RAM is on if the ROM is off. This is a trifle simplistic, since all I/O on the 68000 is memory mapped, but you get the idea.

Now, if A23 is high (true for any address of 1xxx xxxx xxxx xxxx xxxx xxxx, where "x"=don't care) the ROM will be on. For all other addresses the RAM is on. Thus, we've effectively split the address space of the CPU into ROM and RAM halves, with the ROM eating up the entire top half of a 16 mb address space and the RAM the lower.

Add a latch to force A23 to the memories "high" immediately after reset. Add an I/O instruction to disable this condition. After reset the ROM will be at 0 and the RAM just doesn't exist (depending on hardware implementation). The startup code should be a jump to address 800000+the offset of the main code (the ROM's normal space), followed by the I/O instruction needed to disable the funny reset circuit.

The 8 mb jump stays in ROM, since the hardware continues to decode a ROM chip select. Using a single address line as a chip select simplifies decoding logic, saving a buck or three in the unit's production, and makes relocation easy. It's true that the ROM appears multiple times in the processor's address space (it starts at 800000, 840000, 880000, and every multiple of it's size). The code could care less, as it is entirely contained in 256k.

Relocating with MMUs

Recently I've seen a lot of people doing this on the 64180 and Z180 processors. Both include memory management units that translate the core CPU's 64k logical address space into a 1 mb physical space. On these processors most people relocating their code go one step further: they trick the code into thinking it really executes at location 0, even when an upper order chip select is all that enables ROM.

This is pretty easy to do on any processor with a memory management unit: just open a window in the MMU to point to the ROM with a logical address (that issued by the code's instructions) in low memory. Program the MMU to add offsets to make the real, physical, address up high. This is particularly useful when old technology assemblers and compilers are used, which can handle only a 64k address space. The logical addresses are always in 64k, even though the MMU-translated ones are not, so the tools are happy.

There is one important "gotcha": debugging might be tough. If the code is compiled at 0 but the MMU translates it to, say, 80000, then your logic analyzer, emulator, or other debugging engine will not be able to work symbolically. These hardware debuggers see the address bus (post-MMU translation), and try to equivalence the addresses to the pre-translated addresses made by the linker. The best solution is to get a modern linker that fixes symbol addresses automatically. For some reason, though, a lot of programmers refuse to desert their old tools.

Viruses, TSRs and the 80x88

Power-up is not the only domain of relocatable code. In the DOS world too many of us have been stricken with computer viruses, which are nearly always relocatable. TSR programs are as well, as it is difficult to say just where a TSR (or virus) will be loaded.

The 80x88's segment registers let you build a more interesting relocation scheme, especially for small programs. In effect, every address reference is made by applying the offset contained in one of the processor's segment registers. Just change the register, and the entire address of the program changes as well. If the program is less than one segment long (64k), it can be entirely relocatable with or without using PC-relative instructions.

However, Intel did remove most of the power-up difficulties from the 80x88 family by making it boot from the end of the address space, rather than from the beginning. It seems most ROMed programs execute from high memory to save the trouble of adding relocating circuitry.