Microcontroller C Compilers

This article discusses the state of C for controllers circa 1990.

Published in Electronic Engineering Times, November 1990

By Jack Ganssle

C has taken the software industry by storm. In the last few years even the microcontroller industry, the last bastion of pure assembly language, has experienced an exponential use of C language programs.

C's rapid proliferation spawned dozens of compilers targeted to embedded systems. Their quality ranges from nearly perfect to downright unusable. Sometimes the compiler's caliber is inversely proportional to the data sheet's gloss and the product's cost. Unfortunately the best characteristics are apparent in the vendor's advertising, the worst don't show up until you've used it for some time.

Code size and speed are usually uppermost programming concerns, especially to cynical dyed-in-the-wool assembly programmers being dragged into a high level language for the first time. I hesitate to add to the hype about efficiency, except to state the obvious, that C code is slower and larger than comparable assembly. But not by much - in most cases the difference is only about 25%.

It's hard to beat C for a large embedded project. Everyone admits that it is less efficient than assembly, but C will reduce non-recurring engineering (NRE) costs by a factor of 3 or more. In most products the savings in design, coding and maintenance quickly justifies the extra memory expenses. Where is the crossover between NRE and recurring costs? How many units will you have to sell before the larger ROM costs more than the extra $50, 100, or 150 thousand in NRE?

This conventional wisdom doesn't directly apply to microcontrollers, where ROM sizes are fixed and just can't be increased. Still, the quality of modern compilers is such that the penalty in memory space for using C is small. Using a controller with a little extra memory can save a lot during development.

Moore's law states that every two years the silicon wizards will double the number of transistors on a chip. Therefore memory is cheap, and gets ever cheaper as densities increase. Unfortunately, microcontrollers seem to be at the tail end of the benefit chain of this rule. Even today it's hard to find more than 16k or so of ROM on a controller. 4k is still not uncommon. Intel did recently introduced a 32k version of the venerable 8051, but 32k is but a pittance compared to the memory used by big microprocessor-based programs.

The cardinal rule of design for microcontrollers is to carefully manage your resources. Certainly the most important and scarcest resource on any microcontroller is memory. Hoard this precious commodity; don't squander it in careless decisions. Before selecting a compiler be sure it is as miserly in its use of memory as you would be if writing assembly code.

Consider stack operations. The 6805 has no real stack, so the compiler must laboriously manipulate automatic variables (i.e., ones stored on the runtime stack) using the CPU's entirely inadequate 8 bit index register. Stack overhead will therefore occupy a lot of ROM space. In this sort of situation perhaps it makes sense to adapt to the processor's architecture and use static variables everywhere, but you'll pay a penalty in RAM requirements. At the other end of the spectrum, the embedded 68302 is a C programmer's dream platform, since stack relative addressing is fast and easy.

Many microcontroller C compilers speed operations by trading off speed for memory. Take the 8051: stack accesses are so cumbersome that many compilers allocate automatic variables as statics. In other words, even if "x" is a temporary automatic whose scope is local to one function, the compiler assigns it a permanent address in memory. With limited RAM size this can be an important concern.

Most of us consider ROM size when making the C versus assembly decision. RAM is just as important. Obviously the compiler will allocate RAM space for the stack, variables, structures, and the like. Some compilers also make use of potentially large amounts of RAM for internal compiler runtime functions. Library routines invariably use RAM. Some compilers copy all string literals from ROM to RAM during the intialization sequence. Why? Just to make the compilation easier.

Compilers designed for non-embedded programming usually are very poor at dividing memory into separate RAM and ROM sections. They assume that the address space is all RAM. Embedded programs reside in ROM, while the data is stored in a remote RAM area, so the ability to specify separate starting addresses for code and data is crucial. But this is far from enough - the ideal compiler will let you divide your code and data even further. Suppose the product uses memory-mapped I/O; it's essential that these ports, although looking like RAM variables to the compiler, be assigned to the proper absolute addresses. Interrupt vectors are also stored at fixed locations; the compiler/linker must let you define these at absolute addresses distinct from the rest of the code.

All compilers will perform some amount of optimization to minimize code size or increase speed. Some are truly remarkable, removing constant expressions from loops and the like. This may not be a virtue; extensive optimization makes the code impossible to debug. No one yet knows how to tie optimized code to a debugger. All meaningful references between the object code and the original source lines are lost when the compiler moves the source around, so all debugger vendors insist that you debug with optimizations turned off. Rather than rely on extensive optimization, write good code! Don't leave a constant assignment inside a loop where it will be executed thousands or millions of times. Don't ask the compiler to convert needlessly between floating point and integer.

How fast does the compiler translate a program? Many programmers are now familiar with Turbo C's blazing compilations. No cross compiler is nearly so fast. In evaluations conducted at Softaid from Turbo's few seconds to over 20 minutes (on the most expensive compiler evaluated). Expect to use your tools a lot. Demand reasonable translation times.

Linkers can be even slower than the compiler. After all, a well designed program is built around quite a few small modules. After making a change, you'll only recompile the one module that was effected, but you'll relink the entire program. A slow linker is a curse to avoid at all costs.

Many cross compilers build their internal data structures in memory, incorrectly assuming the "huge" address space of the host development system will be adequate. On a PC much of 640k address space is taken up with the compiler itself, DOS, network drivers, and TSRs. It's not unusual to find a cross compiler unable to compile big source programs. Virtual products, which write intermediate tables to disk, support programs of any size.

How important it is that the compiler conforms to the ANSI standard? If the product will be reused many times by other engineering groups, portability is vital. There is a lot to be said for using at least a close facsimile of ANSI compatibility so the product can survive a midstream compiler change. In any event, be sure the compiler at least supports function prototyping. It's a simple way of automatically checking parameter lists that wall eliminate many hours of debugging. Be sure the compiler actually checks the parameters, and doesn't accept but ignore the "prototype" keyword!

If the system will use some form of multitasking or interrupt handlers written in C, be sure the library is reentrant. Some aren't. Manuals rarely allude to reentrancy, so compile a tiny "do nothing" program and measure RAM use. Then, add library calls without adding variables. Be suspicious if the more RAM is linked in. The most common non-reentrant part of a library is the floating point package.

Non-reentrant code might force you to write all of the interrupt handlers in assembly. Remember that allocation of automatics can change the reentrancy characteristics of a function. Automatics stored on the stack will be intrinsically reentrant; those assigned to specific RAM locations will not.

On the subject of libraries, be sure the compiler includes all of the functions you'll need. While many embedded systems make no runtime calls, others depend on extensive library support. Evaluate your requirements. Will you require special CPU resources? Does the compiler support these? Most compiler companies use a common parser and just replace code generation modules when building variants for different processors. Therefore, register variables, for example, may not really use registers.

A few compilers will support a CPU's internal memory management unit. If your design requires extended memory that can be accessed only via an MMU, be sure the compiler gives you some sort of MMU control. Some compilers will even automatically remap it and insert C functions into individual maps. Hitachi's 647180X microcontroller includes an MMU; others will as time moves on and memory demands increase.

Is the compiler compatible with the assembler? Compiled C code must be combined with assembled files via a common linker. Almost every linker takes a different object file format, essentially guaranteeing problems in combining tools from different vendors. In some cases the tools from a single vendor are incompatible. Don't expect to work around a weak assembler by substituting one from another software house, since they will rarely be compatible.

Once the code is written you'll have to debug it. Plan to use a source level debugger (SLD), and be sure the compiler is compatible with the debugger. Prior to the widespread use of C it was common to mix and match tools; the assembler, linker, and debugger could all come from different vendors, yet work together with a minimum of trouble. The symbol and hex files might need a trivial amount of conversion to get the tools to work well together, but that was expected and was really not a lot of trouble. Block-structured languages like C have changed all of this. Tools are sometimes like the construction workers on the Tower of Babel. Few are now really compatible with each other.

Source level C debugging requires a tremendous amount of information about the program's organization. C is not just an extension of assembly; line number records and symbol addresses, while sufficient for programs created with an assembler, are only a fraction of what is needed for C. Usually most of the difficulty lies with data representations. Is a variable local to one function? How is it's scope defined in the debug file? Is it a static that has an absolute memory address, or, as for an automatic type, is it assigned as an offset from the stack? Is it a register variable? All of these questions get even more complicated if the variable is an array or structure.

Unfortunately, many compilers produce little or no debug information, rendering them all but useless in an embedded environment where troubleshooting by adding print statements just doesn't work.

The 8 bit arena is especially chaotic. While a number of standards for expressing source debug information have been proposed (such as IEEE-695 and COFF), few languages produce these; those that do often add their own extensions. The quality of information varies widely and changes almost daily as the vendors scramble to get their products into better competitive positions. The files are far more complex than a simple symbol file, so generating conversion utilities is a difficult and time-consuming process. SLD vendors must think long and hard before supporting a particular format.

The moral of the story is to ask hard questions from each of your development tool vendors. Make no assumptions about compatibility. Once you have a text file of C code, it must be compiled, linked, perhaps located, and debugged through an SLD and emulator. Will each of these tools work together? Will you get full source debug functionality, like local variables, scope tracking and C line number support, or will some important feature be compromised?

Finally, try to ensure that the compiler is reliable. Talk to people who have successfully completed sizable embedded projects with the tool. How often did the compiler crash, miscompile, or unexpectedly cost engineering time? I know of one big Navy job where the government mandated the use the obscure language CMSC. At great expense the company acquired a DOD-approved VAX cross compiler that was so unreliable they were forced to write code in small sections, compile it, and examine all of the compiler's assembly output. If the translation was obviously wrong, the programmers made a more or less random source change and tried again. Your tax dollars at work.

A lot of factors go into compiler selection. When you finally make a decision, buy the product and immediately run tests to evaluate the product's usefulness. A few days of testing can reveal many fatal flaws. Return it if it is unacceptable - reputable companies will always take a return if made within the first week or two.