Microcontroller C Compilers
This article discusses the state of C for controllers circa 1990.
Published in Electronic Engineering Times, November 1990
 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
C has taken the software industry by storm. In the last few years
even the microcontroller industry, the last bastion of pure assembly
language, has experienced an exponential use of C language programs.
C's rapid proliferation spawned dozens of compilers targeted to
embedded systems. Their quality ranges from nearly perfect to
downright unusable. Sometimes the compiler's caliber is inversely
proportional to the data sheet's gloss and the product's cost.
Unfortunately the best characteristics are apparent in the vendor's
advertising, the worst don't show up until you've used it for
some time.
Code size and speed are usually uppermost programming concerns,
especially to cynical dyed-in-the-wool assembly programmers being
dragged into a high level language for the first time. I hesitate
to add to the hype about efficiency, except to state the obvious,
that C code is slower and larger than comparable assembly. But
not by much - in most cases the difference is only about 25%.
It's hard to beat C for a large embedded project. Everyone admits
that it is less efficient than assembly, but C will reduce non-recurring
engineering (NRE) costs by a factor of 3 or more. In most products
the savings in design, coding and maintenance quickly justifies
the extra memory expenses. Where is the crossover between NRE
and recurring costs? How many units will you have to sell before
the larger ROM costs more than the extra $50, 100, or 150 thousand
in NRE?
This conventional wisdom doesn't directly apply to microcontrollers,
where ROM sizes are fixed and just can't be increased. Still,
the quality of modern compilers is such that the penalty in memory
space for using C is small. Using a controller with a little extra
memory can save a lot during development.
Moore's law states that every two years the silicon wizards will
double the number of transistors on a chip. Therefore memory is
cheap, and gets ever cheaper as densities increase. Unfortunately,
microcontrollers seem to be at the tail end of the benefit chain
of this rule. Even today it's hard to find more than 16k or so
of ROM on a controller. 4k is still not uncommon. Intel did recently
introduced a 32k version of the venerable 8051, but 32k is but
a pittance compared to the memory used by big microprocessor-based
programs.
The cardinal rule of design for microcontrollers is to carefully
manage your resources. Certainly the most important and scarcest
resource on any microcontroller is memory. Hoard this precious
commodity; don't squander it in careless decisions. Before selecting
a compiler be sure it is as miserly in its use of memory as you
would be if writing assembly code.
Consider stack operations. The 6805 has no real stack, so the
compiler must laboriously manipulate automatic variables (i.e.,
ones stored on the runtime stack) using the CPU's entirely inadequate
8 bit index register. Stack overhead will therefore occupy a lot
of ROM space. In this sort of situation perhaps it makes sense
to adapt to the processor's architecture and use static variables
everywhere, but you'll pay a penalty in RAM requirements. At the
other end of the spectrum, the embedded 68302 is a C programmer's
dream platform, since stack relative addressing is fast and easy.
Many microcontroller C compilers speed operations by trading off
speed for memory. Take the 8051: stack accesses are so cumbersome
that many compilers allocate automatic variables as statics. In
other words, even if "x" is a temporary automatic whose
scope is local to one function, the compiler assigns it a permanent
address in memory. With limited RAM size this can be an important
concern.
Most of us consider ROM size when making the C versus assembly
decision. RAM is just as important. Obviously the compiler will
allocate RAM space for the stack, variables, structures, and the
like. Some compilers also make use of potentially large amounts
of RAM for internal compiler runtime functions. Library routines
invariably use RAM. Some compilers copy all string literals from
ROM to RAM during the intialization sequence. Why? Just to make
the compilation easier.
Compilers designed for non-embedded programming usually are very
poor at dividing memory into separate RAM and ROM sections. They
assume that the address space is all RAM. Embedded programs reside
in ROM, while the data is stored in a remote RAM area, so the
ability to specify separate starting addresses for code and data
is crucial. But this is far from enough - the ideal compiler will
let you divide your code and data even further. Suppose the product
uses memory-mapped I/O; it's essential that these ports, although
looking like RAM variables to the compiler, be assigned to the
proper absolute addresses. Interrupt vectors are also stored at
fixed locations; the compiler/linker must let you define these
at absolute addresses distinct from the rest of the code.
All compilers will perform some amount of optimization to minimize
code size or increase speed. Some are truly remarkable, removing
constant expressions from loops and the like. This may not be
a virtue; extensive optimization makes the code impossible to
debug. No one yet knows how to tie optimized code to a debugger.
All meaningful references between the object code and the original
source lines are lost when the compiler moves the source around,
so all debugger vendors insist that you debug with optimizations
turned off. Rather than rely on extensive optimization, write
good code! Don't leave a constant assignment inside a loop where
it will be executed thousands or millions of times. Don't ask
the compiler to convert needlessly between floating point and
integer.
How fast does the compiler translate a program? Many programmers
are now familiar with Turbo C's blazing compilations. No cross
compiler is nearly so fast. In evaluations conducted at Softaid
we measured compile/link times for a 1000 line program ranging
from Turbo's few seconds to over 20 minutes (on the most expensive
compiler evaluated). Expect to use your tools a lot. Demand reasonable
translation times.
Linkers can be even slower than the compiler. After all, a well
designed program is built around quite a few small modules. After
making a change, you'll only recompile the one module that was
effected, but you'll relink the entire program. A slow linker
is a curse to avoid at all costs.
Many cross compilers build their internal data structures in memory,
incorrectly assuming the "huge" address space of the
host development system will be adequate. On a PC much of 640k
address space is taken up with the compiler itself, DOS, network
drivers, and TSRs. It's not unusual to find a cross compiler unable
to compile big source programs. Virtual products, which write
intermediate tables to disk, support programs of any size.
How important it is that the compiler conforms to the ANSI standard?
If the product will be reused many times by other engineering
groups, portability is vital. There is a lot to be said for using
at least a close facsimile of ANSI compatibility so the product
can survive a midstream compiler change. In any event, be sure
the compiler at least supports function prototyping. It's a simple
way of automatically checking parameter lists that wall eliminate
many hours of debugging. Be sure the compiler actually checks
the parameters, and doesn't accept but ignore the "prototype"
keyword!
If the system will use some form of multitasking or interrupt
handlers written in C, be sure the library is reentrant. Some
aren't. Manuals rarely allude to reentrancy, so compile a tiny
"do nothing" program and measure RAM use. Then, add
library calls without adding variables. Be suspicious if the more
RAM is linked in. The most common non-reentrant part of a library
is the floating point package.
Non-reentrant code might force you to write all of the interrupt
handlers in assembly. Remember that allocation of automatics can
change the reentrancy characteristics of a function. Automatics
stored on the stack will be intrinsically reentrant; those assigned
to specific RAM locations will not.
On the subject of libraries, be sure the compiler includes all
of the functions you'll need. While many embedded systems make
no runtime calls, others depend on extensive library support.
Evaluate your requirements. Will you require special CPU resources?
Does the compiler support these? Most compiler companies use a
common parser and just replace code generation modules when building
variants for different processors. Therefore, register variables,
for example, may not really use registers.
A few compilers will support a CPU's internal memory management
unit. If your design requires extended memory that can be accessed
only via an MMU, be sure the compiler gives you some sort of MMU
control. Some compilers will even automatically remap it and insert
C functions into individual maps. Hitachi's 647180X microcontroller
includes an MMU; others will as time moves on and memory demands
increase.
Is the compiler compatible with the assembler? Compiled C code
must be combined with assembled files via a common linker. Almost
every linker takes a different object file format, essentially
guaranteeing problems in combining tools from different vendors.
In some cases the tools from a single vendor are incompatible.
Don't expect to work around a weak assembler by substituting one
from another software house, since they will rarely be compatible.
Once the code is written you'll have to debug it. Plan to use
a source level debugger (SLD), and be sure the compiler is compatible
with the debugger. Prior to the widespread use of C it was common
to mix and match tools; the assembler, linker, and debugger could
all come from different vendors, yet work together with a minimum
of trouble. The symbol and hex files might need a trivial amount
of conversion to get the tools to work well together, but that
was expected and was really not a lot of trouble. Block-structured
languages like C have changed all of this. Tools are sometimes
like the construction workers on the Tower of Babel. Few are now
really compatible with each other.
Source level C debugging requires a tremendous amount of information
about the program's organization. C is not just an extension of
assembly; line number records and symbol addresses, while sufficient
for programs created with an assembler, are only a fraction of
what is needed for C. Usually most of the difficulty lies with
data representations. Is a variable local to one function? How
is it's scope defined in the debug file? Is it a static that has
an absolute memory address, or, as for an automatic type, is it
assigned as an offset from the stack? Is it a register variable?
All of these questions get even more complicated if the variable
is an array or structure.
Unfortunately, many compilers produce little or no debug information,
rendering them all but useless in an embedded environment where
troubleshooting by adding print statements just doesn't work.
The 8 bit arena is especially chaotic. While a number of standards
for expressing source debug information have been proposed (such
as IEEE-695 and COFF), few languages produce these; those that
do often add their own extensions. The quality of information
varies widely and changes almost daily as the vendors scramble
to get their products into better competitive positions. The files
are far more complex than a simple symbol file, so generating
conversion utilities is a difficult and time-consuming process.
SLD vendors must think long and hard before supporting a particular
format.
The moral of the story is to ask hard questions from each of your
development tool vendors. Make no assumptions about compatibility.
Once you have a text file of C code, it must be compiled, linked,
perhaps located, and debugged through an SLD and emulator. Will
each of these tools work together? Will you get full source debug
functionality, like local variables, scope tracking and C line
number support, or will some important feature be compromised?
Finally, try to insure that the compiler is reliable. Talk to
people who have successfully completed sizable embedded projects
with the tool. How often did the compiler crash, miscompile, or
unexpectedly cost engineering time? I know of one big Navy job
where the government mandated the use the obscure language CMSC.
At great expense the company acquired a DOD-approved VAX cross
compiler that was so unreliable they were forced to write code
in small sections, compile it, and examine all of the compiler's
assembly output. If the translation was obviously wrong, the programmers
made a more or less random source change and tried again. Your
tax dollars at work.
A lot of factors go into compiler selection. When you finally
make a decision, buy the product and immediately run tests to
evaluate the product's usefulness. A few days of testing can reveal
many fatal flaws. Return it if it is unacceptable - reputable
companies will always take a return if made within the first week
or two.
|