By Jack Ganssle

Some Assembly Required

Published 6/03/04

No doubt there's some truth to that, but surely the computational cost of an elaborate GUI supporting vast filesystems bears much of the blame. I wonder, too, if some of the much-vaunted gain in raw processor speed is lost by cache-busting programs like Word. Sure, a 3 GHz machine screams on small apps running entirely from the fastest RAM. But wander out of cache and the system has to issue hundreds of wait states for each main memory load or store. Without cache, with 50 nsec DRAMs, a fast Pentium crawls along at 20 MHz.

Hyde claims knowledge of assembly gives developers the ability to craft faster high level code. Of that I'm not so sure. If we patterned C++ code after assembly language we'd always use pointers (a very assembly-like construct) instead of arrays even when an array is the best solution, and eschew automatic variables in favor of globals. And surely the code would be polluted with an excess of GOTOs; conditional GOTOs most likely, since if(variable) goto x; looks just like jnz x.

It's awfully hard to correlate high level structures with the handful of assembly instructions most compilers use. Creating an object invokes a constructor. which generates what instructions? And what can I do to optimize it? Search a string and the compiler will almost certainly invoke a runtime routine, which is hidden in object form in some inscrutable library. We rely on the compiler to abstract us from these low level details, and expect it to generate an efficient translation.

And yet there is value in knowing what your tools do. In the embedded space we're faced with real-time constraints that are unheard-of in the PC/workstation world. Certain sections of our code are always performance bound.

Consider interrupt service routines. Except for the most demanding applications I'd never advocate writing ISRs in assembly, yet using C has its own set of problems. Does that a= b*c; statement execute in a microsecond. or a week? Do we dare use floating point when the routine must complete in less than 100 æsec?

A friend once told me of working on a Navy job using the CSM-2 language. The compiler was so awful they learned to write a source function, compile it, and then examine the resulting assembly before doing any testing. If the code looked wrong they'd change something - maybe even spacing - in the source, and recompile, hoping the change would trick the tool into generating correct code. I laughed, thinking that's like programming into a black hole of uncertainty. Yet unless we know what our tools do, what sorts of code they're likely to generate, writing real-time code is also coding into a black hole. If the function isn't fast enough, we change something, nearly at random, hoping to get better performance.

And so I go a step further than Mr. Hyde. Don't structure your high level source code like assembly language, and never think in assembly when cranking C/C++ code. But for time-critical sections do examine the generated code. Look for simple optimizations, be wary of calls to runtime routines. Always instrument ISRs and other performance-bound functions to measure their performance.

Great firmware programmers do know assembly. They embrace it. For these low level routines are where the C meets the assembly and the hardware. Lights are flashing, motors spinning and analog zings in and out. For me, working on the boundary of the system where the firmware meets the real world is the best part of any project.