By Jack Ganssle

A Pox on Globals

Published in Embedded Systems Design, October 2006

If God didn't want us to use global variables, he wouldn't have invented them. Rather than disappoint God, use as many global as possible.

This must have been the philosophy of the developer I know who wrote a program over 100K lines long that sported a nifty 5000 global variables. Five thousand. The effect: he was the only person in the universe who could maintain the code. Yet I heard constant complaints from him about being "stuck on the project's maintenance."

Then he quit.

Regular readers of this column know I'm obsessed with global variables, the scourge that plunges so many systems into disaster. Globals are seductive; they leer at us as potential resources, crooning "just put one in here, how bad can it be?" Like a teenager mismanaging a credit card, that first careless use too often leads to another and another, until, as Thomas McGuane wrote, "The night wrote a check the morning couldn't cash."

But "globals" is a term that refers to much more than just variables. Any shared resource is a potential for disaster. That's why we all write device drivers to handle peripherals, layering a level of insulation between their real-world grittiness and the rest of our code.

You do religiously use device drivers, don't you? I read a lot of C; it's astonishing how developers sprinkle thoughtless input/output instructions throughout the code like Johnny Appleseed tossing seeds into the wind.

The Problem

Globals break the important principle of information hiding. Anyone can completely comprehend a small system with ten variables and a couple of hundred lines of code. Scale that by an order of magnitude or three and one soon gets swamped in managing implementation details. Is user_input a char or an int? It's defined in some header, somewhere. When thousands of variables are always in scope, it doesn't take much of a brain glitch or typo to enter set_data instead of data_set, which may refer to an entirely unrelated variable.

Next, globals can be unexpectedly stepped on by anyone: other developers, tasks, functions, and ISRs. Debugging is confounded by the sheer difficulty of tracking down the errant routine. Everyone is reading and writing that variable anyway; how can you isolate the one bad access out of a million. especially using the typically-crummy breakpoint resources offered by most bit-wiggling BDMs?

Globals lead to strong coupling, a fundamental no-no in computer science. eXtreme Programming's belief that "everything changes all of the time" rings true. When a global's type, size or meaning changes it's likely the software team will have to track down and change every reference to that variable. That's hardly the basis of highly-productive development.

Multi-tasking systems and those handling interrupts suffer from severe reentrancy problems when globals pepper the code. An 8 bit processor might have to generate several instructions to do a simple 16 bit integer assignment. Inevitably an interrupt will occur between those instructions. If the ISR or another task then tries to read or set that variable, the Four Horsemen of the Apocalypse will ride through the door. Reentrancy problems aren't particularly reproducible so that once a week crash, which is quite impossible to track using most debugging tools, will keep you working plenty of late hours.

Or then there's the clever team member who thoughtlessly adds recursion to a routine which manipulates a global. If such recursion is needed, changing the variable to emulate a stack-based automatic may mean ripping up vast amounts of other code that shares the same global.

Globals destroy reuse. The close coupling between big parts of the code means everything is undocumentably interdependent. The software isn't a collection of packages; it's a web of ineffably interconnected parts.

Finally, globals are addictive. Late at night, tired, crashing from the uber-caffeinated drink consumed hours ago, we sneak one in. Yeah, that's poor design, but it's just this once. That lack of discipline cracks open the door of chaos. It's a bit easier next time to add yet another global; after all, the software already has this mess in it. Iterated, this dysfunctional behavior becomes habitual.

Why do we stop at a red light on a deserted street at 3 AM? The threat of a cop hiding behind a billboard is a deterrent, as are the ever-expanding network of red-light cameras. But breaking the rules leads to rule-breaking. Bad habits replace the good ones far too easily, so a mature driver carefully stops to avoid practicing dangerous behavior. In exceptional circumstances, of course, (the kid is bleeding) we toss the rules and speed to the hospital.

The same holds true in software development. We don't use globals as a lazy alternative to good design. But in some exceptional conditions there's no alternative.

Alternatives to Globals

Encapsulation is the anti-global pattern. Shared resources of all stripes should cower behind the protection of a driver. The resource - be it a global variable or an I/O device - is private to that driver.

A function like void variable_set(int data) sets the global (in this case for an int), and a corresponding int variable_get() reads the data. Clearly, in C at least, the global is still not really local to a single function; it's filescopic to both driver routines (more on this later) so both of these functions can access it.

But there are some additional perks that come from this sort of encapsulation.

In embedded systems it's often impossible due to memory or CPU cycle constraints to range-check variables. When one is global then every access to that variable requires a check, which quickly burns ROM space and programmer patience. The result is, again, we form the habit of range-checking nothing. Have you seen the picture of the parking meter displaying a total due of 8.1 E+6 dollars? Or the electronic billboard showing a 505 degree temperature? Ariane 5 was lost, to the tune of several hundred million dollars, in part because of unchecked variables whose values were insane. I bet those developers which they had checked the range of critical variables.

An encapsulated variable requires but a single test, one if statement, to toss an exception if the data is whacky. If CPU cycles are in short supply it might be possible to eliminate even that overhead with a compile-time switch that at least traps such errors at debug time.

Encapsulation yields another cool debugging trick. Use a #define to override the call to variable_set(data) as follows:
#define variable_set(data) variable_set_debug(data, __FILE__, __LINE__)

. and modify the driver routine to stuff the extra two parameters into a log file, circular buffer, or to a display. Or only save that data if there's an out-of-range error. This little bit of extra information tracks the error to its source.

Add code to the encapsulated driver to protect variables subject to reentrancy corruption. For instance:

int variable_get(void){
     int temp;
     push_interrupt_state;
     disable_interrupts;
     temp=variable;
     pop_interrupt_state;
     return temp;
}

Turning interrupts off locks the system down until the code extracts the no-longer-global variable from memory. Notice the code to push and pop the interrupt state; there's no guarantee that this routine won't be called with interrupts already disabled. The additional two lines preserve the system's context.

An RTOS offers better reentrancy-protection mechanisms like semaphores. If using Micrium's uC/OS-II, for instance, use the OS calls OSSemPend and OSSemPost to acquire and release semaphores. Other RTOSes have similar mechanisms.

I mentioned that the ex-global is not really private to a single function. Consider a more complicated example, like handling receive data from a UART, which requires three data structures and four functions:
- UART_buffer - a circular buffer which stores data from the UART
- UART_start_ptr - the pointer to the beginning of data in the circular buffer
- UART_end_ptr - pointer to the end of the buffer
- UART_init() - which sets up the device's hardware and initializes the data structures
- UART_rd_isr() - the ISR for incoming data
- UART_char_avail() - tests the buffer to see if a character is available
- UART_get() - retrieves a character from the buffer if one is available

One file - UART.C - contains these functions (though I'd also add the functions needed to send data to the device to create a complete UART handler) and nothing else. Define the filescopic data structures using the static keyword to keep them invisible outside the file. Only this small hunk of code has access to the data. Though this approach does create variables that are not encapsulated by functions, it incurs less overhead than a more rigorous adoption of encapsulation would, and carries few perils. Once debugged, the rest of the system only sees the driver entry points so cannot muck with the data.

Note that file that handles the UART is rather small. It's a package that can be reused in other applications.

Wrinkles

Encapsulation isn't free. It consumes memory and CPU cycles. Terribly resource-constrained applications might not be able to use it at all.

Even in a system with plenty of headroom there's nothing faster for passing data around than a global. It's not always practical to eliminate them altogether. But their use (and worse, their abuse) does lead to less reliable code. My rule, embodied in my firmware standard, is "no global variables! But. if you really need one. get approval from the team lead." In other words, globals are a useful asset managed very, very carefully.

When a developer asks for permission to use a global, ask "Have you profiled the code? What makes you think you need those clock cycles? Give me solid technical justification for making this decision."

Sometimes it's truly painful to use encapsulation. I've seen people generate horribly contorted code to avoid globals. Use common sense; strive for a clean design that's maintainable.

Encapsulation has its own yin and yangs. Debugging is harder. You're at a breakpoint with nearly all the evidence at hand needed to isolate a problem, except for the value of the encapsulated system status! What now?

It's not too hard to reference the link map, find the variable's address, and look at the hex. But it's hard to convert a floating point number's hex representation to human-speak. One alternative is to have a debug mode such that the encapsulated variable_set() function stores a copy, which no other code accesses, in a real global somewhere. Set this inside the driver's reentrancy-protection code so interrupt corruption is impossible.

The Other Side of the Story

A couple of thousand words ago I mentioned a program that had 5000 globals. When it came time to do a major port and cleanup of the code we made a rule: no globals. But this was a big real-time system pushing a lot of data around very fast; globals solved some difficult technical problems.

But at the end of the port there were only 5 global variables. That's a lot more tractable than derailing with 5000.