By Jack Ganssle

Published in Embedded Systems Design, July 2007

Perfecting Naming Conventions

'Tis but thy name that is my enemy;
Thou art thyself, though not a Montague.
What's Montague? It is nor hand, nor foot,
Nor arm, nor face, nor any other part
Belonging to a man. O, be some other name!
What's in a name? That which we call a rose
By any other name would smell as sweet;
So Romeo would, were he not Romeo call'd,
Retain that dear perfection which he owes
Without that title. Romeo, doff thy name,
And for that name which is no part of thee
Take all myself.

Maybe to Juliet names were fungible, but names and words matter. Biblical scholars refute attacks on scripture by exhaustive analysis of the meaning of a single word of Greek or Aramaic, whose nuance may have changed in the intervening millennia, corrupting a particular translation.

In zoology the binomial nomenclature, originally invented by Carl Linnaeus (born 300 years ago this year), rigorously specifies how species are named. Genus names are always capitalized while the species name never is. That's the standardized way zoologists communicate. Break the standard and you're no longer speaking the language of science.

Names are so important there's an entire science, called Onomatology, devoted to their use and classification.

In the computer business names require a level of precision that's unprecedented in the annals of human history. Motor_start() and motor_start() are as different as the word for "hair" in Urdu and Esperanto. Mix up "l" and "1" or "0" and "O" and you might as well be babbling in Babylonian. Yet, depending on the compiler implementation, this_is_a_really_long_variable_name and this_is_a_really_long_variable_name_complete_nonsense are identical.

Yet we still use "i", "ii", and "iii" (my personal favorite) for index variables. You have to admire anyone devoted to his family, but that's no excuse for the too-common practice of using a spouse's or kid's name in a variable declaration.

Words matter, as do Names. Don't call me "Dave." I won't respond. Don't call a variable foobar. It conveys nothing to a future maintainer. Great code requires a disciplined approach to naming.

Conventions

There are some 7000 languages used today on this planet, suggesting a veritable Babel of poor communication. But only about 9 are spoken by more than 100 million people; 69 are known by 10 million or more (http://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers). The top ranks are disputed, but speakers of Mandarin, Spanish, English, and perhaps Hindi far outnumber those for any other language. C itself is composed entirely of English words like "if," "while," "for," though in many companies programmers comment in their native language. This mix can't lead to clarity, which is the overarching goal of naming and documenting.

The global economy means, for better or worse, many companies that do all their work in-house will be outsourcing and offshoring parts of the effort in the future. Sites like rentacoder.com that encourage programmers from a vast number of countries compete for work are a harbinger of the future. A common lingo is needed to ease communication between so many cultures, and the default is clearly English. That may change, just as French was once the lingua franca of diplomacy and German that for science. So at the risk of sounding Anglo-centric I think it's clear that before too long most of us will be required to develop code and comments in English. Names, therefore, should use English words.

Spelling matters. Misspelled words are a sign of sloppy work. Our crummy tools, though, don't do any sort of spell-checking, even though programmers have given the rest of the world fabulous tools that immediately flag a misspelled word. Invariably a spelling error will creep in from time to time. When discovered, fix it. It's nearly impossible to maintain code littered with these sorts of mistakes, as now the developer has to remember the oxymoronic "correct misspelling" to use.

Long names are a great way to convey meaning, but C99 requires that only the first 31 and 63 identifiers to be significant for external and internal names, respectively. Restrict all names to 31 characters or less.

Don't redefine a name using C's scoping rules. Though legal, having two names with different meanings is confusing. Similarly, don't use names that differ only in case.

On the subject of case, it's pretty traditional to define macros and constants in upper case while using a mix of cases for functions and variable names. That seems reasonable to me. But what about camel case? Or should I write that CamelCase? Or is it camelCase? Everyone has a different opinion. But camel case is merely an awkward way to simulate a space between words, which gets even more cryptic when using acronyms: UARTRead. Some advocate only capitalizing the first letter of an acronym, but that word-izes the acronym, torturing the language even more.

Types

Developers have argued passionately both for and against Hungarian notation since it was first invented in the 70s by space tourist Charles Simonyi. At first blush the idea is appealing: prefix variables with a couple of letters indicating the type, increasing the name's information density. Smitten by the idea years ago I drank the Hungarian cool-aid.

In practice Hungarian makes the code ugly. Clean names get mangled. szString means "String" is zero-terminated. uiData flags an unsigned int. Then I found that when changing the code (after all, everything changes all the time) sometimes an int had to morph to a long, which meant editing every invocation of the name. One team I know avoids this problem by typedefing a name like iName to long, which means not only is the code ugly, but the Hungarian nomenclature lies to the unwary.

C types are problematic. Is an int 16 bits? 32? Don't define variables using C's int and long keywords; follow the MISRA standard and use the following typedefs to remove all ambiguity, and to make porting much simpler:

int8_t - 8 bit signed integer
int16_t - 16 bit signed integer
int32_t - 32 bit signed integer
uint8_t - 8 bit unsigned integer
uint16_t - 16 bit unsigned integer
uint32_t - 32 bit unsigned integer

See http://www.opengroup.org/onlinepubs/009695399/basedefs/stdint.h.html for some interesting extensions to these typedefs for use where performance issues mean we want the compiler to make the smartest decisions possible.

Forming Names

Linnaeus developed a hierarchy to classify organisms, which today consists of the Kingdom, Phylum, Class, Order, Family, Genus, and Species, which is reflected in biological names like Homo sapiens. The Genus comes first, followed by the more specific, the species. It's a natural way to identify large sets. Start from the general and work towards the specific.

The same goes for variable and function names. They should start with the big and work towards the small. Main_Street_Baltimore_MD_USA is a lousy name as we're not sure till the very end which huge domain - the country - we're talking about. Better: USA_MD_Baltimore_Main_Street.

Yet most of the code I read uses names like Read_Timer0(), Read_UART(), or Read_DMA(). Then there's a corresponding Timer0_ISR(), with maybe Timer0_Initialize() or Initialize_Timer0(). See a pattern? I sure don't.

Better:
               Timer_0_Initialize()
               Timer_0_ISR()
               Timer_0_read()

With this practice we've grouped everything to do with Timer 0 together in a logical, Linnaean taxonomy. A sort will clump related names together.

In a sense this doesn't reflect English sentence structure. "Timer" is the object; "read" the verb, and objects come after the verb. But a name is not a sentence, and we do the best we can do in an imperfect world. German speakers, though, will find the trailing verb familiar.

Since functions usually do something it's wise to have an action word, a verb, as part of the name. Conversely, variables are just containers and do nothing. Variable names should be nouns, perhaps modified by adjectives.

Avoid weak and non-specific verbs like "handle," "process" and "update." I have no idea what "ADC_Handle()" means. "ADC_Curve_Fit()" conveys much more information.

Short throwaway variable names are fine occasionally. A single line for loop that uses the not terribly-informative index variable "i" is reasonable if the variable is both used and disposed of in one line. If it carries a value, which implies context and semantics, across more than a single line of code pick a better name.

TLAs and Cheating

In M.H. Hodge and F.M. Pennington. Some Studies of Word Abbreviation Behavior. Journal of Experimental Psychology, 98(2):350-361, 1973 researchers had subjects abbreviate words. Other subjects tried to reconstruct the original words. The average success rate was an appalling 67%.

What does "Disp" mean? Is it the noun meaning the display hardware, or is it the verb "to display?" How about "Calc?" That could be percent calcium, calculate or calculus.

With two exceptions never abbreviate a name. Likewise, with the same caveats, never use an acronym. Your jargon may be unknown to some other maintainer, or may have some other meaning. Clarity is our goal!

One exception is the use of industry-standard acronyms and abbreviations like LED, LCD, CRT, UART, etc that pose no confusion. Another is that it's fine to use any abbreviation or acronym documented in a dictionary stored in a common header file. For example:

/*  Abbreviation Table
 Dsply    == Display (the verb)
 Disp     == Display (our LCD display)
 Tot      == Total
 Calc     == Calculation
 Val      == Value
 MPS      == Meters per second
 Pos      == Position
*/

I remember with some wonder when my college physics professor taught us to cheat on exams. If you know the answer's units it's often possible to solve a problem correctly just by properly arranging and canceling those units. Given a result that must be in miles per hour, if the only inputs are 10 miles and 2 hours, without even knowing the question it's a good bet the answer is 5 MPH.

Conversely, ignoring units is a sure road to disaster. Is Descent_Rate meters per second? CM/sec? Furlongs per fortnight? Sure, the programmer who initially computed the result probably knows, but it's foolish to assume everyone is on the same page. Postfix all physical parameters with the units. Descent_Rate_MPS (note in the dictionary above I defined MPS). Timer_Ticks. ADC_Read_Volts().

Are there a lot of rules about naming? You betcha. But they come naturally with practice. And practice is what we must be doing for the rest of our careers. Practice new ideas. Practice more effective in-the-code documentation.

For stasis is death.