Episode 6: The Inside Scoop on Getting Names in Firmware Right.
| July 14, 2014 | Level: Everyone | 
(Go to the complete list of videos)
While long names will never yield self-documenting code, correctly naming things is hugely important. Watch this 9 minute video to learn why using names like read_timer_1 is poor programming practice.
Video Transcription
Hi, I'm Jack Ganssle and welcome to the embedded news and video blog,   which is a companion to my free online embedded news e-newsletter. Today   we're going to talk a little bit about naming conventions for variables   and functions in embedded firmware. 
                
I read an awful lot of code and I'm constantly astonished at some of the   really poor names that people select for their variables and functions.   Ironically, this is often from the same people who believe, absolutely   incorrectly, that with long variable names you can have self-documenting   code. That's just not the case. However, correct variable names are   really important. 
Why is it that every index variable is named "i", or for nested loops   "ii" or my personal favorite, "iii". Well you know, it goes back to   FORTRAN, 60 years ago because with FORTRAN, variables starting with the   letter "i" were the first of the default integers and somehow we're   still doing that even though it doesn't really make a whole lot of sense   anymore. 
 
 
 
There are some variable names which you see which are really baffling.   This is from Linux. This is not a contest to see how few characters you   can type and where are the comments? What's the documentation on those   parameters, what do they mean? This is awful stuff and whoever wrote   this code should be banished from the ranks of programmers. 
The naming problem was solved in 1735 by Carl Linnaeus, when he came up   with is now known as Linnaean taxonomy. He said you start with the very   general and you work yourself to the more and more specific. 
So when you name biological entities for example, you start with the   kingdom, plant or animal and you work your way down to more and more   specific. The class of creatures that we're members of are of the Genus,   the general, homo and the species, specific, sapiens. So we're called   homo sapiens. 
So an example of a name that doesn't correspond to these rules, which we   see a lot of, is read_timer. It's a terrible name, it's exactly the   opposite of what I've just talked about. A much better name would be   timer_read, timer_initialize, timer_get. Doing it this way, all of   these things are logically grouped together. 
I believe that with 2 exceptions, we should not permit any abbreviations   or acronyms in our names. They had a pretty interesting experiment,   they had one group of computer science people abbreviate names and they   had another group of CS people try to expand those abbreviations. They   had a 60% success rate. What that tells me is that abbreviating is a   form of encryption, which is exactly the opposite of what we want.   Remember, in writing software, clarity is our goal, encryption is not. 
Now I said there were 2 exceptions, the first of course is anything   that's industry standard. You know, everyone knows what USB means and   the second exception is anything we've defined in a data dictionary,   perhaps in a header file somewhere. So for example here, mps means   meters per second and everyone on the project is using exactly the same   abbreviation so it's very clear what this means. 
When I took my first physics class I couldn't believe it. The professor   taught us to cheat on exams. I had no idea they would do this, they   showed that if you could understand the units, without even   understanding the physics, you could often time get the correct answer   by canceling units and by using the units to do the math correctly and   he was certainly correct. 
You've probably heard of the Mars Climate Orbiter. That was a   spacecraft that was destined for Mars and it got there. It got there and   it smacked right into the surface of the planet, it was supposed to go   into orbit. The problem was that the units used in the software for the   ground support equipment were different from the units used in the   spacecraft itself. They mixed up metric and imperial units, a real   bone-headed problem. What this says to me is that any variable or   function that has some sort of physical parameters associated with it,   should have the units suffixed to it. 
So for example, timer. What is this saying? Is this in ticks,   microseconds, milliseconds, weeks? I don't know, but if you suffix it   with, in this case microseconds, it's absolutely clear. Or descent rate.   Is that meters per second, centimeters per second, furlongs per   fortnight? I don't know. If we attach this mps suffix, which we've   defined in our data dictionary to mean meters per second, then everyone   knows exactly what's in this variable and the chances of screwing things   up go way down. 
I poked a little fun at index variables, but in truth I think it is   reasonable to use very short names for index variables. As long as they   have a very short scope, that only use over a few lines of code. So if   you have a loop that spans maybe four of five lines of code, sure, go   ahead and use i, j, k that's fine, but if the loop is bigger than that   then go ahead and use a much more descriptive name. What about   CamelCase, or using underscores to separate words in a name? What's the   correct thing to do? 
I personally don’t think it makes any difference. My personal preference   is to use underscores. It's a little bit more like English where you   use spaces to separate names, I think it's slightly more readable, but I   think either way is fine, you just make a rule and stick to it.   CamelCase suffers a little bit when we're talking about, say constant   sort of macros that might be defined with all uppercase characters. That   becomes a little more problematic but, the bottom line is that it   doesn't make a difference as long as you all follow the same rules and   everyone on the team is doing it the same way. 
How about Hungarian? Hungarian notation is where we prefix a variable   with a letter or 2 or 3 that indicate the type of the variable and this   was all the rage for some time. I think that it is, in general, a   mistake. I think it reduces readability. 
The MISRA rules, MISRA of course is a standard that is becoming more and   more popular in the firmware world. The MISRA rules very wisely   prohibit us from using base types because, for example, int. What's an   int? Nobody knows. It depends upon the compiler, the CPU and the wind   direction. We build these fabulously complex systems based upon these   really unknown identities. MISRA doesn't say how we should define our   types. They recommend though, that we use a POSIX standards. 
So for example, you went 16_T, would mean an unsigned 16 bit integer,   which I think is a pretty good way of doing things, but it makes is   fairly difficult to use Hungarian with such a long type name. Some   people would put U16 but I think that because the IDEs are so good today   that it's not really necessary when you hover over a variable inside   your IDE it will usually tell you what the type of the variable is. 
But maybe a better prefix global variables with something really   horrible and nasty, just to indicate to everybody else that this is a   global and to discourage people from actually using globals because they   are so dangerous. I mean, realistically, perhaps a prefix of G_ or   Global_ would make an awful lot of sense. Yeah, it makes the variable a   bit harder to use, but that's a good thing, because globals are so   dangerous. But that's the same reason they painted dynamite red, to warn   people that this is a dangerous commodity. We're looking for   readability, over brevity. 
So, active window is much preferable to ActWin. ActWin may make perfect   sense to you today but in 2 or 3 years when someone else is maintaining   this code, it may not be as obvious to people. Often times we have   legacy code, or we inherit a code which is a mess, the names are just a   disaster. No one has time to go in and fix the well. It would be   wonderful if we could but it's just not possible. 
I think it's better and makes sense, to draw a metaphorical line in the   sand where we say, from now on, going forward, we're going to do things   the right way. We'll use meaningful, well thought out variable names and   yeah, we'll continue to deal with the old mess because that's just the   way it is. The old people who did it this way, they were amateurs, but   as professional developers, we should use the best possible practices   going forward, recognizing that we can't fix all the old sins. 
So that's my take on variable and function names. I think getting them   right is really important and I think it makes sense to put just a bit   of thought into them. So thanks for watching and don't forget to go over to ganssle.com for  more embedded videos, over 1000 articles on the subject, and be sure to sign up for the free Embedded Muse newsletter.
              

