The speaker lineup for the Embedded Online Conference is pretty amazing! Sign up with promo code GANSSLE90 and wonderful things will happen like reversal of male pattern baldness, a guest spot on Teen Vogue magazine, and a boost of what JFK called "vim and vigor." It will also get you $100 off the $190 registration fee (which goes up to $290 May 1).
Episode 6: The Inside Scoop on Getting Names in Firmware Right.
|July 14, 2014||
(Go to the complete list of videos)
While long names will never yield self-documenting code, correctly naming things is hugely important. Watch this 9 minute video to learn why using names like read_timer_1 is poor programming practice.
Hi, I'm Jack Ganssle and welcome to the embedded news and video blog, which is a companion to my free online embedded news e-newsletter. Today we're going to talk a little bit about naming conventions for variables and functions in embedded firmware.
I read an awful lot of code and I'm constantly astonished at some of the really poor names that people select for their variables and functions. Ironically, this is often from the same people who believe, absolutely incorrectly, that with long variable names you can have self-documenting code. That's just not the case. However, correct variable names are really important.
Why is it that every index variable is named "i", or for nested loops "ii" or my personal favorite, "iii". Well you know, it goes back to FORTRAN, 60 years ago because with FORTRAN, variables starting with the letter "i" were the first of the default integers and somehow we're still doing that even though it doesn't really make a whole lot of sense anymore.
There are some variable names which you see which are really baffling. This is from Linux. This is not a contest to see how few characters you can type and where are the comments? What's the documentation on those parameters, what do they mean? This is awful stuff and whoever wrote this code should be banished from the ranks of programmers.
The naming problem was solved in 1735 by Carl Linnaeus, when he came up with is now known as Linnaean taxonomy. He said you start with the very general and you work yourself to the more and more specific.
So when you name biological entities for example, you start with the kingdom, plant or animal and you work your way down to more and more specific. The class of creatures that we're members of are of the Genus, the general, homo and the species, specific, sapiens. So we're called homo sapiens.
So an example of a name that doesn't correspond to these rules, which we see a lot of, is read_timer. It's a terrible name, it's exactly the opposite of what I've just talked about. A much better name would be timer_read, timer_initialize, timer_get. Doing it this way, all of these things are logically grouped together.
I believe that with 2 exceptions, we should not permit any abbreviations or acronyms in our names. They had a pretty interesting experiment, they had one group of computer science people abbreviate names and they had another group of CS people try to expand those abbreviations. They had a 60% success rate. What that tells me is that abbreviating is a form of encryption, which is exactly the opposite of what we want. Remember, in writing software, clarity is our goal, encryption is not.
Now I said there were 2 exceptions, the first of course is anything that's industry standard. You know, everyone knows what USB means and the second exception is anything we've defined in a data dictionary, perhaps in a header file somewhere. So for example here, mps means meters per second and everyone on the project is using exactly the same abbreviation so it's very clear what this means.
When I took my first physics class I couldn't believe it. The professor taught us to cheat on exams. I had no idea they would do this, they showed that if you could understand the units, without even understanding the physics, you could often time get the correct answer by canceling units and by using the units to do the math correctly and he was certainly correct.
You've probably heard of the Mars Climate Orbiter. That was a spacecraft that was destined for Mars and it got there. It got there and it smacked right into the surface of the planet, it was supposed to go into orbit. The problem was that the units used in the software for the ground support equipment were different from the units used in the spacecraft itself. They mixed up metric and imperial units, a real bone-headed problem. What this says to me is that any variable or function that has some sort of physical parameters associated with it, should have the units suffixed to it.
So for example, timer. What is this saying? Is this in ticks, microseconds, milliseconds, weeks? I don't know, but if you suffix it with, in this case microseconds, it's absolutely clear. Or descent rate. Is that meters per second, centimeters per second, furlongs per fortnight? I don't know. If we attach this mps suffix, which we've defined in our data dictionary to mean meters per second, then everyone knows exactly what's in this variable and the chances of screwing things up go way down.
I poked a little fun at index variables, but in truth I think it is reasonable to use very short names for index variables. As long as they have a very short scope, that only use over a few lines of code. So if you have a loop that spans maybe four of five lines of code, sure, go ahead and use i, j, k that's fine, but if the loop is bigger than that then go ahead and use a much more descriptive name. What about CamelCase, or using underscores to separate words in a name? What's the correct thing to do?
I personally don’t think it makes any difference. My personal preference is to use underscores. It's a little bit more like English where you use spaces to separate names, I think it's slightly more readable, but I think either way is fine, you just make a rule and stick to it. CamelCase suffers a little bit when we're talking about, say constant sort of macros that might be defined with all uppercase characters. That becomes a little more problematic but, the bottom line is that it doesn't make a difference as long as you all follow the same rules and everyone on the team is doing it the same way.
How about Hungarian? Hungarian notation is where we prefix a variable with a letter or 2 or 3 that indicate the type of the variable and this was all the rage for some time. I think that it is, in general, a mistake. I think it reduces readability.
The MISRA rules, MISRA of course is a standard that is becoming more and more popular in the firmware world. The MISRA rules very wisely prohibit us from using base types because, for example, int. What's an int? Nobody knows. It depends upon the compiler, the CPU and the wind direction. We build these fabulously complex systems based upon these really unknown identities. MISRA doesn't say how we should define our types. They recommend though, that we use a POSIX standards.
So for example, you went 16_T, would mean an unsigned 16 bit integer, which I think is a pretty good way of doing things, but it makes is fairly difficult to use Hungarian with such a long type name. Some people would put U16 but I think that because the IDEs are so good today that it's not really necessary when you hover over a variable inside your IDE it will usually tell you what the type of the variable is.
But maybe a better prefix global variables with something really horrible and nasty, just to indicate to everybody else that this is a global and to discourage people from actually using globals because they are so dangerous. I mean, realistically, perhaps a prefix of G_ or Global_ would make an awful lot of sense. Yeah, it makes the variable a bit harder to use, but that's a good thing, because globals are so dangerous. But that's the same reason they painted dynamite red, to warn people that this is a dangerous commodity. We're looking for readability, over brevity.
So, active window is much preferable to ActWin. ActWin may make perfect sense to you today but in 2 or 3 years when someone else is maintaining this code, it may not be as obvious to people. Often times we have legacy code, or we inherit a code which is a mess, the names are just a disaster. No one has time to go in and fix the well. It would be wonderful if we could but it's just not possible.
I think it's better and makes sense, to draw a metaphorical line in the sand where we say, from now on, going forward, we're going to do things the right way. We'll use meaningful, well thought out variable names and yeah, we'll continue to deal with the old mess because that's just the way it is. The old people who did it this way, they were amateurs, but as professional developers, we should use the best possible practices going forward, recognizing that we can't fix all the old sins.
So that's my take on variable and function names. I think getting them right is really important and I think it makes sense to put just a bit of thought into them. So thanks for watching and don't forget to go over to ganssle.com for more embedded videos, over 1000 articles on the subject, and be sure to sign up for the free Embedded Muse newsletter.