Go here to sign up for The Embedded Muse.
TEM Logo The Embedded Muse
Issue Number 363, December 3, 2018
Copyright 2018 The Ganssle Group

Editor: Jack Ganssle, jack@ganssle.com

   Jack Ganssle, Editor of The Embedded Muse

You may redistribute this newsletter for non-commercial purposes. For commercial use contact jack@ganssle.com. To subscribe or unsubscribe go here or drop Jack an email.

Contents
Editor's Notes

After over 40 years in this field I've learned that "shortcuts make for long delays" (an aphorism attributed to J.R.R Tolkien). The data is stark: doing software right means fewer bugs and earlier deliveries. Adopt best practices and your code will be better and cheaper. This is the entire thesis of the quality movement, which revolutionized manufacturing but has somehow largely missed software engineering. Studies have even shown that safety-critical code need be no more expensive than the usual stuff if the right processes are followed.

This is what my one-day Better Firmware Faster seminar is all about: giving your team the tools they need to operate at a measurably world-class level, producing code with far fewer bugs in less time. It's fast-paced, fun, and uniquely covers the issues faced by embedded developers. Information here shows how your team can benefit by having this seminar presented at your facility.

Latest blog: Engineer or scientist?

Quotes and Thoughts

A safety culture is a culture that allows the boss to hear bad news. Sidney Dekker

Tools and Tips

SEGGER emPack The complete OS for embedded systems

Please submit clever ideas or thoughts about tools, techniques and resources you love or hate. Here are the tool reviews submitted in the past.

Bit-banding is a really useful feature supported by most of the Cortex-M processors. It allows normal load/store operations to access individual bits in memory and I/O. Joseph Yiu's excellent The Definitive Guide to the ARM Cortex-M3 has a good description of it. Another is here.

In the last Muse I suggested that firmware developers familiarize themselves with using oscilloscopes. Since then I ran across Rohde&Schwarz's excellent introductory guide. Did you know the CRT scope was invented in 1897? There's a picture of it in that guide. That's prior to vacuum tubes, or, at least, de Forest's Audion.

Not new, but of considerable interest to firmware developers: The Power of Ten - Rules for Developing Safety Critical Code, by Gerard J. Holzmann. The paper discusses ten rules JPL uses to improve their code. From the concluding paragraph: "These ten rules are being used experimentally at JPL in the writing of mission critical software, with encouraging results. After overcoming a healthy initial reluctance to live within such strict confines, developers often find that compliance with the rules does tend to benefit code clarity, analyzability, and code safety."

Freebies and Discounts

Tony Gerbic won last month's voltage standard.

This month's giveaway is a piece of junk. Or rather, a battered and beaten "historical artifact." It's a Philco oscilloscope from 1946. The manual, including schematic, is here. I picked it up on eBay a few years ago, and while it's kind of cool, have no real use for the thing. It powers up and displays a distorted waveform, usually, but is pretty much good for nothing other than as a desk ornament. I wrote about this here. (The thing is so old I'd be afraid to leave it plugged in while unattended). Oh, that magnet on the right side? It's to position the beam!

Enter via this link.

Reusing SOUP

Firmware is growing in size and complexity at a ferocious rate. Our only hope of keeping up with customers' demands is reuse, whether via commercial products, open source code, or reusing proprietary components. Often we don't know a lot about the hunk of code we'd like to incorporate, leading to the somewhat onomatopoeic acronym SOUP: SOftware of Unknown Pedigree.

While most developers are not working on safety-critical code, we all want our products to be safe and reliable. It's instructive to look at how SOUP is handled in the world where a bug could kill someone.

IEC 61508 is probably the most well-known standard for building safe systems. A portion of that document covers software. It identifies four Safety Integrity Levels (SIL) from 1 to 4, with SIL4 representing a system whose failure would be catastrophic.

So, can SOUP be used in a system qualified to 61508? The answer is, "it depends."

There are three routes to gaining certification for SOUP under 61508:

  1. The software was created under the auspices of the standard.
  2. The software has been "proven in use" over time.
  3. Noncompliant software is ex post facto assessed at the desired SIL level.

Option 1 is pretty much out for extant open source projects and most legacy code. Large code bases of any sort probably can't be qualified under option 3 due to the enormous costs involved. At higher levels (SIL3 and SIL4) few vendor-supplied packages will meet either of these qualifications, though some RTOS and comm stack providers do provide certifiable components today.

That leaves option 2. To meet this requirement the integrator of the component (not a component's vendor) must demonstrate that it has been used in a similar role for a certain number of hours. Its uses and failures, if any, need to be recorded. Further, it must have "a restricted and specified functionality," which would seem to leave out big packages like desktop operating systems.

Let's look into this a bit more deeply.

The standard defines two kinds of systems. One that supports "low demand mode of operation" is generally turned off or is usually not performing any safety functions. A trivial example is a TV remote control that only comes to life when a button is pressed.

Then there are "high demand or continuous mode of operation" devices: basically, those that run a safety function all of the time. An example might be a nuclear power plant controller, or, in some families, the TV set (assuming the TV is considered family-critical!).

The standard's reliability requirements are:

Formulas are provided to turn those figures into number of requests ("treated demands") or hours of operation to show a component is proven-in-use. Assuming 99% and 95% confidence levels, these work out to:

(C is confidence level).

With billions of hours of use - recorded use - required, it's hard to see how any component could be qualified to SIL4 using the "proven in use" option. One could argue that if used in 100 million embedded systems you could divide those hours-of-operations by 100 million, making such certification feasible, but remember that the usage and failures of these must be recorded.

In the safety world evidence must be presented to build a safety case that a component will work as promised. The use and failure log is that evidence.

The key takeaway for non-safety-critical systems is that if you select a component and expect high reliability, either it must have been built using a very rigorous process, or it needs an awful lot of in-service time to gain confidence in it.

Obviously, the range of safety and reliability requirements for embedded systems is vast. A smart toothbrush doesn't have the same requirements as an engine control unit, which in turn can be less reliable than, say, a nuclear weapon's permissive action link. (There's a lesson here in end-runs around safety-systems, given that the PAL key was reputedly 00000000 for years.)

Advice to a Young Developer

Daniel Wisehart had ideas on advice to young developer in addition to what I wrote:

I have a few more suggestions for young firmware developers, some for aspiring developers and some for just-hired developers, in no particular order.

1) Learn a bit about heat and how it relates to what you do in firmware.  If you double to clock rate or the rate you activate outputs, how much more heat do you generate?  Is there a difference between holding unused outputs high and holding them low (you will need to talk to a hardware person about this: do not be afraid to ask)?

2) Learn about energy usage in battery operated systems.  If you pulse an LED faster than the human eye can detect at 50% duty cycle: how much less energy do you use?  What does this do to LED brightness (you will need a datasheet to answer this)?

3) Learn to read data sheets well enough to get basic information that effects the firmware and do not be afraid to always ask for and read the data sheets for every component connected to the firmware in the system you are working on.

4) Learn to read schematics well enough to answer your own questions about how things are connected together.  Do not be afraid to ask a hardware person to explain what you do not understand that seems related to the firmware.

5) Ask a lot of questions about the ways the firmware needs to help test the hardware and the complete system.  Even if you are not there for the first turn of the boards, there will be hardware replacement for various reasons and customer bug reports to investigate.  Good tools in your testing toolkit will save you a lot of time and frustration.

6) Make sure you understand in some detail how the system works.  Imagine you are training a new guy hired after you.  Can you explain to a new firmware developer how the system is supposed to work?

Dave Kellog wrote:

These are two relatively "old" books.  However, there is a lot a newbie (high school or otherwise) can learn from them:

  1. The Definitive Guide to How Computers Do Math : Featuring the Virtual DIY Calculator.  "Clive 'Max' Maxfield and Alvin Brown have written a wonderful book... about the essential workings of computers." (The Embedded Muse 125, February 22, 2006).
  2. Bebop Bytes Back: An Unconventional Guide to Computers Paperback - August 1, 1997
On Legacy Code

Last issue I wrote a bit about legacy code. Readers had some interesting thoughts. Tom Mazowiesky wrote:

On Legacy code - I do both full time work and consulting so I support legacy code and develop new.  

I started at my full time job as a firmware developer in 1994 at my present company, and the code was an assembly language mess.  We manufacture bill validators for slot games and vending machines for international customers.  Each customer had a unique version of a 300K byte source program, outside of the currency database which was a separate module, but copied into the source file for each customer.  We had multiple customers for each of about 60 different currencies we supported, and there were hardware differences as well!

When I joined they were introducing a new version product and I took the most generic program and began to change it into modular code.  We wound up with 50 or 60 assembler modules including a separate database for each currency.   We managed to cut our error rates down by a factor of ten using the new code, and of course when we fixed a bug in one source we were done - didn't have to edit 100+ versions of the same source.

Now it took a fair amount of effort on my part to do that, but when you look at how it improved the product and reduced customer complaints, it was worth the time of a senior software guy (me) to do this.

Our next new machine used C as the base language and we did modular development from scratch.  This effort occurred in the 1999-2000 time frame.  When the product was introduced it had some problems, but most were hardware/mechanical.  There were software problems but they were manageable, most due to how our customers implemented our communication protocols.  We changed code to fix their bugs (these machines are in regulated markets, so code changes on their part cost them between 10K and 100K dollars to resubmit) about 95% of the time.

We eventually upgraded the processor in the machines, a drop in DSP board to improve speed of recognition.  About 85% of the code ported over directly, the rest low level hardware code that was processor specific.

It's been 18 years since we did the first 'C' machine and 12 years since the DSP upgrade.  We used LINT at the start (still do) and did a lot of code walk throughs at the beginning and it really paid off.  We don't do any bug hunting these days, its been two years since someone found a problem - a module that was clearing a flag it shouldn't have - so we can dedicate resources to new product development.

Using modular development, LINTing and Code reviews really return dollar value benefits.  It's bad because you don't need as many people supporting your product;  but it's good because you don't need as many people supporting you product (the yin/yang thing).

Steve Peters sees a parallel to hardware:

Legacy code is the bane of software developers. Some of the worst code I've ever seen was created by developers supposedly more experienced than myself. I've spent countless overtime hours refactoring crap code out of desperation because it was no longer fixable; no way to tell by looking what would happen during runtime. At least I had the luck to spend about 1/3 of my career on new development. But chances are, a developer's fate is fixing someone else's junk. Worst of all, with no documentation or meaningful comments, too much precious time gets wasted reverse engineering code that probably should be tossed overboard. Imagine how any hardware designer would laugh if presented with a hopelessly outdated motherboard and asked to "just make a few modifications" to make it perform more like a modern platform. Without schematic or any other doc of any kind. Yet that is what software developers routinely face. Think I'm exaggerating? Consider the Y2K problem, where 30+ year old stranded legacy systems had to be "modernized". Would anyone modernize a 30 year old motherboard? Sometimes it really is necessary to just start over.

Lars Pötter contributed:

Creating a new Project from scratch is not the solution for code rotting. I have seen a major Software Project being started from scratch as the new better version of what was developed for many years before.

Once finished we looked there to find solutions for the issues the old version had and found the same issues.

The solution can only be to stop the degradation of the code. The boy scout rule applies "Always leave the campground cleaner than you have found it". So writing unit test for the parts you change and putting in good code at least at these parts increases the over all code quality bit by bit. It is also very effective as you basically only put in tests at places that had bugs before. So you could argue that the code that "never fails anyway" doesn't get the "unnecessary" tests.

The other benefit of this workflow is that you learn more and more about the code (and hopefully add comments as soon as you understood something new). Often when code is thrown away and rewritten people realize during the rewrite or afterwards that they did not understand what the old code was doing and why it did it that way. I have seen many examples where the argument for the rewrite was some "crazy complicated" algorithms used in the old code. The new code then failed in some "rare cases" and in the end the new code was changed to use the same "crazy complicated" algorithms because they work in all the needed use cases. Who doesn't learn from history,...

A rewrite instead of maintaining code should only be done by someone who really understands the old code down to the last line. Otherwise you just reimplement the same problems. If you do not understand the code, start with test and do refactoring until the code does what you want it to do.

If you can not improve the code quality of existing code. And if all the projects you work on degrade into a state where a rewrite is the only solution, then get a job outside of programming!

This is a good point. The second law of thermodynamics says that disorder increases in closed systems. Entropy increases. Programs are no more exempt from this depressing truth than the expanding universe. Successive maintenance cycles increase the software's fragility, making each additional change that much more difficult.

Software is like fish. It rots. Over time, as hastily-written patch after panicked feature change accumulate in the code, the quality of the code erodes. Maintenance costs increase.

As Ron Jeffries has pointed out, maintenance without refactoring increases the code's entropy by adding a "mess" factor (m) to each release. That is, we're practicing really great software engineering all throughout a maintenance upgrade... and then a bit of business reality means we slam in the last few changes as a spaghetti mess.

The cost to produce each release looks something like: (1+m)(1+m)(1+m)...., or (1+m)N, where N is the number of releases. Maintenance costs grow exponentially as we grapple with more and more hacks and sloppy shortcuts. This explains that bit of programmer wisdom that infuriates management: "the program is too much of a mess to maintain".

But many advocate starting release N+1 by first refactoring the mess left behind in version N's mad scramble to ship. Refactoring incurs its own cost, r. But it eliminates the mess factor, so releases cost 1+r+r+r..., which is linear. This math is more anecdotal than accurate, but there's some wisdom behind it.

Luke Hohmann calls this "post release entropy reduction." It's critical we pay off the technical debt incurred in being abusive to the software. Maintenance is more than cramming in new features; it's also reducing accrued entropy.

Another Ideal Diode

A lot of readers responded to the Ideal Diode in the last Muse. It seems many such products are available. Enrico wrote:

Regarding the ideal diode, I just wanted to "advertise" a part which I used in the past and was incredibly simple, small and cheap.

The LTC4412HV is an ideal diode controller, meaning the current capability is limited only by the MOSFET technology, which is constantly improving. That diode provides 20mV of forward voltage only, given the external P-MOS will work in triode region: it will keep it "closed enough" to have such sensed Vds across it.

The MOSFET shall work with a Vgs between the minimum input voltage and a clamped 7V, while its maximum current can be easily found by having 20mV/Rds_on. This, after a small search on any distributor, shows that is really easy to find a suitable MOSFET fro the application, unless really particular needs. The drawback is to find the balance between current, input voltage and Rds of the MOSFET, so that a 20mV drop can be actively kept by the controller; specially when dealing with P-MOS.

But the most amazing thing is that this part can work with a twin (or more) in parallel (like the LTC4376 ), can drive two MOSFETs when used as power selector, and can be configured as a switch controller. And provides a logic input for an MCU interface, like the LTC4376 and many others. I thought to share this, as it has an operating voltage between 2.5V and 36V, which is similar and I think it is cheaper, giving more design flexibility (MOSFETs can be really cheap as well).

Why listing all this great things (at least to me)? To try to compensate the big downside: these ideal diodes are at least 2 orders of magnitude slower than a conventional one. I would not use that in a fast DC-DC converter which is designed to use a conventional free-wheeling diode.

This Week's Cool Product

Semmle sells a tool called QL, which allows developers to ask deep questions about their code. QL treats the code as data, and, like a database, provides a mechanism so users can construct deep queries about the code. Sounds sort of like marketing hand-waving, until one looks at some of their case studies. Engineers at JPL found a bug in the Mars Curiosity Rover firmware which could have resulted in the loss of the mission. Instead of just fixing it and moving on, in 20 minutes they constructed a QL query which found the same problem in 30 other places! That sort of engineering appeals to me: learn something, then assume you could have made a similar mistake, so hunt for more instances of it.

I haven't used it, and can't help but wonder if considerable training is needed about the query language, but always find it heartening to discover more sophisticated tools for error-removal.

Note: This section is about something I personally find cool, interesting or important and want to pass along to readers. It is not influenced by vendors.

Jobs!

Let me know if you’re hiring embedded engineers. No recruiters please, and I reserve the right to edit ads to fit the format and intent of this newsletter. Please keep it to 100 words. There is no charge for a job ad.

Joke For The Week

Note: These jokes are archived here.

Not a joke, but funny. Jeanne Petrangelo sent this:

A woman operating a slot machine in New York "won" $42,949,672.76. Note that 232 is 4,294,967,276. Ya think the software glitched?

The casino offered her two bucks and a free dinner instead of $232/100.

About The Embedded Muse

The Embedded Muse is Jack Ganssle's newsletter. Send complaints, comments, and contributions to me at jack@ganssle.com.

The Embedded Muse is supported by The Ganssle Group, whose mission is to help embedded folks get better products to market faster.