The Embedded Muse 350

Go here to sign up for The Embedded Muse.

The Embedded Muse
Issue Number 350, May 21, 2018
Copyright 2018 The Ganssle Group

Editor: Jack Ganssle, jack@ganssle.com

Jack Ganssle, Editor of The Embedded Muse

You may redistribute this newsletter for non-commercial purposes. For commercial use contact jack@ganssle.com. To subscribe or unsubscribe go here or drop Jack an email.

Contents

Editor's Notes
Quotes and Thoughts
Tools and Tips
Freebies and Discounts
Tin Whiskers
Are Bugs a Problem - Given Quick Fixes? (Redux #2)
The Vanishing Embedded Engineer?
This Week's Cool Product
Jobs!
Joke for the Week
Advertise with us
About The Embedded Muse

Editor's Notes

Years ago a reader wrote in to tell about how he was on a plane waiting to come home. Maintenance was iteratively coming on the plane, and going off of it. Clearly there was a problem. Finally the captain came on the intercom and announced they were going to turn the aircraft off for thirty seconds, and then turn it back on.

It worked. They flew.

The past few issues of The Embedded Muse have included, at first my comments on the pursuit of firmware quality, and then readers' responses. Perfection is hard and even impossible to achieve. But it is, in my opinion, a goal to which we should aspire. Somehow the electronics industry has created a world where we toggle circuit breakers on planes to fix problems, and where every three-year-old knows if something electronical does something weird, cycle power. It seems getting a product to market is more important than getting it right.

Yet there's a tsunami of data that shows writing high-quality code shortens schedules. Why? Correct software drastically shortens debug and test sessions.

That's what my Better Firmware Faster seminar is all about. Find out how you can bring this class to your facility, to help your engineers achieve world-class code on a shorter schedule. More info is here.

On June 27-28 NIST will host the Sound Static Analysis for Security Workshop at NIST's facility in Gaithersburg, MD. To quote from their communications "this two-day workshop is focused on decreasing software security vulnerabilities by orders of magnitude, using the strong guarantees that only sound static analysis can provide. The workshop is aimed at developers, managers and evaluators of security-critical projects, as well as researchers in cybersecurity." I plan to attend. It's free, and there's more info here. (This link is corrected from the one I posted last issue).

Quotes and Thoughts

Clyde Shappee sent a link to a number of (mostly) spaceflight-related quotes: http://spacecraft.ssl.umd.edu/akins_laws.html

Tools and Tips

Please submit clever ideas or thoughts about tools, techniques and resources you love or hate. Here are the tool reviews submitted in the past.

Tony Mactutis, KM7J, wrote:

Jack, for readers seeking technical journalism, may I suggest that your readers consider the Amateur Radio Relay League's publications, particularly QEX. If I need a break from IEEE's publications (which I do enjoy, when I have the time) but am in the mood for some technical reading, I tend to go to QEX. You won't find articles about RTOS's in QEX, but there is plenty of technical content related to embedded systems, and especially their use in communications systems and test equipment. And many of your readers might just find that ham radio is a hobby that dovetails perfectly with their professional work. There was a time (not that long ago) when it was hard to find an electrical engineer who was not also a ham.

Circuit Cellar just released a nice compendium of PCB vendors' design and quoting tools.

In the last issue I wrote: "A couple of interesting factoids he gave include that the "4" in FR4 PCB material means the speed of a wave on that PCB is about one quarter that of c in a vacuum." A number of readers disputed this. Doing some research it's pretty clear "FR" stands for "fire resistance," but the "4" remains, to me, obscure. It appears to be a grade of fire resistance, as suggested by this from Wikipedia: "FR-4 does not specify specific material, but instead a grade of material, as defined by NEMA LI 1-1998 specification". But nearly everyone pointed out that FR-4 has a permittivity of about 4, and the speed of light varies as the square root of that, so runs at about one-half c on an FR-4 PCB, not one-quarter. Carl Van Wormer linked to an EDN article which states:

Freebies and Discounts

This month, thanks to the folks at Expresslogic, we're giving away 8 copies of Real-Time Multithreading, a great book about RTOSes in general and ThreadX in particular.

Enter via this link.

Tin Whiskers

Earlier this month I met with engineers at Goddard Space Flight Center in Greenbelt Maryland to talk about tin whiskers. It was a fascinating and scary discussion. What's even more alarming is that so few engineers are aware of the phenomena.

Tin whiskers (TW) are tiny filaments that grow from tin-plated surfaces. But it's not just tin: zinc, cadmium and perhaps other metals also form whiskers.

Whiskers were first discovered in the 1940s on cadmium-plated surfaces on air-gapped variable capacitors, like the ones once used to tune radios. They would grow long enough to short out plates, suddenly changing the capacitance and thus the frequency the radio was tuned to. Engineers were advised to avoid cadmium-plated surfaces and components, and to use tin or zinc instead.

Alas, later findings implicated those two materials as well.

These whiskers can grow from a tinned surface to short out a circuit. For instance:

The NASA folks have a rogues' gallery of TW-infected components.

TW shorting a connector pin to the connector's shell

TWs are an increasing problem in electronics today because of the EU's RoHS (Reduction of Hazardous Substances) directive. Leaded solder is largely not allowed any more, so tin-based solders are generally used. Older solders were a mixture of lead and tin, and the lead seems to drastically mitigate (though does not entirely eliminate) whiskers. Remove the Pb, as we've done to comply with RoHS, and whiskers can grow. Modern components are so small, and SMT parts have such a tiny lead pitch, that TW shorts are even more likely.

Whiskers are thin. They're generally between 0.5 and 10 um. So if enough current flows they may arc out, like a fuse, so the circuit might be self-healing. On the other hand, a lot of circuits operate with mere mA or uA, which won't be enough to zap the TW.

Worse, NASA showed examples of high-powered relays on AWACs planes where a tiny whisker arced and created a plasma, causing other TWs to arc, leading to catastrophic failures where hundreds of amps flowed before the relay's metal box burned through.

AWAC relay destroyed by TW

Whiskers can be long - over a centimeter. So they can bridge even between ICs.

The NASA people showed me a veritable horror show of TW. Like a 2 x 2 foot section of raised flooring from a server room with a zinc bottom. They figure that one panel has over 10 million whiskers growing. Normally a floor panel isn't much of a problem for electronics, except these get removed for service; maintenance workers unknowingly brush them off. Air currents can waft them into the electronics. Oddly, an adjacent panel was whisker-free.

Eleven space missions have had computer failures, and in some cases complete loss of the spacecraft, due to suspected TW problems.

The Space Shuttle program was almost canceled 4 years before its scheduled shut-down because of TW. PCB hold-downs on flight hardware were tin-plated. Enormous numbers of TW were growing. In a shake test (on the ground) a TW dislodged, creating a short and causing a critical control unit to fail. The Shuttle's program manager, beset with yet another potentially catastrophic problem, considered ending the entire program. Ultimately thousands of these hold-downs were replaced in all of the vehicles with tin-free versions.

Shuttle card guides with TW

The mechanism behind their formation isn't understood, but tin atoms migrate between microscopic crystalline grains (called grain boundary diffusion) over surprisingly long distances. These appear to aggregate on one grain, growing the whisker.

No one can predict when or if they will form. Adding at least 0.5% of lead to the tin greatly reduces their formation (while creating RoHS problems), but TW have been observed even on Pb/Sn solders. Conformal coatings help, but don't eliminate them.

Sometimes they are quite visible but often one needs an LED or fiber-optic light to illuminate a sample at different angles to see the whiskers.

What is the solution? There is none known at this time. This is what NASA recommends (from their web site):

NASA has a web site dedicated to whisker research. I thought The Conjuring was scary, but the site will really give engineers nightmares.

Are Bugs a Problem - Given Quick Fixes? (Redux #2)

In Muse 348 I mused about a meme that is circulating that if one can fix bugs quickly, bugs don't matter. Muse 349 had replies from readers who disagreed. That sparked a lot of email from readers who disagreed with the disagreements. Here are some responses:

Paul Carpenter wrote:

I am reminded of recent Dilbert cartoons

http://dilbert.com/strip/2018-05-03

and

http://dilbert.com/strip/2018-05-02

The last TWO weeks has seen one of the large banks in UK crippled by major update on their mainframes, for at least a week, branches and customers could not process majority of non-automated payments. This is still ongoing for week three...

https://www.theguardian.com/business/2018/may/06/tsb-crisis-it-meltdown-enters-third-week-but-progress-being-made https://www.telegraph.co.uk/news/2018/04/29/half-tsb-online-banking-customers-still-locked-accounts/ https://www.express.co.uk/finance/city/956033/tsb-banking-mobile-services-IT-failure

James Thayer posited:

Software patching does have a place. It offers the opportunity to add features to delivered products and it does offer the opportunity to fix those bugs that do make it out into the field despite our best efforts.

That said, the position that software patching is a means of producing cheaper products faster is flawed.

A rule of thumb that I have used over the decades of my career is that at any given level the cost of finding and fixing a bug is 10 times more expensive than finding and fixing it at the previous level. As it is a rule of thumb, one can quibble with the multiplier, but the principal is sound.

Catching a bug at the requirements level is about as cheap as it gets (apart from not creating the bug in the first place.) No code has been written. No test cases generated. Etc.

Catching a bug at the design level will generally be a bit more expensive since it may require that aspects of the design be reworked.

Once you get to the coding level, the costs start piling up. It took time to write the bug in the first place and it takes developer time to ferret it out and write the fix. Writing code to fix a bug may introduce new bugs. New test cases need to be developed. But at least at this level, there is usually only one or two people chewing up time and money to fix it.

Once the bug makes out into the testing environment, all of the previous issues still exist but now there are more people involved. Developers, testers, release engineers. The bug is often harder to isolate. Efforts to find the bug may have to compete for scarce testing resources. And the communication channel back to (and from) the developers acquires noise since testers often do not have the same insight as to what might be going on. Red herrings are introduced that chew up time and money chasing the wrong things. And so on. This becomes amplified for complex products that have additional levels integration and test.

And finally, if the bug makes it out into the field, the problems seen above in the test environment become amplified further. There are even more people involved (including the customer). The communication channel becomes even noisier. The software environment is much less controlled than in the testing lab. Isolation of the problem is greatly complicated if it can only be reproduced in the field. And now there is an additional factor: delivery. The fix will have to be validated against all (or at least a significant portion) of the variants already delivered. Effort must be spent in insuring that the patch will correctly update all variants. There is the opportunity to create new bugs that only appear in some variants and not others (which are even harder to track down.) Patching actually makes this aspect worse as each patch release (which tend to be much more frequent than full blown feature releases) creates a new variant of the software floating around in the wild. And finally, there are also intangible costs that result from customer dissatisfaction and loss of good will.

The bottom line is that software patching is a useful tool in that it gives you an out when your development processes fail and a bug does make it all the way to the field. It is not, however, a path to producing better products faster and cheaper.

Like me, John Grant comes from the olden days of EPROMs:

I disagree with both your correspondents under "Are Bugs a Problem, Given Quick Fixes (Redux)?". And not just because I grew up in the days when a software upgrade meant levering EPROMs out of their sockets (and my company issued a new software release about every 9 months).

Part of the problem is that actually we're not in control of the patching process. I buy a strawberry ice cream and when I'm halfway through eating it it changes to chocolate. I'm driving my car down the road and it stops for half an hour and when it gets going again the control for the windscreen wipers is where the indicator switch used to be.

When I make a cup of tea I just want to take the milk from the fridge and pour some in. I don't want to open the milk, sniff it, find it's gone off, download a patch, find that's made it worse, download another one, by which time the tea's cold.

Just because suckers will pay money for something doesn't make it useful (except to the person collecting the money). I'm afraid that does sound like an argument from the land of Phineas T Barnum. Back in the 1970s companies made a lot of money selling cars that rusted away in less than 5 years; people bought them because they thought that was how it had to be. Then the Swedes and the Japanese started making cars that lasted 20 years, and guess what, the rest of the world had to up their game to compete.

I appreciate the feedback, both for and against my (uncompromising) take on quality code.

The Vanishing Embedded Engineer?

The always-interesting Jacob Beningo wrote an article in Design News where he speculates that the firmware developer of yore is going extinct. He contends that in the near future developers will be abstracted from the hardware to an unprecedented degree. No longer will we manipulate registers; instead we'll call APIs, rather like Windows programmers. He attributes this to the increasing complexity of, well, everything. We're using all sorts of complex comm protocols, graphics, and resources more akin to desktop development than low-level bit pushing.

I think the picture will be more mixed. It's good news that we're increasingly relying on third-party packages to handle these tasks. That's the whole idea behind reuse.

When I was a young engineer, 45 years ago, we wrote everything in assembly language. We wrote our own RTOSes and communications stacks (such as they were). Today we operate at a higher level of abstraction, and other than nostalgia, no one really wants to go back to those days.

I think Jacob is right in that more firmware people will be distant from the hardware. Managers will want this as deeply-embedded people are expensive, and for years some have told me they use a high-end CPU that will support Linux so their pool of engineers is wider... and cheaper.

But the hardware is a real thing. Our code will run on it. Worse, sometimes it won't run properly, and a deeply-embedded engineer will have to break out a logic analyzer, scope and other tools to probe nodes. To watch comm links. To monitor the PWM to figure out why that brushless DC motor stalls.

There will always be a huge class of products using smaller microcontrollers that can't run Linux or big stacks. Special problems, like building ultra-low power systems, will demand engineers that can write code and understand hardware nuances.

Digi-Key currently lists 72,000 distinct part numbers for MCUs. 64,000 of those have under 1 MB of program memory. 30,000 have 32 KB of program memory or less. The demand for small amounts of intelligent electronics, programmed at a low level, is overwhelming. I don't see that changing for a very long time, if ever.

Those 64,000 MCUs will be programmed by deeply-embedded people who can't conceive of being abstracted from the hardware.

This Week's Cool Product

I'm not sure when this came out, but ARM's DesignStart gives engineers access to the Cortex M0 and M3 IP. Want to design your own SoC? With DesignStart you can get started - for free. Plop your custom design into an FPGA. Or, commercialize a product paying no up-front fees, just royalties.

Note: This section is about something I personally find cool, interesting or important and want to pass along to readers. It is not influenced by vendors.

Jobs!

Let me know if you’re hiring embedded engineers. No recruiters please, and I reserve the right to edit ads to fit the format and intent of this newsletter. Please keep it to 100 words. There is no charge for a job ad.

Joke For The Week

Note: These jokes are archived at www.ganssle.com/jokes.htm.

From Taylor Hillegeist:

Q: What is the fastest way to write spaghetti code?

A: Copy and Pasta.

Advertise With Us

Advertise in The Embedded Muse! Over 27,000 embedded developers get this twice-monthly publication. .

About The Embedded Muse

The Embedded Muse is Jack Ganssle's newsletter. Send complaints, comments, and contributions to me at jack@ganssle.com.

The Embedded Muse is supported by The Ganssle Group, whose mission is to help embedded folks get better products to market faster.