The Embedded Muse 251

Go here to sign up for The Embedded Muse.

The Embedded Muse
Issue Number 251, December 16, 2013
Copyright 2013 The Ganssle Group

Editor: Jack Ganssle, jack@ganssle.com

Jack Ganssle, Editor of The Embedded Muse

You may redistribute this newsletter for noncommercial purposes. For commercial use contact jack@ganssle.com. To subscribe or unsubscribe go to https://www.ganssle.com/tem-subunsub.html or drop Jack an email.

Contents

Editor's Notes
Quotes and Thoughts
Tools and Tips
DMMCheck
RAM Failures
More Funny Products
On Bad Habits
Jobs!
Joke for the Week
Advertise with us
About The Embedded Muse

Editor's Notes

Did you know it IS possible to create accurate schedules? Or that most projects consume 50% of the development time in debug and test, and that it’s not hard to slash that number drastically? Or that we know how to manage the quantitative relationship between complexity and bugs? Learn this and far more at my Better Firmware Faster class, presented at your facility. See https://www.ganssle.com/onsite.htm.

In my opinion the quality of technical web (and it's all web-based now) magazines is declining, victims of the drive to cut costs. But Altera's System Design at the Leading Edge is different. This issue is packed with useful articles for embedded hardware engineers, and is devoid of marketing fluff. Recommended.

Quotes and Thoughts

"Computer system analysis is like child-rearing; you can do grievous damage, but you cannot ensure success." - Tom DeMarco

Tools and Tips

Please submit neat ideas or thoughts about tools, techniques and resources you love or hate.

DMMCheck

How accurate is your DMM? If you work for a decent-sized company they probably calibrate at least some of the test equipment every year. But many others never check calibration. Buy a meter off eBay and you have no idea if reads accurately.

I recently came across the Voltagestandard web site which offers a number of low-cost references and ordered one of their DMMCheck devices. The specs are amazing considering the price:

5.000 volt output reference, accurate to 0.01% +/- 500 µV
1.000 mA reference, accurate to 0.1% +/- 1 µA
Three 0.1% resistors: 1K, 10K and 100k

The unit runs off an included 9 V battery.

It comes with a calibration certificate, and its accuracy is guaranteed for six months. For the first two years recals are free (user pays shipping) and after that it's $5. The vendor checks each unit against an 8.5 digit DMM that is recalibrated every year.

Calibration certificate for the DMMCheck

Calibration certificate.

Dmmcheck

The DMMCheck's 5.000 volt output.

Obviously, this unit provides only a few fixed values and doesn't substitute sending your DMM to a cal lab, but at $35 (for 25 PPM resistors; add $4 for 10 PPM) it's a bargain for those who don't get regular meter calibrations.

RAM Failures

A very recent paper with the unusual title The Feng Shui of Supercomputer Memory (you have to have access to the ACM Digital Library to get this) looks at DRAM (DDR3) and SRAM failures. This very interesting study shows that SRAMs experience transient (e.g., induced by cosmic rays) failures hundreds of times more frequently than hard failures. For SRAM in L2 caches a computer at 875 feet of elevation had 77 times more transient errors (single event upsets, or SEUs) than permanent ones. Another at 7,500 feet raised that rate to 441 times. Both machines were of similar design and used identical SRAM.

An obvious question is "what was the failure rate in absolute terms." The paper is silent about this. A lot of studies have used neutron sources to cause SEUs, but it's hard to see how that translates to a system running in the real world. The 1996 paper Single Event Upset at Ground Level by Eugene Normand summarizes a lot of unpublished data. It seems SRAMs of that era experienced SEUs at ground level at a rate of about 2 x 10**-12/bit-hr. That's within an order of magnitude of the data Xilinx has published for the SRAMs in their FPGAs.

Suppose a small embedded system has 1 MB of SRAM. With these numbers one bit could be expected to change at random every seven years. It's impossible to speculate how many of those SEUs are important; some areas of memory will be unused, an LSB changing on sensed data may not matter much. But if a company produces 1000 products with this memory in it, then 140 of those systems could experience an SEU each year, or one every few days.

This calculator shows that at 40,000 feet the cosmic ray flux is about 500 times stronger than at sea level, and varies logarithmically with altitude. So systems shipped to Denver will experience SEUs 3.5 times more often than the one per seven years per system.

But wait, there's more. Actel (now Microsemi) has a paper titled Understanding Single Event Effects (SEEs) in FPGAs; this and other sources suggest the neutron flux from cosmic rays doubles one travels from New York to Stockholm due to the increase in latitude.

So I started thinking about RAM tests.

Even a transient error can corrupt the stack, pointers (gasp, function pointers!) and critical data structures. Such an error is no less harmful than a bit that is stuck high or low. But a transient failure is, well, transient. Rewrite the location and it goes away. A RAM test will miss transient failures, which, according to the first paper I cited, are by far the most common sort. Or, if the bit flip happens in the middle of the test, it will report a failure that disappears as soon as the flagged location is rewritten. A false positive.

It's common to run a RAM test at boot-up. Some systems run background RAM checks all of the time. But if the authors' data is correct, those tests will pick up only some tiny percentage of the unwanted bit flips. Conversely, the vast majority of errors detected by the test will be SEUs, and thus irrelevant.

Transistors are roughly free. The MCU world has morphed from simple little 8 bit CPUs to 32 bitters packed with complex I/O and gobs of memory. Transistor geometries are shrinking as well, and each new node is more sensitive to high-energy particles. Maybe it's time the semiconductor vendors pair a parity bit or even ECC with each memory location. The additional cost would be very low.

More Funny Products

A. J. van de Ven responded to Howard Speegle's tail of finding an unconnected microprocessor in a blender:

Howard Speegle’s "contains microprocessor" story reminded me of Clos de la Tech, a winery built by T.J. Rodgers, CEO of of Cypress Semiconductors. A number of his wines include Cypress SRAM and PSoC chips attached to the bottom of the bottle.

Makes a great gift for any techies within the family.

Harold Hallikainen made me crack up:

The food processor one reminds me of a magazine's April idea for design. It was calculating log(s) using a Z80. It had the Z80 wired as a diode in the feedback path of an op amp.

Tom Guadagnola wrote:

I am a consultant and in the 1980's we designed a device that mounted on a cow's tail for a large Australian dairy farm. The cuff included an MCU, an RF receiver and a CO2-actuated cuff. A radio transmitter in the milking area signaled the MCU to pulse the CO2 cartridge and squeeze the cow's tail while it was being milked. Apparently when a cow's tail is squeezed at the correct rate the milk is "dispensed" at a faster rate. These units were built in large quantities for less than $2 each. We always wanted to know exactly what the farmer was he doing to the cow when he discovered this phenomenon.

Steve Paik has a fantastic Annoy-a-tron story:

A few years back, a couple of coworkers put one inside the drop ceiling of our team lead. He had no idea what / where it was, and spent a few MONTHS looking for it. The rest of the team was in on the secret and would tell him "I didn't hear anything" or "it must be your monitor" and kept him changing offices and equipment. On one suggestion, he even went to the doctor to get his hearing checked! Finally, as an act of mercy, one of the conspirators moved the Annoy-a-tron into his laptop bag when he went on a business trip for interoperability testing, knowing that he would find it. When the team lead found it, he immediately called one of the team members (non-conspirator) to ask about it. The engineer told him who did it and what had been going on.

At this point, the team lead phoned our director to let him in on a little revenge prank. Shortly afterward, he sent an e-mail to the whole team, telling us that he had been requested to leave the customer's site and we had to suspend interop testing (potentially costing us a multi-million dollar account) because his equipment was chirping and it was causing the customer's engineers to go crazy. The two conspirators immediately went to our director to confess their sins and tell our team lead how to shut off the Annoy-a-tron. The director (and VP) proceeded to "chew them out" for jeopardizing the account. They both thought they were going to be fired on the spot, and luckily everything was cleared up by lunch and we had a good laugh over it.

On Bad Habits

Weland Treebark responded to the last issue's comments on developing bad habits:

I thought I'd write to you about the matter of avoiding developing bad engineering habits. It's a problem I am often confronted with. The part of the world I live in tends to be better known as a place where development is outsourced to, rather than outsourced from. Consequently, even mission-critical applications suffer from outrageously bad engineering, so bad that they are borderline life-threatening. Everyone starts well-intentioned, but they rarely stop to think about what they are doing with deadline after deadline piling in, and the authorities known as Delivery Managers -- generally unable to tell a transistor from a bumblebee -- don't really care about safety, just about, well, delivery. Bad habits pile up.

For my part, I avoid working in outsourcing companies like the plague. Wherever I work, I try to bring the following with me:

1. I treat missing or out-of-date documentation as a bug. There are no separate "Implement module X" and "Document module X" task: documenting module X is part of implementing it, and implementation is not done until it's documented. What does help is that I actually *like* documenting my code. Words flow easily for me (I worked as a tech journalist at some point) even in English, which isn't my native language, so it isn't as dreadful as it is to other programmers.

2. I try to be pessimistic about deadlines and ring a bell when one is outrageous. This probably comes with experience. During my first two years as a programmer, I'd regularly mess up schedules. Now that happens less and less often. Working with no time to look back invites unclever hacks to pile up.

3. What is probably most important, I try to set standards for my work and I'm confrontational about them if I have to be. Even if I'm writing a small proof-of-concept for a demo on a small board I assembled in an afternoon, the code has to be clean and the board has to look good. Sure, it doesn't have to be feature-complete, but it has to be acceptable in terms of engineering. I never buy into the "never mind that, it has to work for now, we'll make it look good later"; that "later" never comes. If I have to yell, kick and scream to get the time I need, then I'll yell, kick and scream.

A lot of things that are valuable to us as engineers seem to be of little value to the people in the organisations we work for, and this is neither wrong per se, nor surprising. However, I try not to lower my engineering standards just to score points with management. That's pretty much the equivalent of a teenager giving in to peer pressure and starting to smoke. I'm not uncompromising, I just ask for good reasons if I am to do a sloppy job.

There's no reason to lie about it -- it's not easy, especially when colleagues (regardless of where they lie in the hierarchy of the organisation) are not supportive, or when you are developing things you do not really believe in, but I think it is part of an ethical obligation we have as engineers, the same way good doctors wouldn't do a sloppy job even if they were treating a common cold.

Luca Matteini wrote:

About the invisible war, against changes, I really could write again for pages on practical examples. I feel engineering is a "performing art": I design and build something, every time trying to make something new, learning something new, in an expressive way.

That's not so common though. In the last 25 years I've seen hardware and software developers repeating a different kind of personal development. Most of the time people try to find a single design pattern, an utopian universal pattern, that once found(?) will be their work method in the years(!) to come.

So I've seen 1980's hardware designs that are still used today, no matter if an easier and more efficient scheme does exist, in the 21st century. Software developers that insist on ten years' old projects because "they've always been shipped this way", and keep maintaining zombie code.

How incomprehensible has to be my coding style to them, my sources that try to evolve and change every time I find a new tool or idea. Not to mention when I design a new hardware, and I read all the product news to find alternatives to what "already worked so well": poor me!

I think that the right word is "innovation", never to be used alone though. That's the point where maybe, electronic design differs so much from other forms of "art". Here we need to be prepared, analytical, and only after that we can be innovative and expressive. Embedded design craftsmen?

Maybe the only best practice against bad habits in design is thinking of another way to redesign an already finished product. That will become next bad habit to fight!

Jobs!

Let me know if you’re hiring embedded engineers. No recruiters please, and I reserve the right to edit ads to fit the format and intents of this newsletter. Please keep it to 100 words.

Joke For The Week

Note: These jokes are archived at www.ganssle.com/jokes.htm.

John Black found this on-line:

In 1998, I made a C++ program to calculate pi to a billion digits. I coded it on my laptop (Pentium 2, I think) and then ran the program. The next day I got a new laptop but decided to keep the program running. It has been over seven years now since I ran it. and this morning it finished calculating.

The output:
"THE VALUE OF PI TO THE BILLIONTH DIGIT IS = "

I looked in the code , and found out that I forgot to output the value. :(

Advertise With Us

Advertise in The Embedded Muse! Over 23,000 embedded developers get this twice-monthly publication. .

About The Embedded Muse

The Embedded Muse is Jack Ganssle's newsletter. Send complaints, comments, and contributions to me at jack@ganssle.com.

The Embedded Muse is supported by The Ganssle Group, whose mission is to help embedded folks get better products to market faster.