You may redistribute this newsletter for non-commercial purposes. For commercial use contact firstname.lastname@example.org. To subscribe or unsubscribe go here or drop Jack an email.
After over 40 years in this field I've learned that "shortcuts make for long delays" (an aphorism attributed to J.R.R Tolkien). The data is stark: doing software right means fewer bugs and earlier deliveries. Adopt best practices and your code will be better and cheaper. This is the entire thesis of the quality movement, which revolutionized manufacturing but has somehow largely missed software engineering. Studies have even shown that safety-critical code need be no more expensive than the usual stuff if the right processes are followed.
This is what my one-day Better Firmware Faster seminar is all about: giving your team the tools they need to operate at a measurably world-class level, producing code with far fewer bugs in less time. It's fast-paced, fun, and uniquely covers the issues faced by embedded developers.
On-site Seminars: Have a dozen or more engineers? Bring this seminar to your facility. More info here.
The last issue sparked a lot of reader ideas. Keep 'em coming!
Latest blog: On Evil.
|Quotes and Thoughts|
"While technology can change quickly, getting your people to change takes a great deal longer. That is why the people-intensive job of developing software has had essentially the same problems for over 40 years. It is also why, unless you do something, the situation won't improve by itself. In fact, current trends suggest that your future products will use more software and be more complex than those of today. This means that more of your people will work on software and that their work will be harder to track and more difficult to manage. Unless you make some changes in the way your software work is done, your current problems will likely get much worse." - Watts Humphrey
|Tools and Tips|
Please submit clever ideas or thoughts about tools, techniques and resources you love or hate. Here are the tool reviews submitted in the past.
|Freebies and Discounts|
This month we're giving away a 30 V 10 A power supply.
The contest closes at the end of September, 2018.
Enter via this link.
|On Test - A Story|
Daniel McBrearty wrote:
Testing is critically important, but it won't insure the code is correct. It's just one of many filters we need to apply.
|On Hardware in Asynchronous Sampling|
In the last issue I wrote about some problems with using hardware to deal with an input wider than the CPU's bus. A number of people had thoughts about this.
Ian Stedman wrote that the async timer problem would not exist if the timers counted using Gray Code. With that, only one bit changes at a time. This is the Gray sequence:
0 000 1 001 2 011 3 010 4 110 5 111 6 101 7 100
I don't know of any timers that count in Gray Code, but Digi-Key lists 202 encoders that do, out of 7274 total encoders. Code to convert Gray to binary is here.
Craig Ross wrote:
Years ago, I implemented an asynchronous sampling routine that just did three reads. First read captured the high order word, second read captured the low order word, third read captured the high order word again. If the second read of the high order word matched the first read of the high order word, then there wasn't a roll-over while the low order word was read. If the two reads of the high order word were different, then a roll-over had occurred, so the sequence was repeated. At the time, it sounded like a good fix, do you see problems with this approach, other than the number of steps?
David Wyland offered this about metastability:
I had a metastability problem early in my career, in 1971. It was in the interface clocking for a floating point processor APU. The FPU clock and the CPU clock were not the same, and were not synchronized. There was the dreaded error about once every 20 minutes to 1 hour. And no clue as to its source.
I found out about metastability much later, when I was working in applications at IDT. I read an TI paper about it, and I wrote a paper about metastability in systems using the dual port RAMs IDT made, where the clocks of the 2 ports were asynchronous. A little deeper study showed that when you had metastability, the settling time could be multiplied by up to 10X.
When you clock in a signal into a FF that is changing during the rising edge of the FF clock, the output takes ~10X the time to settle. At the time, the settling time of a 74S74 was 5 nanoseconds, so the metastability settling time would be ~50 ns.
The probability of settling time extension falls off rapidly and exponentially with increasing time. By the time you hit 10X, the probability of settling taking that long is vanishingly small. So 10X is pretty conservative, barely measurable in practical terms.
So how to cure the problem? Use a 2-stage shift register. Clock the signal you are sampling in to the 1st flip flop, and clock the output of the first FF into a second FF. The first FF output will settle to its value before being clocked into the second FF. And the second FF output will be clean, with a nominal settling time.
This works if the system clock is at least 10X the settling time of the flip flops in the system. And a minimum 10X ratio of clock period to FF settling time is a good design margin.
The output from the second FF will be delayed by 1 clock time, but this is seldom a problem in system design.
Another subject in the last issue was using noise to increase the resolution of an ADC. Readers had lots of useful ideas.
Phil M had the intriguing suggestion of using a triangular wave:
4^n oversampling for n bits is correct for Gaussian/white noise, but if you can instead add a triangular voltage ramp to the signal (with a slope of 1 LSB/sample), then you improve this to 2^n, so you get one additional bit for every doubling of the oversampling rate. You can easily generate this ramp signal with a few smartly-chosen capacitors and resistors if the sample clock exists as a physical signal, or with an unused GPIO.
Jim Haflinger had an excellent idea: read, for instance, ten samples. Discard the highest and lowest, and average the remaining. The outliers might be, well, outliers.
Finally, another subject in the last Muse was about initializing variables. Is the BSS zeroed or not? My take is that I always explicitly initialize.
Rod Chapman, one of the smartest guys I know, is a SPARK advocate:
Chocolate teapot time again from the MISRA committee... this rule is marked "System and Undecidable" in their classification so it requires whole-program analysis, could be really slow, and you are still doomed to some combination of false positive and negatives from a static analysis tool.
Another approach: design the language so that data-flow analysis is Sound (0 false negatives, right?) and computed in P-Time. Sounds like a wacky idea??? Oh no... SPARK had this in 1987... :-)
Steve Karg wrote:
From section 3.5.7 of the C89 standard:
If an object that has static storage duration is not initialized explicitly, it is initialized implicitly as if every member that has arithmetic type were assigned 0 and every member that has pointer type were assigned a null pointer constant.
That inspired me to look more closely at the 500+ mind-numbing pages of the C99 standard, where I found in 6.7.8:
If an object that has automatic storage duration is not initialized explicitly, its value is
indeterminate. If an object that has static storage duration is not initialized explicitly, then:
Arithmetic types get assigned to a "positive" zero. In the next letter from a reader, Tom Oke talks about a Cyber 172 machine, which used ones complement math. That, for younger readers who never had to deal with this, supports both positive and negative zeroes.
Tom Oke had a problem from the olden days:
Many years ago (very early 1980's) I was a systems programmer for a University Academic Computing Services and we had a CDC Cyber 6400. This was a machine with 10 peripheral processors, each with 4K of 12-bit words, and a central process running with 64-bit words.
It booted by reading a panel of 16 x 12 toggle switches (looked like they were good for about 5 amps each). So the boot program to PPU 0 could be up to 16 x 12-bit words which were read as the lower area of memory and then executed.
There was a table of the settings for all 16 rows of switches posted by the manufacturer on the panel beside the switches.
An upgrade to the system (to a Cyber 172) brought us a system that now sported additional banks of switches, with the same boot program, listed in the 16-row table.
As you can guess, one day the machine would not boot, and checking the switches against the table produced no answers, until it was noticed that one of the switches in the new upper rows was set.
The table never listed any updates but this was a variable that in the old system was read as 0, and used as an address initialisation to the boot program. The uninitialised variable (value not stated in the table) threw this off and it tried to boot from the wrong area.
So I guess you can get an uninitialised variable, even below the level of assembler (at the label printers).
Martin had a story where RC oscillators and uninitialized variables interacted:
In a rather small project I used a small 8-Bit microcontroller. I didn't use a compiler that was compliant to C-Standards, because its startup code didn't initialize the BSS. I knew that and thoroughly initialized all globals, but you know, I still forgot some.
Not really forgotten, but thinking "my code is self-initializing, since it is a counter that counts down to zero and stops at zero so I don't need to initialized these". Yes, it did count down to zero and stopped, but starting at a (not so) random value above the designed start value lead to the well known symptoms of some units working perfectly and others having subtle "effects". Took some time to sort this out - so nothing new until here.
Somewhat later, with the same system, there was a similar symptom: Many worked just fine, but a few didn't startup well in the morning, later on they worked. First thought after been through the above procedure was another forgotten initialization variable. Inspecting each global variable by looking up the map file and manually tracing the usage of each variable revealed nothing.
The system used the microcontroller's built-in watchdog for safety. The watchdog was initialized, tested (leading to a reset on purpose while starting up) and then triggered within the main program loop. Any unexpected occurrence of a watchdog or other reset event would cause the system to enter a safe state, requiring a power cycle to recover. Further debugging showed the non-working systems were entering said safe state, so they were working as designed but not as expected by the common user.
Now, one needs to know that microcontroller uses independent internal RC oscillators for the CPU clock and the watchdog clock. And there were two of these microcontrollers, loosely synchronized to each other to form a redundant system. The synchronizing mechanism caused one micro to wait for the other at one point within the main loop.
So what happened: In the morning, the temperature in the office was lower, causing the frequencies of the internal RC oscillators to drift.
That in turn caused one microcontroller to wait a bit longer for the other while they did the synchronizing. The watchdog interval was chosen quite narrow, so that delay caused the watchdog to trigger, which in turn stopped the system to the safe state. Relaxing the watchdog interval solved the issue. So don't forget to calculate for all kinds of HW caused tolerances, especially RC oscillators can have rather large tolerances in comparison with resonators or crystals.
Note quite a product yet, but these passive sensors change how wi-fi signals from a smart phone are reflected. They can be used as switches and controls, This reminds me of one of the all-time amazing bugs: the Soviets gave the American ambassador a replica of the Great Seal of the United States, which hung in the ambassador's library. A passive resonant cavity vibrated in sync to conversations in the room; the Soviets flooded the room with microwaves, and the cavity modulated those, which could be picked up by a receiver. It took years for the Americans to find the bug.
Note: This section is about something I personally find cool, interesting or important and want to pass along to readers. It is not influenced by vendors.
Let me know if you’re hiring embedded engineers. No recruiters please, and I reserve the right to edit ads to fit the format and intent of this newsletter. Please keep it to 100 words. There is no charge for a job ad.
Note: These jokes are archived here.
Auto-correct has become my worst enema.
Advertise in The Embedded Muse! Over 28,000 embedded developers get this twice-monthly publication. For more information email us at email@example.com.
The Embedded Muse is Jack Ganssle's newsletter. Send complaints, comments, and contributions to me at firstname.lastname@example.org.
The Embedded Muse is supported by The Ganssle Group, whose mission is to help embedded folks get better products to market faster. We offer seminars at your site offering hard-hitting ideas - and action - you can take now to improve firmware quality and decrease development time. Contact us at email@example.com for more information.