The Embedded Muse 289

Go here to sign up for The Embedded Muse.

The Embedded Muse
Issue Number 289, August 17, 2015
Copyright 2015 The Ganssle Group

Editor: Jack Ganssle, jack@ganssle.com

Jack Ganssle, Editor of The Embedded Muse

You may redistribute this newsletter for noncommercial purposes. For commercial use contact jack@ganssle.com.

Contents

Editor's Notes
Quotes and Thoughts
Tools and Tips
Freebies and Discounts
More on In-the-Field Firmware Updates
More on Second Sources
Jobs!
Joke for the Week
Advertise with us
About The Embedded Muse

Editor's Notes

Collapse of productivity as program size increases

IBM data shows that as projects grow in size, individual programmer productivity goes down. By a lot. This is the worst possible outcome; big projects have more code done less efficiently. Fortunately there are ways to cheat this discouraging fact, which is part of what my one-day Better Firmware Faster seminar is about: giving your team the tools they need to operate at a measurably world-class level, producing code with far fewer bugs in less time. It's fast-paced, fun, and uniquely covers the issues faced by embedded developers. Information here shows how your team can benefit by having this seminar presented at your facility.

Better Firmware Faster in Australia and New Zealand: I've done public versions of this class in Europe and India a number of times. Now, for the first time, I'm bringing it to Sydney on November 9 and Auckland on November 16. There's more information here. Seating is limited so sign up early.

Better Firmware Faster in Maryland: The Barr Group is sponsoring a public version of this class at their facility on November 2, 2015. There's more information here.

Quotes and Thoughts

Embedded systems have 20% of the defects of information systems. - Capers Jones.

Tools and Tips

Please submit clever ideas or thoughts about tools, techniques and resources you love or hate. Here are the tool reviews submitted in the past.

Andrew Retallack recommends Nuts and Volts magazine.

Caron Williams asks of Muse readers:

We are finding it increasingly difficult to make bread boards and test them. I feel this is an essential part of hardware development - testing the new design (or a significant part of it) prior to investing several hundred to several thousand dollars in a prototype board from a manufacturing house. The root cause of the problem is that newer chips are often only available in very small packages which cannot be hand soldered, even by the very best of technicians. And having a professional vendor make a board to 'fan out' the package - or any board, for that matter - is prohibitively expensive for a quantity of one. I feel we are not alone in this, and wonder if you or your readers know of, or have developed, any coping strategies.

Amen to this question. I have some ADP172 voltage regulators from Analog Devices here. They're 1 mm square in a WLCSP with four balls. I can barely see the devices, let along solder them!

Freebies and Discounts

I bought a $50 zero to 10 MHz signal generator from eBay. It's a new unit which was shipped from China; these things are for sale all the time on that site. I did a review of the unit, and it is this month's giveaway. Go here to enter. The contest will close at the end of August, 2015. It's just a matter of filling out your email address. As always, that will be used only for the giveaway, and nothing else.

Free Signal Generator

More on In-the-Field Firmware Updates

In the last issue I asked how people go about doing in-the-field firmware updates. Several readers had replies.

One who wishes to remain anonymous wrote:

I built an interesting mechanism for updating the bootloader itself. I wrote the bootloader so that the code is relocatable and at build time, I generate two binary files, one containing the normal bootloader and one containing the bootloader with a certain offset. The contents are almost identical, but I build the whole thing to be able to run the same at any location. I have two areas reserved on the MCU, one for the bootloader and one for the boot updater. When I want to update the bootloader itself, I write the bootdownloader in the MCU and after I checksum the written data I jump to the bootupdater.

The bootupdater detects it's running from an offset address, changes the reset vector to itself, deletes the bootloader and copies itself on the bootloader area. It checksums the whole thing and the last thing it does is make the reset vector point to the bootloader again and then resets. After I made it work, It ran every time without a flaw and it can successfully recover from power loss or communication failures at any time during the boot process. It was my first complex embedded project and I'm quite fond of my creation.

Ray Keefe contributed this:

We have been developing a range of different bootloaders for more than a decade. This includes devices that are part of the power management infrastructure (eg. smart grid controllers). For this class of device, we allow for extra on board data FLASH and transmit the new application image in chunks until it is fully transmitted. This includes retires etc and a properly managed check that every chunk was received. Once complete, it is checksummed (32 bit checksum of the entire image) and only if that passes do we then set up the bootloader to erase the application and write the new application into its place. The penalty for this method is that you have to have enough extra data FLASH to hold an entire application image. The benefit is that a box up a pole in the middle of nowhere does not ever get taken offline by a communications link loss and the bootloader can be a lot simpler because it does not have to implement all the different communications protocols. This is a big benefit if the communications is over DNP3 or IEC61850.

It also supports bootloading from a local link so you can also do it that way.

Another issue is image retention. Another benefit of extra data FLASH is you can keep an extra copy of the boot and application image and refresh them from time to time. Data FLASH does not have infinite retention of its contents.

We use one page of the data flash as a status page for the bootloader. If the power fails during the write, when the system reboots the bootloader checks that page and will see that it is meant to rewrite the application and will redo the process.

So it can fully recover from a power fail during application erase and write.

So the full process is:
- transfer new application over whatever transmission media at whatever rate you like
- when it is completely transferred do an integrity check
- if it passes, set the status page for the bootloader to application replace state and reboot
- the bootloader reads the status page and does the erase then replace for the application
- the bootloader then checks the application checksum and if it passes (as it should) it runs the application

So what happens if the power fails just after the application is transferred and before the bootloader status page is written?

The application does a check in with the server each time it starts. It will be notified that there is a new version and pulls it again. The application then checks the checksum of the stored new version and if it is correct it then sets the bootloader status page and reboots. We could improve this slightly by have the server remember it transferred the image but the instance of this specific scenario is zero so far. If the transmission penalty/cost was very high, it would be worth doing.

We have hardware in places we have never visited and they do over the air configuration and application updates seamlessly. This significantly reduces the cost of supporting products in the field, and especially as customers begin to realise they can get feature enhancements after delivery. One example is a water management system for filling bulk water tankers where the customer decided they wanted to change the meter in the field for one that didn't have the same ratio of litres delivered per pulse of the meter. So we update the config file for that unit, push the change to our web service and the unit collects it and is ready to go with the new meter. That unit was 892km away at the time.

More on Second Sources

In the last issue I wrote about the troubles we often have with parts going end-of-life.

Stephen Irons wrote:

The problem is not 'a part goes obsolete', rather it is 'we have to support many different versions of hardware'. This is just a fact of life, and we must be ready to handle it.

Refactor your code to support both old and new hardware
Often, the first few versions of software are intimately tied to the hardware, by using features of the specific peripherals that you have available, and because it is not always obvious how to separate out the application code from the device-specific code.

When new hardware comes into the picture, it is a good opportunity to work out where to divide between 'device driver' and 'application'. It can be very tricky to work out a good API for the application code, and to separate out the stuff that is specific to the processor instruction set, its interrupt architecture, the compiler hooks to interrupts, the interrupt controllers, specific features of the peripherals, dependencies on external hardware on the PCB, how the system starts up, when the peripheral needs to be started, and the 1001 other details.

But it is likely that you will have to support both old and new hardware for some period of time. So make sure that you separate the device drivers from the application. There are lots of ways to do this from compile time #defines (yuck), link time (different builds for different hardware), startup time (OS device drivers, or dynamic linking) to run time.

Know which version of hardware you have
The software should know what hardware it has available. Not the processor, as the software cannot do anything about that, but the details of the hardware: peripherals, memory. At a minimum, you need a hardware type identifier, but you quickly find that you really need a product type, a serial number, an as-built (BOM) version, a hardware-software interface version. The software should have access to all of these, though you don't want to change behaviour based on the serial number.

Product type identifies the type of board or product
Serial number is different for each manufactured board so you can tell two identical boards apart
BOM version tells you exactly which components are on the board
HW-SW interface version tells the software how to talk to the board (you get a new BOM version when a capacitor changes to meet the spec; however, the software still talks to the board in the same way).

These different things are all present in the system. It is better to make them explicit, rather than hiding them away.

Hardware designers seem to think that they can encode everything that software needs to know about the hardware into a handful of values, encoded using either a voltage on an ADC input, or a few digital inputs.

Unfortunately, over the life of a product, hardware changes are not always sequential. You have to support two different memory sizes, and two different ADC devices, and that version of PCB where one line was accidentally inverted -- sure, we don't sell them any more, but they are used in the automated test system, so the software has to support them. You have to support all combination of these, for a total of 8 different versions in the trivial example. It would be nice to keep a linear hardware progression, but it is usually not that easy.

So I prefer a feature list written into the device during manufacture. You usually have to write serial numbers, product codes, etc into the device -- this is just another field in the manufacturing data. Reserve 8, 16 or 32 bytes of non-volatile memory for the feature list. When you change a particular piece of hardware, allocate a bit to indicate which option is installed. Someone in the project team must be responsible for maintaining this list.

It WILL get messy, when you allocate 1 bit to distinguish between 2 ADCs, then later you have a third option, but you cannot allocate two consecutive bits. Just accept that, and hide it behind an appropriate API.

If you have more non-volatile memory available for your manufacturing data, you might also store the hardware description as an Open Firmware Device Tree, which can be used to initialise device drivers and give them meaningful names.

It is also best if this hardware description, whether hardware ID, feature list or device tree, is stored on the device it applies to. If you have plug-in boards, they must each be able to describe themselves. This might mean that you have to put a serial EEPROM on each plug-in board.

Keep the component change separate from any new features
When changing to a new component, usually the new component allows you to add new features to your product, either because it is faster, or has a nifty peripheral that enables something new and whizzy. Save those new features for a later day.

The only successful component change project I have worked on was successful because the engineering team pushed back against the marketing team's desire for new features. We first finished the component change (schematic, PCB, software, manufacturing, etc); then when those problems were out of the way, we went about adding new features.

The worst component change project I have worked on saw the component change as an excuse to revamp almost everything about the project: new compiler, new RTOS, new product features, big code refactoring, and, incidentally, a new processor SoC. The lead engineer was used to working alone, and just did stuff as it seemed good to him. The whole thing took forever, and caused much tension between the lead engineer and management.

So my principle is to either change the hardware (with only software changes to support old and new hardware) or to change the software (with minimal hardware changes).

Manage the transition
The technical part of making the transition is relatively easy -- your fine engineers will crank the handle and it eventually gets done.
Make sure that someone is responsible for managing the transition from the old to the new part -- last-time buy of old parts, using up the stock, writing off the old parts, ordering inventory of new parts, working out when to make the transition, managing customers who are dependent on the old version, and the other details about how and when to make the transition.

Trevor Woerner said:

Several years ago I worked with a company whose latest technology was all based on 80186 clones with 100% custom hardware designs and custom software. As you can imagine sourcing parts was difficult, if not impossible.

A well-written Linux user-space application is not impossible to port to any off-the-shelf hardware that can run Linux, even when the underlying architecture changes dramatically.

Perhaps more companies need to evaluate whether or not they need to be designing their own hardware from chips/logic? Obviously there will always be products which need to go through this exercise, but my feeling is that some companies might be doing this out of habit rather than need.

I'm currently working with a second company in a completely different market who is going through the exact same transition. Traditionally they've designed their own hardware and written their own software but can't help wondering: why can't we just use a RaspberryPi? The right solution for them might not actually use the RaspPi, but the question is, however, appropriate.

By the way, the first company is now in the process of moving their product to their 4th off-the-shelf board in 7 years (each of the past 3 have gone end-of-life) and the only significant software changes they've made in the last 7 years is to add new features.

Larry Rachman had a story about a part change:

I'm of the opinion that while it's reasonable to accept a *part* being obsoleted, it's not reasonable to accept a *socket* being obsoleted.
IIRC, Freescale still sells something that will fit in the old 'candy bar' 68000 socket.

Of course, this opens the door to the exploration of subtle incompatibilities, either due to a part manufacturer's oversight or a marginal design, where the board manufacturer was just 'lucky' until the new part came in.

RMOAS, <mumble> years ago: a video display terminal, using dynamic RAM as the screen buffer. Worked fine until a new brand of DRAM was tried. The symptom was total failure at startup. Not missing or wrong characters, not a blank screen, but completely dead - processor hang.

Turns out that buried deep in the (assembly language) code there was a subroutine that was accessed with a jump instead of a call. When it reached the bottom, it executed a return, fetching a return address from a non-existent address that mapped to the DRAM. A random (but apparently repeatable) return address was fetched, the code stumbled around, and somehow made it back into the mainline. They'd been shipping this way for a few years without knowing it. Of course, brand Y DRAM returned a different address.

Someone should write a book about all these stories. Kind of the engineering equivalent of telling horror stories around the campfire late at night.

Frequent correspondent Harley Burton also has a tale to tell:

Many years ago, I ran into part obsolescence in my project. While working at Rockwell, about 1980, we were still building a radio designed in 1962 for the Air Force. It used a still fairly common FET, the 2N4416. They used this device in the second R.F. mixer of this extremely expensive radio. The original performance of this radio was fantastic. However, Signetics made a change to the silicon masks which subtly changed the noise and 3rd order performance of the radio so that it no longer passed some of the spec parameters.

We redesigned the mixer and hand selected devices for nearly 6 months before we could get a mixer with the correct performance. It was actually a slightly lower component count and a slightly higher spec performance. The cost was equal to or maybe slightly less than the original.

However, when we submitted it to the Air Force, they decided not to implement it. Instead, they did a lifetime buy of the mixer which used the original 2N4416 masked parts. It was too much maintenance cost to change the mixer in parts cost, multiple inventories, etc.

This should have been the end of it but the radio is still in use and a new contract was let to purchase more as recently as 10 years ago and possibly more recently than that. Can you believe, this radio has been built for at least 45 and possibly over 50 years.

Just goes to show that good design is good design no matter how long it is in use.

Jobs!

Let me know if you’re hiring embedded engineers. No recruiters please, and I reserve the right to edit ads to fit the format and intents of this newsletter. Please keep it to 100 words. There is no charge for a job ad.

Joke For The Week

Note: These jokes are archived at www.ganssle.com/jokes.htm.

From Steve Bresson - an old joke with a new twist:

Two engineers were standing at the base of a flagpole, looking at its top. A woman walked by and asked what they were doing. "We're supposed to find the height of this flagpole," said Sven, "but we don't have a ladder."

The woman took a wrench from her purse, loosened a couple of bolts, and laid the pole down on the ground. Then she took a tape measure from her pocketbook, took a measurement, announced, "Twenty one feet, six inches," and walked away.

One engineer shook his head and laughed, "A lot of good that does us. We ask for the height and she gives us the length!"

Both engineers have since quit their engineering jobs and are currently serving as elected members of Congress.

Advertise With Us

Advertise in The Embedded Muse! Over 23,000 embedded developers get this twice-monthly publication. .

About The Embedded Muse

The Embedded Muse is Jack Ganssle's newsletter. Send complaints, comments, and contributions to me at jack@ganssle.com.

The Embedded Muse is supported by The Ganssle Group, whose mission is to help embedded folks get better products to market faster.