A Conversation With HCC Embedded, Part 1

Summary: Jack talks to HCC Embedded about very high reliability firmware design.

At the Embedded Systems Conference I had a chance to sit down with Dave Hughes and David Brook from HCC Embedded (http://www.hcc-embedded.com/). This company has a variety of software products, like file systems, network stacks, boot loaders, and the like, which customers can integrate into their products to get to market faster. Regular readers know how deeply I care about getting firmware right. HCC shares that devotion and I was impressed with the principals of that company. Later, we had a telephone call and explored their approaches more deeply. They were kind enough to allow me to reproduce it here. This has been edited for clarity and length.

In the following Dave talks about MC/DC. This is short for modified condition/decision coverage. It's an extreme form of testing that insures every statement in a program has been executed, and that all decisions, and conditions within a decision, have been executed. It's required at the highest level of the DO-178B/C standard, which is the one required for commercial avionics. MC/DC is tedious and expensive. But so far, as far as we know, no one has lost a life due to software problems in commercial aviation.

Jack: Tell me about HCC Embedded.

Dave: Well, HCC has been developing embedded software for the last 15 years or so, in various specialist areas. We started off specializing in file systems that were very fail-safe, and later moved into other areas. Today we have about 6 different file systems. We now have USB stacks with device hosts and numerous class drivers, network stacks, TCP/IP, and schedulers. We focused on creating embedded software that's extremely flexible and reusable. We started off on 8-bit architectures and moved to 16, 32 and 64-bit CPUs, supporting different tool chains, different endianness, different RTOSes, etc. A goal was to architect pieces of embedded software that are independent of these platform specific attributes. If you write a piece of C code, it shouldn't matter what your platform is. And so we've spent a long time trying to architect software in a way that it's scalable and reusable. Of course, one of the natural progressions is that, as you get higher quality, as you get this portability, you get a level of scalability and reusability that you wouldn't get if you don't go to extreme lengths in trying to develop your software deeper.

Dave: Absolutely everything. I don't think there are any big exceptions. I suppose the high end of aerospace we haven't touched, but pretty much everything else we have been involved in. A lot of small instrumentation companies, functional safety companies, automotive companies, industrial companies, telephone companies. I mean, every field. Because the components we develop are very generic, they really aren't verticalized. We've developed one product in the last couple of years which is verticalized, which is a file system specifically for smart meters that is designed for getting the best usage out of flash. It maps the data structures that meters tend to use directly to the flash in a way that's manageable and efficient. But in general, our products have been completely agnostic about their final applications.

Jack: When we were talking in California a few months ago, one of the things that struck me is that you do have a tremendous focus on quality; much more so than I normally hear talking to vendors. How do you go about ensuring that your products truly are of a very high quality and essentially defect free?

Dave: You raise all sorts of issues in that question! Guaranteeing defect free code is something we can never do. I mean, that's well proven. You can go back to Turing, you can go back to Godel, if you take a mathematical view of it. But what we can do is we can make best efforts to make things of a very high quality and extremely reliable. We started developing software using what I would call "freestyle" with basic quality goals. We had a coding standard, we had a style guide, we make sure there's no warnings, using the highest warning levels, etc. But things have progressed and we wanted to create products that are more versatile and more reusable. And we also feel that if we make a product that is endlessly checked, then it just doesn't need to be developed any further.

We will end up with a very low level of support and, because it's reusable across many platforms, we can finish it and forget it. This is a very, very long process. We've been developing processes to actually ensure that pieces of software that we produce now are of higher and higher quality. Some are more challenging than others.

For example, it's a very big project to create file systems that are of the highest quality. It's difficult and awkward to do full dynamic coverage MC/DC on things like long name handling and directory structures. So we've adopted very strict coding standards, and have taken MISRA very seriously. And we've built on to that a whole V-model which is being improved all the time. So, for example, our scheduler, our encryption module, our TLS module, have all been developed with much more rigorous standards employed. And they include having a requirement specification, having design documents, having tests that match back to the requirements. A lot of code is being released now with a very high level of built-in testing.

For example, we'll build a scheduler with a test suite that executes full MC/DC coverage on the scheduler. The beauty of this is if you then take that scheduler and put it into a different architecture, onto a different endianness, on a different compiler, with different optimization settings, it doesn't matter what you do. You execute that same test suite and you'll know that your scheduler has got full MC/DC coverage. Every single path with every single decision, has been checked that it has been compiled correctly. It doesn't prove your code is fault free by any means. I mean, that also needs to be tied into the design. That has to be tied into the tests, which need to be mapped back, etc. It is a means of guaranteeing that a piece of code that has been developed for one platform can be reused across platforms and across industries, and can be verifiably proved to do the same things.

If problems are reported, then those fixes will be propagated back across all those platforms because they're going to have to go into the same test suite that does the same coverage and insures it works very cleanly. So you're getting benefits of scalability and reusability.

It brings me on to one of my pet lambs, which is how badly structured the software industry is, in terms of economics. Hardware manufacturers get paid for their actual piece of hardware they sell. Software is a bit more abstract and there's no clear path to getting remuneration for these things. You take something like this crazy, open SSL situation where you have a piece of software which you think should be developed to a really high standard. Now, this isn't really open SSL's problem, this really isn't their problem. They're quite open and clean about what they do. But what's not clear is why nobody has insisted that security software is developed to the same sort of levels as, say, an industrial controller is developed to. It seems like our personal security ought to be taken seriously. This SSL software was installed on something like 500,000 different servers, controlling millions and millions of peoples' web accesses. The problem cost millions in terms of the amount of corrective action that was required, re-issuing certificates and everything else. That code could have been re-written to an extremely high quality for, in my estimation, 1-2 million dollars or so. An absolute fraction of the amount it cost the industry. An absolute fraction of all the time wasted. Imagine all the board meetings that took place to review their company's security. But there's no obvious way, in the software industry, to recoup the relatively small investment.

Go here for part 2 of the interview.

Published February 16, 2015