Kozio's Hardware Diagnostics
Summary: It's surprisingly hard - and expensive - to get the hardware right.
A couple of years ago, intrigued by their technology, I visited a little company named Kozio (www.kozio.com) in Colorado. They produce software that diagnoses faults in hardware. It's aimed at a couple of markets: the engineer who is bringing up a new design, and production test. I found the notion interesting as in a previous life I discovered that a huge number of embedded systems that use DRAM and large SRAM arrays are poorly-designed. They pass the limited tests their creators use, but hit that memory with the perfect storm of bits flipping and those highly-capacitive arrays start acting like, well, capacitors. Occasional errors result that are very hard to diagnose.
Recently Kozio expanded and reconfigured their product line. The flagship offering is now VTOS (Verification and Test Operating System), a name that is a bit confusing. An OS? Just what we need, another OS!
Turns out, VTOS is a sort of minimal OS that boots with far few resource than the more bloated Linux or Windows. And, of course, it comes with a lot less functionality, the point being that one needs just enough code running to support executing a wide range of hardware tests.
What kind of hardware tests? Well, all sorts of memory tests, of course, which will isolate not only failures, but design weaknesses. Perhaps there's not enough margin in the timing - the tests hammer at the memory controller. Today, of course, we use these elaborate DDR-style SDRAMs which have access times that are just about impossible to predict or measure, but VTOS's tools will give detailed performance data.
VTOS is U-Boot compatible, so a production system can have it ready and lurking, bootable when necessary.
The company tells me that the typical high-end development effort that doesn't use VTOS requires 1.5 to two person years to build a verification package. They also claim their customers can get VTOS up and running in only 30 minutes. I can't check those numbers but scale them by even an order of magnitude and the delta is huge. Further they say there are $17 billion in returns of supposedly defective consumer electronics every year, and most of those show no problems. Presumably some large fraction then are due to flaky designs whose hardware problems weren't fleshed out because of inadequate testing code. Then there's the increasing problem of counterfeit parts; VTOS on the production line can identify boards with parts of suspect origin that fail the aggressive tests.
One doesn't port VTOS to a new target system; instead it's essentially pre-ported (versions come for different processor families). VTOS Builder creates a custom version matching the target's memory and peripheral configurations. It knows about hundreds of devices, but, from the GoToMeeting demo, it looks rather easy accommodate a new peripheral by filling in form fields.
One customer spent 10 weeks troubleshooting crashes. Three days after installing VTOS the error was identified: the CPU's PLL occasionally lost sync. In another case a customer had had problems for two years with memory errors. An hour of exploration with VTOS determined that the ECC had never been turned on!
The cost depends on which packages one buys, but figure on a bit under $10k to twice that. That's a ton cheaper than 1.5 to 2 development years for an in-house solution. All in all I think this is a nice solution to a too-often-neglected problem.
Published September 5, 2012