Vanishing Visibility

By Jack Ganssle

I take great satisfaction in my tools. My fingers are not strong enough to remove a bolt, but give me a wrench and my hand can perform amazing new feats. We computer folks like to consider the PC a mind-tool that increases the power and reach of one's brain. Conventional hand tools give us a similar ability to manipulate the mechanical world in ways impossible via the unaided human body.

I'm a fanatic about woodworking tools, keeping them clean and sharp, buying only the best, collecting the cream of the technology of yesteryear that, while now out of style, may still be the best solution to a problem. Though power tools with big motors that hurl sawdust like a swirling gale satisfy my testosterone craving for brute mechanical power, high quality chisels and planes are among my favorite possessions. A hand plane works well only if you take time to understand the wood, molding its use to the grain, hardness, and even moisture content of the work. In contrast, a 2 horsepower electric plane blindly tears through any obstacle leaving its marks of destruction behind in telltale chatter-gouges. Yet you can't beat an electric plane for removing lots of wood fast.

The same goes for the embedded world. This magazine bulges with ads for all sorts of virtual assistants, each of which is aimed at one part of the development process. Just as the hand and electric planes have valid though different applications, no single embedded instrument is the silver bullet for all circumstances. One of the skills of the engineer is the judicious selection and use of the right mix of tools for each project.

This fascination with tools of all kinds led me to start an emulator company back in the 80s. It was a wild and fascinating ride, made much more interesting by the opportunity to look into the work of thousands of developers, and to see how we grapple with the bugs that plague even the most well designed systems. Recently, though, I decided to move on and sell the company.

Yet the problem of getting products to market still fascinates me. My love of tools of all sorts is undiminished. With no longer any equity in the tool business and thus no conflict of interest with this column, I feel freer to examine some of the issues that are surfacing in the 90s.

And I'm concerned. Scared, really, for the future of developers in the embedded industry. It's driven by relentless forces none of us can control and can sometimes barely understand. The twin forces of technology advancements and frenetic business are backing engineers into a metaphorical corner of impossible demands with terribly limited resources.

Now systems are more complex than ever, with new breeds of bugs. Timing problems, once restricted to hardware, are an ever more problematic firmware fact. RTOS complexities and excruciatingly complex algorithms fan the fire of bugs.

Debuggers do one fundamental thing: provide visibility into your system. Features vary, but all we ask of a debugger is "tell me what is going on!" Sometimes we're interested in procedural flow (single stepping, breakpointing); other times function timing or dependencies or memory allocation. Regardless, we simply expect our tools to reveal hidden system behavior. Only after we see what's going on can we use our brains to understand "why that happened", and then apply a fix.

My fear is we're removing our ability to look into the systems. The visibility we take for granted is being eroded.

Technology Tribulations

In embedded systems emulators have always been one of the choice weapons in the war on bugs. Yet, for as long as I can remember pundits have been predicting their death. Though it seems as quaint as IBM's 1950s prediction that the worldwide market for computers was merely a couple of dozen, in fact 20 years ago many people believed that the 4 MHz Z80 would spell the doom for ICEs. "4 MHz is just too fast," they proclaimed, "no one can run those speedy signals down a cable."

Time proved them wrong, of course. Today's units run at 60+ MHz on processors with single-clock memory cycles, an astonishing achievement.

The imagined speed limit is not limited to ICEs, as ROM emulators and other debuggers that use a physical target system interface all suffer from similar problems.

Is an end yet in sight? I believe so, though the limiting frequency is a bit hazy. Today's approach of putting all or much of the ICE's electronics on the pod removes the cabling and bus driver problems, but electrons do move at a finite speed and even the fastest of circuits have non-zero propagation delays.

CPU vendors squeeze the last bit of clock rates from their creations partly by tuning their chips ever more exquisitely to the rest of the system's memory and I/O. A danger signal is the current problem with PC motherboards: it is so difficult to design a high speed Pentium-based motherboard that Intel has had to assume that role. They are reportedly now the largest producer of PC motherboards. In effect the computer is so tightly coupled to the processor that only the CPU vendor can produce a reliable system based on the chip! Clearly, an intrusion by any sort of development tool will at best be problematic. Yes, today's Pentium emulators do work. Will tomorrow's units be able to handle the continued push into stratospheric clock rates? I have doubts.

Packages are creating another sort of problem. Heat, speed, and size constraints have yielded a proliferation of packaging styles that challenge any sort of probing for debugging. If you've ever tried to use a scope on a 208 pin PQFP device, or, worse, a 100 pin TQFP, you know what I mean. Yes, some tremendously innovative probing systems exist - notably those from Emulation Technology and HP. Despite these, it's still difficult at best to establish a reliable connection between a target CPU and any sort of hardware debugger, from a voltmeter to an ICE.

Traditional (How can a few-year old technology have traditions!) surface mount devices have exposed pins that you at least have a prayer of getting to. Newer devices don't. The BGA (Ball Grid Array) package, which is suddenly gaining favor, connects to a PC board via hundreds of little bumps on the underside of the package - where they are completely inaccessible. Other technologies bond the silicon itself under a dab of epoxy directly to the board. All of these trends offer various system benefits; all make it difficult to impossible to troubleshoot software and hardware.

OK, you smirk, these issues only apply to the high end of the embedded market, where clock rates - and production costs - soar with the eagles. Other, subtle influences, though, are wreaking havoc on the low end.

Take microcontrollers, for example. These CPUs have ROM and RAM on-board, giving a very simple, very inexpensive one-chip solution for simple 8 and 16 bit applications. The 8051 is the classic example of this, and indeed has been an amazing success that has survived twenty years of assault by other, perhaps more capable, processors.

Single chip solutions are tough to debug, though, since the on-board memory means there's generally no address/data bus coming to the outside world. An extreme example is Microchip's 8 pin PIC part. 8 pins! The only ins and outs are I/O.

Various debugging solutions exist, but the traditional solution is the bond-out chip, a special version of the processor, with extra pins, that bring all important signals to the outside world, especially those oh-so-critical address and data lines needed to track program execution. With a proper bond-out-based ICE you can track everything the code does, in real time, with no compromises. Perfect, no?

Well, a few wrinkles are starting to surface. For one, the chip vendors hate making bond-outs. The market is essentially zero, yet every time the processor's mask gets revised a new bond-out is needed. In the old days chip vendors swallowed hard, but did make them reasonably available.

Now this is less common. With the 386EX (which is not a microcontroller, but which benefits from a bond-out) Intel announced that only a handful of vendors would get access to the special version of the part, probably to some extent increasing the cost of tools. Is this an indication of the beginning of the end of generally available bond-out parts?

Sometimes the bond-out is not kept to current mask revisions. I know of at least one case where a vendor provides bond-outs that will not run at full speed, essentially removing the critical visibility of real-time execution from developers. This situation puts you in the awful conundrum of deciding "should I buy an expensive tool! that forces me to run at half speed, no doubt destroying all timing relationships?"

Sometimes - often - the bond-outs will not run at reduced voltages. Your 3 volt system might require a pod which is a convoluted mix of 3 and 5 volt technologies, creating additional propagation delays as voltages get translated. In effect a non-intrusive tool becomes subtly more intrusive, in ways that are hard to predict. Voltages are declining fast - some CPUs now run at sub-one volt levels - so the problem can only get worse.

A very scary development is the incredible proliferation of CPUs. Vendors are proud of their ability to crank out a new chip by pressing a few buttons on a CAD system, changing the mix of peripherals and memory, producing variant number 214 in a particular processor family. Variants are a sign of a good, healthy line of parts (look at that mind boggling array of 8051 parts), but are a nightmare for tool vendors. Each requires new hardware, software, support, evaluation boards, and the like. In the "good old days", when we saw only a few new parts per year per family, support was easy to find. Now my friends who make microcontroller tools complain of the frantic pace needed to support even a subset of the parts.

As tool consumers you probably don't care about the woes of the vendors. But part proliferation creates a problem that hits a bit closer to home: for any specific variant there may only be a handful of customers. Tool support may never exist for that part if vendors feel there's not a big enough market. An odd fact of the tool market (from compilers to ICEs) is that the health of the market is a function of the number of customers using a chip, not the number of chips used. CPU vendors are happy to get one or two huge design wins, say an automotive company that sucks up millions of parts per year. Tool folks might only sell a couple of units to such a customer, far too few to pay their huge development costs.

I know of dozens of big companies left stranded by CPUs with no support, who have had to in some cases build their own tools. Some even write custom compilers! There can't be a more expensive way to do a project.

Non-Computers

As one interested in the philosophical implications of our business I'm fascinated with the drift to "virtual" implementations of, well, everything. Hey, your cell phone is a fascinating connection - without wires - to a billion other phones on the planet; it's part of the biggest machine on the planet, yet looks like nothing more than a few bucks of electronics. Similar virtual connections underlie just about everything on the Internet as well. Now we're seeing a move to "virtual" microprocessors. Today, you can buy a micro, that, well, just has no physical being. It's a file of VHDL equations.

Buy a virtual Z80 or 186 and then incorporate that into your own design. Burn it into a big gate array or FPGA or ASIC. The idea is to reduce chip count by integrating the processor into the ASIC along with all of your proprietary circuits. It keeps costs and board size down.

We're used to software being a rather ethereal "thing", with no real physical implementation. Now we can buy "hardware" equally as ethereal. It's software hardware. Hardware software. Or something.

Some of the vendors promoting these ghostly CPUs promise the ability to customize the processor. Add instructions with the click the mouse! It would seem a magic solution to precisely matching computational power to your application's needs.

But, how will you use the new instructions? Code in assembly language only? Write your own compiler?

Worse, with the CPU buried inside of a big chip, how do you plan to troubleshoot your code?

Cache, prefetchers, superscaler designs, and lots of other ever-more-common processor features all create debugging headaches. My point here is not a complaint against the technology; it's to voice a concern that we dare not blindly design in the latest cool thing without understanding how we'll find our bugs. We've got to realize that these new features have both benefits and perils. I have seen too many designers in the flush of initial project optimism forget that soon they'll be up to their eyeballs in bugs, and that they will need some sort of tool to give them

Technological problems are a funny thing. The barriers rarely stand for long. Customer needs quickly translate into solutions. One only has to look at IBM's PowerPC parts, some of which include a built-in debug port that even supports real time trace, to see what the future might bring.

Modern-day Luddites fear technology, thinking it's advancing much faster than we poor humans can adjust to the changes. In last month's column I metaphorically walked a bit in their shoes, expounding my concerns that our embedded technology is moving at a faster rate than our ability to efficiently develop new systems. I see our infatuation with faster, smaller, and higher integration to be leading us down a path where the costs of development are skyrocketing.

CPU cores hidden away inside ASICs give fabulously small systems, yet that buried processor is all but impossible to probe. Couple bus cycles within fractions of a nanosecond to a peripheral and you leave no margin for your tools. One-off CPUs, whether from burying a VHDL virtual processor inside a high integration part, or from the huge explosion of derivatives of popular parts, are often tool-orphans. Tool vendors, after all, won't invest huge sums in developing products for a particular CPU unless they see a large, healthy market for their offerings.

Even seemingly boring issues like device packaging further isolate us from the processor. If we can't probe it, we can't see what's going on. We lose the visibility needed to find bugs.

We see some glimmers of "solutions" to these technology problems. An example is Motorola's Background Debug Mode (BDM). Their recent processors include a special serial port used only for debugging. Transistors are cheap - it makes sense to integrate extra onto the processor as a special debug port.

Similarly, other vendors are putting variants of JTAG (IEEE 1149.1) ports onboard their fastest CPUs, sometimes to aid product testing, others times for debugging. Like the BDM these are all essentially serial interfaces that give one access to the core while using only a minimum number of pins.

A debugger on-board the chip eliminates all speed issues. It functions despite cache's complications. Even when the CPU is hidden in a huge ASIC, if just a few pins come out for the serial debugger, then designers will have some ability to troubleshoot their code.

JTAG/BDM lets you set simple breakpoints, single step, and examine and change memory and I/O... in short, everything you can do with a normal PC design environment, like Microsoft's Visual C++.

BDM-like solutions are a reasonable subset of a debugging methodology. They're so inexpensive that every developer can have the toolset. Some tool vendors properly promote these as nothing more than debugging adjuncts, devices designed for working on certain non-real time sections of code. Their message is to "use the right tool for the right job - a BDM where it makes sense, and a full-function emulator for real time troubleshooting."

Unhappily, too many of us are so taken with the prospect of cheap tools that we hear the good news about BDM/JTAG but somehow don't listen to the second part of the message. I believe chip vendors, frustrated by the difficulty of providing real emulation for their latest creations, promote these limited serial debuggers as the "perfect" embedded debugging tool.

Cheap serial debuggers give us about the same debugging resources used by the millions of our programming brethren working on PCs, UNIX machines, and mainframes. So, why am I complaining?

It's what we lose. Real time trace. Performance analysis. Overlay RAM. Timing analysis.

Though the database programmers of the world have never had these tools, we need them; our problems are quite different. Our systems are riddled with interrupts and DMA. We run preemptive multitasking with closely coupled tasks. Most embedded systems have a critical real time component to their nature, so we need tools that let us work in the time domain.

Though the new breed of serial debugging devices does make it possible to get our systems operating properly, they don't deal with the complex nature of our products. We're forced to work heroically due to tool limitations.

But there's much more to the tool woes. I see developers squeezed between technology problems - which may ultimately prove to be solvable - and much more insidious business issues, some of which I honestly can't understand.

The Cost of Money

As an ex-tool vendor I can't count the times I've heard "well, we really need decent equipment, but my boss won't let me spend the money."

It matters little what equipment we're talking about. Once I wrote an off-hand comment about companies who won't upgrade computers. An avalanche of email filled my electronic in-box, from developers saddled with 386-class machines in the Pentium age. We live in front of our computers, spending hours per day with it. It's incomprehensible to me that a business won't provide very expensive engineers new machines every two years. I've seen compile times shrink from tens of minutes to tens of seconds when transitioning just one generation of computers; surely this translates immediately into real payroll savings and faster development times!

Yes, we have an insatiable appetite for new goodies. Glimmering new scopes, emulators, logic analyzers, and software tools fill our thoughts much as kids dream of Tonkas and Barbies. Very often, though, the gap between what we want and what we get is as wide as the Grand Canyon.

Now, I know the cost and scarcity of capital. Just try going to the bank, hand humbly in hand, looking for working capital when you really need it. Venture capital is the seed of high tech, but is much less available than people realize.

There's never enough money, especially in smaller businesses, so every decision is a financial trade-off between competing needs.

I also know the cost of payroll. It's by far the biggest expense in most technology businesses. Yet many managers view payroll as a sunk cost. Years ago my boss told me "I have to pay you anyway, but to buy that scope costs me real money."

Well, no, actually, he didn't have to pay me or any of the engineers. He had options: do less engineering with fewer people and save on salary. Use us inefficiently and ignore the costs. Work to improve our efficiency and either get products out faster or get the same work done with fewer people.

This concept of payroll as a fixed cost is a myth, one that destroys too many technology companies. Managers do have the ability to manage this cost, the biggest one of all, effectively. It's not easy and it's never "done"; effective management requires an intimate understanding of the processes involved, a willingness to experiment and tune, and a dedication to a never-ending quest to find lots of 1 and 2% improvements, as the magic 20% efficiency improvement silver bullet does not exist.

Our culture of absorbing payroll as a fixed expense means we battle for weeks over $10,000 tool costs while ignoring, or accepting, a million dollars in salary costs.

Perhaps this is symptomatic of uninformed managers, and exhibits itself in every area of development. One friend who makes a living designing products as a contractor tells me story after story of companies who happily spend a quarter million dollars on tooling for the product's plastic box yet balk at a quote for $30k in custom firmware.

I see an increasing number of companies embracing the noble ideal of "doing more with less" without understanding that sometimes spending a bit on tools is the fastest route to that ideal.

Time To Market

You can't pick up a trade magazine today without seeing the industry's mantra - Time To Market - gracing every article and ad. All sorts of studies indicate getting a product out first is the best way to gain market share and profitability. Whether this is true or not makes little difference; the important point is that management has universally bought into the concept, leaving it up to engineering to somehow "make it so".

The time to market furor explains surveys that show development time to be the number one priority of many engineering departments, with cost usually running third after quality. Whether we agree with the goals or not, it is at least a reasonable ranking of priorities.

Get it done fast. Do a good job. And then, worry about costs. These are the constraints we're working under, in order.

But we can't develop a realistic plan without considering all of the facts. One is that salaries continue to rise, especially now, and especially for highly trained and scarce engineers. None of us can control this.

Fast, gotta be fast. Cheap, too, somehow we have to save bucks wherever we can. OK - now what?

Astonishingly, more and more companies are making decisions like: no tools. Poor tools. Or, let's pick a chip that has no tools, or for which decent tools are a but a dream.

How on earth are we supposed to be fast with inadequate tools? Won't costs skyrocket as we spend more time struggling finding bugs - bugs that are more evasive than ever as products get more complex -using what amounts to toys?

In the face of increasing salaries, more complex products, and terrifying schedules, all too often the question "need normal">how are we going to get the work done" never gets answered honestly.

Yet, as you read this today,need yes"> hundreds of companies pursue development strategies that are doomed to cost too much and take too long. Some use custom microprocessors - for good reasons and bad - and build their own compilers and debuggers. I'm not saying this is necessarily wrong, it's just costly. Some of these businesses understand and manage the issues; others just yell louder at the developers to meet the schedule.

I've seen months spend gluing CPUs inaccessibly into the core of a monster ASIC, without the least thought given to debugging! and then the hardware guys present the firmware folks with this fiat accompli and only two months left in the schedule.

We must look at the technology challenges posed by the parts we chose, and then at our options for building the system and then finding bugs. We need normal">must find or invent ways of achieving our fast-quality-cheap goals before committing to a difficult or impossible technology.

And, management must understand that time costs money, real money, not just sunk costs. Further, crummy development environments never yield faster product introductions.

Are We the Problem?

This is not a Dilbert-like rant against managers. We're all infatuated with the latest technology, and we all are convinced that, this time, bugs won't be as big of a problem as last time.

Embedded processors through what's left of the 90s, and on into the next century will continue to get faster, more highly integrated, and will generally become much tougher to work on that those of yesteryear. That's a fact as sure as salary inflation and time to market pressures.

It's largely up to the developers doing the work to educate management, and to make intelligent decisions yielding debuggable products.

Often we are perceived as wanting everything without decent justifications. Faster computers, private offices, better software tools. Without educating our bosses about how these things save them money we'll lose most battles.

A common joke is the "capital equipment justification," all too often more an exercise in creative writing than in fact gathering and analysis. Sometimes tool vendors will present you with spreadsheets of savings from using their latest widget, but none of us really trust these figures. It's far better to use hard-hitting, quantitative data accumulated from your own hard-won experience. Don't have any? Shame on you!

One well-known bug reducer is recording each bug, stopping and thinking for a few seconds about how you could have avoided making the mistake in the first place. Take this a step further and think through (and record!) how you found it, using what tools. Log it all in an engineering notebook as you work; it's a matter of a few seconds time, yet will help you improve the way you work. This notebook will also serve as the raw data for your cost justifications. If that cruddy freeware compiler generated a bad opcode that took a day to find, a little math quickly will show how much money a multi-thousand dollar commercial package would save.

As you educate management, educate yourself, and remember those lessons when you're the boss!

I'll end as I started. No longer in the tool biz I have no vested interest in seeing anyone spend a nickel on the latest widget. I am a great believer, though, that the human condition has always been improved by our use of tools, from mastering fire to learning to use the simplest of plows, to today's amazingly complex and sophisticated embedded tools.

Bugs will never go away. Better development methodologies can (will? Not until we individual developers create a personal passion for improvement) reduce the error rate, but never to zero. Debuggers - of many types - will always be important tools. Make sure your hardware design and processor selection will allow you to use tools effectively, and then make intelligent decisions about which to get.