Selecting a CPU
How do you decide what CPU to use? Here's some factors to consider.
Published in Embedded Systems Programming, December 1990
For novel ideas about building embedded systems (both hardware and firmware), join the 27,000+ engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. It takes just a few seconds (just enter your email, which is shared with absolutely no one) to subscribe.
By Jack Ganssle
Have you ever stopped to reflect on the evolution of the microprocessor? Everyone knows that the long-defunct 4004 and its immediate successor, the 4040, created this industry. Most of us who remember these chips are now the graybeards of the business, which, at age 36, I find sobering. How fast electronics changes!
Today you can buy an incredible number of MIPs for rock-bottom prices. RISC, DSP, and advanced CISC processors dominate the trade press. All of us have or covet a 386 or 68030-based desktop machine. It's important to remember that the highly visible personal computer is but the tip of the computer iceberg. The workhorses of our industry are the 8 and, yes, 4 bit machines. Nearly every appliance is based on a four bit CPU; nearly every instrument, controller, and process control device uses an 8 bit machine. Dataquest predicts that in 1990 eight bit CPUs will outsell 16 bitters by more than an order of magnitude; by 1993 the market share of 8 bitters will increase even more, to some 20 times that of 16 and 32 bits together.
Despite the staggering number of 8 bit computers in use, the architecture really hasn't changed much in almost 15 years. The well deserved early demise of the first 8 bit micro, the 8008, made everyone in the industry expect that its successors would die equally quick deaths. Who would have believed that the Z80 could, after a decade and a half, still be one of the best selling microprocessors in the world? Although it has evolved into the 64180 and Z280, the Z80 itself is still designed into new systems every day of the year. The Z80's heirs offer a high degree of peripheral integration, but differ little architecturally from their heritage. Motorola's line of CPUs has followed the same path - incremental improvements still have not really changed the nature of the animal.
In the single chip world, the 6800 and 8051 families are still the most-used processors in the microcontroller world. Dozens of descendants all offer variations on memory capacity and peripheral mix, but differ little in terms of raw horsepower or instruction set.
I think the insatiable demand for some amount of cheap processing power created our industry, but that this demand has been largely satisfied by the capabilities of 70's era CPUs. Yes, some applications will always need a tremendous amount of compute horsepower, and will always require the latest marvel of silicon (or GaAs) engineering. But how much compute power do you need to run a microwave oven? How much will you ever need?
Hitachi's 647180X is a really clever example of where we may be headed. Silicon is cheap, but rather than try to create a new, incompatible generation of computers, they ported the 64180 core (again, just a Z80 in disguise) to the single chip world. A vast number of peripherals, limited only by pin count (not silicon real estate), with plenty of on-board memory, reduces total system cost. The Z80 compatibility lets real world users port their old applications to the chip, and lets them use the tools the industry spent years developing and proving. Another interesting example is Zilog's new Z8. At 18 pins, it is physically small and cheap enough to be used as a true distributed processor - put one wherever you have even a little data collection or reduction to do. Does it have a sexy instruction set? No. Does it elegantly solve a wide range of problems? Most certainly.
At last year's Embedded Systems Conference, Andy Grove of Intel announced that by the end of the century Intel's darling will be the 80786 with 100 million transistors and a 250 Mhz clock rate. What will we do with all of this capability? High end applications will always exist, but the bulk of the industry seems to be low performance, embedded applications. (But, I want one of those babies in my CAD machine!)
I'm writing this on a Toshiba T1000 at 37000 feet. Coffee stained, battered, and travel-weary, its 4.77 Mhz 80C88 doesn't impress the fellow across the aisle with the latest Compaq wonder. His batteries dies two hours ago - I'm still working. Strive for appropriate technology!
Just as "Appropriate Technology" is the new buzzword for third world development efforts, it should be a rallying point for designers of the latest hi-tech equipment as well. Remember that we're not designing for the sake of producing something wonderful - we're designing to make a marketable product! Probably no decision you will make will effect the overall nature and cost of the product as much as the CPU selection, so be sure to pick one that fits the job at hand, rather than the one getting the most press coverage.
In most organizations the hardware designers select a processor that meets certain performance criteria and that cleanly interfaces to the rest of the system. Software concerns are just as important as hardware issues, and should be just as influential.
A crucial decision must be made - can you reuse your old code? If 70% of the system can be stolen from current projects, especially if coded in assembly, fight to keep the current architecture! The software is probably the biggest part of the engineering budget - preserve your investment.
Some aspects of CPU selection are entirely unrelated to technical issues. The product's marketability is a function of its total life cycle costs, including hard-to-measure yet critical expenses like inventory handling. If your company used 68000s and 68000 peripherals in all of its products, do you really want to stock another entire family of components?
What experience does the design team have? Retraining an army of 8088 programmers in the intricacies of the 68000 is bound to be expensive. They'll be learning while doing, struggling with new hardware, instruction sets, and tools; making mistakes while the project's meter is running. The production, repair, and support groups will need additional equipment and training. If, however, the company's long range plans involve a move to the new architecture, then taking the plunge is unavoidable. Try to pick a simple project for starters and budget enough time to overcome the inevitable learning curve.
The cost of tools is an important consideration. Even if you ultimately decide to use the same processor employed in all of the company's products, the start of a project is a good time to re-evaluate the development environment. Maybe now is the time to move from assembly to C, or to upgrade to an ANSI compatible compiler. Are the debuggers adequate? Should you try a source code management program? Programmer time is hideously expensive - look for ways to spend a few bucks up front to save lots of time downstream.
Obviously the cost of the processor itself is important, but don't be lulled into looking at its price alone. Total system cost is the issue - the half dollar CPU that needs a twenty dollar UART is no bargain.
High integration parts offer a wide range of peripherals on the processor's die. This can dramatically reduce system costs. Even if the chip is relatively expensive, smaller PC boards, fewer sockets, and simpler decoding might swing the cost tradeoff in favor of the high integration solution.
Carefully analyze the peripheral selection included on the chip. Is it really adequate? In an interrupt-driven environment, be sure that the interrupt structure is useful. If the device is a microcontroller, does it include enough RAM and ROM? This may impact language selection.
Fast data transfers are sometimes important. The on board DMA controllers are frequently used to move memory data around. If this is a requirement, be sure the chip supports memory to memory DMA. Some restrict DMA to memory to I/O cycles.
A lot of embedded systems depend on a real time operating system. Context switching is sequenced by a regular timer interrupt. Does the chip have a spare timer? Don't let the hardware team allocate all of them to their more visible needs. Few hardware guys realize that these sort of resources are sometimes needed to make the software run.
Whether a microcontroller or microprocessor is used, one of the biggest software issues in processor selection is the CPU's address space. Will your program fit into memory?
Microcontrollers typically have only small amounts of on-board memory. Remember - once the chip choice is made, you are committed to making the code work on that CPU. It's very sleazy to report to your boss halfway through a project that a different CPU will be needed. A simple remote data logger might be coded in a high level language, while complex applications can be a nightmare to shoehorn in. Be very sure about memory needs before casting the processor choice in concrete!
The huge address spaces of 16 bit microprocessors are more than adequate for most programs. This is not the case in the 8 bit world, which usually limits addresses to 64k. Once this seemed like an unlimited ocean we could never fill. Embedded projects are getting bigger, and less efficient tools are regularly used to reduce the NRE costs. 64k might not be enough.
Some designs use bank switching schemes or memory management units (MMUs) to let the program disable one section of RAM or ROM and bring in another. While potentially giving access to immense memory arrays, neither of these approaches yields the nice huge linear address space we all yearn for. (The same could be said about segmentation on the 80x88). Will your compiler support the MMU? Few compilers automatically use the memory manager to squeeze a big program into the project's virtual address space. In general, you'll have to handle the MMU manually, perhaps tediously issuing lots of MMU commands throughout the code. Still, it does offer a reasonable way out of the memory constraints imposed by a 16 bit address bus.
In selecting the processor most companies first look at performance. Semiconductor vendors are happy to ship you crateloads of comparative benchmarks showing how their latest CPU outperforms the competition. Drystones, Whetstones, MIPs, and Linpacks - their numbers are legion, baffling, and usually meaningless. An embedded CPU needs just enough horsepower to solve its one, specific problem. Only two questions are relevant: will this processor get the job done within specified time constraints? Will it satisfy performance needs imposed on the product or its derivatives in the future? The answers are sometimes not easy to determine.
If the project is an incremental upgrade of an existing product then consider instrumenting the current code to measure exactly how much free processor time exists. The results are always interesting and sometimes terrifying. Performance analyzers will non-intrusively show the percentage of time spent in each section of the code, particularly the idle loop. You can do the same with an oscilloscope. Add code to set an I/O bit high only when the idle loop is running. The scope will immediately show the bit's state - it's a simple matter to then compute the percent CPU utilization. In a performance-critical application, it's a good idea to build this code in from the beginning. It can be IFed out prior to shipment, but easily reenabled at any time for maintenance.
When a new project is complete, consider making these measurements to close the feedback loop on the design process. Just how accurate were your performance estimations? If, as is often the case, the numbers bear little resemblance to the original goals, then figure out where the errors were, and use this information to improve your estimating ability. Without feedback, you work forever in the dark.
Most studies find that 90% of a processor's time is spent executing 10% of the code. Identify this 10% in the design (before writing code) and focus your energies on this section.
Modelling the critical sections is always a good idea. Try writing similar code on a PC, being sure to use the same language as in the final system. Set up an experiment to run the routine hundreds or thousands of times, so you can get accurate elapsed time measurements.
The trick is then to port these execution figures to estimates on the target hardware. Well heeled companies buy a development board for each of the CPUs being evaluated. Smaller firms can use simulators (although these cost as much as the boards!), or you can "guestimate" a conversion factor. Compare instruction sets and resulting timing. Include wait states and DMA activity. You can get quite accurate numbers this way, but a wise designer will then add in another 50% in the interest of conservative design.
Real time operations, especially those synchronized to external devices, can be modelled the same way. The PC has an extensive interrupt structure - use it! Its software interrupts can be used to simulate external hardware events.
While modelling the code, use this opportunity to debug the algorithms involved. A PC or other system with a friendly debugging interface makes it easy to work out conceptual bugs while also estimating the code's performance. It is surprising how often actually making something work will turn up problems that demand much more memory or performance. Hardware engineers prototype hardware; we should prototype software.
Sometimes your code will have to respond to very fast external devices that stretch the processor's capability to the limit. Success depends on confining the fast code to a single small routine that can be studied carefully and accurately before proceeding. You may have little choice but to write assembly code on paper and count instruction execution times. The great peril in this is proceeding under the assumption that the code will work - it seldom does. No matter how simple, have an associate evaluate it.
Forget about computing timing with complex processors. Some, like the H16 and Z280, have on board cache, prefetchers and other hardware that make it all but impossible to measure instruction times. The timing tables included in the manufacturer's data books seemed to be designed to obfuscate. How long does an ADD take? No one really seem to know...
When making the tradeoffs, be sure to factor in special useful features of the processor. A multiply instruction can speed the code up considerably - if it is useful. Sometimes the highly touted multiply or divide runs surprisingly slowly. Check the timing! It's also common to find the math instructions work with very limited precision.
Picking the processor is not easy. Consider using a feature matrix with weighting factors scaled to the project's tradeoffs. When there is no clear deciding factor between competing CPUs it's usually a good idea to simply stick with your company's current processor.