An Intro to DSP
The Basics of Digital Signal Processing. Originally in Embedded Systems Programming, May, 1998.
For novel ideas about building embedded systems (both hardware and firmware), join the 35,000 engineers who subscribe to The Embedded Muse, a free biweekly newsletter. The Muse has no hype and no vendor PR. Click here to subscribe.
By Jack Ganssle
Perhaps one of the last bastions of old-fashioned embedded design is one of the newest, hottest fields: Digital Signal Processors. DSP developers still tune their code for every last bit of performance, squeezing astonishing algorithms into sometimes tiny memory spaces.
Though DSPs have been around since the early 80s, now their skyrocketing performance, decreasing price, and reasonable power consumption make them the workhorse of choice for many computational-intensive applications.
But the story of DSP is the story of signal processing. A DSP is not your typical controller. These parts have architectures highly optimized for specific types of operations.
Signal Processing 101
Unhappily for those of us who slid through math classes in college, and who managed to forget anything beyond basic Algebra in the years hence, the entire raison d'tre for DSPs revolves around applying non-trivial math to complex analog signals. Sometimes they're also used as accelerators for other very fast, complex mathematical transformations like graphics algorithms.
DSP architecture simply makes no sense unless you have a basic understanding of the sorts of math used to process signals.
A mathematical function of any sort, from a simple multiplication to a filter or FFT, is an operation that transforms one or more input signals to an output. We can think of a function for computing distance as a closed box - an algorithm - that accepts time and speed as inputs, producing distance as output.
Analog signals, such as used in virtually all of the electronic devices we interact with daily, are voltages that vary with time. The 110 VAC (in the USA) at your wall outlet is a smoothly varying voltage that fluctuates from minimum to maximum 60 times per second. The output of your CD player is a similar voltage varying much more complexly with time to accurately convert a digital bit stream to analog speaker motion representing Beethoven's Fifth.
Signal processing algorithms accept analog signals and feed them through a function to convert them to something else. One example is the filter.
Filters are conceptually simple. A "high pass" filter accepts an analog input and blocks all of the low frequency components. "Low pass" filters do the opposite. "Notch filters" block or pass only one narrow frequency range.
Remember the bad old days of phonograph records? The tiniest imperfection created an annoying pop in the speakers. A filter can reduce the amplitude of the pop (rendering it less audible) by simply averaging three or four sequential samples.
That is, if the phonograph produced a signal voltage that varies with time called f(t), then you could transform it by passing it through a filtering algorithm of the form:
o(t) = [f(t-1) + f(t) + f(t+1)]/3
which is nothing more than a simple moving average. Three adjacent points in time are summed, then divided by the number of points to form a true average.
This is really a simple form of the normal">convolution, an crucially important signal processing concept. Convolutions, which feed one function through another, are notationally represented by the asterisk. In the example above, we convolve the input signal f(t) with an array of constants (call them c(t)), as follows:
o(t) = f(t) * c(t).
In this example the constant array is nothing more than 1/3, 1/3, and 1/3. That is, for every point of the input signal, the convolution f(t) * c(t) multiplies three adjacent points by 1/3, and sums the three points.
Though the trivial function we've described does reduce the size of the pop, it does so by smearing the pop over three adjacent outputs. In fact, this moving average has the effect of smearing all of the music just a bit, not something a purist would tolerate.
A different sort of filter, using the same math but a different convolving function, could smear less while still reducing the amplitude of the pop. For example, instead of a three point moving average use 5 points, selecting a different set of convolving coefficients:
o(t) = [2f(t-2) + 3f(t-1) + 5f(t) + 3f(t+1) + 2f(t+2)]/15
See how the center point is emphasized while outlying points have much less influence on the transformed signal? We've changed our simple moving average to an average that has "shape" associated with it (defined by the convolving coefficients 2, 3, 5, 3, 2). There's less signal smearing yet the amplitude of the pop still decreases.
A more complex notch filter could, if you knew the spectral characteristics of the pop, totally eliminate its sound by blocking anything with those defined parameters.
This is an example of a Finite Impulse Response (FIR) filter. The output is a function of the inputs convolved with a set of coefficients.
Another common filter is the Infinite Impulse Response (IIR) filter. Like the FIR, it's a convolution of coefficients and inputs. Unlike the FIR, the IIR equation also includes some number of outputs in the result - a form of feedback.
FIRs and IIRs are the basis for a vast number of signal processing algorithms implemented by DSPs. It is rather easy to make a filter using analog components like op amps, resistors and capacitors. And, in fact many electronic products use filter heavily, from CD players to radios. Analog designs suffer from drift and stability, though, and are increasingly difficult to build as your filter requirements become more precise. A digital implementation lets you precisely specify your filtering requirements by fiddling with the coefficients.
Even better, you can change the characteristics of a digital filter on the fly by modifying the convolving coefficients. These adaptive filters can more precisely match the noise spectrum in a particular phone line, for example, or adjust itself to a person's speech characteristics in a voice recognition circuit. No analog filter implementation is so dynamic.
For non-analog types it's hard to get excited about filtering; somehow it's tough to relate the math to our daily experience. One simple application might help. Consider music synthesizers: they use electronics to produce sounds imitating those created by conventional musical instruments. It turns out a particular type of IIR filter produces sounds almost exactly the same as those made by plucked strings. Strange as it may seem, if you feed white noise - a Guassian distribution of sound - into the filter, you can simulate all sorts of stringed instruments just by modifying the convolving coefficients. Change from Violin to Cello by passing a new set of constants to the algorithm!
Filters are not the only sort of math performed by DSPs. In 1822 Jean Fourier found that all periodic waveforms can be expressed as the sum of sine waves of different frequencies and amplitudes. Sometimes a lot of sine waves - a perfect square wave includes an in finite number of sine components.
Various Fourier Transforms change functions of time into functions of frequency, and vice versa. In other words, a Fourier transform of a voltage rapidly digitized by an A/D can give us a map of every frequency component in that signal. For decades engineers have used spectrum analyzers to do this transformation in a development lab environment - it's quite dramatic to see the complex waveforms of, say, the cellular radio band displayed as a graph of vertical lines showing the amplitude on one axis, and their frequency on the horizontal axis. A DSP running a Fourier transform does much the same.
The vector dot product (the sum of the pointwise multiplication of two vectors) is important to many applications like correlation, matrix math and multi-dimensional signal processing. Graphics algorithms use vector addition. Other convolution techniques are the basis for error correction in transmitted signals.
Convolutions, filters, Fourier transforms, vector products and many other common signal processing tasks all share a common implementation detail: the algorithms all compute vast numbers of sums of products (a=bc+d). This is the basis for DSPs.
Most DSP applications employ a similar design. An analog front-end feeds data to an A/D converter that digitizes data in the time domain. The DSP reads the samples and computes lots and lots of equations of the form a=bc+d
I'll present my Better Firmware Faster seminar in Melbourne and Perth, Australia February 20 and 26th. All are invited. More info here. The early registration discount ends January 20.
Going back to eliminating the phonograph pop, an A/D would sample the musical stream tens of thousands of times a second (44.1 Khz, or 23 microseconds per sample, for CD-quality sound). The DSP would compute the simple filter described, or more likely one with many more convolution terms in real time, generating a digital representation of the smoothed music which a D/A then converts back to analog. Given realistic numbers of convolution coefficients, even in this simple system multiplies and adds are taking place at sub-microsecond rates.
The first characteristic of a DSP is its ability to solve equations of the form a=bx+b fast!
These high-speed math operations are moving a lot of data and perhaps instructions around on system busses. The array of coefficients may be large; certainly the data stream is huge, though it may not have to have much transient on-board storage. The speeds may simply be too high for reasonably-priced memory arrays.
The second characteristic of a DSP is it's memory configuration. Most DSPs include some amount of on-board memory that runs at full system speed.
In our mobile world half our users want to take the pop-reduction equipment on the road, perhaps operating off a couple of AA batteries for weeks on end. Presumably someone cranks the phonograph manually!
The third characteristic of a DSP is it's power consumption. It's a battery-operated world, yet today's batteries have quite limited amp-hour capacities.
DSPs use a number of architectural approaches to compute sums of products at staggering rates. The first is Harvard or Harvard-like bus structures, where the instruction and data spaces are separated. This lets the CPU fetch instructions and data simultaneously. Unfortunately now there are a baffling number of variants, some even including cross-over switches to flip busses onto each other. It's virtually impossible to make a deterministic comparison between bus architectures. Each has its own proponent; each comes with its own strengths and weaknesses. It seems each new family of DSPs has its own approach to moving data around fast.
Some of this memory is invariably on-board for speed reasons. Modern DSP speeds range upwards of 200 MHz - a 5 nsec bus cycle, far too fast for most memory systems. External RAM, if used, generally runs with wait states to control system costs. This means the very fastest portions of the code should relatively small, tight loops that can live on-board the chip.
You'll frequently find DSP chips that run multiple processes in parallel, performing several multiply/accumulates at the same time. TI's latest entry claims a sustained rate of 8 operations per clock, for a staggering 1.6 billion instruction rate at 200 MHz.
The core of the DSP (and a critical difference between DSPs and conventional microprocessors) is the Multiply/Accumulate unit (MAC) which is the brains behind solving a=bc+d repeatedly and quickly. Words cannot do justice to the MAC. Instead, look at this code snippet for the ZSP16401 from ZSP Corp:
loop: lddu r8, r13, 2 ; load x[i],x[i+1] lddu r6, r14, 2 ; load y[i],y[i+1] mac2.a r8, r6 1 ; accumulate x[i]*y[i] + x[i+1]* y[i+1] agn0 loop ; loop
After the first pass through the loop the CPU executes this operation in a single clock cycle! At 200 Mhz that's 5 nsec per computation. Prefetchers and cache logic read ahead to keep the DSP fed with the x and y data arrays.
A big problem with fast DSPs is simply keeping the beast fed with data. Some specmanship may show mindboggling numbers of MIPs. A one cycle multiply/accumulate is wonderful! if the system design can keep data pouring into the unit fast enough. Make sure that these numbers indicate sustained ratings, not a quick burst or two.
The quest for speed does carry a price: power. When the Pentium first came out it consumed 14 watts, putting out enough heat to almost be dangerous, while sucking down battery supplies at alarming rates. DSPs also have a speed/power tradeoff, though their use in so many portable applications means most include power-saving modes, and most offer quite reasonable power consumption. For example, TI's TMS320C40 consumes only about 1 ma per MIP at 3 volts.
As the cost per transistor continues to plummet more vendors are adding floating point math units to their DSPs. Till recently all DSPs were integer units, processing typically 16, 24 or 32 bits of integer info at a time.
Integers, though, are less than useful in many applications. Most non-floating point applications use "fixed point" math, where a 16 bit accumulator has an implied binary point. Typically the MSB is a sign bit; the binary point is positioned to the right of the sign. This means the entire accumulator represents a fraction that ranges from 0 to .111111111111111, or around .99. (The first bit to the right of the binary point represents ', the second represents ', etc.)
Fixed point is handy since multiplies can never cause an overflow. Additions can, though, so most processors include a mechanism for detecting this. It's important to realize, though, that fixed point is purely a programming convention; any ALU that can do two's complement math does fixed point automatically as the math works out the same. A certain amount of manual scaling is required, since A/Ds generally present integer data.
The precision represented by this notation is limited to the 15 bits after the binary point. The dynamic range, unless extra user-written code extends it, is equally limited. In some applications inputs and product terms may span vast dynamic ranges. If you select a DSP with on-board floating point you can generally ignore the range issue as the usual notation gives 24 bits of precision and a range of about plus or minus 2127.
The pages of this magazine show plenty of tools for writing and debugging DSP code. A few factors, though, make for some unique challenges with this technology.
The assembly Vs C debate was settled a long time ago for most microprocessor applications. 8 bit systems are now routinely coded in C - even for C-hostile chips like the 8051 and PIC. This is much less true for DSPs.
The quest for raw horsepower means that - at least in the fastest portions of the firmware - developers almost always write DSP code in assembly language. Indeed, many applications use assembly from top to bottom, though this is starting to change.
Speed is not the only problem with high level languages. C does not have a fixed point standard, a compelling argument in favor of the more expensive floating point chips.
DSPs naturally lend themselves to a well-thought-out mix of C and assembly. It'll be a long, long time before they are so fast that machine cycles can be tossed away for the sake of using a HLL, as we regularly do to reduce development time with conventional microprocessors.
While DSP compilers have lagged those of CISC, their debugging tools eerily foreshadowed the direction that microprocessor debuggers are only now headed. Since the speeds are so great, and since so much of a DSP application lives in on-chip memory, conventional emulators that monitor the bus are rare.
Instead, most DSP chips include a JTAG interface optimized for debugging. A few pins are dedicated to serial in/out streams driven by a relatively inexpensive control unit. Single stepping, memory and I/O access, and the like all takes place over this serial interface. A high level debugger with GUI, that understands the file formats produced by assemblers and compilers, gives the developer a window into the system. Though this generally precludes overlay RAM and real time trace, it's a realistic compromise between debugging needs and physical realities.
Motorola pioneered this with CISC chips with their BDM interface, now a standard fixture on most of their parts. Other vendors have found it to be a cost-effective way to get some level of debugging power to developers.
A tough part of DSP design is selecting algorithms and coefficients for the particular filters for your application. Lots of packages, from freebies on the net to CDs included with DSP books to some commercial packages are available. Filter design is an art in itself, requiring a fair amount of study to insure the resulting designs are stable and efficient.
The DSP has made entirely new sorts of technologies available. Digital cellular wouldn't exist without them, as they do error correction, equalization, and compression/decompression of speech in real time, in a noisy, impossible RF environment.
There's a significant shift going on from custom signal-processing ASICs to commercial DSP chips. Even ignoring all of the benefits of using an off-the-shelf part, like all computers the DSP is a programmable device. In a world full of change, ASICs whose logic is cast in metaphorical concrete, are a significant liability. Some applications - like ADSL and ISP modems - are on the market today as the communications standards change at a frightening rate. Downloading new code sure beats replacing a board.
Applications from sonar/radar processing, motor speed control, image enhancement/compression, speech processing and a thousand other computationally-intensive means that DSP sales continue to grow at a rate some 50% faster than the overall semiconductor growth rate. Prices have never been lower while performance continues to skyrocket. Though the big players in the market are TI, Lucent, Analog Devices and Motorola (accounting for over 90% of all DSP chips sold) lots of other companies offer intriguing parts balancing speed, power, and memory organization.
Signal processing is coming of age; though DSPs will never replace conventional microprocessors, we'll see them used wherever small but fast chunks of looping code massage massive quantities of data.