An Intro to DSP
The Basics of Digital Signal Processing. Originally in Embedded Systems
Programming, May, 1998.
 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
Perhaps one of the last bastions of old-fashioned embedded
design is one of the newest, hottest fields: Digital Signal Processors. DSP
developers still tune their code for every last bit of performance, squeezing
astonishing algorithms into sometimes tiny memory spaces.
Though DSPs have been around since the early 80s, now
their skyrocketing performance, decreasing price, and reasonable power
consumption make them the workhorse of choice for many computational-intensive
applications.
But the story of DSP is the story of signal
processing. A DSP is not your typical controller. These parts have architectures
highly optimized for specific types of operations.
Signal Processing 101
Unhappily for those of us who slid through math classes in
college, and who managed to forget anything beyond basic Algebra in the years
hence, the entire raison d'tre for DSPs revolves around applying non-trivial
math to complex analog signals. Sometimes they're also used as accelerators
for other very fast, complex mathematical transformations like graphics
algorithms.
DSP architecture simply makes no sense unless you
have a basic understanding of the sorts of math used to process signals.
A mathematical function of any sort, from a simple
multiplication to a filter or FFT, is an operation that transforms one or more
input signals to an output. We can think of a function for computing distance as
a closed box - an algorithm - that accepts time and speed as inputs, producing
distance as output.
Analog signals, such as used in virtually all of the
electronic devices we interact with daily, are voltages that vary with time. The
110 VAC (in the USA) at your wall outlet is a smoothly varying voltage that
fluctuates from minimum to maximum 60 times per second. The output of your CD
player is a similar voltage varying much more complexly with time to accurately
convert a digital bit stream to analog speaker motion representing Beethoven's
Fifth.
Signal processing algorithms accept analog signals
and feed them through a function to convert them to something else. One example
is the filter.
Filters are conceptually simple. A "high pass"
filter accepts an analog input and blocks all of the low frequency components.
"Low pass" filters do the opposite. "Notch filters" block or pass only
one narrow frequency range.
Remember the bad old days of phonograph records? The
tiniest imperfection created an annoying pop in the speakers. A filter can
reduce the amplitude of the pop (rendering it less audible) by simply averaging
three or four sequential samples.
That is, if the phonograph produced a signal voltage
that varies with time called f(t), then you could transform it by passing it
through a filtering algorithm of the form:
o(t) = [f(t-1) + f(t)
+ f(t+1)]/3
which is nothing more than a simple moving average.
Three adjacent points in time are summed, then divided by the number of points
to form a true average.
This is really a simple form of the
normal">convolution, an crucially important signal processing concept.
Convolutions, which feed one function through another, are notationally
represented by the asterisk. In the example above, we convolve the input signal
f(t) with an array of constants (call them c(t)), as follows:
o(t) = f(t) * c(t).
In this example the constant array is nothing more
than 1/3, 1/3, and 1/3. That is, for every point of the input signal, the
convolution f(t) * c(t) multiplies three adjacent points
by 1/3, and sums the three points.
Though the trivial function we've described does
reduce the size of the pop, it does so by smearing the pop over three adjacent
outputs. In fact, this moving average has the effect of smearing all of the
music just a bit, not something a purist would tolerate.
A different sort of filter, using the same math but a
different convolving function, could smear less while still reducing the
amplitude of the pop. For example, instead of a three point moving average use 5
points, selecting a different set of convolving coefficients:
o(t) = [2f(t-2) + 3f(t-1)
+ 5f(t) + 3f(t+1) + 2f(t+2)]/15
See how the center point is emphasized while outlying
points have much less influence on the transformed signal? We've changed our
simple moving average to an average that has "shape" associated with it
(defined by the convolving coefficients 2, 3, 5, 3, 2). There's less signal
smearing yet the amplitude of the pop still decreases.
A more complex notch filter could, if you
knew the spectral characteristics of the pop, totally eliminate its sound
by blocking anything with those defined parameters.
This is an example of a Finite Impulse Response (FIR)
filter. The output is a function of the inputs convolved with a set of
coefficients.
Another common filter is the Infinite Impulse
Response (IIR) filter. Like the FIR, it's a convolution of coefficients and
inputs. Unlike the FIR, the IIR equation also includes some number of outputs in
the result - a form of feedback.
FIRs and IIRs are the basis for a vast number of
signal processing algorithms implemented by DSPs. It is rather easy to make a
filter using analog components like op amps, resistors and capacitors. And, in
fact many electronic products use filter heavily, from CD players to radios.
Analog designs suffer from drift and stability, though, and are increasingly
difficult to build as your filter requirements become more precise. A digital
implementation lets you precisely specify your filtering requirements by
fiddling with the coefficients.
Even better, you can change the characteristics of a
digital filter on the fly by modifying
the convolving coefficients. These adaptive filters can more precisely match the
noise spectrum in a particular phone line, for example, or adjust itself to a
person's speech characteristics in a voice recognition circuit. No analog
filter implementation is so dynamic.
For non-analog types it's hard to get excited about
filtering; somehow it's tough to relate the math to our daily experience. One
simple application might help. Consider music synthesizers: they use electronics
to produce sounds imitating those created by conventional musical instruments.
It turns out a particular type of IIR filter produces sounds almost exactly the
same as those made by plucked strings. Strange as it may seem, if you feed white
noise - a Guassian distribution of sound - into the filter, you can simulate all
sorts of stringed instruments just by modifying the convolving coefficients.
Change from Violin to Cello by passing a new set of constants to the algorithm!
Filters are not the only sort of math performed by
DSPs. In 1822 Jean Fourier found that all periodic waveforms can be expressed as
the sum of sine waves of different frequencies and amplitudes. Sometimes a lot
of sine waves - a perfect square wave includes an in finite number of sine
components.
Various Fourier Transforms change functions of time
into functions of frequency, and vice versa. In other words, a Fourier transform
of a voltage rapidly digitized by an A/D can give us a map of every frequency
component in that signal. For decades engineers have used spectrum analyzers to
do this transformation in a development lab environment - it's quite dramatic
to see the complex waveforms of, say, the cellular radio band displayed as a
graph of vertical lines showing the amplitude on one axis, and their frequency
on the horizontal axis. A DSP running a Fourier transform does much the same.
The vector dot product (the sum of the pointwise
multiplication of two vectors) is important to many applications like
correlation, matrix math and multi-dimensional signal processing. Graphics
algorithms use vector addition. Other convolution techniques are the basis for
error correction in transmitted signals.
Convolutions, filters, Fourier transforms, vector
products and many other common signal processing tasks all share a common
implementation detail: the algorithms all compute vast numbers of sums of
products (a=bc+d). This is the basis for DSPs.
DSP Architectures
Most DSP applications employ a similar design. An analog
front-end feeds data to an A/D converter that digitizes data in the time domain.
The DSP reads the samples and computes lots and lots of equations of the form a=bc+d
at very high speeds. The output goes to a D/A and thence back to the real world.
Going back to eliminating the phonograph pop, an A/D
would sample the musical stream tens of thousands of times a second (44.1 Khz,
or 23 microseconds per sample, for CD-quality sound). The DSP would compute the
simple filter described, or more likely one with many more convolution terms in
real time, generating a digital representation of the smoothed music which a D/A
then converts back to analog. Given realistic numbers of convolution
coefficients, even in this simple system multiplies and adds are taking place at
sub-microsecond rates.
The first characteristic of a DSP is its ability to
solve equations of the form a=bx+b fast!
These high-speed math operations are moving a lot of
data and perhaps instructions around on system busses. The array of coefficients
may be large; certainly the data stream is huge, though it may not have to have
much transient on-board storage. The speeds may simply be too high for
reasonably-priced memory arrays.
The second characteristic of a DSP is it's memory
configuration. Most DSPs include some amount of on-board memory that runs at
full system speed.
In our mobile world half our users want to take the
pop-reduction equipment on the road, perhaps operating off a couple of AA
batteries for weeks on end. Presumably someone cranks the phonograph manually!
The third characteristic of a DSP is it's power
consumption. It's a battery-operated world, yet today's batteries have quite
limited amp-hour capacities.
DSPs use a number of architectural approaches to
compute sums of products at staggering rates. The first is Harvard or
Harvard-like bus structures, where the instruction and data spaces are
separated. This lets the CPU fetch instructions and data simultaneously.
Unfortunately now there are a baffling number of variants, some even including
cross-over switches to flip busses onto each other. It's virtually impossible
to make a deterministic comparison between bus architectures. Each has its own
proponent; each comes with its own strengths and weaknesses. It seems each new
family of DSPs has its own approach to moving data around fast.
Some of this memory is invariably on-board for speed
reasons. Modern DSP speeds range upwards of 200 MHz - a 5 nsec bus cycle, far
too fast for most memory systems. External RAM, if used, generally runs with
wait states to control system costs. This means the very fastest portions of the
code should relatively small, tight loops that can live on-board the chip.
You'll frequently find DSP chips that run multiple
processes in parallel, performing several multiply/accumulates at the same time.
TI's latest entry claims a sustained rate of 8 operations per clock, for a
staggering 1.6 billion instruction rate at 200 MHz.
The core of the DSP (and a critical difference
between DSPs and conventional microprocessors) is the Multiply/Accumulate unit
(MAC) which is the brains behind solving a=bc+d repeatedly and quickly. Words
cannot do justice to the MAC. Instead, look at this code snippet for the
ZSP16401 from ZSP Corp:
loop:
lddu r8, r13, 2 ; load x[i],x[i+1]
lddu r6, r14, 2 ; load y[i],y[i+1]
mac2.a r8, r6 1 ; accumulate x[i]*y[i] + x[i+1]* y[i+1]
agn0 loop ; loop
After the first pass through the loop the CPU executes this
operation in a single clock cycle! At 200 Mhz that's 5 nsec per computation.
Prefetchers and cache logic read ahead to keep the DSP fed with the x and y data
arrays.
A big problem with fast DSPs is simply keeping the
beast fed with data. Some specmanship may show mindboggling numbers of MIPs. A
one cycle multiply/accumulate is wonderful! if the system design can keep data
pouring into the unit fast enough. Make sure that these numbers indicate
sustained ratings, not a quick burst or two.
The quest for speed does carry a price: power. When
the Pentium first came out it consumed 14 watts, putting out enough heat to
almost be dangerous, while sucking down battery supplies at alarming rates. DSPs
also have a speed/power tradeoff, though their use in so many portable
applications means most include power-saving modes, and most offer quite
reasonable power consumption. For example, TI's TMS320C40 consumes only about
1 ma per MIP at 3 volts.
As the cost per transistor continues to plummet more
vendors are adding floating point math units to their DSPs. Till recently all
DSPs were integer units, processing typically 16, 24 or 32 bits of integer info
at a time.
Integers, though, are less than useful in many
applications. Most non-floating point applications use "fixed point" math,
where a 16 bit accumulator has an implied binary point. Typically the MSB is a
sign bit; the binary point is positioned to the right of the sign. This means
the entire accumulator represents a fraction that ranges from 0 to
.111111111111111, or around .99. (The first bit to the right of the binary point
represents ', the second represents ', etc.)
Fixed point is handy since multiplies can never cause
an overflow. Additions can, though, so most processors include a mechanism for
detecting this. It's important to realize, though, that fixed point is purely
a programming convention; any ALU that can do two's complement math does fixed
point automatically as the math works out the same. A certain amount of manual
scaling is required, since A/Ds generally present integer data.
The precision represented by this notation is limited to
the 15 bits after the binary point. The dynamic range, unless extra user-written
code extends it, is equally limited. In some applications inputs and product
terms may span vast dynamic ranges. If you select a DSP with on-board floating
point you can generally ignore the range issue as the usual notation gives 24
bits of precision and a range of about plus or minus 2127.
DSP Development
The pages of this magazine show plenty of tools for writing
and debugging DSP code. A few factors, though, make for some unique challenges
with this technology.
The assembly Vs C debate was settled a long time ago
for most microprocessor applications. 8 bit systems are now routinely coded in C
- even for C-hostile chips like the 8051 and PIC. This is much less true for
DSPs.
The quest for raw horsepower means that - at least in
the fastest portions of the firmware - developers almost always write DSP code
in assembly language. Indeed, many applications use assembly from top to bottom,
though this is starting to change.
Speed is not the only problem with high level
languages. C does not have a fixed point standard, a compelling argument in
favor of the more expensive floating point chips.
DSPs naturally lend themselves to a well-thought-out
mix of C and assembly. It'll be a long, long time before they are so fast that
machine cycles can be tossed away for the sake of using a HLL, as we regularly
do to reduce development time with conventional microprocessors.
While DSP compilers have lagged those of CISC, their
debugging tools eerily foreshadowed the direction that microprocessor debuggers
are only now headed. Since the speeds are so great, and since so much of a DSP
application lives in on-chip memory, conventional emulators that monitor the bus
are rare.
Instead, most DSP chips include a JTAG interface
optimized for debugging. A few pins are dedicated to serial in/out streams
driven by a relatively inexpensive control unit. Single stepping, memory and I/O
access, and the like all takes place over this serial interface. A high level
debugger with GUI, that understands the file formats produced by assemblers and
compilers, gives the developer a window into the system. Though this generally
precludes overlay RAM and real time trace, it's a realistic compromise between
debugging needs and physical realities.
Motorola pioneered this with CISC chips with their
BDM interface, now a standard fixture on most of their parts. Other vendors have
found it to be a cost-effective way to get some level of debugging power to
developers.
A tough part of DSP design is selecting algorithms
and coefficients for the particular filters for your application. Lots of
packages, from freebies on the net to CDs included with DSP books to some
commercial packages are available. Filter design is an art in itself, requiring
a fair amount of study to insure the resulting designs are stable and efficient.
The Future
The DSP has made entirely new sorts of technologies
available. Digital cellular wouldn't exist without them, as they do error
correction, equalization, and compression/decompression of speech in real time,
in a noisy, impossible RF environment.
There's a significant shift going on from custom
signal-processing ASICs to commercial DSP chips. Even ignoring all of the
benefits of using an off-the-shelf part, like all computers the DSP is a
programmable device. In a world full of change, ASICs whose logic is cast in
metaphorical concrete, are a significant liability. Some applications - like
ADSL and ISP modems - are on the market today as the communications standards
change at a frightening rate. Downloading new code sure beats replacing a board.
Applications from sonar/radar processing, motor speed
control, image enhancement/compression, speech processing and a thousand other
computationally-intensive means that DSP sales continue to grow at a rate some
50% faster than the overall semiconductor growth rate. Prices have never been
lower while performance continues to skyrocket. Though the big players in the
market are TI, Lucent, Analog Devices and Motorola (accounting for over 90% of
all DSP chips sold) lots of other companies offer intriguing parts balancing
speed, power, and memory organization.
Signal processing is coming of age; though DSPs will
never replace conventional microprocessors, we'll see them used wherever small
but fast chunks of looping code massage massive quantities of data.
|