Convolutions
Convolutions are a really cool way to smooth and even differentiate
data. You gotta check this out.
Published in ESP, January 1994.
 |
For hints, tricks and ideas about better ways to build embedded systems, subscribe to The Embedded Muse, a free biweekly e-newsletter. No hype, just down to earth embedded talk. 23,000 other engineers subscribe. It takes just a few seconds (all we need is your email address, which is shared with absolutely no one) to subscribe to the Embedded Muse. |
In November of 1992 I wrote about taming analog noise, a subject
that generated a heart warming flood of letters. A lot of you
apparently develop systems that read A/D converters and compute
some sort of response based on the input data. In the good old
days cheap 8 bit A/Ds were more than adequate for many applications,
and had such a crummy resolution that noise wasn't much of an
issue. That has certainly changed. I'm astonished at the number
of ultra-high resolution converters now cheaply available. Maxim
and others sell 16, 18, and even 20 bit A/Ds. Some are surprisingly
cheap.
18 bits of conversion drives the weight of each bit into the microvolts.
Each bit is worth:
(input voltage range)/(2**number of bits).
An old 8 bit A/D converting over a 0 to +5 range had a resolution
of about 20 millivolts (.020 volts). 18 bits over the same range
gives each LSB 19 microvolts (.000019 volts)! It's awfully
hard to produce analog circuits that have less than 19 microvolts
of noise.
As I discussed a year ago, fast repetitive signals or those modulated
on a high frequency carrier are fairly easy to filter. An example
is your TV set, which receives not much more than a few microvolts
at the antenna, buried in many millivolts of noise. This is easy
to filter since the signal is modulated on the station's carrier
frequency. Narrow bandwidth RF filters notch out virtually everything
that is not at almost exactly the carrier's center frequency.
The noise is distributed over a huge bandwidth; the filters remove
all but a tiny portion of that bandwidth, taking the noise out
as well.
Radios are rather a special case. Most signals presented to digital
systems are at a much lower frequency. Indeed, most are non-periodic,
nearly DC voltages. No carrier exists; all of the "bandwidth"
carries important information. Most solutions revolve around smart
or silly averaging techniques: you read the sample N times, sum
each read, and then divide by N.
It's important that only the latest N readings are included in
the average, so very old data can't skew the results. N is chosen
large enough to generate quiet data, but must be small compared
to the rate data changes at, to avoid smearing the results.
For a running average, as each sample is collected the oldest
is dropped and the new one added. A circular buffer is often
used to store the readings. Once the buffer is initially filled
the system can provide a smoothed output by taking only one reading
and computing the average of the buffer's contents.
Any simple averaging technique reduces noise at a rate proportional
to the square root of the number of samples in the average. Using
100 samples results in an order of magnitude improvement in noise
figure. Since any system will eventually be bound by the computer
and A/D speed, this results in a sort of diminishing return.
Increasing the number of averages to 1000 results in only a factor
of three or so improvement in noise over an average of 100. A
tradeoff must be made between system response time and allowable
noise level.
If your system acquires "swept" data (like that from
an oscilloscope: signal strength on one axis and time or some
other parameter on the other), you have the option of averaging
both in time (by averaging each matching point of several sweeps
together) as well as in the dimension the data is taken over.
For a scope this dimension is time - a sample is taken so many
times per second. Since the data behaves as a continuous function,
there are no sharp discontinuities between data points. Any point
i is much like its neighbors, so a short average over the time
axis does not distort the result excessively.
Averaging over the baseline axis (say, time) will indeed smear
the data somewhat, but is inherently faster than averaging multiple
sweeps. Your average is complete almost in real time.
To average in the baseline dimension, imagine the digitized data
has been placed in an array D[i], where i runs from 0 (the first
point of the current scan) to k (the last point of the current
scan). You can easily generate a smoothed version of this data
by computing a new array D1[i..k]. Every element i of D1 is the
average of D[i-2], D[i-1], D[i], D[i+1], D[i+2].
Of course, this has the effect of smearing the output data. Sharp
peaks will be somewhat flattened. Can we define an averaging strategy
that will avoid this side-effect?
The Convolution
The system just described generates an output for each point i
that is the average of all the points in the vicinity of i. In
effect, for each i we make an output by multiplying the input
waveform by a unit step function, and summing the results at each
point. The unit step is then slid one point to the right and the
algorithm repeated. Adjusting the number of points included in
each average simply changes the width of the step. Wider steps
(i.e., a bigger N) give more noise-free data, but at the price
of smearing it. Narrow steps give noisy signals that are more
faithful to the input data. A step exactly one point wide gives
the unmodified input signal. We call this technique the convolution.
Now, when I went to collage, the professors baffled me with math
that left the concept behind convolution a total mystery. Sure,
I could sling the exponential integrals the academics loved, but
really had no idea the convolution is the simple process of using
one function, in a narrow sliding window against another function,
to generate a third that is the combination of the two.
In averaging we slide a very simple function along the axis. It's
coefficients are 0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0. Each string
of zeroes is infinitely long, and the string of ones is N points
wide. Sure, we recognize that this smears the results. Why not
"convolve" the input data with a function that resembles
the signal itself? In other words, pick a set of weights that
accentuate the mid-point and include reduced levels of the more
distant points. Computationally this involves multiplying each
point in the vicinity of i with a weighting factor, and then averaging
the results. Points far away from the center play a much less
significant role in the result, giving a more faithful output
waveform. In other words, use a set of weighting values to fit
the result to the anticipated result.
Remember that any continuous function can be approximated with
a polynomial? We'll take advantage of this here, and represent
the signal's shape over a small interval with a polynomial of
some degree. We'll fit a curve through the data to generate a
more realistic model of the signal we're digitizing. No one point
will be assumed to be correct; the curve defined by a collection
of points will yield data that is, on the average, closer to the
truth.
The goal of curve fitting techniques is to reduce the "amount
of error" in the data. A least squares approach does just
this. If you wish to minimize the maximum error you can use the
Chebyshev approach, which uses trig relations rather than polynomials
to approximate the function.
Least squares will find coefficients for a polynomial describing
a curve that fits the data in such a fashion that the "sum-square"
error is minimized, i.e., the square root of the sum of the squares
of the error at each point is a minimum. The error at each point
is the difference between what the fitted curve predicts, and
what the actual data is.
It's too hard to compute a least squares polynomial in real time.
Fortunately, the concept of convolutions can be extended to perform
the least squares automatically.
The derivation is too involved to present here. I suggest reference
to the Savitsky and Golay (see reference). Suffice to say that
a set of integers can be defined that, when convolved with an
input signal, gives an output waveform that is a least squares
fit to an "ideal" signal over a narrow range.
Picking the convolving functions is a bit of an art form, and
depends on the characteristics of your input data. Very complex
data will need high-order polynomials to get a decent fit. Less
busy data can probably be matched with simple third order equations.
Here are a few sets of integers that define convolving functions
that will yield least squares fits to input data. Different convolution
intervals are given. A polynomial fit will be made over the number
of sample points shown. Thus, the set of 5 integers will try to
fit 5 sample points.
Points 25 21 17 13 9 5
-12 -253
-11 -138
-10 -33 -171
-9 62 -76
-8 147 9 -21
-7 222 84 -6
-6 287 149 7 -11
-5 322 204 18 0
-4 387 249 27 9 -21
-3 422 284 34 16 14
-2 447 309 39 21 39 -3
-1 462 324 42 24 54 12
0 467 329 43 25 59 17
1 462 324 42 24 54 12
2 447 309 39 21 39 -3
3 422 284 34 16 14
4 387 249 27 9 -21
5 322 204 18 0
6 287 149 7 -11
7 222 84 -6
8 147 9 -21
9 62 -76
10 -33 -171
11 -138
12 -253
Norm 5175 3059 323 143 231 35
It's easy to apply these formulas. To use the set of 5 integers,
compute:
D(3)= [-3*D(1) + 12*D(2) + 17*D(3) + 12*D(4) + -3*D(5)]/35 D1(4)=
[-3*D(2) + 12*D(3) + 17*D(4) + 12*D(5) + -3*D(6)]/35 D1(5)= [-3*D(3)
+ 12*D(4) + 17*D(5) + 12*D(6) + -3*D(7)]/35
Just as you'd divide by N when doing a simple average of N samples,
you must divide the sum by the "norm" as shown above.
In this case, the norm is 35.
If the data is very busy (i.e., over only a few sample points
it undergoes a lot of maxima and minima), then a small set of
integers should be picked (e.g., 9 rather than 21). After all,
the method attempts to fit a curve to a short segment of the input
data; busy data is all but impossible to fit under any circumstances.
Data that changes slowly can use the larger sets of integers,
resulting in more smoothing.
Differentiation
Stretch you mind back to college circuit theory. Those who managed
to stay awake may remember that the convolution integral has an
important property, namely:
f'(t)=g(t)*h'(t) and, f'(t)=g'(t)*h(t)
where * represents the convolution process and the prime marks
indicate the derivative.
This means if you need to both smooth and differentiate a signal,
you can convolve the signal with the derivative of the convolving
function. You never need to explicitly differentiate the input
signal, a task that can use far too much computer power to be
practical.
Since there are many instances where it is important to compute
the derivative of an input signal, it is worthwhile to pursue
this a little further. Remember that the convolving process primarily
smoothes input data. If a useful smoothing function is picked,
the derivative of that function can be convolved with the input
signal to both smooth and take the input's derivative. A single
convolution can do both.
Why compute a derivative? The first derivative of a function is
nothing more than its rate of change, a factor used extensively
in some embedded systems, particularly in instruments.
So, if we compute the derivative of the least squares function,
we can generate a new set of integers that both smoothes and differentiates
the data. Again, refer to the reference for the math details.
Theoretically, you can extend this concept to any number of derivatives,
though it seems above about the third derivative the integers
get unwieldy.
The following set of integers computes the first derivative:
Points 25 21 17 13 9 5
-12 30866
-11 8602
-10 -8525 84075
-9 -20982 10032
-8 -29236 -43284 748
-7 -33754 -78176 -98
-6 -35003 -96947 -643 1133
-5 -33450 -101900 -930 -660
-4 -29562 -95338 -1002 -1578 86
-3 -23806 -79504 -902 -1796 -142
-2 -16649 -56881 -673 -1489 -193 1
-1 -8558 -29592 -358 -832 -126 -8
0 0 0 0 0 0 0
1 8558 29592 358 832 126 8
2 16649 56881 673 1489 193 -1
3 23806 79504 902 1796 142
4 29562 95338 1002 1578 -86
5 33450 101900 930 660
6 35003 96947 643 -1133
7 33754 78176 98
8 29236 43284 -748
9 20982 -10032
10 8525 -84075
11 -8602
12 -30866
Norm 1776060 3634092 23256 24024 1188 12
Apply these in exactly the same manner as previously described.
Conclusion
A lot of us primarily use old-fashioned averaging for noise reduction,
often simply summing successive sweeps together. The convolution
is a different way of looking at noise reduction, potentially
improving response time by performing a smart average over the
time dimension. It even lets you differentiate the signal, for
essentially no cost in speed or memory. Of course, any sort of
averaging does reduce signal fidelity somewhat, but in smoothing,
as in life, everything is a compromise.
Those who are interested in pursuing this further should consult
the classic paper by Abraham Savitzky and Marcel Golay in the
July, 1964 issue of Analytical Chemistry (Volume 36 Number 8).
|