Convolutions

Convolutions are a really cool way to smooth and even differentiate data. You gotta check this out.

Published in ESP, January 1994.

By Jack Ganssle

In November of 1992 I wrote about taming analog noise, a subject that generated a heart warming flood of letters. A lot of you apparently develop systems that read A/D converters and compute some sort of response based on the input data. In the good old days cheap 8 bit A/Ds were more than adequate for many applications, and had such a crummy resolution that noise wasn't much of an issue. That has certainly changed. I'm astonished at the number of ultra-high resolution converters now cheaply available. Maxim and others sell 16, 18, and even 20 bit A/Ds. Some are surprisingly cheap.

18 bits of conversion drives the weight of each bit into the microvolts. Each bit is worth:

(input voltage range)/(2**number of bits).

An old 8 bit A/D converting over a 0 to +5 range had a resolution of about 20 millivolts (.020 volts). 18 bits over the same range gives each LSB 19 microvolts (.000019 volts)! It's awfully hard to produce analog circuits that have less than 19 microvolts of noise.

As I discussed a year ago, fast repetitive signals or those modulated on a high frequency carrier are fairly easy to filter. An example is your TV set, which receives not much more than a few microvolts at the antenna, buried in many millivolts of noise. This is easy to filter since the signal is modulated on the station's carrier frequency. Narrow bandwidth RF filters notch out virtually everything that is not at almost exactly the carrier's center frequency. The noise is distributed over a huge bandwidth; the filters remove all but a tiny portion of that bandwidth, taking the noise out as well.

Radios are rather a special case. Most signals presented to digital systems are at a much lower frequency. Indeed, most are non-periodic, nearly DC voltages. No carrier exists; all of the "bandwidth" carries important information. Most solutions revolve around smart or silly averaging techniques: you read the sample N times, sum each read, and then divide by N.

It's important that only the latest N readings are included in the average, so very old data can't skew the results. N is chosen large enough to generate quiet data, but must be small compared to the rate data changes at, to avoid smearing the results.

For a running average, as each sample is collected the oldest is dropped and the new one added. A circular buffer is often used to store the readings. Once the buffer is initially filled the system can provide a smoothed output by taking only one reading and computing the average of the buffer's contents.

Any simple averaging technique reduces noise at a rate proportional to the square root of the number of samples in the average. Using 100 samples results in an order of magnitude improvement in noise figure. Since any system will eventually be bound by the computer and A/D speed, this results in a sort of diminishing return. Increasing the number of averages to 1000 results in only a factor of three or so improvement in noise over an average of 100. A tradeoff must be made between system response time and allowable noise level.

If your system acquires "swept" data (like that from an oscilloscope: signal strength on one axis and time or some other parameter on the other), you have the option of averaging both in time (by averaging each matching point of several sweeps together) as well as in the dimension the data is taken over. For a scope this dimension is time - a sample is taken so many times per second. Since the data behaves as a continuous function, there are no sharp discontinuities between data points. Any point i is much like its neighbors, so a short average over the time axis does not distort the result excessively.

Averaging over the baseline axis (say, time) will indeed smear the data somewhat, but is inherently faster than averaging multiple sweeps. Your average is complete almost in real time.

To average in the baseline dimension, imagine the digitized data has been placed in an array D[i], where i runs from 0 (the first point of the current scan) to k (the last point of the current scan). You can easily generate a smoothed version of this data by computing a new array D1[i..k]. Every element i of D1 is the average of D[i-2], D[i-1], D[i], D[i+1], D[i+2].

Of course, this has the effect of smearing the output data. Sharp peaks will be somewhat flattened. Can we define an averaging strategy that will avoid this side-effect?

The Convolution

The system just described generates an output for each point i that is the average of all the points in the vicinity of i. In effect, for each i we make an output by multiplying the input waveform by a unit step function, and summing the results at each point. The unit step is then slid one point to the right and the algorithm repeated. Adjusting the number of points included in each average simply changes the width of the step. Wider steps (i.e., a bigger N) give more noise-free data, but at the price of smearing it. Narrow steps give noisy signals that are more faithful to the input data. A step exactly one point wide gives the unmodified input signal. We call this technique the convolution.

Now, when I went to collage, the professors baffled me with math that left the concept behind convolution a total mystery. Sure, I could sling the exponential integrals the academics loved, but really had no idea the convolution is the simple process of using one function, in a narrow sliding window against another function, to generate a third that is the combination of the two.

In averaging we slide a very simple function along the axis. It's coefficients are 0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0. Each string of zeroes is infinitely long, and the string of ones is N points wide. Sure, we recognize that this smears the results. Why not "convolve" the input data with a function that resembles the signal itself? In other words, pick a set of weights that accentuate the mid-point and include reduced levels of the more distant points. Computationally this involves multiplying each point in the vicinity of i with a weighting factor, and then averaging the results. Points far away from the center play a much less significant role in the result, giving a more faithful output waveform. In other words, use a set of weighting values to fit the result to the anticipated result.

Remember that any continuous function can be approximated with a polynomial? We'll take advantage of this here, and represent the signal's shape over a small interval with a polynomial of some degree. We'll fit a curve through the data to generate a more realistic model of the signal we're digitizing. No one point will be assumed to be correct; the curve defined by a collection of points will yield data that is, on the average, closer to the truth.

The goal of curve fitting techniques is to reduce the "amount of error" in the data. A least squares approach does just this. If you wish to minimize the maximum error you can use the Chebyshev approach, which uses trig relations rather than polynomials

Least squares will find coefficients for a polynomial describing a curve that fits the data in such a fashion that the "sum-square" error is minimized, i.e., the square root of the sum of the squares of the error at each point is a minimum. The error at each point is the difference between what the fitted curve predicts, and what the actual data is.

It's too hard to compute a least squares polynomial in real time. Fortunately, the concept of convolutions can be extended to perform the least squares automatically.

The derivation is too involved to present here. I suggest reference to the Savitsky and Golay (see reference). Suffice to say that a set of integers can be defined that, when convolved with an input signal, gives an output waveform that is a least squares fit to an "ideal" signal over a narrow range.

Picking the convolving functions is a bit of an art form, and depends on the characteristics of your input data. Very complex data will need high-order polynomials to get a decent fit. Less busy data can probably be matched with simple third order equations.

Here are a few sets of integers that define convolving functions that will yield least squares fits to input data. Different convolution intervals are given. A polynomial fit will be made over the number of sample points shown. Thus, the set of 5 integers will try to fit 5 sample points.

	Points	25	21	17	13	9	5
							
	-12	-253					
	-11	-138					
	-10	-33	-171				
	-9	62	-76				
	-8	147	9	-21			
	-7	222	84	-6			
	-6	287	149	7	-11		
	-5	322	204	18	0		
	-4	387	249	27	9	-21	
	-3	422	284	34	16	14	
	-2	447	309	39	21	39	-3
	-1	462	324	42	24	54	12
	0	467	329	43	25	59	17
	1	462	324	42	24	54	12
	2	447	309	39	21	39	-3
	3	422	284	34	16	14	
	4	387	249	27	9	-21	
	5	322	204	18	0		
	6	287	149	7	-11		
	7	222	84	-6			
	8	147	9	-21			
	9	62	-76				
	10	-33	-171				
	11	-138					
	12	-253					
	Norm	5175	3059	323	143	231	35

It's easy to apply these formulas. To use the set of 5 integers, compute:

D(3)= [-3*D(1) + 12*D(2) + 17*D(3) + 12*D(4) + -3*D(5)]/35 D1(4)= [-3*D(2) + 12*D(3) + 17*D(4) + 12*D(5) + -3*D(6)]/35 D1(5)= [-3*D(3) + 12*D(4) + 17*D(5) + 12*D(6) + -3*D(7)]/35

Just as you'd divide by N when doing a simple average of N samples, you must divide the sum by the "norm" as shown above. In this case, the norm is 35.

If the data is very busy (i.e., over only a few sample points it undergoes a lot of maxima and minima), then a small set of integers should be picked (e.g., 9 rather than 21). After all, the method attempts to fit a curve to a short segment of the input data; busy data is all but impossible to fit under any circumstances. Data that changes slowly can use the larger sets of integers, resulting in more smoothing.

Differentiation

Stretch you mind back to college circuit theory. Those who managed to stay awake may remember that the convolution integral has an important property, namely:

f'(t)=g(t)*h'(t) and, f'(t)=g'(t)*h(t)

where * represents the convolution process and the prime marks indicate the derivative.

This means if you need to both smooth and differentiate a signal, you can convolve the signal with the derivative of the convolving function. You never need to explicitly differentiate the input signal, a task that can use far too much computer power to be practical.

Since there are many instances where it is important to compute the derivative of an input signal, it is worthwhile to pursue this a little further. Remember that the convolving process primarily smoothes input data. If a useful smoothing function is picked, the derivative of that function can be convolved with the input signal to both smooth and take the input's derivative. A single convolution can do both.

Why compute a derivative? The first derivative of a function is nothing more than its rate of change, a factor used extensively in some embedded systems, particularly in instruments.

So, if we compute the derivative of the least squares function, we can generate a new set of integers that both smoothes and differentiates the data. Again, refer to the reference for the math details. Theoretically, you can extend this concept to any number of derivatives, though it seems above about the third derivative the integers get unwieldy.

The following set of integers computes the first derivative:

	Points	25	21	17	13	9	5
							
	-12	30866					
	-11	8602					
	-10	-8525	84075				
	-9	-20982	10032				
	-8	-29236	-43284	748			
	-7	-33754	-78176	-98			
	-6	-35003	-96947	-643	1133		
	-5	-33450	-101900	-930	-660		
	-4	-29562	-95338	-1002	-1578	86	
	-3	-23806	-79504	-902	-1796	-142	
	-2	-16649	-56881	-673	-1489	-193	1
	-1	-8558	-29592	-358	-832	-126	-8
	0	0	0	0	0	0	0
	1	8558	29592	358	832	126	8
	2	16649	56881	673	1489	193	-1
	3	23806	79504	902	1796	142	
	4	29562	95338	1002	1578	-86	
	5	33450	101900	930	660		
	6	35003	96947	643	-1133		
	7	33754	78176	98			
	8	29236	43284	-748			
	9	20982	-10032				
	10	8525	-84075				
	11	-8602					
	12	-30866					
						
	Norm	1776060	3634092	23256	24024	1188	12

Apply these in exactly the same manner as previously described.

Conclusion

A lot of us primarily use old-fashioned averaging for noise reduction, often simply summing successive sweeps together. The convolution is a different way of looking at noise reduction, potentially improving response time by performing a smart average over the time dimension. It even lets you differentiate the signal, for essentially no cost in speed or memory. Of course, any sort of averaging does reduce signal fidelity somewhat, but in smoothing, as in life, everything is a compromise.

Those who are interested in pursuing this further should consult the classic paper by Abraham Savitzky and Marcel Golay in the July, 1964 issue of Analytical Chemistry (Volume 36 Number 8).