Digital filtering can improve audio signals in many ways. For instance, Wiener filtering can be used to separate frequencies that are mainly signal, from frequencies that are mainly noise (see Chapter 17). Likewise, deconvolution can compensate for an undesired convolution, such as in the restoration of old recordings (also discussed in Chapter 17). These types of linear techniques are the backbone of DSP. Several nonlinear techniques are also useful for audio processing. Two will be briefly described here.
The first nonlinear technique is used for reducing wideband noise in speech signals. This type of noise includes: magnetic tape hiss, electronic noise in analog circuits, wind blowing by microphones, cheering crowds, etc. Linear filtering is of little use, because the frequencies in the noise completely overlap the frequencies in the voice signal, both covering the range from 200 hertz to 3.2 kHz. How can two signals be separated when they overlap in both the time domain and the frequency domain?
Here's how it is done. In a short segment of speech, the amplitude of the frequency components are greatly unequal. As an example, Fig. 22-10a illustrates the frequency spectrum of a 16 millisecond segment of speech (i.e., 128 samples at an 8 kHz sampling rate). Most of the signal is contained in a few large amplitude frequencies. In contrast, (b) illustrates the spectrum when only random noise is present; it is very irregular, but more uniformly distributed at a low amplitude.
Now the key concept: if both signal and noise are present, the two can be partially separated by looking at the amplitude of each frequency. If the amplitude is large, it is probably mostly signal, and should therefore be retained. If the amplitude is small, it can be attributed to mostly noise, and should therefore be discarded, i.e., set to zero. Mid-size frequency components are adjusted in some smooth manner between the two extremes.
Another way to view this technique is as a time varying Wiener filter. As you recall, the frequency response of the Wiener filter passes frequencies that are mostly signal, and rejects frequencies that are mostly noise. This
requires a knowledge of the signal and noise spectra beforehand, so that the filter's frequency response can be determined. This nonlinear technique uses the same idea, except that the Wiener filter's frequency response is recalculated for each segment, based on the spectrum of that segment. In other words, the filter's frequency response changes from segment-to-segment, as determined by the characteristics of the signal itself.
One of the difficulties in implementing this (and other) nonlinear techniques is that the overlap-add method for filtering long signals is not valid. Since the frequency response changes, the time domain waveform of each segment will no longer align with the neighboring segments. This can be overcome by remembering that audio information is encoded in frequency patterns that change over time, and not in the shape of the time domain waveform. A typical approach is to divide the original time domain signal into overlapping segments. After processing, a smooth window is applied to each of the over-lapping segments before they are recombined. This provides a smooth transition of the frequency spectrum from one segment to the next.
The second nonlinear technique is called homomorphic signal processing. This term literally means: the same structure. Addition is not the only way that noise and interference can be combined with a signal of interest; multiplication and convolution are also common means of mixing signals together. If signals are combined in a nonlinear way (i.e., anything other than addition), they cannot be separated by linear filtering. Homomorphic techniques attempt to separate signals combined in a nonlinear way by making the problem become linear. That is, the problem is converted to the same structure as a linear system.
For example, consider an audio signal transmitted via an AM radio wave. As atmospheric conditions change, the received amplitude of the signal increases and decreases, resulting in the loudness of the received audio signal slowly changing over time. This can be modeled as the audio signal, represented by , being multiplied by a slowly varying signal, , that represents the changing gain. This problem is usually handled in an electronic circuit called an automatic gain control (AGC), but it can also be corrected with nonlinear DSP.
As shown in Fig. 22-11, the input signal, a[ ] × g[ ], is passed through the logarithm function. From the identity, log(xy) = log x + log y, this results in two signals that are combined by addition, i.e., log a[ ] + log g[ ]. In other words, the logarithm is the homomorphic transform that turns the nonlinear problem of multiplication into the linear problem of addition.
Next, the added signals are separated by a conventional linear filter, that is, some frequencies are passed, while others are rejected. For the AGC, the gain signal, g[ ], will be composed of very low frequencies, far below the 200 hertz to 3.2 kHz band of the voice signal. The logarithm of these signals will have more complicated spectra, but the idea is the same: a high-pass filter is used to eliminate the varying gain component from the signal.
In effect, log a[ ] + log g[ ] is converted into log a[ ]. In the last step, the logarithm is undone by using the exponential function (the anti-logarithm, or ex), producing the desired output signal, a[ ].
Figure 22-12 shows a homomorphic system for separating signals that have been convolved. An application where this has proven useful is in removing echoes from audio signals. That is, the audio signal is convolved with an impulse response consisting of a delta function plus a shifted and scaled delta function. The homomorphic transform for convolution is composed of two stages, the Fourier transform, changing the convolution into a multi-plication, followed by the logarithm, turning the multiplication into an addition. As before, the signals are then separated by linear filtering, and the homomorphic transform undone.
An interesting twist in Fig. 22-12 is that the linear filtering is dealing with frequency domain signals in the same way that time domain signals are usually processed. In other words, the time and frequency domains have been swapped from their normal use. For example, if FFT convolution were used to carry out the linear filtering stage, the "spectra" being multiplied would be in the time domain. This role reversal has given birth to a strange jargon. For instance, cepstrum (a rearrangment of spectrum) is the Fourier transform of the logarithm of the Fourier transform. Likewise, there are long-pass and short-pass filters, rather than low-pass and high-pass filters. Some authors even use Quefrency Alanysis and liftering.
Keep in mind that these are simplified descriptions of sophisticated DSP algorithms; homomorphic processing is filled with subtle details. For example, the logarithm must be able to handle both negative and positive values in the input signal, since this is a characteristic of audio signals. This requires the use of the complex logarithm, a more advanced concept than the logarithm used in everyday science and engineering. When the linear filtering is restricted to be a zero phase filter, the complex log is found by taking the simple logarithm of the absolute value of the signal. After passing through the zero phase filter, the sign of the original signal is reapplied to the filtered signal.
Another problem is aliasing that occurs when the logarithm is taken. For example, imagine digitizing a continuous sine wave. In accordance with the sampling theorem, two or more samples per cycle is sufficient. Now consider digitizing the logarithm of this continuous sine wave. The sharp corners require many more samples per cycle to capture the waveform, i.e., to prevent aliasing. The required sampling rate can easily be 100 times as great after the log, as before. Further, it doesn't matter if the logarithm is applied to the continuous signal, or to its digital representation; the result is the same. Aliasing will result unless the sampling rate is high enough to capture the sharp corners produced by the nonlinearity. The result is that audio signals may need to be sampled at 100 kHz or more, instead of only the standard 8 kHz.
Even if these details are handled, there is no guarantee that the linearized signals can be separated by the linear filter. This is because the spectra of the linearized signals can overlap, even if the spectra of the original signals do not. For instance, imagine adding two sine waves, one at 1 kHz, and one at 2 kHz. Since these signals do not overlap in the frequency domain, they can be completely separated by linear filtering. Now imagine that these two sine waves are multiplied. Using homomorphic processing, the log is taken of the combined signal, resulting in the log of one sine wave plus the log of the other sine wave. The problem is, the logarithm of a sine wave contains many harmonics. Since the harmonics from the two signals overlap, their complete separation is not possible.
In spite of these obstacles, homomorphic processing teaches an important lesson: signals should be processed in a manner consistent with how they are formed. Put another way, the first step in any DSP task is to understand how information is represented in the signals being process.