Speech Enhancement with Geometric Advent of Spectral Subtraction using Connected Time-Frequency Regions Noise Estimation

Speech enhancement with Geometric Advent of Spectral subtraction using connected time-frequency regions noise estimation aims to de-noise or reduce background noise from the noisy speech for better quality, pleasantness and improved intelligibility. Numerous enhancement methods are proposed including spectral subtraction, subspace, statistical with different noise estimations. The traditional spectral subtraction techniques are reasonably simple to implement and suffer from musical noise. This study addresses the new approach for speech enhancement which has minimized the insufficiencies in traditional spectral subtraction algorithms using MCRA. This approach with noise estimation has been evolved with PESQ, the ITU-T standard; Frequency weighted segmental SNR and weighted spectral slope. The analysis shows that Geometric approach with time-frequency connected regions has improved results than old-fashioned spectral subtraction algorithms. The normal hearing tests has suggested that new approach has lower audible musical noise.


INTRODUCTION
The fundamental objective behind speech enhancement is to remove or reduce background noise.The background noise removal has a number of applications like using telephone in noisy environments including streets, public places etc. all these applications demand to reduce noise for normal hearing aids and improved quality.The spectral subtraction for speech enhancement with geometric approach (Yang and Philipos, 2008) is used with Time-Frequency connected regions (Karsten and Søren, 2005) noise estimation algorithm.Our aim is to test approach with different noise estimation algorithms (Martin, 2001) and compare results with other present methods to select appropriate estimation algorithms.The spectral subtraction technique (Loizou, 2007;Boll, 1979) works on very simple principle by assuming additive noise.The estimation algorithms estimate novel and Noisy speech spectra and subtract noise estimated spectrum form clean spectrum.The estimation of noise spectrum is computed in periods where signals are not present.If estimated signal spectrum is passed through inverse discrete Fourier transform which utilize phase of noisy signal, we obtain enhanced speech.The subtraction procedure has to be performed sensibly in order to sidestep the signal distortion.In case of over subtraction, major portion of speech is also subtracted and for under subtraction, small portion of noise still interfere the signal.The Fig. 1 shows the block diagram of spectral subtraction algorithm.Many algorithms are developed with different solutions, some suggests over subtraction (Berouti et al., 1979), some came with suggestion that speech spectrum is divided into continuous frequency bins and apply non-linear methods in bins (Kamath and Loizou, 2002) and some suggested psychoacoustical methods (Virag, 1999).The spectral subtraction algorithm is easy to implement for effective use to eliminate the background noise but still there are major weaknesses in this approach.Among those shortcomings one is the introduction of musical noise (Berouti et al., 1979).The estimated spectrum may contain some negative values which occur due to wrong estimation.One way is to use non-linear process to remove these errors by setting all those negative values to zero in order to guarantee the non-negative magnitude spectrum.But by doing so small random isolated peaks are generated in spectrum.These peaks sound like tones in time-domain which continuously changing frame wise.These newly generated tones are called musical noise.The spectral subtraction equations are derived on some norms which assume that cross terms are zero because of un-correlation nature of speech and interrupting noise.And this assumption is valid as speech and noise are statistically independent of each other means there is no correlation among them.As a result it is concluded that these equations are estimated not particular ones.But in Geometric approach with time-frequency connected region noise estimation, the equations for estimation of noise become non-negative and as result gain function will always be positive.

SPECTRAL SUBTRACTION MATHEMATICAL ANALYSIS
Consider s (n) is novel speech and e (n) is error signal (noise) and y (n) is noisy signal contains clean and error signal: By taking STFT of y (n), the resultant frequencydomain equation is: ω n = 2 πn/N, where n = 0, 1, 2 and 3..... N-1 and N represents frame length.For short-term power spectrum of noisy speech computation, the Y (jω n ) is multiplied with its conjugate that is Y*(jω n ): The terms |E (jω n )| 2 , S(jω n ).E*(jω n ) and S*(jω n ).E(jω n ) are estimated with expectation operator E{.}asE{|E (jω n )| 2 }, E{S(jω n ).E*(jω n )} and E{ S*(jω n ).E(jω n )}.Now consider that e (n) is zero and there is no correlation with the novel signal s (n), the above terms will reduce to zero and equation for novel speech estimation will: (5) The gain or suppression function can be calculated from Eq. ( 4) as: Equation ( 5) becomes: By neglecting the cross terms in equation ( 4), H (jω n ) will always be positive with range 0 ≤ H (jω n ) ≤ 1.The cross terms can be computed from Eq. ( 4) as: The term |S (jω n ) | 2 + |E (jω n ) | 2 are replaced with Y′ (jω n ) and then equation becomes: When the cross terms are neglected, the resultant error is: The cross term error shows that actual noise spectrum estimation is not fulfilled that needs to be estimated which results in random tones.

GEOMETRIC SPECTRAL SUBTRACTION
The noisy spectrum Y (jω n ) at frequency ω n is computed by the summation of two complex valued spectra.These spectra are now represented in complex geometrical plane where the Y (jω n ) is the sum of two complex spectra S (jω n ) and E (jω n ) respectively.Representation of the complex values is sketched in Fig. 2.
In traditional spectral subtraction the cross terms are assumed to be zero for computing gain function but a new gain function is now computed without any assumption by transforming the Eq. ( 1 The new gain function can be calculated from Fig. 3 as: This gain function is always positive, that is, H G ≥0.The block diagram of the Geometric approach of spectral subtraction for enhanced speech is sketched in Fig. 4.

BACKGROUND NOISE ESTIMATION
The key objective of speech enhancement is to eliminate or reduce the background noise by estimating noise.All speech enhancement algorithms normally use estimation methods for this purpose.If the background noise is progressing gently along with speech, its estimation is easy in pause periods of speech but if there is rapid noise growing, estimation becomes more difficult.Some of the estimation algorithms are discussed in this section including MCRA (Bernard et al., 2005) and frequency connected regions MCRA: MCRA was introduced to approximate nonstationary background noise.The noise approximation in this algorithm is updated by utilizing averaging of previous spectral values of noisy spectrum which is measured by time and frequency dependent smoothing elements.The smoothing elements are computed on the basis of signal presence probability in frequency band and the probability is computed by utilizing the ratio of noisy speech spectrum to minimum evaluated over a fixed time.The estimation of noise spectrum from signal presence probability is computed on basis of following supposition: S P : Y (j, k) = S(j, k) + E(j, k) (20) j represents frame No and k shows the frequency bin No. where S A and S P shows supposition of speech absence and presence respectively.This algorithm for noise estimation utilizes progressive recursive averaging given as: (21) β n is the smoothing element having range of 0 ≤β n ≤1 and Ψ n (j, k) shows the amplitude power spectrum of noise computed by expectation operator E {.}.The speech presence probability can be computed from following equations: P (j, k+1) = β P p (j, k) + (1 + β P ) When S(j,k)/S min (j, k) When S (j, k)/S min (j, k) ζ represents threshold level of presence while S (j, k)/S min (j, k) shows ratio of noisy spectrum to its local minimum.

CONNECTED TIME-FREQUENCY REGIONS NOISE ESTIMATION
The block diagram for the connected timefrequency region noise estimation algorithm is shown in Fig. 5.After windowing speech, STFT is applied to compute periodogram of noisy speech, that is, P Y (j, k) = |Y (j, k)| 2 .After computing periodograms, they are under process of smoothing.The smoothed periodograms are temporally minimum tracked and are used for purpose of speech presence detection.This detection is utilized to attain low biased noise PSD estimates P′ E (j, k) and for noise periodogram estimates P E (j, k) which is equal to P Y (j, k) in speech absence condition.But if speech is present, noise periodogram estimate is equal to noise PSD estimation.In later case, recursive smoothed bias compensation parameter is put on minimum tracked values.The bias compensation factor is updated during absence of speech in frames while remain unchanged during speech presence.The noise magnitude periodogram estimation |E (j, k)| is computed from noise PSD estimation and on basis of these information, decision of speech presence is made and used in speech enhancement algorithm.The noisy speech periodograms P Y (j, k) are spectrally smoothed.The P Y (j, k) bands are composed of weighted sum of 2N+1 band.The spectral smoothing equation is: (k-i) K represents the modulus K and K shows complete spectrum length.The windowing function b (i) is used for spectral weighting which sums to 1, that is, ∑ 1 .The spectrally smoothed periodograms are temporally smoothed recursively with timefrequency changing smoothing factor ξ (j, k) to create the temporally spectrally smoothed periodogram P (j, k): The temporal minimum values P min (j,k) are computed from P(j,k) by tracing within minimum search window have length W min : The P min (j, k) tracks are utilized in speech presence.The speech presence results in increase of power in temporally smoothed spectrum because of the additive noise at particular time-frequency regions.As a result ratio of temporally smoothed spectrum to noise PSD estimate becomes more robust to estimate the SNR and noise-to-noise ratio at specific time-frequency regions.The smoothing phenomenon ensures the speech presence detection even in conditions where noisy speech power is unstable.As a result, connected speech presence and absence regions can be achieved.
Here we have computed two different noise estimations; one is noise PSD estimation and second is noise spectrum estimation.The PSD estimation is used in speech enhancement algorithm while noise spectrum estimation shows the properties of residual noise from speech enhancement algorithm.The speech enhancement algorithm for this noise estimation is spectral subtraction with geometric approach.

EXPERIMENTAL SETUP AND EVALUATIONS
The NOIZEUS (Hu and Loizou, 2007) a noisy speech database is developed to evaluate speech enhancement algorithms.Three real world noise environments including Airport, exhibition hall and street are considered at different noise levels ranging from 0dB to 15dB.The speech enhancement algorithm with noise estimation is evaluated with PESQ (ITU, 2000;Bernard et al., 2005;Jianfen et al., 2009), FwSNRseg (Bernard et al., 2005;Jianfen et al., 2009) and WSS (Bernard et al., 2005;Jianfen et al., 2009).PESQ is ITU-T standard to evaluate the perceptual quality of enhanced speech.Basically the PESQ estimates the Mean Opinion Score (Loizou, 2007) from novel and degraded speech signals.PESQ (Table 1 and  2) is rated with following five-point scale.
Frequency weighted segmental SNR is one of variant of SNR which is weighted SNRseg within the particular frequency bin which is related to the critical bin.Noise in certain frequency bins is less harmful than in other bins of input speech signal.Higher the FwSNRseg (Table 3) value better is quality.Weighted Spectral slope (Table 4) measures the distance in  Figure 6 shows the time-domain waveform analysis of novel, noisy speech and enhanced speech with two different noise estimations.Similarly the Fig. 7 shows the spectrum of clean, noisy and enhanced speech signals.

CONCLUSION
In present study Geometric approach of spectral subtraction is implemented with time-frequency connected speech region noise estimation.In Contrast with conventional power spectral subtraction method, where the cross terms are assumed to be zero which results in cross terms errors, a new gain function in Geometrical approach has been developed which will always be positive and real.This GA method is supported by TFCR noise estimation for noise spectrum estimation purpose, whose temporal minimum values are computed from smoothed periodogram which are further utilized in speech presence activity.This combination of spectral subtraction with noise estimation is evaluated for quality in three different real world noisy environments with PESQ, FwSNRseg and WSS.The experimental results show that timefrequency noise estimation when combined with GA approach performs better than minimum controlled recursive averaging noise estimation.

Fig. 4 :
Fig. 4: Block diagram of spectral subtraction with geometric approach

Fig. 6 :
Fig. 6: Time-domain waveforms for clean speech (BLUE), exhibition Hall 0dB noisy speech (RED), TFCR noise estimation enhanced speech and MCRA noise estimation enhanced speech spectral domain.It is based on the comparison of smoothed and distorted spectra of novel and degraded speech signal respectively.The smaller distance measured means better quality of speech and vice versa.Figure6shows the time-domain waveform analysis of novel, noisy speech and enhanced speech with two different noise estimations.Similarly the Fig.7shows

Fig. 7 :
Fig. 7: Spectra of novel, noisy and processed speech by TFCR and MCRA a E } are the magnitudes where {φ Y φ s φ E }are the phase angles for novel and noisy speech respectively.From Fig.2, if we solve the triangle in Fig.3with the help of laws of sines, we can obtain a new suppression or gain function which will always be positive and real.By solving the above triangle WRX, following equations are computed which represents that WR is perpendicular to XR: S