Underdetermined Blind Mixing Matrix Estimation Using Stwp Analysis for Speech Source Signals*

Wavelet packets decompose signals in to broader components using linear spectral bisecting. Mixing matrix is the key issue in the Blind Source Separation (BSS) literature especially in under-determined cases. In this paper, we propose a simple and novel method in Short Time Wavelet Packet (STWP) analysis to estimate blindly the mixing matrix of speech signals from noise free linear mixtures in over-complete cases. In this paper, the Laplacian model is considered in short time-wavelet packets and is applied to each histogram of packets. Expectation Maximization (EM) algorithm is used to train the model and calculate the model parameters. In our simulations, comparison with the other recent results will be computed and it is shown that our results are better than others. It is shown that complexity of computation of model is decreased and consequently the speed of convergence is increased.


Introduction
Blind source separation (BSS) using Independent Component Analysis (ICA) has attracted great deal of attention in recent years.Important applications such as speech recognition systems, speech enhancement, speech separation, wireless communication, image processing, telecommunications, and biomedical signal analysis and processing had been carried out using ICA [1,2].The main objective of ICA is to identify independent sources using only sensor observation datum which are linear mixtures of unobserved independent source signals [3][4][5].The standard formulation of ICA requires at least as many sensors as sources [4].Blind source separation is very important problem when there are more sources than sensors.In this case estimation of mixing matrix is very important issue.
Anemuller and Kollmeier have used an anechoic model for mixture signals using the standard method of maximum likelihood estimation of Fourier transformed speech signals to develop an adaptive algorithm to calculate the mixture matrix [6].Li et al have proposed a sparse decomposition approach for an observed data matrix, and then using gradient type algorithm, the basis matrix is estimated and therefore source signals are separated [7].
In recent years several approaches have been investigated to address the over-complete source separation problem.Shi et al have used a gradient based algorithm in time domain to separate source signals from two mixtures in over-complete cases [8].Rickard et al have made an assumption that windowed sources are disjointly orthogonal in time, frequency or time-frequency domains.They have applied a mask function in one of domains and have been able to separate sources from two mixtures [9,10].Bofill-Zibulevsky have proposed a geometrical mixing model using shortest path criteria and proposed an algorithm to separate the source signals [11].
Vielva et al assumed some prior knowledge about the statistical characteristics of the sources and then estimated the mixing matrix which resulted separation of the sources [12].SangGyun kim [27] used a k-means clustering method in time-frequency (TF) domain to estimating the mixing matrix.At first he obtained the STFT of any mixtures and by considering the ratio of mixture signals in TF domain he computed the regions which only one source is active.Lewicki et al. [13] provided a complete Bayasian approach assuming Laplacian source prior to estimating both the mixing matrix and the sources in the time domain.Tinati et al. proposed a comparison method for speech signal orthogonality in wavelet and time-frequency domains [14].Tinati et al. proposed a new algorithm for selecting best wavelet packet node using LMM-EM model, finally they could obtain best results about estimation of mixing matrix [15,16].They apply LMM model for speech mixture signals in wavelet packet domain using long-term Analysis.In their proposed algorithm because of long-term analysis, increasing the source number causes more errors in the estimation of mixing matrix.
In this paper we apply Laplacian model in short time wavelet packet domain and finally in most packets there are only one or less sources and then we show that complexity of computations of model is decreased and therefore speed of convergence is increased.
Because speech signals are localized in different time -frequency bins then we can find in short time windows that only one source is active in the mixing signals.Therefore using short time wavelet transform we obtain the packets which only one source is active and other sources are disabling.The histogram of phase differences is used to obtain a best wavelet packet for this purpose [15,16].In this paper there is no need to calculate mixture model, of course a preprocessing of packets histogram is done and a best packet is selected in every window (short time) for mixture modeling.
All of the above proposed algorithms and methods are combined to introduce a new and simple mixing matrix determination in over-complete cases.We will demonstrate that very promising results are obtained using examples.

Wavelet Transform
Wavelet theory has attracted much attention in signal processing, during past decades.It has been applied in a number of areas including data compression, speech processing, image processing, biomedical signal analysis, coding, transient signal analysis, and other signal processing applications.
Wavelet transform may be described in terms of representation of a signal with respect to specific family of functions that are generated by a single analyzing function [17][18][19][20].
A single wavelet function generates a family of wavelets by dilating (stretching or contracting) and translating (moving along time axis) itself over continuum of dilation and translation values.
The continuous wavelet transform (CWT) of a signal f(t) is defined as [21]: where ψ(t) plays the same role as e jωt in the Fourier transform, and is scaled and transformed version of mother wavelet, ψ a,b as: Computational complexity and redundancy are main disadvantages of the CWT.In addition, in practical applications, in particular those involving fast algorithms, the continuous wavelet transform can only be computed on a discrete grid of points, which is discretizing only 'a' or both 'a' and 'b' parameters.
Discrete wavelet transform (DWT) is commonly referred as dyadic sampling of CWT, and generally is defined as: In general any function f(t)L 2 (R) could be expanded using a set of functions, φ k (t) and ψ j,k (t) as: , , where φ(t) is called scaling function, and is determined from the following equation: It was shown that the scaling and wavelet functions are used to determine two sets of low-pass and high-pass filters which are used to decompose signal f(t) in to coarse and detail levels.This is called multi-resolution analysis.Low-pass and high-pass filter coefficients are obtained from following equations: where h 0 (k) and h 1 (k) are impulse response of low-pass and high-pass filters respectively [20].If one proceeds in both of the coarse and detail level branches of the wavelet tree then a wavelet packet decomposition tree is obtained.A full wavelet packet decomposition binary tree for two scale wavelet packet transform is shown in .

Independent Component Analysis
Independent component analysis is a statistical method expressed as a set of multidimensional observations that are combinations of unknown variables [1,2,4,22].These underlying unobserved variables are called sources and they are assumed to be statistically independent with respect to each other.The linear ICA model can be ex- pressed as: T is an M-dimentional observed vector and x i (t) is the i th mixture signal.A= [a ij ] M×N is an unknown M × N mixing matrix that operates on statistically independent unobserved variables which is defined as the following vector: where again s i (t) is the i th source signal.It is assumed that any entry of mixing matrix A has a constant value, in other words the ICA system is an LTI system.In the case of an equal number of sources and sensors, (M = N), a number of robust approaches using independent component analysis have been proposed by many researchers [22,23].In this case ICA method estimates the inverse or pseudo inverse of mixing matrixes as W. In the overcomplete source separation case (M < N), an ill-conditioned occurs, and the source separation problem has many solutions.It consists of two sub-solutions: 1) estimating the mixing matrix A and 2) estimating the source signals S(t).
Consider a case with two sensors and three sources.The mixing model is therefore expressed as: x t a s t a s t a s t x t a s t a s t a s t For simplicity, one can assume all the coefficients in one of the above equations to have unit values.This is because in the scatter plots we are actually plotting the ratios of relevant magnitudes of signals that will be used in later.

Methods
In order to increase the sparsity of signals, we use the wavelet packet decomposition (WPD) on the observed signals and wavelet packet coefficients will be used to plot the scatter-representation [24,25].This is shown in   5), the relevant histograms are shown and can be seen that much better representation here as well [24].
The phase difference of observed data measured by sensors is expressed as: where l i WP represents the i th wavelet packet in the l th level of decomposition.It is shown in Figure (5) that there are three peaks corresponding to three sources.Because of many sources in mixing signals all peaks in histogram plot have low amplitude, about lower than 1%, if we use short term analysis for mixing signals, we obtain histograms that involves less sources and finally we get higher peaks in histograms.First we will show when we use short time analysis therefore we obtain activity for a single source in the most packets.In this situation we take a windowed mixture signals and then calculate the wavelet packet decomposition for N = 4096 and N = 512, then using Equation (9) we calculate θ t and we obtain histogram for phase difference and finally we get peak value of calculated histograms, if this peak value is higher than 20% then it is assumed that in this packet we have one active source.When we have in the obtained packets lower peaks in histograms then we can say that there are many sources are active in this packet, it is shown in Figures (6), (7).In Figure (6) it is shown that when we select window length 4096, in all packets more sources are active and their peak values is lower than 10% and with window length 512 many or all of the packets contain only one source and peak value is higher than 20%.

Selection of the Best Wavelet Packet Node in Short Time Analysis
In this section we propose a new algorithm to select the best node in wavelet packet domain for each source direction, which are depicted in columns of the mixing matrix.Considering mixtures of speech signals in the time-frequency domain, it is more probable that each speech signal be localized in different time-frequency bins.Therefore, we can state that there are packets where a particular source may be localized better than the other sources, when we apply short time analysis in wavelet packet domain.Consequently dispersion of that particular source will be better and will show itself in much higher amplitude and narrower width in the histogram plots than the other sources.This presumption leads us to find a certain packet that will identify some sources better than the others.In Figure (8) we describe a new method based on short time analysis in wavelet packet domain.According to the previous sections we choose a window in time domain and then we calculate the phase differences and then histogram is obtained if maximum peak value is higher than 20% then it is suitable for calculating Laplacian model about it, unless another window is chosen and this operation is repeated over the mixing signals.Finally we sort peak values and corresponding Laplacian parameters, then best parameters are selected based on maximum peak value about any source direction.

Calculating of Laplacian Model Using EM Algorithm
The Laplacian density is usually expressed as: where θ 0 represent the center and c > 0 controls the width or variance of the density function.If parameter c is increased, narrower dispersion will result, and also this will increase the height as well.Expectation maximization algorithm is a technique used to find model parameters in such a way that the likelihood function of the model is maximized.It has been used in BSS problem in literature.Assuming T samples for θ t and Laplacian model densities as in Equation ( 10), the log-likelihood takes the following form: In order to update θ 0 and c 0 , we have to find derivatives of J(θ k ,θ 0 ,c 0 ) with respect to θ 0 and c 0 , then set them to zero.The recursive formulas for iterations are obtained as: ( 1) 0 ( ) where in above equations 'n' indicate the iteration steps.
Using these iteration formulas we are able to train the Laplacian Model and estimate the center and other parameters for each Laplacian distribution.
In the following examples we demonstrate the process of the best wavelet packet node selection algorithm of Figure ( 8) in details.
Example 1: Consider the mixing matrix to be as: Using Equations ( 9), ( 12) and ( 13) and with assumption N = 512, parameters of model are calculated up to second level of decomposition.The obtained results are shown in Table (1) and finally best estimation can be written as:    ber of iterations is about 5-10, which is much less than the other reported cases [16], [26].
Example 2: Consider the mixing matrix to be as: Again, parameters of model are calculated up to second level of decomposition with N = 512.The obtained results are shown in Table (2) and finally best estimation can be written as:

Comparison of the Proposed Algorithm with the Other Results
In this section comparison of our algorithm with the Sang Gyune Kim method [27] will be considered.For this, consider the mixing matrix to be as: The corresponding scattering plot is shown as Figure (10) and the mixture matrix using the SangGyune Kim method is shown as below with ε = 0.03 as a threshold: our proposed algorithm gives better results than Kim method.

Conclusions
In this investigation we purposed a novel algorithm for wavelet packet node selection for any source direction using short time analysis.This was performed in all packets.Therefore, a more accurate estimation of the mixing matrix is obtained.It is shown that the number of iterations is about 5-10, which is much less than the other reported cases.
Two examples with three and four source components in the mixture were undertaken for simulations.Proposed algorithm (mixing matrix estimation) is tested on them.Results indicate that we have been able to estimate the mixing matrix with high accuracy, also the comparison show that the obtained results are better than other recent reports.

Figures ( 2 ), ( 3 )
show the speech signals and their corresponding mixing signals.Every signal has 2 seconds length in time and the sampling rate of them is 16 khz.

Figure ( 4 )
Figure (4) for the mixture of speech signals shown in Figure (3).It is obvious that directions of the signals are much clearer in wavelet domain.In Figure (5), the relevant histograms are shown and can be seen that much better representation here as well[24].The phase difference of observed data measured by sensors is expressed as:

Figure 8 .
Figure 8.The proposed algorithm for calculating of mixing matrix.

2 )
'p(s i )' parameter is the maximum value in the histogram plot and 'E(s i )' is the mixing matrix parameter corresponding to i th speech source signal.Learning curve and estimated Laplacian Model for sources are shown in Figure(9).It is shown that the num

Figure10.
Figure10.The scatter plot of mixture signals in STFT domain using Kim algorithm.