^{1}

^{1}

^{*}

^{1}

This paper proposes a voice codification based on two algorithms that make the wave form codification in time domain. The first uses the significant impulse model (SIM), which has as a goal to operate as an endpoint detector and as a dawn sampling, through the detection and selection of the significant valleys and crests; the second algorithm, is a redundant wave-form recycler (RWR) that uses an architecture based on fuzzy logic with an accumulative memory. The fuzzy algorithm obtains the similitude grade between the redundant wave forms, this with the objective of save into an knowledge base the patterns, based on the no supervised learning and when there are into memory, automatically there will be used to identified their arrive respect to the input signal, substituting the input block by the correspondent pattern into memory. This decoding process is using the SIM interpolation with a memory in accordance to the RWR.

The voice signals digitalization to each sample, has given rise to the conventional coding techniques as PCM, which is based on the scalar quantization [

The size and definition of the population of the codebook or training (updated during the measurement) has two critical parameters that determine the efficiency of a VQ [1-5]. There are several models that reduce both storage and computational load, but the problem is that those do not always match with the vector patterns of the incoming signal due to a phase shift [1-8].

This paper proposes two algorithms that change the perspective of VQ to adjust the patterns to the input vectors [3,4] for the input vectors recycling, used for redundancy. These algorithms are the SIM (significant impulse model) which prevents the phase shift and reduces the number of samples for voice modeling and RWR that achieves non-trained patterns recognition through the same signal recycling.

The speech coder first receives the signal into the RWR process in order to fit the voice signal by the MIS algorithm. The RWR makes a quantization of the signal which is presented as a vector and compared with patterns storage in the memory. The differences between them are evaluated by a fuzzy system; if the distances are great enough to be accepted as “similar” the system chooses this pattern to be used or in other hand the pattern is discarded as “similar” then the RWR calls another pattern in memory. This process is repeated until any of the following two cases occurs: find a pattern in memory “similar” to the input vector or can’t find any pattern which satisfies the conditions of similarity. For the first case the input vector is encoded using an index that identifies this pattern in the memory, making an adjustment in proportion using the coding that is performed by the SIM.

The

cient recycling.

A binary word is needed to identify the coding algorithm performed. The decoding chooses a linear interpolation for the SIM or an i-adjusted vector magnitude σ in the case of the RWR (this process is illustrated in

To use linear interpolation for decoding, the signal is modeled based on its direction and strength properties, to allow omitting pulses having the same direction and close strength. This will reduce the number of samples needed to reproduce a signal. Although the linear correlation has good enough results when using these features to find the relationship between two signals, the modeling is not always equally benefited, particularly at high frequencies, since it adds some components to the signal, having noise. The

See the comparison into the

To achieve this modeling process we use the fuzzy logic if-then rules in terms of the direction and strength or magnitude. The Equations (1) and (2) show this rules respectively.

where x_{i} is the i^{th} sample of the signal, S(x) is the signal direction and M is the direction model. Likewise, I is the vector formed by interpolation of samples skipped by M, and V_{x} is the vector formed by the omitted samples M.

The C constant determines the samples reduction degree needed into the SIM. The C adjustment is in accordance of the signal/noise ratio. So, in low noise conditions (approximately 16 dB) needs a samples reduction near of 50% and a correlation coefficient value greater than 0.99. As this way the benefits of a “down-sampling” (in terms of reducing samples) is obtained without lose the voice quality (see

Once the signal is modeled by the SIM the number of

significant changes in the signal are count for determine if a signal frame is voiced or unvoiced with similar results than a zero crossing detector.

To make a description of the RWR, we assume that the size of the frame is 11.25 ms, since this is approximately the greatest average elongation that a pattern can have [3, 10]. The pattern searching using the RWR is based on the premise of comparing the patterns having a frame of the incoming signal that is shifted one sample per iteration. However, this assumption implies a large computational loading, because the number of iterations. One solution is to increase the number of shifted samples per iteration, but this could involve problems into the pattern recognition stage. The solution proposed in this paper is to shift the frame of the signal into each significant permutation, causing the number of samples displaced in order to switch in accordance to the changes in the signal, where such changes are described by the SIM. The

The knowledge base optimized by the histogram, whichis limited in size without significant reduction of the performance pattern detection. As show in

In accordance with the conditions described above, the patterns are compared in terms of its direction and strength. The effect has previously been defined by the SIM and the difference in magnitude is defined by Equation (3). How-

ever, before any comparison it is necessary to standardize their scales obeying the Equation (4).

The comparisons were evaluated by a fuzzy Mamdani algorithm. The membership functions (MF) are sigmoidal (as shown in Figures 8 and 9), these determine the degree of the antecedent and its correspondence of every rule, since there are two rules (direction and difference) using the “and” fuzzy operator to unify. Once the antecedent is defined by a single number, the consequent is defined by aggregation of two rules: with the pattern “similar” or “different”, each of them with their respective membership function. Finally the centroid method is used in order to do the defuzzification process

[12-14].

The output of the fuzzy system requires a condition of “likeness” (see Equation (5)) to allow a more subjective comparison of the waveform.

where SC is the coded signal, SD is the output of the fuzzy system, P is the size of the displacement and L is the length of the pattern.

Finally

Using a voice signal of 30 seconds containing words in Spanish phonetics [

70% of them, depending on whether the frame is mostly voiced or mostly unvoiced. The size of the knowledge base depends of this frame. The

As in

The algorithms mentioned above are simulated with a input speech signal into the architecture with sampling frequency of 8 KHz and 8 bits of resolution. Under these conditions the compression rates range is from 0.2 to 0.125 for the SIM and between 0.7 and 0.2 for the RWR are achieved. Having a joint with a compression ratio up to 0.04 achieved. The decoded voice quality can be improved with a 3 KHz low pass filter applied to the reconstructed signal. With an acceptable quality at compression rate of 0.15 according to the MOS (Mean Opinion Score) test applied on a population of 20 people.

By this way the architecture suggests that the process for defining an input vector through a window, which is displaced in relation to the changes in the voice signal, greatly discards the similar waveforms described as patterns, due loss of their synchronization in its window, involving a more complex detection using an algorithm that forces their detection.