Wavelet-based Ecg Data Compression Optimization with Genetic Algorithm

Copyright © 2013 Tsung-Ching Wu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ABSTRACT With a direct impact on compression performance, optimal quantization scheme is crucial for transform-based ECG data compression. However, traditional optimization schemes derived with signal adaption are commonly inherent with signal dependency and unsuitable for real-time application. In this paper, the variety of arrhythmia ECG signal is utilized for optimizing the quantization scheme of wavelet-based ECG data compression based on a genetic algorithm (GA). The GA search can induce a stationary relationship among the quantization scales of multi-resolution levels. The stationary property facilitates the control of multi-level quantization scales with a single variable. For this aim, a three-dimensional (3-D) curve fitting technique is applied for deriving a quan-tization scheme with linear distortion characteristic. The linear distortion property can be almost independent of ECG signals and provide fast error control. The compression performance and convergence speed of reconstruction quality maintenance are also evaluated by using the MIT-BIH arrhythmia database .


INTRODUCTION
Electrocardiogram (ECG) is a non-invasive modality that senses electric action of heart motion from body surface.Being a three-dimensional (3-D) organ, heart disease diagnosis usually needs the use of multiple ECG signals sensed from different positions around the heart.Typical requirement is a record of 12-lead ECG signals [1].ECG data compression technique is crucial for the transmission and long-term storage of mass ECG signals, e.g., ambulatory monitoring and telemedicine [2,3].This technique can also benefit portable ECG recording due to low power consumption.
According to processed data, ECG data compression methods can be generally categorized into time-, transform-domain, and signal adaptation groups [4][5][6].Signal adaptation methods based on a pre-process are unsuitable for real-time applications and time-domain methods are usually sensitive to the interference of high frequency components.With high compression performance and real-time processing capability, transform-domain methods have attracted much attention of researchers recently, especially for discrete cosine transform (DCT) [7] and discrete wavelet transform (DWT) [8] due to their excellent compression performance.
Transform-domain methods achieve data compression in terms of a non-uniform quantization scheme that uses different quantization scales for different frequency components.Quantization scheme will strongly impact on compression performance due to irreversible processing.For this sake, optimizing the compromise between compression ratio (CR) and distortion is crucial.Optimization schemes generally can be partitioned into filter selection and quantization scale design groups.With lattice parameterization, Nielsen et al. [9] proposed a signalbased optimization process that found optimal mother wavelet with minimal distortion rates for some fixed CRs.He and Mitra [10] designed optimal quantization error feedback filter by minimizing synthesis filtering error.Filter selection was also concerned for 3D signal compression [11].For quantizing DCT coefficients, Batista et al. [12] determined threshold and quantization vectors by minimizing the cost function J defined as a linear combination of entropy and distortion measure.Blanco-Velasco et al. [13] built a nearly perfect reconstruction cosine modulated filter bank and determined threshold value with reconstruction quality guaranteed.
However, optimization schemes commonly need a

OPEN ACCESS
pre-process for signal adaptation.This requirement is disadvantageous for real-time application.A less attractive approach was the set partitioning in hierarchical trees (SPIHT) algorithm [14] that combining quantization and coding can provide wavelet-based ECG data compressing with selectable bit rate.SPIHT scheme also has good compression performance.Considering the maintenance of reconstruction quality, an approach combining non-recursive discrete periodized wavelet transform (NRDPWT) and the reversible round-off linear transformation (RROLT) theorem was proposed [15].The RRO-NRDPWT-based approach with the capability of error propagation resistance and octave coefficient normalization is efficient for reconstruction error control.
In this paper, a new quantization scheme that combines genetic algorithm (GA) and linear distortion program is presented for the RRO-NRDPWT-based ECG data compression.By using a test dataset involving 11 arrhythmia ECG signals, optimal quantization scales for a desired range is first found with a GA search.Using percentage root mean square difference (PRD) as the distortion measure, GA search with the criterion of minimum PRD/CR can induce a stationary relationship among multi-level quantization scales.This property implies that multi-level quantization scales can be controlled with single variable.Conducted by this hypothesis, a 3-D polynomial curve fitting technique is then applied for linear distortion program with respective to a generation variable QF.Following the use of MIT-BIH arrhythmia database [16], linear distortion characteristic of the new quantization scheme, which benefits fast convergence in reconstruction quality maintenance, is evaluated.The experimental results of using untrained ECG signals also show that the GA-based quantization scheme can be independent of training data and improve average compression performance by 18.52% in comparing with the SPIHT scheme [14].

QUANTIZATION SCHEME OF RRO-NRDPWT-BASED ECG DATA COMPRESSION
RRO-NRDPWT is an efficient DWT process developed for easily controlling reconstruction error with minimum-word-length fixed-point computation.The fundamental architecture of RRO-NRDPWT-based ECG data compression system consists of three processes; i.e., RRO-NRDPWT, quantization, and lossless SPIHT encoding.The quantization scheme with single variable for controlling quantization scales of octave frequency bands can be independent of signals and suitable for real-time application.A brief review of the system is given in this section.
Let S J denote a row vector involving N-point sampled ECG data with where is referred to as the decomposition level.For , the nonrecursive channel filter for the jth level decomposition can be defined with a In j A , the filter coefficients of two adjacent columns have a 2 j J shift relationship in the vertical direction.By integrating the channel filters of all decomposition levels, a N N  filter-band matrix can be constructed, where 0 is a column vector consisting of constant elements.Using matrix A, the 1-D RRO-NRD-PWT can be represented with where denotes to round all the elements of vector X into the nearest integers, max is the significance normalization factor of A, X 「 」 j d is a row vector consisting of 2 j  integer wavelet coefficients of the jth level, and  and is a unitary matrix where denote the inverse and transport of A, respectively.

A
The quantization scheme with generation variable QF can be defined by where     X denotes to truncate the elements of vector X into integers and is the normalized quantization scale of the jth level defined by For inverse quantization, each retrieved datum will be compensated by half of the quantization scale, namely, where  j d is the jth sub-band vector of and  d   sign X denotes the sign vector of X, e.g., sing will be encoded with the differential pulse code modulation (DPCM) and lossless SPIHT scheme, respectively.

OPTIMAL QUANTIZATION SCHEME DESIGN BASED ON GENETIC ALGORITHM
GA invented by J. Holland [17] is a global searching method commonly used for finding optimal solutions of multi-variable non-linear system.This method achieves global searching based on uniformly distributed population [18,19]  The first stage applies GA to find four sets of optimal quantization scales where each set involves 11 level quantization scales, i.e., .Each set of multi-level quantization scales acts as a seed that will result in a PRD value for each signal during range [0.5%, 7%].The four seeds corresponding to four specified (i.e., ) will be found with minimum PRD/CR being the selection criterion where CR = (bits of compressed signal)/(bits of originalsignal).The compressed data file of each signal consists of a QF value and quantized ECG data.The former was encoded with the DPCM method and the latter was encoded with lossless SPIHT scheme.The GA-based searching process is described in the following: Crossover: 1) The choice of {cp DC ( ),•••, cp j ( )} or { cp j−1 ( ),•••, cp −8 ( )} for crossover is random where the value of j is also randomly chosen and the choosing probability is defined by 0.5.
2) Crossover processing number for each iterationis defined as 40. Mutation: 1) Mutation process is defined as exchanging the values of cp i ( ) and cp j ( ) when they are selected.
2) Select the set with maximum cp −8 ( ) and terminate the iteration.
In the algorithm, a large mutation probability (i.e., 0.3) is selected for the desire of fast convergence.As shown in Figure 1, the solutions of ECG signal 101 have the trend of for i .In fact, this trend is also true for all ECG signals.This trend implies that for a specific ( ) , GA search can always end at a globally optimal solution that selects one set of multi-level quantization scales   with minimum PRD/CR.This result cannot be influenced by mutation probability.For the training dataset, 44 values for each j will be found.Figure 2  ,0  shows the 44 values of . 7 The GA solutions also imply that the criterion minimum PRD/CR can induce a stationary relationship among multi-level quantization scales, i.e., , respectively.For simplicity, the generation function is approximated with where QF being the control variable is defined with    growth in high CR region.This improvement can result in NR-DPWT-GA with more linear distortion behavior, especially for high CR case.By using 48 arrhythmia ECG signals, three wavelet based approaches are compared in Figure 5 where NRDPWT-GA with optimal quantization scale design can obtain the best compression performance in both low and high CR regions.In comparing with SPIHT scheme, the compression performance can be improved by 12.26(%) and 10.13% for the two ranges 4 ≤ CR ≤ 12 and 14 ≤ CR ≤ 20, respectively.

QF
Generally, data dependency can be an intrinsic property of GA-based approach.This property can lead to instable result in practical applications.For studying the data dependency effect of GA-based quantization scheme, a second training dataset comprising 8 ECG signals randomly selected from MIT-BIH ST change database was build.These signals were the records 301, 305, 310, 314, 315, 317, 325 and 326 with each involving 10-min length sampled data.By applying the same training process of GA and 3-D curve fitting described in Section 3, a second   GA-based quantization scheme referred to as NRDPWT-GAst is derived in Figure 6 where the robustness of exponential-growth resistance in high CR region is also obvious.For a comparison of the two GA-based quantization schemes, all the signals saved in arrhythmia database and ST change database were used.The former and later comprise 48 and 28 ECG signals, respectively.Each signal involves 15-min length sampled data.The evaluation results were shown in Figures 7-9 where both the two schemes can obtain approximately linear distortion results with very minor difference.The comparison implies that data dependency effect of GA-based quantization scheme can be almost neglected in practice.Figure 8 shows that NRDPWT-GAar using arrhythmia signal is     112, 115, 116,117, 118, 119, 201, 207, 208, 209, 212, 213, 214, 228, 231, and 232) was also taken.Each signal contains about 1-min length sampled data.Three waveletbased approaches were compared in Figure 9 where NRDPWT-GAar also can obtain the best compression performance.In comparing with SPIHT scheme, the compression performance can be improved by 6.19(%) and 27.85% for the two ranges 4 ≤ CR ≤ 12 and 14 ≤ CR ≤ 20, respectively.
Quantization scheme with linear distortion characteristic can not only obtain high compression performance, but also facilitate in reconstruction quality maintenance that can be fulfilled with a close-loop error control process [20].For exploring the quality maintenance performance of NRDPWT-GAar, the three ECG signals, records 109, 117, and 232, were used.From is the fault tolerance of error control.Record 109 has a waveform with baseline wandering and slightly noise coupling.Figure 10 shows that a stable low bit rate (CR ≈ 20) can be obtained.The desired QF can be also stable and convergence speed is very fast, even for a suddenly violent rate change.The iteration times (ITs) of error control process are distributed in the dynamic range [1,3].The two segment demonstrations of Figures 10(e) and (f) show that the clinical information including the amplitude and duration can be preserved well.Record 117 is a nice waveform ECG signal.Figure 10 shows that for violent QF changes, the convergence speed can be still fast.The dynamic range of ITs is [1,4].The reconstructed signal with very low rate (CR ≈ 24) has slightly distortion at the boundary of segments.This can be overcome by reducing PRD T .As shown in Figure 12, record 232 is a distorted and noisy waveform ECG signal.Though both rate and QF are violently changed, the dynamic range of ITs can be maintained in [1,4].This signal has poor PRD due to the smoothing effect of quantization process, but the reconstruction error is almost unobservable.for wavelet-based ECG data compression.This quantization scheme can be easily controlled with signal variable and facilitates the issue of reconstruction quality maintenance.

Figure 1 .
Figure 1.The GA-based cp j ( ) of the ECG signal 101 for four specified.

Figure 2 .
Figure 2. Curve fitting result of cp −7 (QF) by using the training dataset.
variable.Based on the hypothesis, the second stage is to create a applying 3-D curve fitting technique.The 3-D coordinate system consists of three real number axes defined with PRD, CR, and   j cp

Figure 2 Figure 3 .
Figure 2 illustrates the curve fitting result of .The quantization scales of  7 cp QF 

Figure 4
shows the average PRD-QF and CR-QF curves of NRDPWT-GA, both are approximately linear.In comparing with the local optimal quantization scheme NR-DPWT-6t, [2] the NRDPWT-GA has smoother   j cp QF  curve in high CR region.On the other hand, the NRDPWT-6t has an exponential j cp

Figure 3 .
Figure 3.The cp j (QF) values of the GA-based quantization scheme.

Figure 4 .
Figure 4. Compression performance of the GA-based quantization scheme for the 48 ECG signals in MIT arrhythmia database.

Figure 5 .
Figure 5. Compression performance comparison of three wavelet-based approaches.

Figure 7 .
Figure 7. Compression performances of two GA-based quantization schemes by using MIT-BIH arrhythmia database.

Figure 8 .
Figure 8. Compression performances of two GA-based quantization schemes by using MIT-BIH ST change database.

Figure 4 ,
an approximately linear relationship between PRD and QF can be derived by LSE method as 0.417 0.07717 0.4138 PRD QF QF    .Applying this relationship into the linear QF prediction model used in the close-loop error control process can obtain the ECG data compression results shown in Figures 10-12, respectively.The three figures only demonstrated the first 316 segments of each file where PRD T denotes the target PRD and For wavelet-based ECG data compression, the problem of quantization scheme optimization was studied by using two kinds of ECG signals and a GA search in this paper.The experimental results showed that GA-based quantization scheme can be inherent with the property of signal independency.Using a training set involving more