Audio Watermarking Using Wavelet Transform and Genetic Algorithm for Realizing High Tolerance to Mp3 Compression

Recently, several digital watermarking techniques have been proposed for hiding data in the frequency domain of audio signals to protect the copyrights. However, little attention has been given to the optimal position in the frequency domain for embedding watermarks. In general, there is a tradeoff between the quality of the watermarked audio and the tolerance of watermarks to signal processing methods, such as compression. In the present study, a watermarking method developed for a visual image by using a wavelet transform was applied to an audio clip. We also improved the performance of both the quality of the watermarked audio and the extraction of watermarks after compression by the MP3 technique. To accomplish this, we created a multipurpose optimization problem for deciding the positions of watermarks in the frequency domain and obtaining a near-optimum solution. The near-optimum solution is obtained by using a genetic algorithm. The experimental results show that the proposed method generates watermarked audios of good quality and high tolerance to MP3 compression. In addition, the security was improved by using the characteristic secret key to embed and extract the watermark information.


Introduction
Recent progress in digital media and digital distribution systems, such as the Internet and cellular phones, has enabled us to easily access, copy, and modify digital content, such as electric documents, images, audio, and video.Under these circumstances, techniques to protect the copyrights of digital data and to prevent unauthorized duplication or tampering of these data are strongly desired.
Digital watermarking (DW) is a promising method for the copyright protection of digital data.Several studies have investigated audio DW [1][2][3][4][5][6][7][8][9][10][11][12].Currently, digital audio clips distributed over the Internet or cellular phone systems are often modified by compression, which is one of the easiest and most effective ways to overcome DW without significantly deteriorating the quality of the audio.Two important properties in audio DW are the inaudibility of the distortion due to DW, and the robustness against signal processing methods, such as compression.In addition to these properties, the data rate and the complexity of DW have attracted attention when discussing the performance of DW.
We developed a method in which 1) a digital watermark can be sufficiently extracted from watermarked audio, even after compression, and 2) the quality of the audio remains high after embedding the digital watermark.However, there generally is a trade-off relation between these two properties.
In the present study, we improved both the extraction of digital watermarks and the quality of the watermarked audio by developing a multipurpose optimization problem for deciding the positions of digital watermarks in the frequency domain and obtaining a near-optimum solution by using a discrete wavelet transform (DWT) and a genetic algorithm (GA) [13,14] for realizing high tolerance to compression by MP3, which is the most popular compression technique.The proposed method enables us to embed digital watermarks in a near-optimum manner for each audio file.In addition, the security of the watermarked audio is improved by using a characteristic secret key to embed and extract digital watermarks.

Wavelet Transform
Original audio data (0)   k s is used as the level-0 wavelet decomposition coefficient sequence, where denotes the element number in the data.The data is decomposed into the multi-resolution representation (MRR) and the coarsest approximation by repeatedly applying a DWT.The wavelet decomposition coefficient sequence k ( ) j k s at level is decomposed into two wavelet decomposition coefficient sequences at level by ( 1) and (2): ( 1) ( ) 2 where 2 n k and 2 n k denote the scaling and wavelet sequences, respectively, and denotes the development coefficient at level The development coefficients at level J are obtained by using ( 1) and ( 2) iteratively from to 0  j 1 j J   .Figure 1 shows the process of a multi-resolution analysis by DWT.The signal is re-composed by using (3) repeatedly from to .
In the present study, we use the Daubechies wavelet for DWT.As a result, we obtain the following relation between and :

Wavelet Domain Digital Watermarking Based on Threshold-Variable Decision
It is known that the histogram of the wavelet coefficients of each domain of MRR sequences has a distribution that is centered at approximately 0 when DWT is performed on a natural visual image [15].For an audio clip, we also found the same phenomena.Figure 2 shows an example of an audio histogram.
In the present research, the technique [15] for exploiting the above phenomena on a natural image for embedding a digital watermark on the wavelet coefficients of MRR sequences is applied to audio DW.The procedure is described below.

Setting of Parameters
For the watermarking of an audio clip, we obtain the histogram of the wavelet coefficients at the selected level of MRR sequences.Figure 3 shows a schematic diagram of the histogram of the wavelet coefficients of an MRR sequence.As with the DW techniques for images [15,16], we set the following watermarking parameters:

V V
The values of and (see Figure 3) are chosen such that the non-positive wavelet coefficients ( m in total frequency) are equally divided into two groups by , and the positive wavelet coefficients ( (minus) Th  wavelet coefficients in (  .In short, (plus), 4) Th T 3 T S  .S S are set to 0.2, which was determined experimentally.

Embedment of Watermark Information
The wavelet coefficients of MRR are rewritten according to the following rules in embedding digital watermark.
Here, denotes one of the wavelet coefficients.
i 1) In the case that bit in watermark W is 0, The wavelet coefficient i V is set in the range of when bit i in watermark W is 0, whereas the DWT coefficient is set in the range of or V when bit in watermark W is 1.The frequency of the change of i toward the inside is expected to be approximately equal to the change toward the outside when the number of 0 bits is approximately the same as the number of 1 bits.

Generation of Watermarked Audio
The inverse DWT (IDWT) is performed to wavelet coefficients embedded with the watermark to obtain the audio with the watermark.i V 

Presumption of Parameters
The watermarked audio, which may be modified by signal processing methods, such as MP3 compression, is converted into wavelet coefficients.The wavelet coefficient in the region where the watermark information is embedded is denoted as V  .
For the histogram of The watermarked audio can undergo certain types of audio processing, including compression, such that the difference between the distribution of the histogram of V  after audio processing and that of V before embed- ding the watermark is not negligible.In such a case, it may not be persuasive that and (minus) Th (plus) Th can be used as presumptive values for and , respectively.

Detection of Watermark Information When the wavelet coefficient
, the corresponding bit i W  in measured watermark is judged to be 0. When the DWT coefficient is in the range of , the corresponding bit  in the measured watermark is judged to be 1.

 W
The detection rate (%) is defined as the percentage of correspondence between the bit i in watermark W and the corresponding bit in measured watermark

Use of the Secret Key
When we embed a digital watermark by using the partial problem described in Section 5, the watermark is produced by using a secret key ( ) S  , which is composed of a row of  integers randomly selected once or less per integer in the integer range from 1 to , as shown in the example below , where  is the number of bits of the watermark and is the total number of wavelet coefficients that are candidates embedded with DW.
(400) (271, 72, 39, 990, 524, 88, , 1011, 688, 312) S   (5) Each value and order of numbers in ( ) S  indicate the position of each bit of the digital watermark in the DW region.Here, the position in the region is expressed as a one-dimensional coordinate.For example, the first number, 271, and the second number, 72, in mean that the first and second bits of the watermark are set for the wavelet coefficients at the coordinates of 271 and 72 in the DW region, respectively.
(400) S  (4) (271, 72, 39, 990 The coordinate shift is performed by generating ( )  such that the shift value is added to all values of the elements in ( ) S  .For example, it is assumed that the shift value is 10.As a result, The DW positions of the wavelet coefficients decided by secret key described in (7) are simply demonstrated in Figure 5.
where , i i y y denote the values of the -th sound data before and after embedding the digital watermark, respectively; denotes the total number of wavelet coefficients at the DWT level selected for embedment; are constants; and x is a 0 -1 variable that decides the embedment of the watermark on the corresponding wavelet coefficient, where 1 denotes an embedment and 0 denotes a non-embedment.

Optimal Watermarking Problem
Because our approach of DW optimization is on the first and challenging stage, we formulate the problem in a simple way on the viewpoint of optimization problem.Therefore, in the present study, we formulate the optimization problem as minimization for distortion.We can also formulate the problem using the constraint of keeping distortion less than the masking threshold.Such more elaborate approach is our next target.
When the number of wavelet coefficients that are possible targets for digital watermark embedment becomes larger, the solution space of becomes larger, with the result that a search for an optimal or near-optimal solution is time-consuming or difficult.Accordingly, we define the partial problem as follows:  2 1 e( )  where s is an integer variable ranging from 0 to 1 N  ; i is a 0 -1 constant that decides the digital watermark embedment on the corresponding wavelet coefficient, where 1 denotes an embedment and 0 denotes a nonembedment, and i are the same as those described for .We prepare a random initial pattern for c , , , , , For getting the detection rate , we use the watermark, the wavelet transformation level, and the (near-)optimum solution for or as input data to the decoder for the proposed method.

P P
In the present study, we use DWT.However other optimal watermarking problems can be formulated using other transforms as shown in our previous study [17,18] where discrete Fourier transform and a watermarking method proposed in the reported study [19] were used.

GA Approach
GA is one of the most acknowledged methods for nearoptimization.We presume that and might have many locally optimum solutions.According to our experiences, GA was fairly effective for searching the acceptable near-optimum solutions for many discrete optimization problems even when many locally optimum solutions could exist.We use GA in the present study because we have much more successful experiences on GA application to optimization problems than those on other methods.Other acknowledged techniques such as tabu search [20] for solving discrete optimization problems could be other options for solving and The GA, which is based on biological evolution, has been applied for solving the optimization problem.The solution of the optimization problem is expressed as a genotype.In each generation, there is a population composed of several individuals identified by their genotypes.The basic idea of GA is that if the number of better individuals is increased by generation updating, an optimum or approximately optimum solution, as expressed by an individual, will eventually be obtained.Chromosome composed of genes is string specifying an individual.For the generation updating, crossover and mutation are performed.The crossover takes two parent strings and generates two offspring strings.The mutation changes se-lected strings in a random way.In the references [13,14], the GA is explained in detail.
In this section, we explain our approach for obtaining near-optimum solutions for and by using a GA.P P

Coding
The GA coding for Experiment 1, described below, is performed as follows.
A gene is expressed by a bit of value 0 or 1. Accordingly, each chromosome is composed of a row of bits.The total number of bits is , in which bits of the higher ranks are associated with the level of DWT and m n subordinate bits are associated with the shift value s , described for in the binary expression (Figure 6).

P
When an individual associated with a level that actually does not exist in a list of levels used for DWT is generated in the GA process, the individual is judged to have a fatal gene and is deleted, and a new individual is generated.
The GA coding for Experiment 2, described below, is performed as follows.
A gene is expressed by a bit of value 0 or 1 for .Accordingly, each chromosome is composed of a row of bits.The total number of bits is .Each bit is assigned to each DWT coefficient for possible embedment of the watermark (Figure 7).The value of 1 for a bit means that the corresponding DWT coefficient is selected as an object of watermarking, whereas 0 denotes non-embedment for the corresponding DWT coefficient.

P k
A gene is also expressed by a bit of value 0 or 1 for P .Accordingly, each chromosome is composed of a row of bits.The total number of bits is .The chromosome expresses shift value l s described in P in the binary expression (Figure 8).

Strategy
For , a one-point crossover and a mutation by the exchange(s) of pairs of 0 and 1 on a chromosome are used, while for a two-point crossover and a one-point mutation are used.The fitness function is used for both P and , where d, a, and e were introduced in Section 5.

Numerical Experiments
In this section, we describe our computer experiments and the results for evaluating the performance of the proposed method.

Method
The experiment was performed in the following computational environment: the personal computer was a Dell Dimension DXC051 (CPU: Pentium IV 3.0 GHz; main memory: 1.0 GB); the OS was Microsoft Windows XP; the development language was Microsoft Visual C++ 6.0.
For DWT, we use Daubechies wavelets, which were successfully used in related research on DW techniques for images [16].Moreover, a string composed of 400 randomly generated bits is used as the watermark information.
Five music audio files, composed of the first entry in five genre categories (classical, jazz, popular, rock, and hiphop) in the research music database RWC [21], were copied from CDs onto the personal computer as WAVE files with the following specifications: 44.1 kHz, 16 bits, and monaural.For each music audio file selected from the database, one 10-sec clip of the music audio (hereafter referred to as the original music audio clip) was extracted starting at 1 minute from the beginning of the audio file and saved on a personal computer.The watermarked music audio clip was produced by embedding a digital watermark on the original audio clip by the proposed method.
In Experiment 1, where the near-optimum solutions for were obtained by using GA to evaluate the tolerance of watermarking to compression, MP3, AAC, and WMA compression systems were each used to compress the watermarked music audio clip to bitrates of 64, 96, and 128 kbps.The bitrate of 32 kbps was also used for MP3.Moreover, for , the fitness function value of the near-optimum solutions obtained with GA was compared with the fitness function values of the feasible solutions at the initial generation.

P P
In Experiment 2, the near-optimum solutions for P and P were obtained by using GA, and the performances of those solutions were compared with respect to the calculation times for getting those solutions, the quality of the watermarked music audio clip, and the detection rate of the watermarks after compression by the MP3 technique.Moreover, for P and P , the near-optimum solutions obtained by using GA were compared with the solutions produced by random generation of individual, neglecting the restrictions (10) for P and ( 17) for , with respect to the quality of the watermarked music audio clip, and the detection rate of the watermarks after compression by the MP3 technique.

Procedure
The procedure in the experiment is as follows.
Step 1: First, an initial population consisting of several individuals is generated.In the process of generating an initial population having a given number of individuals, the individual that does not meet the restriction or that has a fatal gene is deleted as soon as it is produced, and a new individual is generated.If all individuals generated in 300 continuous trials do not meet the restriction or have at least one fatal gene, the procedure is terminated.When an initial population having a given number of feasible individuals is generated, go to Step 2.
Step 2: The embedment of digital watermark according to the condition decided by each individual, the sound compression with the MP3 technique and the detection of digital watermark after MP3 decoding are performed, and then the fitness is calculated.When the generation is final, the procedure is terminated.Otherwise, go to Step 3.
Step 3: The roulette strategy for selection, crossover, and mutation are performed.Go to Step 2.
The near-optimum solution, which is defined as the solution having the highest fitness through all generations, is obtained by repeating the process from Steps 2 to 3 until the given final generation.

Conditions
Table 1 shows the conditions of the GA strategy.In addition to the conditions shown in Table 1, the lower bound of the detection rate a, described by the restriction conditions (10) and (17), was set to 90.Table 2 , , ,  k l m n For Experiment 1, 32 and 64 kbps were used as the bitrates of the MP3 compression and DWT levels ranging from one to eight were selected as the search range.For Experiment 2, 96 kbps was used as the bitrate of the MP3 compression for level 3 of DWT, and 32 kbps was used for levels 4 to 6 of DWT.
y y j denote the values of the r-th sound data of frame before and after embedding the digital watermark, respectively, and , N N denote the number of sound data at frame and the number of frames to be measured for j seg , respectively.In the present study, we used 209 ms as the time length of one frame.When calculating SNR seg , we excluded the frame with j , which means there is no change of all values of the sound data of frame .An audio frame size of 209 ms is adopted for comparison with results obtained by our previous watermarking method [17,18], where the frame size was decided in the relation to the condition of watermarking.In this subsubsection, to examine the performance of the proposed method in Experiment 1, the fitness function value of the near-optimum solution for the partial problem P is first compared with that obtained from 30 feasible solutions generated at random for the partial problem P .Next, the tolerances of watermark obtained by the proposed method to compression by MP3, AAC, and WMA are shown, and the time to obtain the near-optimum solution for the partial problem P is shown to check the practicability of the proposed method.
Each DWT level obtained as an element composed of a chromosome of the near-optimum solution by GA was 4 for classical and jazz, 5 for rock and hiphop, and 6 for popular music for 32 kbps of the bitrate condition of MP3, while the DWT level for 64 kbps as the bitrate condition of MP3 was 3 for classical, jazz, and rock music, 4 for popular, and 5 for hiphop.
As shown in Figures 9-13, GA successfully found a good solution considered as the near-optimum solution for each case.Table 3 shows the tolerance of watermark to the compression.Table 4 shows the time to obtain a near-optimum solution.For the MP3 bitrate condition of 32 and 64 kbps in the process of GA, the average detection rate after compression by MP3 with the bitrate of 32 to 128 kbps was 98.39% and 93.88%, respectively, and the average time to obtain the near-optimum solution was 5.94 × 10 2 and 3.78 × 10 2 sec, respectively.As a condition for obtaining the high tolerance of watermark to MP3 compression, 32 kbps was better than 64 kbps.However, it took more time to obtain the near-optimum solution for the MP3 bitrate condition for 32 kbps than the time for 64 kbps.Moreover, for the MP3 bitrate condition of 32 kbps and 64 kbps in the process of GA, the average detection rate after compression by AAC with the bit rate of 64 to 128 kbps was 93.71% and 84.62%, respectively, and that by WMA with the bitrate of 64 to 128 kbps was 98.33% and 95.07%, respectively.As shown in Table 5, seg after embedding the watermark on the condition of the near-optimum solution obtained by the proposed method was 69.7 to 78.6 dB.These values suggest that noise due to the watermarking was difficult to perceive.

Experiment 2
In this subsubsection, to examine the performance of the proposed method in Experiment 2, the performances of the near-optimum solutions for the original problem P and the partial problem P and those obtained from the solutions generated at random for the original problem P and the partial problem P are compared with respect to the detection rates of the watermarks after MP3 compression and by the errors of the watermarking.Next, the times to obtain the near-optimum solution for the original problem P and the partial problem are shown to P check the practicability of the proposed method.
As shown in Figures 14 to 18, GA successfully found a good solution considered as the near-optimum solution in each case for the original problem P and the partial problem .Moreover, the near-optimum solution for the original problem P had better performance as a condition for embedding watermarks than that for the partial problem P P .However, the time to obtain the near-optimum solution for the original problem P was approximately 2 to 200 times, compared with that for the partial

problem
(Table 6).Practically, the watermarking decided by a near-optimum solution for the partial problem is recommended.In addition, an initial solution for the partial problem can be considered to be a secret key.Therefore, the partial problem P P P P has an advantage over the original problem P from the viewpoint of watermarking security.

Comparison with Another Technique
For making the technical level of the proposed method clear, we compared the results by the proposed method with those by our retorted method [17,18].In our reported study, another optimal watermarking problem was formulated using discrete Fourier transform and a watermarking method proposed in the reported study [19].In our reported study, a string composed of 92 randomly generated bits was used as the watermark information, the same clips as used in the present study were used, and seg was measured using the same condition as that used in the present study.Table 7 shows the toler-ance of watermark to the compression in using our reported method.Comparing Table 7 with Table 3, it is clear that the proposed method had higher tolerance to compression by each of MP3, AAC, and MWA than that by our reported study.As shown in Table 8, SNR seg after embedding the watermark on the condition of the near-optimum solution obtained by our reported method was 36.8 to 52.2 dB.Although the amount of watermark information used in the present study was more than 4 times to that of our reported study, the proposed method realized lower noise than that by our reported method (Tables 5 and 8).

Discussion for Practical Setup
It takes much time to apply our technique to a song of 3 -5 minutes as one clip.For practical usage of our approach, we will select some short clips of 10 second, for example.Then, we will use our approach to each short clip.The methodology for effective selections of short lips in a song is our next target.c       It is difficult to model the compression attack in general.In this paper, the framework of watermark optimization is shown on the viewpoint of realizing good quality and high tolerance to MP3 compression.Because MP3 compression is one of the easiest attacks to watermark, we have selected it as an example.In addition, the high tolerance of watermark by the proposed method to each of AAC and WMA compression, which are representative audio compression techniques, is also shown in this paper.

Conclusions
A method for embedding digital watermark using DWT and GA to realize high tolerance to compression by MP3 is proposed.The proposed method enables us to embed digital watermark in a near-optimum manner for each music audio clip.Moreover, the near-optimum solution for the original problem P and partial problem P , and the initial pattern of digital watermark for are used as secret keys in extracting the digital watermark.The experimental results show that the proposed method generates watermarked audio of good quality and high tolerance to MP3 compression.

Figure 1 .
Figure 1.Multi-resolution analysis by the DWT.

SFigure 2 .
Figure 2. Histogram of the wavelet coefficients of an MRR sequence at level 3 (jazz).

Figure 3 .
Figure 3. Schematic diagram of the histogram of MRR wavelet coefficients.

Figure 6 .
Figure 6.Schematic diagram of a chromosome structure in Experiment 1 for P'.

Figure 7 .
Figure 7. Schematic diagram of a chromosome structure in Experiment 2 for P.

Figure 8 .
Figure 8. Schematic diagram of a chromosome structure in Experiment 2 for P'.

Figure 9 .
Figure 9.Comparison of fitness function value between the near-optimum solution and the 30 feasible solutions generated at random (Experiment 1, music file: classical).Left figure: MP3 bit rate; 32 kbps.Right figure: MP3 bit rate; 64 kbps.

Figure 10 .
Figure 10.Comparison of fitness function value between the near-optimum solution and the 30 feasible solutions generated at random (Experiment 1, music file: jazz).Left figure: MP3 bit rate; 32 kbps.Right figure: MP3 bit rate; 64 kbps.

Figure 11 .
Figure 11.Comparison of fitness function value between the near-optimum solution and the 30 feasible solutions generated at random (Experiment 1, music file: popular).Left figure: MP3 bit rate; 32 kbps.Right figure: MP3 bit rate; 64 kbps.

Figure 12 .
Figure 12.Comparison of fitness function value between the near-optimum solution and the 30 feasible solutions generated at random (Experiment 1, music file: rock).Left figure: MP3 bit rate; 32 kbps.Right figure: MP3 bit rate; 64 kbps.

Figure 13 .
Figure 13.Comparison of fitness function value between the near-optimum solution and the 30 feasible solutions generated at random (Experiment 1, music file: hiphop).Left figure: MP3 bit rate; 32 kbps.Right figure: MP3 bit rate; 64 kbps.

Figure 14 .
Figure 14.Comparison between the near-optimum solutions and the solutions generated at random (Experiment 2, music file: classical).

Figure 15 .
Figure 15.Comparison between the near-optimum solutions and the solutions generated at random (Experiment 2, music file: jazz).

Figure 16 .
Figure 16.Comparison between the near-optimum solutions and the solutions generated at random (Experiment 2, music file: popular).

Figure 17 .Figure 18 .
Figure 17.Comparison between the near-optimum solutions and the solutions generated at random (Experiment 2, music file: rock).

Table 1 . Conditions of the GA strategy.
shows