^{1}

^{2}

^{*}

^{1}

The aim of this work is the development of a steganographic technique for the MP3 audio format, which is based on the Peak Shaped Model algorithm used for JPEG images. The proposed method relies on the statistical properties of MP3 samples, which are compressed by a Modified Discrete Cosine Transform (MDCT). After the conversion of MP3, it’s possible to hide some secret information by replacing the least significant bit of the MDCT coefficients. Those coefficients are chosen according to the statistical relevance of each coefficient within the distribution. The performance analysis has been made by calculating three steganographic parameters: the Embedding Capacity, the Embedding Efficiency and the PSNR. It has been also simulated an attack with the Chi-Square test and the results have been used to plot the ROC curve, in order to calculate the error probability. Performances have been compared with performances of other existing techniques, showing interesting results.

Steganography techniques are used to hide secret information in the most common audio/video formats. There are three main different kinds of audio/video steganography [

1) insertion steganography, where the secret message is inserted in the cover object;

2) substitution steganography, where some bits of the cover object are substituted with the bits of the secret message;

3) constructing steganography, where an ad hoc cover object is generated to contain the secret message.

The developed technique is based on LSB steganography, a substitution steganography, that replaces the least significant bit of the audio/video file with the secret message bit. This method is very simple to implement and does not allow the human eye/hear to perceive significant changes in the stego object. In

However, this technique has lower resistance to the statistical attacks since with a proper steganalysis it is possible to detect the secret information. To solve this problem, Model Based Steganography can be used [

After the embedding process, contains the secret information, called . The union between this part and is the stego object.

The purpose of this paper is to present a new steganographic algorithm for the MP3 [3-5] format based on the change, in the Peak Shaped Based for the JPEG [

The MP3 format was born to have good audio quality and low file size [3,4]. An audio file is first converted into a digital format, with a sampling, and then it is processed with the human psychoacoustic model [3-5].

With this model it is possible to delete the frequency that the human ear can’t hear; an algorithm characterized by this properties is called “lossy” because it deletes some information. This happens with a compression, that uses the MDCT [3,5], the Modified Discrete Cosine Transform, described in the following equation:

that is another version of the DCT-II used in the JPEG format, where:

and is a window (it is possible to choose different kinds of windows).

The audio samples are processed in a MDCT filterbank. The audio sequence is divided in “frame”, each frame contains M samples and is processed as in

The Peak Shaped Based (PSB) [

LSB steganography;

Least significant bit;

Model Based steganography [

The JPEG coefficient are, for first, divided by group, indicated with g(b) [

and by offset, indicated with O(b) [

where b is the JPEG coefficient and .

The PSB algorithm is based on an assumption, from the properties of the JPEG coefficients statistical distribution processed by the algorithm F5, which is [

where h(b) indicates the histogram of the coefficient b. With this assumption is possible to calculate a probability, called “offset probability”.

Subsequently the coefficients are processed with the “discrepancy”, an operator that allows to calculate the statistical dependence between two closer coefficients. This is defined as follows:

where [

where, as shown in ^{th}. Each block contains 64 DCT-II coefficient, 8*8, and their sequences composing the JPEG image.

To apply the PSB algorithm to the MP3 format it is necessary to study the differences between this format and the JPEG standard, in order to identify possible changes. These differences are:

the JPEG uses the DCT-II while the MP3 uses the MDCT;

the JPEG works on blocks; each blocks, or matrix, size is 8*8. Instead, the MP3 works on frame; each frame has dimension equal to 1*1152, that are vectors;

the PSB is based on an assumption from the F5 algorithm, that is used for the JPEG format.

Concerning the first point, it is possible to notice that the PSB works on the coefficients. It is therefore necessary to demonstrate that the DCT-II and the MDCT statistical distributions have the same properties.

Concerning the second point, it is necessary to detect the operations that in the PSB works on matrix and, to apply it on the MP3 format, transforming them in operations that works on vectors.

Concerning the third point, it should be studied the statistical distribution of the MP3 coefficients after the F5-algorithm in order to identify if this distribution has the same properties than the JPEG distribution processed by F5.

The JPEG discrepancy works on matrix, as described in paragraph 3.1. It is necessary to modify this operation to enable it to operate on vectors. Considering the MP3 format, and the MP3 frames, it is possible to call one of them as. The discrepancy works on the previous frame and the subsequent. The frames are showed in

where is the same as in the JPEG discrepancy.

The statistical distribution of the MP3 coefficients, after the compression, is Peak Shaped [

as shown in

With the parameter r set to 0.5, it is possible to have the best approximation of the statistical distribution. It is possible to choose the value r = 1, the Laplacian distribution, to have a good approximation with low complexity.

The Laplacian distribution is used to approximate the statistical distribution of the JPEG coefficients, as well [8-10].

Using the previous considerations, it is possible to apply the Peak Shaped Based steganography to the MP3 format. In fact it is possible to utilize the assumption used by this algorithm from the F5 [

Having both formats, namely both transformed, the same statistical distribution of the coefficients, the use of different transformed becomes irrelevant to the development of the algorithm.

In the following, the list of steps of the embedding process is reported:

the first step is represented by the analysis of the MP3 statistical distribution;

successively the value of Hg vector that contains the histograms of the MP3 is calculated;

with the Hg values it is possible to exclude the samples that are statistical most significant;

the Hg values allow the calculus, with the algorithm shown in

each frame of the MP3 file, that contains 576 coefficients, is taken and analyzed;

the coefficients b are divided by group, with the g(b) (4), and by offset, with the O(b) (5);

the discrepancy is calculated by means of Equation (9);

with the discrepancy, the vector P and a PRNG, according with the secret key, it is possible to determi-

nate the coefficients that contain the stego message;

the offset of each coefficient that is possible to modify is changed according to the value of the bit of the secret message, as showed in

To extract the secret message the embedding process must be repeated to determinate the coefficients that were modified. The analysis of each offset allows the reconstruction of the secret message.

To analyze the performance of a steganographic techni-

que three parameters are used:

Embedding capacity;

PSNR;

Embedding efficiency.

The embedding capacity (EC) [7,10] indicates the maximum data size that it is possible to hide in the cover object. It is defined as follows:

The Peak to Noise Ratio [

Through the Embedding Efficiency (EE) [2,12] it is indicated the average number of bits inserted for each change. It is defined as follows:

To evaluate the PSB for MP3 performance it is necessary to compare the Efficiency, the Capacity and the PSNR with different kinds of steganographyc algorithms and with the original PSB, that it was implemented for the JPEG format.

The embedding capacity and efficiency for the Model Based Steganography for the JPEG format are shown in

The PSNR is evaluated for the PSB algorithm for the JPEG format, as shown in

Shaped Steganography used for the JPEG format.

To calculate the Embedding Capacity for the MP3-PSB steganography it is necessary to calculate the file size as follows:

A new variable is defined and it is called L_{s} that is the secret message length. Then the Capacity is:

and the results are shown in

The PSNR is evaluated as described in Equation (11) and the results are showed in

The Efficiency is calculated by analyzing the number of changes to insert the secret message in the cover object. It is possible to see the performance in

In Tables 7 and 8 the PSNR and the Capacity are illustrated for different kinds of steganographyc algorithms for MP3 format.

With the steganalysis [

and the second one with the symbol . The values of these probabilities depend on a threshold, called, that modify the steganalysis system accuracy.

With these probabilities it is possible to calculate other parameters, like the detection probability, , and the error probability :

A steganalytic technique that is possible to use for the PSB-MP3 is the Chi-Square test [

One method to calculated the chi-square test is the Zhang-Ping attack [

and if it will calculate the chi-square values:

to compare with the threshold.

This analysis to evaluate the PSB performances is done on a random set of MP3 files. With the comparison with the threshold and the Chi-Square value it is possible to calculate the two probabilities of false alarm and missed detection, shown in

With these two probabilities it is possible to graph the ROC curve. This curve indicates the efficiency of the steganalytic method. If the curve is near the first quadrant bisector the steganographyc algorithm is very strong, otherwise the steganalytic method is efficient.

In

A new steganographic algorithm for the MP3 format has been developed by changing, in the Peak Shaped Based for the JPEG, the discrepancy equation, adapting it to vectors and studying the statistical distribution of the MDCT coefficients. The analysis of the performance of this algorithm showed that this method does not introduce audible distortion when the signal audio is reproduced. Further, this method does not create relevant statistical differences in the samples distribution.

The Peak Shaped Based for the MP3 has an high capacity compared to the other algorithms and a good PSNR. In fact the mean Embedding Capacity is equal to 12.75%, higher than the most relevant techniques used for MP3 steganography; the PSNR is equal to 58.21 dB, higher than the PSB for the JPEG.

A steganalytic attack has been simulated to evaluate the robustness of the algorithm. This attack is implemented on the Zhang Ping analysis and on the Chi-Square test. This attack has been adapted to the PSB-MP3 since it was created for the JSteg steganography. By calculating the false alarm probability and the missed detection probability it is possible to draw the ROC curve. The analysis of this curve shows that this attack is not suitable for this steganographic method because the ROC is crushed on the bisector. The error probability, calculated with the ROC curve, tends to 0.5 when the threshold increases; when it takes this values the choice is completely random.

The steganographyc algorithm implemented, as assumed, is resistant to the statistical attacks.