^{1}

^{*}

^{2}

^{*}

^{1}

^{*}

^{1}

^{*}

^{1}

^{*}

Nigeria was listed as a part of terrorist states by United States of America as a result of Islamic group (Boko Haram Sect) attacks and other activities in the nation. It has also been discovered that the group employs “steganographic” schemes as a secure means for transmitting their hidden information to each other via Internet and social networks. The group has killed thousands of people since their increased insurgency in July, 2009. These challenges have affected the nation’s foreign policies, political and social economic developments. This research addresses the challenges by employing forensic technique using blind steganalysis approach to detect the presence of the hidden messages in images. Image Quality Metric is employed for extracting the features, and logistic regression is trained as the classifier to predict the stego-images. We show the effectiveness of the method by conducting test and analysis with 319 images varying in size and style. The result shows that the performance of the method is better than other steganalysis methods.

The awareness of insecurity in Nigeria has placed an increasing focus on the need for security of life and properties. According to [

The art of discovering the existence of steganographic data or secret message in an object is called steganalysis. It also refers to as the body of techniques that is designed to distinguish between cover-objects and stego- objects [

This paper presents an efficient method in detecting steganographic data by employing Image Quality Metrics (IQM) as a means of feature extraction and Logistic Regression analysis for classification. One advantage of this project is that it is able to provide solution to some of steganalysis problems in terms of testing algorithm against payload stego-images and various categories of images such as animals, fruits and natural scenes. Also, the system is able to detect presence of hidden message in cover signal. Another advantage is that the system is able to predict accurately any suspected images irrespective of the algorithm used in embedding process due to the fact that the system is trained with different embedding algorithms, for example, LSB, F5 etc.

The research on steganalysis started in the late 90’s. The idea to use a trained classifier to detect data hiding was first introduced in a paper by [

In [

Dual statistics steganalytic method for detection of LSB embedding in uncompressed formats was introduced in [

A universal blind detection scheme that can be applied to any steganographic scheme after proper training on databases of original and cover-images was introduced in [

A blind steganalysis method was presented in [

Another blind steganalysis method with high detection ratio was proposed based on best wavelet packet decomposition. However, the methods based on wavelet high order statistics could not perform very well on spatial domain steganography such as LSB steganography [

Contourlet Based Steganalysis (CBS) was presented in [

The modern steganography techniques places embedding changes in those regions of images that are hard to model and hence increasingly more complex statistical descriptors of covers that are required to capture a large number of dependencies among cover elements that might be disturbed by embedding.

Much works have been done in the literature but need for better and efficient method in term of high prediction rate for further development of steganalysis necessary. Therefore, this paper adopted IQM as a method for feature extraction technique.

319 images were tested and analyzed. Messages were embedded to 169 gray scale images using four known steganography software which are VLS (Virtual Laboratory Steganography), with different embedding algorithms which enable our system to learn and predict accurately any suspected images irrespective of the algorithms used in their embedding process. Thereafter, feature extraction process took place and logistic regression is trained as the classifier to predict the stego-images. The table in the appendix showed the data analysis with IQM functions used for the images.

The training process block diagram is as represented in

From the block diagram, the training set is the features extracted using the image quality measures (IQM). X is the input features to the hypothesis that is, to the predicting equation while the output Y is the result generated from the predicting equation based on the input features. The result in this case is either 1 repre- senting stego-image or 0 representing cover image.

Let x(i) denote input variable i.e. extracted features;

Let y(i) denote output variable or the targeted variable;

y can only take on two values 0 and 1, that is; y! {0,1};

(x(i), y(i)) denote the training examples.

Logistic regression analysis is used on the selected features generated through Image Quality Measures (IQM) to build an optimal classifier using a set of test images and an original image. The idea is that the distance between a two cover images is less than the distance between a cover image and a stego-image. That is

where,

C represents Cover image;

C_{d} represents distortion of the cover image;

S represents a stego-image;

S_{d} represents distortion of the stego-image.

The focus here is on the binary classification problem in which y can take on only two values, 0 (cover-image) and 1 (stego-image). 0 is also called the negative class, and 1 the positive class, and they are sometimes also denoted by the symbols “−” and “+”.

Let the predicting equation, that is, the hypothesis be denoted as hi^{Q}x^{V}, which is written as

where

T is the intercept from the linear regression equation added to the regression coefficient multiplied by some value of the predictor x.

The features extracted from 150 cover images and 169 stego-images were trained on Logistic Regression classifier using SPSS.

Structural Content (SC) defines the closeness between two images can be quantified in terms of correlation function. These measures measure the similarity between two images; hence in this sense they are complemen- tary to the difference-based measures.

where M and N are the dimension of the image,

MSE is the Mean Square Error. It is defined as

where M and N are the dimension of the image,

AD is the Average Difference, given by

Unweighted Cases | N | Percentage | |
---|---|---|---|

Selected Cases | Included in Analysis | 319 | 100.0 |

Missing Cases | 0 | 0.0 | |

Total | 319 | 100.0 | |

Unselected Cases | 0 | 0 | |

Total | 319 | 100.0 |

B | S.E. | Wald | df | Sig. | Exp(B) | |
---|---|---|---|---|---|---|

MSR | −0.519 | 9850.349 | 0.000 | 1 | 10.000 | 0.595 |

PSNR | −0.298 | 0.275 | 10.173 | 1 | 0.279 | 0.742 |

MNC | 1230.551 | 930.104 | 10.761 | 1 | 0.185 | 40.543E53 |

AD | −0.570 | 0.568 | 10.009 | 1 | 0.315 | 0.565 |

SC | 900.810 | 520.868 | 20.950 | 1 | 0.086 | 20.742E39 |

MD | 0.008 | 0.005 | 20.425 | 1 | 0.119 | 10.008 |

NAE | 380.275 | 160.399 | 50.448 | 1 | 0.020 | 40.195E16 |

SD | 0.516 | 9850.349 | 0.000 | 1 | 10.000 | 10.676 |

CONSTANT | −2090.710 | 1440.044 | 20.120 | 1 | 0.145 | 0.000 |

Step | −2 Log likelihood | Cox & Snell R Square | Nagelkerke R Square |
---|---|---|---|

1 | 414.005^{a} | 0.081 | 0.109 |

where M and N are the dimension of the image,

PSNR is Peak Signal to Noise Ratio. It is used to find the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. PSNR is commonly used to measure the quality of reconstruction of lossy compression codecs especially for image compression. PSNR is given by

MD is the Maximum Difference, given by

SD is the Spectral Distance, while NAE is the Normalized Absolute Error, given by

Estimation terminated at iteration number 20 because maximum iterations have been reached. Final solution cannot be found.

HypothesisGeneral hypothesis (predicting equation)

The hypothesis generated from the trained data is

The above equation is the predictive equation.

The result after testing the system with 319 images is shown in

The result of the testing in

The result of this research work is compared with work done in [

The result of the table above show that the system implemented in this project has a high prediction rate compare with WBS [

The approach in this method has provided an easy method for steganalysis and robustness in terms of testing the system against different payload stego-images. We are able to show the effectiveness of the method by conducting test and analysis with 319 images varying in size and style. Messages are embedded to 169 gray scale images using four known steganography softwares which are VLS (Virtual Laboratory Steganography), SecretLayer, QuickStego and OpenStego with different embedding algorithms which enable our system to learn and predict accurately any suspected images irrespective of the algorithms used in their embedding process. Thereafter, feature extraction process takes place and logistic regression is trained as the classifier to predict the stego-images. Finally, our method is able to achieve 58.9% detection rate, despite training the system with a low images (319) compare to existing methods with that were trained with 12,200 images. This means that our method is more efficient.

The output of this paper is recommended to Ministry of Defense to serve as part of reference effort necessary

Observed Y | Predicted Y | Percentage Correct | |||
---|---|---|---|---|---|

0 | 1 | ||||

Step 1 | 0 | 57 | 93 | 38.0 | |

1 | 38 | 131 | 77.5 | ||

Overall Percentage | 58.9 | ||||

Secret Data Size(bits) | Steganalysis Method | Average detection Accuracy (%) |
---|---|---|

5000 | WBS | 51 |

FBS | 53 | |

This Research | 58.9 |

in curbing the Boko-Haram insurgency menace in the country.

OwoeyeKolade,Ajayi AdedoyinOlayinka,FadugbaSunday,ObayomiAdesoji,Isinkaye FolasadeOlubusola, (2015) Detection of Stego-Images in Communication among the Terrorist Boko-Haram Sect in Nigeria. Journal of Data Analysis and Information Processing,03,168-174. doi: 10.4236/jdaip.2015.34017