Image Retrieval Using Deep Convolutional Neural Networks and Regularized Locality Preserving Indexing Strategy

Convolutional Neural Networks (CNN) has been a very popular area in large scale data processing and many works have demonstrate that CNN is a very promising tool in many field, e.g., image classification and image retrieval. Theoretically, CNN features can become better and better with the increase of CNN layers. But on the other side more layers can dramatically increase the computational cost on the same condition of other devices. In addition to CNN features, how to dig out the potential information contained in the features is also an important aspect. In this paper, we propose a novel approach utilize deep CNN to extract image features and then introduce a Regularized Locality Preserving Indexing (RLPI) method which can make features more differentiated through learning a new space of the data space. First, we apply deep networks (VGG-net) to extract image features and then introduce Regularized Locality Preserving Indexing (RLPI) method to train a model. Finally, the new feature space can be generated through this model and then can be used to image retrieval.


Introduction
In traditional CBIR systems, low-level features such as the color, shape and texture features are usually extracted to construct a feature vector for describing images and then, based on a proper similarity measure, images are retrieved by comparing the feature vector corresponding to the query image and those corresponding to images in the data set.Generally, there are three key issues in CBIR systems, (1) selecting appropriate feature extraction method, (2) extracting appropriate image features and (3) matching features with effective method.
Many researchers devote most of their attention to the first issue.However, they usually fail to extract the internal structure contained in the features which is crucial for distinguishing data points.In our paper, we aims to find this internal structure from the original data space.Moreover, the convolutional neural network has been developing rapidly since 2012 when Krizhevsky et al. won the championship on the classification of the Image Net based on CNN [1].Thus, it is a good tool for us to extract more abstract image features.Recently, some works prefer to combine CNN with hash code and a few hashing methods are proposed to measure the relevance between the query and the database.Hash methods can be divided into two categories: supervised hash and unsupervised hash.Since a pair-wise similarity matrix is needed for hash methods, the storage and computational cost can dramatically increase especially for the case when the database is very large.In order to resolve such a problem, we propose a method which utilize the abstract features of CNN and the efficiency of RLPI.Indeed, how to dig out the potential information contained in these features is another critical issue.We believe that there is a certain internal structural link between similar features.Thus, our main purpose is to find out this link and RLPI is a good choice in helping us with this research [2].RLPI is fundamentally based on LPI which is proposed to find out the discriminative inner structure of the document space.However, in out paper, we apply it to the image feature space and get a good performance.

VGG-Net
As mentioned before, we utilize the deep CNN for extracting abstract features from images.In our work, we utilize a VGG-net model with 5 nets for our image retrieval purpose.In here, these five nets are represented with five alphabets from A to E, respectively.The width of convolution layers starts from 64 in the first layer and then increases by a factor of 2 after each max-polling layer, until it reaches 512 and then maintains.In addition to convolution layers, there are five max-polling layers.Although VGG-net contains five nets, the convolution layers and the pooling layers in these five nets have the same parametric settings.This strategy can make sure that the shape comes out of each convolution layer group is consistent, no matter how many convolution layers are added in the convolution group.
Many studies demonstrate that deeper networks can achieve better performance.However, training deeper networks not only dramatically increases the computational requirements but also needs stringent hardware support.In our work, we utilize VGG-net model to extract image features.In order to implement this network with moderate computing requirements, each image is rescaled to the same size of 224 × 224, which is then represented with a vector of

Locality Preserving Indexing
LPI is proposed to find out the discriminative inner structure of the document space and extract the most discriminative features hidden in the data space.
Given a set of data points and a similarity matrix.Then LPI can be obtained through solving the following minimization problem: where is a diagonal matrix and its entries are column sums of W .
The W can be constructed as: where ( ) As the objective function will generate a heavy penalty if neighboring data points i x and j x are mapped far apart.Thus, to get the objective function minimization is an attempt to ensure that if i x and j x are close then y a x = are close as well [3].Then, the basis function of LPI are the eigenvectors associated with the smallest eigenvalues of the following generalized eigenproblem: Thus, the minimization problem in Equation ( 1) can be changed to the following problem: and the optimal a 's are also the maximum eigenvectors of eigen-problem: However, according to [2], the high computational cost limits the application of LPI on large datasets.RLPI aims to solve this drawback through solving the eigen-problem in Equation ( 5) efficiently.

Regularized Locality Preserving Indexing
The following theorem can be used to solve the eigen-problem in Equation (5) efficiently: Let y be the eigenvector of eigen-problem: with eigenvalue λ .If T X a y = , and a is the eigenvector of eigen-problem in Equation ( 5) with the same eigenvalue λ [2].
Based on this theorem, the direct computation of the eigen-problem in Equation ( 5) can be avoided and the LPI basis function can be acquired through the following two steps: 1) Solve the eigen-problem in Equation ( 6) to get y .
2) Find a which can satisfy the equation T X a y = and a possible method to find a which can best fit the following equation in the least squares sense: where i y is the i -th element of y .A penalty can be imposed on the norm of a to avoid infinite many solutions for the linear equations system T X a y = through the following method: where 0 α ≥ is a parameter to control the amounts of shrinkage.This is the regularization which can be well studied in [4].Thus, we called this method as Regularized Locality Preserving Indexing (RLPI).Let , and the RLPI embedding is as follows: where z is a d -dimensional new representation of the image x and this is our main purpose.From this point of view, RLPI is also a dimensionality reduction method.A is the transformation matrix.When we perform image retrieval, we can use this new feature representation to retrieval the similar images in the database through cosine distance.

Experiments Results
In order to evaluate the performance of our proposed method, experiments are conducted on Caltech-256 dataset.For the purpose of comparisons, results from other methods are also presented.

Image Datasets
Caltech-256 dataset contains 29780 images in 256 categories.We select images from the first 70 classes of the caltech-256 dataset to construct a smaller dataset (referred to as Caltch-70 here and after) consisting of 7674 images for our experiment.We select 500 images randomly as the queries and the remaining as the targets for search.

Quantitative Evaluation
Three measures are used to evaluate the performance of different algorithms.
The first one is the precision defined as: where correct N is the number of correct relevant returns among the K retrieved images.
The precision tells us the rate of relevant images in total retrieved images in a particular search.However, sometimes, we want to get more relevant images from the database rather than just a very high precision.Thus, the recall is also an important measure of the performance of different algorithms.We define the second measure recall as: where relevant N is the total number of relevant images in the dataset.

A. Comparisons with Hash Type Methods
In this section, we compare our proposed method (VGG-RLPI) with many hash type methods [5] [6] [7].In [6], Iterative Quantization (ITQ) and PCA random rotation (PCA-RR) methods are proposed which represents state-of-art in 2013.And many classic hash type methods such as locality sensitive hashing (LSH), PCA hash (PCAH) and SKLSH [7] are also in comparison with our proposed method and from the results, we can find that our proposed method is superior to these methods.
Figure 1 shows the retrieval precision and recall of different methods averaged among 500 queries in the Caltech-70 dataset.For the purpose of comparisons, results from other hash type methods are also presented.In these methods, features are extracted with the VGG-net.From the curves in this figure we can see that our proposed method (VGG-RLPI) performs the best among others.

B. Comparisons with Dimension Reduction Methods
As has mentioned above, the RLPI is also a dimension reduction method.Thus, comparisons are also performed with two other dimension reduction methods: the Principle Component Analysis (PCA) method and the Linear Graph Embedding (LGE) method.For fair comparisons, retrieval experiments are first performed with reduced feature vectors obtained from these three methods to determine the optimal dimension.Figure 2 gives the mean precision of the three methods at different number of returns in the Caltech-70 dataset.From the figure, we can see that the optimal dimension is 100 both for LGE and RLPI.However, for PCA method, the curves almost coincide when feature dimension exceed 50.Thus, for the following comparison, we select the precision corresponding to 100 dimension for the three dimension reduction methods.From the results in Figure 3 we can find that our proposed method superior to the other two ones.

Conclusion
In this paper, a novel method utilizing the deep CNN and the RLPI is proposed for image retrieval.Since the CNN features have both abstract and global properties in FC-layers, it can well represent an image and also has a good discrimination ability in both classification and information retrieval tasks.However, using the features extracted from CNN to perform pattern matching directly is inefficient.On the other hand, the RLPI can learn a new feature space which is