Content-Based Image Retrieval with Feature Extraction and Rotation Invariance ()
1. Introduction
To date, the technique or method used to correct rotation angles in the case of retrieving images from a dataset has not been applied to a pre-trained Convolution Neural Network to make its rotation invariant. That is to say, if the orientation of the query image changes, the returned images will also change in that regard. In addition, retrieval is hampered by rotated images. Figure 1 illustrates the point. To make CNN acknowledge rotation angles, it will call for quite a number of techniques of which several developments are currently underway. One technique is to replace CNN’s merging layer with a Spatial Transformer Module [1], which makes it read it in a three-dimensional way and, ultimately, arrives at making it rotation invariant. Unfortunately, the spatial module solution does not give a pre-trained or well-defined architecture. So, to make the retrieval system rotation invariant, a different approach is used.
The Rotation Invariant CBIR system has received very little attention. Bharati et al. [2] suggested a method for extracting rotation-invariant curvelet features by analyzing curvelet transform and performing some relevant derivations. Vandhana et al. [3] presented a method that uses a Scale-Up Robust Features
(a) (b)
Figure 1. (a) ImageData1; (b) ImageData2.
(SURF) detector to find prominent points. By conducting a mapping from cartesian to logarithmic-polar coordinates, projecting this mapping onto two 1D signature vectors, and computing their power spectra coefficients, Milanese et al. [4] proposed a technique for obtaining an image signature that is retrieved from the Fourier power spectrum. Fountain et al. [5] used Fourier expansion algorithm to propose a CBIR rotation invariant system which has to do with manipulating the histogram of intensity gradient. To make CBIR rotation invariant, Tzagkarakis et al. [6] proposed a method based on a texture data conversion using a steerable pyramid. Chifa et al. [7] proposed an approach that involves applying circular masks of various sizes to a picture, extracting the color descriptor from the viewable region on the mask, and merging the results. Krishnamoorthi et al. [8] proposed a technique using an orthogonal polynomials model on CBIR. This extracts surface features that have to do with the gray level discrepancies and frequency band of the image being analyzed and the resulting surface feature vector which then becomes rotation invariant.
From related literature, it is evident that models used could not predict the angle of the rotated image and so cannot satisfy accuracy and check the difference in improvement after the orientation angle is corrected. In this research, a separate model is designed and trained to correct the incorrect rotation invariance.
The Problem
Using only pre-trained, CNN will definitely be a rotation variant and will retrieve images that are still similar to the dataset given and the probability of confusing the query is high. Figure 1(a)—ImageData1 query picture is rotationally accurate and has an accuracy of 1. And for the second approach, Figure 1(b)—The same query image is rotated 90 degrees anti-clockwise, resulting in essentially identical images with a precision of 0.45. Therefore, the technique is to re-train the pre-trained CNN by correcting the angle back to its rotation accuracy for it to see the image in the accuracy of 1 even though the image is rotated. This method automatically makes the rotation variant pre-trained CNN invariant.
2. Methodology
By employing a single pre-trained OAD model, any angle spanning between zero and 359 degrees of every image in the collection is marked. Paper 3 version 1 of the OAD model will be used for the process, where a given dataset for the content-based image retrieval can have any of its images rotated at any arbitrary angle. The images stored as a dataset will then be rotated according to the model’s expected orientation angle. The images will then be processed through the pre-trained CBIR model, in this case, InceptionResNetV2 [9] in order to extract their features. The query image will be processed through both models at the moment of retrieval: the first Orientation Angle Detection Model and the second CBIR Model. Then, using the query image’s extracted characteristics, a similarity measure is computed and the results are returned. In that regard, every skewed image will be automatically orientated by the OAD model, yielding identical results as if the image were not tilted at all. If the OAD model fails to compute the correct orientation angle due to its prediction error, the program detects the orientation of the image w.r.t the perpendicular axis and straightens the image. If the prediction error is greater than 1 the image is adjusted to the left to align to the perpendicular axis. If less than 1, the image is adjusted to the right in order to align to the perpendicular axis. Figure 2 depicts a rotation-invariant CBIR system.
2.1. Datasets Used
The CBIR model was used on the image datasets below:
• ImageData1: A dataset containing 200 images divided into four groups—Nature, People, Buses, and buildings, making 50 images in each group.
• ImageData2: This dataset has 200 pictures divided into two groups. This makes 100 images in each group—Buses and buildings.
2.2. OAD Model on CBIR System
To solve the difficulty indicated in Figure 1, the OAD model is engineered to rotate the image. As seen in Figure 3, the OAD model corrects the image’s orientation. The image after OAD correction in Figure 3(b) and the previous image in Figure 1 is now quite comparable.
As a result, it extracted similar images while boosting the accuracy from 0.45 to 1. ImageData1 and ImageData2 datasets are compared to further justify the improvement. The objective is to apply an arbitrary angle to the n% dataset of images. Next, apply an extraction technique on CBIR features. This can be done by processing the images via the OAD model and then forwarding them into the CBIR model.
Figure 4 depicts the benefit of combining the OAD and CBIR models on the ImageData1 and ImageData2.
Table 1 above shows the result. Here,
(1)
The OAD model improves the performance of images rotated by different percentages on Table 1(a)—ImageData1 and Table 1(b)—ImageData2 where each number is expressed in percentages.
Thus, when combined with the CBIR model, the precision value increases; yet, when used alone, the precision value decreases. As seen in Figure 4, utilizing the OAD model considerably improves the accuracy of the CBIR system when the dataset comprises a high number of rotated images, whereas using the OAD model has a negligible effect when the dataset images are rotationally precise. Additionally, about 5% of rotated photos in both datasets fall between the range of positive and negative improvement. Thus, if a database contains between 5% and 10% rotated images, the correctness level will be high when combining the OAD and CBIR models. By picking a random sample, we can determine the number of images rotated in large datasets.
Figure 2. Flowchart of a Rotation Invariant CBIR system.
Figure 3. (a) OAD query image; (b) OAD correction of angle rotation.
(a) (b)
Figure 4. (a) and (b) chart shows the performance level of correctness when the OAD model is used on ImageData1 and ImageData2 over different values of percentage of rotated images.
(a) (b)
Table 1. (a) Correctness without OAD Model. (b) Correctness with OAD Model.
3. Conclusions
By combining a transitional deep learning model to correct the rotation angle of any image, this study offered a unique construction of a rotation-invariant CBIR system that handles the CNN features that are not rotation invariant. Lastly, it demonstrates that combining this extra correction model with the previous CBIR model had no noticeable impact on real-time image retrieval. The inclusion of additional models considerably enhanced the results although image retrieval time remains an issue.
For further research, the pre-trained CNN scale-invariant system has a chance of retrieving images in real-time for ImageData1 and ImageData2. The average query image retrieval time for a scope may be calculated using the two image datasets. The datasets may be processed using OAD and CBIR models, and the recovered features can then be kept in memory as a feature bank. After retrieving and processing the query images, the same models may be used to get and process the original image. The query image’s extracted features may then be matched against its feature lists. This explains that the CBIR model which is now rotation invariant can recover images in real-time.