Distributed Cluster Based 3 D Model Retrieval with MapReduce

View-based 3D model retrieval methods are attracted intensive research attentions due to the high expression and stable features. In the paper, the bag-of-words (BOW) standardization based SIFT feature were extracted from three projection views of a 3D model, and then the distributed K-means cluster algorithm based on a Hadoop platform was employed to compute feature vectors and cluster 3D models. In order to get precise initial cluster center, the maximum and minimum principle based Canopy algorithm was also presented. The similarity of models was determined by the distance between the query model and each cluster center, and the cluster which nearest to the query model will be return as retrieval results. The simulations indicated that the proposed method had good results in terms of 3D model retrieval accuracy and retrieval time efficiency.


Introduction
With the development of 3D modeling technology and computing techniques, 3D models have been widely used in many fields such as virtual reality, 3D movies, 3D games, computer aided design (CAD) and so on [1].The demand for high-quality 3D models is growing every day.However, to construct a new complicatedly model is an extremely time-consuming process.If the existing 3D models could be retrieved efficiently from database, the work efficiency in related fields will be improved dramatically by reusing the model.There are two different ways to search 3D models from database.One is key-word based method, and the other is content based method [2].However, each method has Journal of Computer and Communications some problems in getting a desired model form a given mass 3D model database.
For the key-word based way, it is hard to correctly and uniformly describe all 3D models in the database due to the diversity of 3D model gesture and the different understanding of 3D models.For the content based way, feature vector should be extracted from each 3D model, and the similarity between 3D models should be calculated according to the feature.However, how to extract strong expression feature from 3D model and reduce the processing time are the main challenges, there are many kinds of content based 3D model retrieval methods.
According to the types of features vector, existing 3D model retrieval techniques can be classified into four categories: geometry-based [3], graph-based [4], view-based [5], and hybrid-techniques [6].In the geometry-based approach, the 3D models are represented by the visual similarity and the feature vector is extracted from the shape or the topology of 3D model.In graph-based approach, it attempts to extract a geometric meaning from a 3D model using a graph showing how shape components are linked together.These above two 3D model retrieval algorithms approach also can be treated as in one category named model-based methods.In the view-based approach, 3D models are represented by a group of 2D views from different directions.So, the views can contain the spatial and structure information of 3D models.As a 3D model can be represented by a set of multiple views, some existing image processing methods can be employed.
The hybrid-techniques based method will comprehensive use of the above methods.
In this paper, we will address to the view-based method.The view-based 3D model retrieval process can be defined as follows: given one query 3D model and then to find all similar objects from the 3D models database under the view-based representation.The view-based representation can be extracted from one or several views of 3D models.Feature extraction is an important step for 3D model retrieval.
Recently, the bag-of-words (BOW) feature was employed to represent 3D model.This method was first used in view-based 3D model retrieval by Takahiko et al. [7].In this method, it can generate a BOW feature.The scale-invariant feature transform (SIFT) features are extracted from different views of one 3D model.The distance between two BOW features is measured to match 3D models.The BOW feature is robust to variances in occlusions, viewpoints, illumination, scale, and backgrounds.
In this paper, the BOW feature is employed to represent 3D model.The distribute K-means and canopy algorithms are used to classify 3D models.In addition, the Map-Reduce process of Hadoop platform is presented to accelerate the feature clustering and the 3D model matching calculation.
The rest of the paper is organized as follows.Section 2 briefly reviews the related works.The proposed 3D model retrieval algorithm is presented in Section 3. Experimental results and analysis are illustrated in Section 4. The conclusions and discusses are given in Section 5.

Related Work
In this Section, we will briefly review the works about view-based 3D model retrieval.Recently, view-based methods attracted much more attention due to it independent of 3D models, and can be realized simply with multi-view representation of models [8].Typically, there are two key steps in this technology.Firstly, the feature vector should be extracted from one or more views of 3D models.Then the feature vectors should be employed to calculate the similarity of 3D models.Hang [9] et al. presented convolutional neural networks (CNN) architecture to recognize 3D models from multiple views.They found that a set of 2D views can be highly informative for 3D shape recognition.Sichen [10] et al. employed multigraph learning as a feature fusion method for view-based 3D model retrieval.The weights of each graph are optimized automatically.Gao [11] et al. presented a solution can jointly learn the visual features from multiple views of a 3D model and optimize the model retrieval task.For the machine leaning based method, the large number of labeled training samples is required.However, the training samples in 3D datasets are very small, so that it will be difficult to train a learning model.Tabia [12] et al. presented a new compact shape signature that is built from multiple vocabulary.They also proposed a mechanism for reducing the impact of vocabulary correlation and the signature size to benefit the view-based method.Cao [13] et al. extracted SIFT feature to represent the visual appearance of 2D view images for each 3D models, and then the high-level and discriminative representation from SIFT feature for individual 3D model is gotten via learning.Bai [14] et al. designed a real-time 3D shape search engine based on the projective images of 3D shapes.They employed GPU to accelerate the projection and view feature extraction.The first inverted file is utilized to speed up the multi-view matching procedure.
In summary, to get strong expression feature and develop some ways for accelerating the processing speed of view-based 3D model retrieval task is worth further study.

View Feature Extraction and Standardization
3D models are composed by some points, lines and planes in the three-dimensional space.So, the views of a 3D model can obtain from XOY, XOZ, YOZ planes projection in a space coordinate system.A projection sample of a 3D model is shown in Figure 1.
After getting the three projection views of a 3D model, the SIFT feature will be extracted from each view.Usually, gathering all the SIFT features together can represent an image.In other words, the feature vector which converted from the SIFT features can represent an image.So, we need to get feature vector form the SIFT features.However, the extracted SIFT feature points amount is sensitive to the gradient direction of image local pixels and the number of selected layers on the scale space.The number of SIFT feature points of an object will directly af-Journal of Computer and Communications fect the quality of model features and the final retrieval accuracy.In order to get strong expression feature from 3D model, the BOW paradigm is employed to standardize the SIFT feature points.
For the BOW based SIFT feature standardization paradigm, each 2D view of a model is treated as a text.The SIFT features which extracted from each view are treated as vocabulary.Then the synonymous features are clustered and the K clusters are obtained.So, the K-dimension feature vector of each view can be gotten by word frequency analysis.The feature standardization process is shown in Figure 2.

K-Means and Canopy Algorithm Based on Map-Reduce
From the above description, we can see that there has a clustering calculation in feature extraction stage.Obviously, clustering calculation is time-consuming especially in big data set.In the paper, the K-means algorithm which can run on the Hadoop platform is employed to get feature vector.Moreover, in order to improve the clustering accuracy, the Canopy algorithm is employed to confirm the K value and the K initial cluster centers.For the Canopy algorithm, the threshold T1 and T2 are determined by the maximum and minimum principle.
According to the maximum and minimum principle, when the first n Canopy center have been confirmed, the n + 1th Canopy center should choose the point which have the maximum distance in the set of minimum distance.This set is composed by the minimum distances between all candidate points and the first n center.

Retrieval Performance Analysis
Before analyzing the 3D model retrieval performance, we compared the clustering performance between the maximum and minimum principle based method and the common method firstly.In this experiment, the standardized feature of each 3D model was used to confirm the initial cluster center via the maximum and minimum principle based method and the common method respectively.Journal of Computer and Communications Then the K-means algorithm was employed to cluster the SIFT feature according to the initial cluster center.The data size of SIFT feature is 10 GB, and the K-means algorithm was executed 5 times on a 5-nodes Hadoop platform.The clustering results of each time about different method are compared and shown in Figure 5. Here, the clustering accuracy rate is calculated by this way: randomly selecting one cluster from the clustering results, then determining the label by analyzing the data in this cluster.At last, randomly selecting 50 items from the cluster and calculating the proportion of the correct items.Form Figure 5 we can know the clustering accuracy rate of the common method cannot exceed the maximum and minimum principle based method every time.The average clustering accuracy rate of the maximum and minimum principle based method can achieve 88%, higher than the 80.4% of the common method.It is means that the maximum and minimum principle based Canopy algorithm can avoid falling into the local optimum and get better initial cluster center.The better initial cluster center will benefit for the K-means algorithm to get more accurate clustering.Obviously, the accurate clustering will improve the 3D model retrieval performance.
Here we will compare the 3D model retrieval performance.We used the precession-recall graph to evaluate the retrieval performance.Precision is to measure the accuracy of the relevant models among the top n ranking results, while recall is the percentage of the relevant class that has been retrieved in the top n results [6].We get two classes SIFT feature of each 3D model.One is BOW processed normalization feature and the other is the normal SIFT feature.We use the K-means cluster algorithm and the maximum and minimum principle based Canopy algorithm to compute the feature vector.Based on these two classes feature, the retrieval performances are compared.To retrieval 3D models, a query model will be input into the retrieval system, and then the feature vector will be calculated.The Euclidean distance between the query model and the K cluster centers are calculated.According to the distance, the clustering whose center is nearest to the input model will be return as the retrieval results.All the above clustering algorithms and matching calculations are executed in a 5-nodes Hadoop platform.Note that, there are so many model classes in a model database, so it is hard to display all the retrieval results.Here, we give 5 model classes retrieval results in Figure 6, Figure 7 and Figure 8.
Figure 6 shows a part of retrieval results.In the experiment, the SIFT features are not BOW standardized.Figure 7 shows a part of retrieval results.In the experiment, the SIFT features are not BOW standardized.
In particular, it should be noted that for some model classes, such as dinosaur or bench, the retrieval performances are improved dramatically after BOW standardization.This is because, there are exist lots of low contrast and unstable edge SIFT feature points in 3D model.If the low contrast and unstable edge SIFT feature are filtered out without BOW standardization, the features of each 3D model will be processed into uniform feature.According to the uniform feature, we will get different retrieval performance in different model categories.It is not a stable feature for all models.It means that some categories may get a better retrieval results, but most categories cannot get some acceptable performance.The retrieval results almost depend on the quality of the view feature points.On the other hand, if lots of representative SIFT features are processed into a unified feature, it will lead to lose feature information and affect the features characterization ability.
The BOW feature standardization can get feature vector by clustering the SIFT feature points of each view, so that every SIFT feature points have contribution in generating feature vector.The stronger expression feature can help to get better retrieval performance.Figure 8 shows the average retrieval performances of different SIFT feature processing methods.We can know that the average retrieval of all model classes of BOW standardization based SIFT feature method is better than the other method.So, the view SIFT feature which standardized by BOW can improve the view-based 3D model retrieval performance.

Conclusion
In the paper, the view-based 3D model retrieval method was studied and a distributed cluster calculating algorithm was employed to get features and match indicated that the presented 3D model retrieval method had better performances in both the computational speed and the retrieval efficiency.

Figure 2 .
Figure 2. Illustration of the feature standardization process.

Figure 3 .
Figure 3. Illustration of the K-means and Canopy algorithm with map-reduce.

Figure 4 .
Figure 4. Time consuming of different compute modes.

Figure
Figure Average retrieval results of different method.

X
. H. Liu et al.DOI: 10.4236/jcc.2018.6500792 Journal of Computer and Communications models.Firstly, a 3D model was transformed into multi-view 2D images and the SIFT features was extracted from each 2D view.Secondly, the SIFT features were normalized by employing the bag-of-words model which have been presented in the field of natural language processing to reduce the noise points influence on 3D model features.Thirdly, a distributed K-means clustering algorithm was performed on a 5-nodes Hadoop platform to obtained 3D model feature, and a maximum-minimization principle based Canopy algorithm was also employed to optimize the K-means clustering algorithm.At last, the experimental results