Data Classification Using Combination of Five Machine Learning Techniques ()
1. Introduction
In this paper five widely used methods: Fuzzy weighted rule, FIS, FCM, SVM and ANN are integrated in classification of Iris data. Several works related to the paper are mentioned in this section. In [1] authors use Adaptive Neuro-Fuzzy Inference System (ANFIS) and the Fuzzy Inference System (FIS) for professional blogger classification, where FIS provides better results compared to Classification Based on Associations (CBA). The combination of Artificial Neural Network (ANN) and ANFIS gives better classification, whereas the proposed ANFIS of the paper shows the best result which is 93%. The concept of FIS in data classification is also found in [2], where fault of electrical transmission line is detected and classified properly.
In [3], fuzzy weighted rules are used to classify Iris data using seven membership function (MFs). The average classification rate is found 96.48%, 96.06% and 96.7% for 7, 9 and 11 labels of MFs. The main drawback of the paper is that, it only deals with single method of classification; therefore we have the scope of inclusion of other data segregation algorithms. The fuzzy rule-based classification is found in [4] for classification of coronary artery disease data, where trapezoidal membership functions are used for input variables. The classification rate varies with different weighting rules, the maximum value is found 92.8% and that of minimum value is 71.8%. In this paper, we applied fuzzy c-mean clustering in Iris data classification; the similar concept is available in MR brain image segmentation in [5]. Here the entire algorithm of C-mean clustering is shown and the performance of image classification is compared with seven different methods and fuzzy c-mean clustering provides moderate result. Application of FCM in image classification is found in [6], where FCN is combined with Convolution Neural Network (CNN) to recognize tumors in the brain. The accuracy of detection is claimed by the auditors is 91%. Application of FCM is also found in image classification in [7] [8]. The SVM in data classification is used in [9], where text based automatic task classification is done. The authors claim the accuracy of classification in the range of 82% to 99%. Similar concept is found in [10] for breast cancer diagnosis, where three different types of kernels are used and accuracy is found above 90% for all cases.
In this paper we combined all the five algorithms to classify Iris data, although the concept of the paper is applicable in any type of data or feature vector-based image classification. The main objective of the paper is to get high accuracy of data classification avoiding deep learning technique so that process time will remain low. Actually, inclusion of Fuzzy weighted rule plays a vital role in data classification. Most of the previous works did not include the Fuzzy weighted rule hence they have to include deep learning to acquire high accuracy of classification, which needs huge process time. The combination of five methods of the paper like [11] is found more robust compared to previous works. We compare the result of the paper (using same data set) with two previous works and found better result, which is shown in result section.
The rest of the paper is organized as: Section 2 provides theoretical analysis of five machine learning algorithms used in this paper for data classification, Section 3 provides results based on analysis of Section 2 and Section 4 concludes entire analysis.
2. Theory of Data Classification
2.1. Fuzzy Inference System (FIS)
Fuzzy Inference System (FIS) consists of three building blocks: Fuzzification, Inference and De-fuzzification. The numerical data is converted to Fuzzy symbols using membership functions (MFs) consisting of several variables, where each variable has its range of numerical value. The above conversion technique is called Fuzzification. The Inference block deals with some rules using if-then form to relate input and output. Finally output symbols are converted to numerical value using De-fuzzification technique on the output MFs.
2.2. Fuzzy Weighted Rule
The detail analysis of Fuzzy weighted rule is shown in [3] with numerical example. In this paper we show the steps of the algorithm in a different way like below:
In this subsection few numerical examples are shown according to the steps Fuzzu weighted rule. First of all, we take few data of Iris under three categories called: Iris-Setosa, Iris-Versicolor and Iris-Virginica shown in Table 1. For each category four types of inputs (SL, SW, PL and PW) and corresponding output are taken as the initial data shown in Table 1. For better understanding of reader, we chose the same initial data of [3] and we elaborate the initial data processing steps more explicitly compared to previous paper.
For each input SL, SW, PL or PW we consider 7 trapezoidal membership functions named: HN, MN, SN, Z, SP, MP and HP as shown in Figures 1(a)-(d) for four input variables. The MFs of three output classes is shown in Figure 2.
2.3. Fuzzy c-Means Clustering
The main objective of FCM is to minimize the objective function,
(3)
where
m is a real number greater than 1 called fuzzifier
uij is the degree to which an x(i) belongs to the cluster j with center cj
x(i) is the ith data point
c is the number of clusters
The steps of Fuzzy c mean clustering algorithm is given below like [12] [13].
2.4. Support Vector Machine
The SVM is a supervised learning algorithm used for data classification,
Figure 1. MFs of four input variables. (a) MFs of input SL; (b) MFs of input PW; (c) MFs of input PL; (d) MFs of input SW.
Table 1. Three types of Iris data [3].
decision-making, pattern recognition, forecasting of data, disease diagnostic etc. The SVM algorithm classifies objects taking decision boundary called hyperplane where the optimum hyperplane separates the points corresponding to objects with widest margin as discussed in [14] [15]. The generalized equation of a hyperplane like,
; (6)
where
is known as the weight vector and b as the bias.
The SVM determines the constants:
such that
for one group of points,
for another group of points. The SVM uses Kernel function to provide the best trajectory of decision boundary.
2.5. Artificial Neural Network
In this paper we used feed-forward ANN, where signal only travels in one direction i.e. from input to output. Such neural network is called multi-layer perceptron and used for pattern recognition. We used it for the case of 10 and 20 hidden layers to observe relative performance. We also used ANN under backpropagation algorithm, where signal flows in both directions. The concept of both of above ANN is available in [16] [17] and here we avoid the theoretical analysis of such ANNs.
The five machine learning methods will be combined using Shannon entropy-based algorithm.
3. Result and Discussion
This section provides results based on theoretical analysis of previous section. First of all, we apply FIS on the Iris data. The FIS used in this paper is shown in Figure 3, where 7 MFs are used against each of the four input variables. We apply 69 Fuzzy rules and few of them are shown in Figure 4. The surface plot variables: PS, PL, PW and SW of the FIS is shown in Figures 5(a)-(f). Here the surface level 1.5, 2 and 2.5 provides the results of Iris-Set, Iris-Ver and Iris-Vir respectively. Next, we apply Fuzzy weighted rule on 150 data of Iris. The detail of the Fuzzy weighted rule is shown in Section 2.1. We run the algorithm 5 times taking 100 data each time, corresponding accuracy of correct recognition is given in Table 2 at the end of this section.
Figure 3. Fuzzy system of data classification.
Figure 5. Surface plot of the FIS. (a) Surface plot of PW vs. SL; (b) Surface plot of PL vs. SL; (c) Surface plot of SW vs. SL; (d) Surface plot of PL vs. PW; (e) Surface plot of SW vs. PW; (f) Surface plot of SW vs. PL.
Next we apply Fuzzy c-mean clustering on the entire dataset taking two variables at a time. The scatterplot of three output data are shown in Figure 6. Few data points seem to cross its region i.e. produce some recognition error. Here 50 data for Iris-Set, 50 data for Iris-Ver and 50 data for Iris-Vir are taken.
Finally, scatterplot of data points in four combinations of four input variables are shown in Figures 7(a)-(d) to get the idea of best separation case. Here PW vs. PL shows the best separation as found in Figure 6(b). The regional separation of data points using SVM is shown in Figures 8(a)-(d), where Figure 8(b) shows the best regional separation. In future we will apply multiple linear regression (MLR) on four-dimensional input data to convert them into two-dimensional data, then apply SVM to observe any improvement compared to four cases of Figure 8.
Next, Irish data classification is done using feedforward ANN. The performance of the network, error histogram and confusion matrix are shown in Figure 9-11 for the case of 10 and 20 hidden layers. Similar results are shown in Figure 12 and Figure 13 for backpropagation ANN for 8 and 10 hidden layers. The performance is found better with increment of hidden layer at the expense of process time.
Except Weighted Fuzzy, no individual method provides high accuracy of recognition visualized from Table 2. The Weighted Fuzzy provides high accuracy at
Table 2. Comparison of data separation algorithms.
Figure 6. Scatterplot of data under fuzzy c-mean clustering.
Figure 8. Three regions of output under SVM.
Figure 9. Performance of the feedforward ANN. (a) 10 hidden neuron; (b) 20 hidden neuron.
Figure 10. Error histogram. (a) 10 hidden neurons; (b) 20 hidden neurons.
Figure 11. Confusion matrix. (a) 10 hidden neuron; (b) 20 hidden neuron.
Figure 12. Performance of the backpropagation ANN. (a) 8 hidden neuron; (b) 10 hidden neuron.
the expense of process time, but process time is much smaller than deep leaning technique. We combined five methods using entropy based combining algorithm of [11], which provides accuracy of recognition above 98% for all the five experiments. Finally, we compared our results with NN + SVM of [18] and FCM + SVM of [19], using the same data, where the result of first case is found 0.9417 and that of second case is 0.9445. Our model is the combination of five MLs, which is more robust than previous works in data classifications.
Figure 13. Error histogram of backpropagation ANN. (a) 8 hidden neuron; (b) 10 hidden neuron.
4. Conclusion
In this paper Iris data classification is done using FIS, Weighted Fuzzy rule, Fuzzy c-mean clustering, SVM and ANN. The combination of five techniques gives minimum value of accuracy of 97.4%, which is found better than previous individual method. The concept of the research work is also applicable for any type of tabular data. The high accuracy of classification of the paper is found because of inclusion of weighted fuzzy rule. The process time of weighted fuzzy rule is larger than the other five techniques used in the paper but considerably lower than deep learning like Convolutional Neural Network (CNN). The proposed technique of the paper provides high accuracy with minimum possible process time. Still we have scope to include other machine learning techniques like: Principal Component Analysis, Linear Discriminant Analysis (LDA), Bayesian Classification, Decision tree etc.