^{1}

^{2}

^{1}

^{1}

Data clustering plays a vital role in object identification. In real life we mainly use the concept in biometric identification and object detection. In this paper we use Fuzzy Weighted Rules, Fuzzy Inference System (FIS), Fuzzy C-Mean clustering (FCM), Support Vector Machine (SVM) and Artificial Neural Network (ANN) to distinguish three types of Iris data called Iris-Setosa, Iris-Versicolor and Iris-Virginica. Each class in the data table is identified by four-dimensional vector, where vectors are used as the input variable called: Sepal Length (SL), Sepal Width (SW), Petal Length (PL) and Petal Width (PW). The combination of five machine learning methods provides above 98% accuracy of class identification.

In this paper five widely used methods: Fuzzy weighted rule, FIS, FCM, SVM and ANN are integrated in classification of Iris data. Several works related to the paper are mentioned in this section. In [

In [

In this paper we combined all the five algorithms to classify Iris data, although the concept of the paper is applicable in any type of data or feature vector-based image classification. The main objective of the paper is to get high accuracy of data classification avoiding deep learning technique so that process time will remain low. Actually, inclusion of Fuzzy weighted rule plays a vital role in data classification. Most of the previous works did not include the Fuzzy weighted rule hence they have to include deep learning to acquire high accuracy of classification, which needs huge process time. The combination of five methods of the paper like [

The rest of the paper is organized as: Section 2 provides theoretical analysis of five machine learning algorithms used in this paper for data classification, Section 3 provides results based on analysis of Section 2 and Section 4 concludes entire analysis.

Fuzzy Inference System (FIS) consists of three building blocks: Fuzzification, Inference and De-fuzzification. The numerical data is converted to Fuzzy symbols using membership functions (MFs) consisting of several variables, where each variable has its range of numerical value. The above conversion technique is called Fuzzification. The Inference block deals with some rules using if-then form to relate input and output. Finally output symbols are converted to numerical value using De-fuzzification technique on the output MFs.

The detail analysis of Fuzzy weighted rule is shown in [

In this subsection few numerical examples are shown according to the steps Fuzzu weighted rule. First of all, we take few data of Iris under three categories called: Iris-Setosa, Iris-Versicolor and Iris-Virginica shown in

For each input SL, SW, PL or PW we consider 7 trapezoidal membership functions named: HN, MN, SN, Z, SP, MP and HP as shown in Figures 1(a)-(d) for four input variables. The MFs of three output classes is shown in

The main objective of FCM is to minimize the objective function,

J m = ∑ j = 1 c ∑ x ( i ) ∈ c j u i j m ( | x ( i ) − c j | ) 2 (3)

where

m is a real number greater than 1 called fuzzifier

u_{ij} is the degree to which an x(i) belongs to the cluster j with center c_{j}

x(i) is the ith data point

c is the number of clusters

The steps of Fuzzy c mean clustering algorithm is given below like [

The SVM is a supervised learning algorithm used for data classification,

SL | SW | PL | PW | Out | |
---|---|---|---|---|---|

Iris-Setosa | 4.6 | 3.4 | 1.4 | 0.3 | 1 |

5.7 | 3.8 | 1.7 | 0.3 | 1 | |

5.2 | 3.4 | 1.4 | 0.2 | 1 | |

4.5 | 2.3 | 1.3 | 0.3 | 1 | |

4.4 | 3.2 | 1.3 | 0.2 | 1 | |

Iris-Virginica | 6.1 | 3 | 4.9 | 1.8 | 3 |

6.1 | 2.6 | 5.6 | 1.4 | 3 | |

6.9 | 3.1 | 5.4 | 2.1 | 3 | |

6.7 | 3.1 | 5.6 | 2.4 | 3 | |

6.2 | 3.4 | 5.4 | 2.3 | 3 | |

Iris-Versicolor | 6.6 | 2.9 | 4.6 | 1.3 | 2 |

5 | 2 | 3.5 | 1 | 2 | |

6.2 | 2.2 | 4.5 | 1.5 | 2 | |

5.9 | 3.2 | 4.8 | 1.8 | 2 | |

6 | 2.9 | 4.5 | 1.5 | 2 |

decision-making, pattern recognition, forecasting of data, disease diagnostic etc. The SVM algorithm classifies objects taking decision boundary called hyperplane where the optimum hyperplane separates the points corresponding to objects with widest margin as discussed in [

f ( x ) = b + w T x ; (6)

where w is known as the weight vector and b as the bias.

The SVM determines the constants: w T , b , τ such that w T x + b ≥ τ for one group of points, w T x + b ≤ τ for another group of points. The SVM uses Kernel function to provide the best trajectory of decision boundary.

In this paper we used feed-forward ANN, where signal only travels in one direction i.e. from input to output. Such neural network is called multi-layer perceptron and used for pattern recognition. We used it for the case of 10 and 20 hidden layers to observe relative performance. We also used ANN under backpropagation algorithm, where signal flows in both directions. The concept of both of above ANN is available in [

The five machine learning methods will be combined using Shannon entropy-based algorithm.

This section provides results based on theoretical analysis of previous section. First of all, we apply FIS on the Iris data. The FIS used in this paper is shown in

Next we apply Fuzzy c-mean clustering on the entire dataset taking two variables at a time. The scatterplot of three output data are shown in

Finally, scatterplot of data points in four combinations of four input variables are shown in Figures 7(a)-(d) to get the idea of best separation case. Here PW vs. PL shows the best separation as found in

Next, Irish data classification is done using feedforward ANN. The performance of the network, error histogram and confusion matrix are shown in

Except Weighted Fuzzy, no individual method provides high accuracy of recognition visualized from

Experiments | Weighted Fuzzy | FIS | Fuzzy C-mean | SVM | Feedforward ANN | Backpropagation ANN | Combined |
---|---|---|---|---|---|---|---|

1 | 0.931 | 0.881 | 0.892 | 0.873 | 0.835 | 0.878 | 0.974 |

2 | 0.904 | 0.855 | 0.879 | 0.907 | 0.862 | 0.874 | 0.982 |

3 | 0.929 | 0.867 | 0.862 | 0.893 | 0.857 | 0.895 | 0.988 |

4 | 0.913 | 0.853 | 0.864 | 0.880 | 0.866 | 0.903 | 0.976 |

5 | 0.932 | 0.871 | 0.882 | 0.921 | 0.841 | 0.917 | 0.978 |

the expense of process time, but process time is much smaller than deep leaning technique. We combined five methods using entropy based combining algorithm of [

In this paper Iris data classification is done using FIS, Weighted Fuzzy rule, Fuzzy c-mean clustering, SVM and ANN. The combination of five techniques gives minimum value of accuracy of 97.4%, which is found better than previous individual method. The concept of the research work is also applicable for any type of tabular data. The high accuracy of classification of the paper is found because of inclusion of weighted fuzzy rule. The process time of weighted fuzzy rule is larger than the other five techniques used in the paper but considerably lower than deep learning like Convolutional Neural Network (CNN). The proposed technique of the paper provides high accuracy with minimum possible process time. Still we have scope to include other machine learning techniques like: Principal Component Analysis, Linear Discriminant Analysis (LDA), Bayesian Classification, Decision tree etc.

The authors declare no conflicts of interest regarding the publication of this paper.

Rahman, Md.H., Akhter, J., Rahaman, A.S.Md.M. and Islam, Md.I. (2021) Data Classification Using Combination of Five Machine Learning Techniques. Journal of Computer and Communications, 9, 48-62. https://doi.org/10.4236/jcc.2021.912004