Data Prediction Model Using Combination of Clustering and Fuzzy Technique

The analysis of environmental daily evaporation plays a vital role in the field of agriculture. It is very essential to know the daily evaporation rate of a particular area for proper cultivation. So, we need a standard prediction model which can predict the daily evaporation. In this paper, we use subtractive clustering and Fuzzy logic to predict daily evaporation of a particular area. The input data used in the paper are: maximum soil temperature, average soil temperature, average air temperature, minimum relative humidity, average relative humidity and total wind, which are related to the daily evaporation of a particular area as the output. The accuracy of output of the paper is compared with the previous model of Artificial Neural Network (ANN) and we get better result towards the target value. The finding of the paper is applicable in environmental science, geological science and agriculture.


Introduction
Future data prediction (short or long term forecasting) is essential in any engineering design, which is done using different machine learning algorithms. This section deals with some state-of-art works in data prediction. Application of Fuzzy system and Neural Network (NN) is found in GDP forecasting of IRAN in [1].
The prediction accuracy of Neural-Fuzzy is found 5.92% and that of Fuzzy-logic is 6.46%. In [2], T-S fuzzy neural network prediction model is used in prediction of the photovoltaic power generation in short term basis. The prediction results sis function (RBF) neural network algorithm is used for forecasting road speed.
The mean absolute percentage error (MAPE) of the proposed method was found minimum comparing with other three methods: time series method, BP neural network and RBF neural network. Another application of Fuzzy Neural Network using Improved Decision Tree is found in [4], where short term electrical load forecasting is done. The predicted load and predictive error of the proposed model is found better compared to few previous models. In [5] the author used a combination of Artificial Neural Network and Fuzzy logic to analyze the strength of cement. The authors compared the results with a pure ANN model. Upon doing so they found their own model to be more user-friendly and easier to use, rather than a pure ANN. Data prediction models are also found in biomedical applications, for example in [6], ANFIS (adaptive neuro-fuzzy inference system) is applied in prediction of epilepsy, analyzing electroencephalogram (EEG) signals.
The paper reveals MSE of both training data and test data for 9 patients. In [7] combination of Fuzzy C clustering and BP neural network is used in short-term electricity consumption forecasting. Authors consider the following input parameters: maximum temperature, minimum temperature, maximum humidity, minimum humidity, wind power and air quality with different weights. The average error of BP neural network forecasting method is found 15.44% and that of proposed algorithm is 6.94%. Sometimes financial data are also predicted by ANN, for example in [8] authors predict the revenue based on previous data.
The MSE is varied taking number of hidden neurons and best performance epoch as parameters. Another example of financial data analysis is found in [9], where authors carry out stock market prediction using supervised machine learning.
In this paper we use subtractive clustering and Fuzzy Inference System (FIS) to predict "environmental daily evaporation" which is useful for cultivation of crops anywhere in the world.
The remaining portion of the paper is arranged as follows: Section 2 provides the basic theory of fuzzy system and clustering techniques, Section 3 presents the system model used to derive the target output, Section 4 provides results based on analysis of Section 3 and finally, Section 5 concludes the entire analysis.

Basic Theory of Fuzzy System and Clustering Techniques
To solve Engineering problem we use Fuzzy Inference System (FIS) consists of three parts: Fuzzification, Inference (rule based) and De-fuzzification like Figure   1. Fuzzification is the process of converting a crisp input value or conventional numerical data to a Fuzzy value based on our knowledge or using grade of membership function (MF). De-fuzzification is the reversed way i.e. Fuzzy to crisp conversion. Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic.  Next part of the section deals with basic theory of subtractive clustering algorithm. Basic task of data clustering is to provide several groups or clusters from the whole data set, where data or objects under a cluster are more alike compared to data or object of other clusters. Different types of clustering algorithms: Fuzzy C-mean clustering, K-mean clustering, subtractive clustering, Gaussian (EM) clustering algorithm, etc. are used in the field of pattern recognition, image or signal analysis, information retrieval, bioinformatics, machine learning, etc.
In this paper we use a basic subtractive clustering algorithm since it can be integrated with FIS quite easily. Subtractive clustering algorithm starts by assuming each data point in the dataset to be a potential centroid. Let us consider the data set, We have to select the data point i x * whose mountain function provides the maximum value. The The magnitude of mountain function of a potential centroid is reduced heavily if it is located near the first centroid. The expression of mountain function is modified to eliminate the impact of first centroid like [12] [13], Here, cen x is the first centroid, D cen is the density value of the first centroid, is the mountain function of previous step and typically r b = 1.5r a . The increment of radius r b = 1.5r a causes the density value (magnitude of mountain) of data points near the first centroid to be lower compared to data points that are farther away. This promotes the creation of clusters whose centroids are farther away from each other. Similarly, the data point with greatest density value achieved by (2) will be selected as the second centroid. Above process is continued until getting density value greater than a threshold and the remaining data points are considered as ordinary points on the scatterplot.

System Model
In this paper, our main objective is to combine two techniques: subtractive clustering and Fuzzy system to predict better output compared to individual one as shown in Figure 2. We consider input data of six environmental variables: Maximum soil temperature, Average soil temperature, Average air temperature, Minimum relative humidity, Average relative humidity and Total wind. The prediction output of the system is daily evaporation, which is another environmental variable. The training data used in the system of the paper is shown in appendix ( Table 1).
The dataset is first fed to a subtractive clustering system to generate natural clusters among the data. The clustered data is then imported into the Fuzzy Inference System (FIS), where the model is trained. After the model is trained, it accepts input variable values from the user and can successfully predict the output with considerable accuracy. The FIS model is used to relate the six input variable with the out shown in Figure 3(a), where each input variable possesses seven membership functions (each cluster of subtractive clustering correspond to its MF) shown in Figure 3(b). In this paper we use Gaussian MF and in practical FIS, the MFs are not uniform like Figure 3

Results
The profile of input and output variables against the index is shown in Figure   5(a) and Figure 5(b) respectively. The graphs reveal that input and output are uncorrelated.
First of all we modeled the relationship between the input and the output variable by clustering the data, where the cluster centers are used as a basis to define a Fuzzy Inference System (FIS). Here clustering is done to concise the representation of relationships embedded in the data. We apply subtractive clustering, which will make the appropriate number of clusters and the cluster centers taking radius of 0.5 shown in Figure 6. We got 7 cluster center for each in-   The rule viewer of the FIS is shown in Figure 8, where each input has seven MFs, because of seven clusters of scatter-plot. Varying the magnitude of input data, the output of Figure 8 changes accordingly, where some numerical data relating input and output is shown in Table 2.
The output result of the paper is compared with ANN prediction, but the result of the paper provides more closed value to the target value. The comparative results are shown graphically in Figure 9 and the percentage of error between "target and our prediction" and "target and ANN prediction" is shown in Figure   10. Both Figure 9 and Figure 10 prove the superiority of our prediction model compared to ANN.

Conclusion
In this paper, we take practical "evaporation of soil" as the target output relating with six environmental parameters. Our result shows better accuracy compared to previous work of ANN. In future we will combine FIS with ANN to acquire more accuracy compared to ANN. We still have the scope of using SVM, the K-means clustering algorithm, Fuzzy c-mean clustering, Naïve Bayes classifier, CART, etc. with FIS for comparison in context of accuracy and process time.