Design of Hanman Entropy Network from Radial Basis Function Network

Different learning algorithms have been developed in the literature for training the radial basis function network (RBFN). In this paper, a new neural network named as Hanman Entropy Network (HEN) is developed from RBFN based on the Information set theory that deals with the representation of possibilistic uncertainty in the attribute/property values termed as information source values. The parameters of both HEN and RBFN are learned using a new learning algorithm called JAYA that solves the constrained and unconstrained optimization problems and is bereft of algorithm-specific parameters. The performance of HEN is shown to be superior to that of RBFN on four datasets. The advantage of HEN is that it can use both information source values and their membership values in several ways whereas RBFN uses only the membership function values.


Introduction
The artificial neural networks (ANNs) that include back propogation (BP) networks [1], radial basis function networks (RBFNs) [2], counter propagation networks [3] to mention a few show their power in data classification, pattern recognition and function approximation. In this paper, we are mainly concerned with incorporating a new learning agorithm, called JAYA into the architecture of RBFN to mitigate the drawbacks of its gradient descent learning.
A radial basis function network (RBFN) [4] [5] is a three-layer feed-forward neural network. Each hidden layer neuron evaluates its kernel function on the incoming input. The network output is simply a weighted sum of the values of the kernel functions in the hidden layer neurons. The value of a kernel function is highest when the input falls on its center and decreases monotonically as it moves away from the center. A Gaussian function is normally used as the kernel function. The training of an RBFN is done by finding the centers and the widths of the kernel functions and the weights connecting the hidden layer neurons to the output layer neurons.
Next, we will foray into the learning domain. Finding the global optimum of a function is the main task of many of the scientific applications. Gradient descent approach is widely used but it suffers from local minima. Another limitation is that it cannot be used in optimization problems that have non-differentiable objective functions. Many modern population based heuristic algorithms focus on finding a near optimum solution to overcome this requirement of differentiability associated with gradient descent learning. All the evolutionary and swarm intelligence based algorithms are probabilistic algorithms that require common controlling parameters like population size, number of generations, elite size, etc. Besides the common control parameters, different algorithms require their own algorithm-specific control parameters. A recent meta-heuristic learning method called Human Effort for Achieving Goals (HEFAG) by Jyotsana and Hanmandlu contains the comparison of several learning methods in [6]. A new learning algorithm called JAYA is developed in [7] to overcome the need for the algorithm-specific parameters but the need for the common control parameters still exists. This algorithm helps the initial solutions move towards the best solution by avoiding the worst solution.

Design of RBFN
For the detailed study on artificial neural networks (ANN) and fuzzy systems and their applications readers may refer to Jang et al. [8]. As Multilayer Perceptron (MLP) is a major leap in ANNs and RBFN has arisen out of simplying computational burden involved in MLP; hence it is widely used [9] for the traditional classification problems. A comparison between the traditional neural networks and RBFN is presented in [10].
RBFN deals with attrubute/feature values that are clustered. The attribue values in a cluster are fitted with the radial basis function which is another name for Gaussian function. RBFN fuzzifies the attribute values in a cluster into the membership function values. Each RBFN neuron stores a cluster centre or centroid, which is initially taken to be one of the samples from the training set. When we want to classify a new input, each neuron computes the Euclidean distance between the input and its centroid and computes the membership function using the standard deviation or width of the Gaussian function. The output of the RBFN is a weighted sum of the membership function values as shown in Figure 1. In this i µ denotes the i th membership function (MF) of a neuron. The MF vector is of size k and each value of this vector is multiplied with the output weight and then summed up to get the computed output.

The Derivation of the Model of RBFN
We will derive an input-output relation underlying the architecture of RBFN in Figure 1 in which prototype refers to the cluster centre. In this architecture there are two phases. The first phase is fuzzification and second phase is regression. For the fuzzification let us assume a cluster consisting of feature vectors of dimension k. Let i th feature X ui in this vector X u be fuzzified using the i th membership function i µ and u stands for u th input vector-output pair. Thus we have k fuzzy sets. Here we have as many neurons as the number of the input feature values. We don't require any equation for this phase. In the regression phase we employ Takagi-Sugeno-Kang fuzzy rule [8] on k-input fuzzy sets and one output as: If X u1 is A 1 and X u2 is A 2 and ••• X uk is A k then This equation is valid if there is one class. We now extend this equation to the multi-class case. We feed the input vector of size k denoted by X u and the neurons compute the membership function values P uj . Let the number of classses be c. The regression equation that computes the outputs Y l in multi-class is framed as: where we have replaced the weight vector {b i } by the weight matrix {w il } to account for multi-class. This is the governing equation for the architecture in

Procedure for Learning of Weights
Consider first the problem of Iris flower recognition. In this we have 4 features. That is each feature vector is 4 dimensional. Assume that these are clustered into say, 3. It means that each cluster contains some number of feature vectors. According to the fuzzy set theory, we can form 4 fuzzy sets in each cluster corresponding to four features. Now each fuzzy set is defined by its attribute values and their membership function values. As we are using clustering, we can obtain mean values as well as scaling factors that are functions of variances involved in MFs of the fuzzy sets resulting from clustering. Our attempt is to focus on learning of weights. Let us assume that we are feeding each feature vector of a cluster. Then we will have four neurons that convert four feature values of the feature vector into four membership function values. Then these membership values will be summed up. As we have assumed three clusters for one class (each flower type of Iris), this procedure is repeated on all feature vectors of the remaining two clusters. By this, we get three sums which will be multiplied with three weights (i.e. forming one weight vector) and the weighted sum is the computed output that represents one class.
The above procedure is repeated for the other three classes and the three weight vectors so obtained correspond to the remaining three flower types. There will also be three weighted sums called the computed outputs. In this paper, we are concerned with one cluster per class for simplicity.

Training of RBFN
The training process for RBFN consists of finding the three sets of parameters: the centrods of clusters, scaling parameters for each of the neurons of RBFN, and a set of the output weight vectors between the neurons and the output nodes of RBFN. The approaches for finding the centriods and their variances are discussed next.

Cluster Centrods
The possible approaches for the selection of clustercentrods are: Random selection of centroids, Clustering based approach and Orthogonal Least Squares (OLS). We have selected the K-means clustering for the computation of centroids of clusters or cluster centres from the training setr. Note that this clustering method partitionsn observations into K number of clusters such that each observation having its value closest to the cluster centre belongs to that cluster.

Scaling Parameters
Camped with the centoid of each cluster, the variance is computed as the average distance between all points in the cluster and the centrod.
Here, i C is the centroid of i th cluster, m is the number of training samples belonging to this cluster, X ui is the u th training sample in the i th cluster. Next we use 2 i σ to compute the scaling parameters denoted by

Output Weights
In the literature, there are two popular methods for the determination of the output weights: one learning method called gradient descent [11] and another computational method called pseudo inverse [12] [13]. As gradient descent learning has problems of slow convergence to local minima, we embark on a new learning algorithm called JAYA. Prior to using JAYA for learning the parameters of RBFN, we will discuss how the weights can be determined by Pseudo-inverse (PINV) method. Consider an input vector which is generally a feature vector of some dimension n. When all the feature vectors are clustered, we will have C number of clusters (Note that c denotes the number of classes). In some datasets such as Iris dataset, we can easily separate out all the feature vectors belonging to each class of one flower type. Thus the feature vectors belonging to a class form a cluster. Out of these feature vectors some are selected for training and the rest for testing.
be the set of feature vectors with each feature vector having the size of n, i.e.
the membership function of the j th basis radial function j µ with the u th feature vector. X uj is the j th component of the feature vector X u and Z ul is the l th target output. Note that this formulation is meant for one cluster per one class. After the fuzzification of X uj into P uj , we can form a matrix P by taking  as per Equation (3). The solution to the above equation lies in the assumption that be Y W Q Z = * = which leads to W Q Z + = where Q + denotes the pseudo inverse matrix of Q, defined as follows: where k I is the k-dimensional unity matrix and γ is a small positive constant.
The pseudo inverse calculating the weights at the output layer, all the parameters of RBFN with its 3-layered architecture in Figure 1 can be determined.

Learning of the Output Weights by JAYA
We will now discuss JAYA algorithm to be used for learning the parameters of RBFN.

Description of the JAYA Algorithm
It is a simple and powerful learning method for solving the constrained and unconstrained optimization problems. As mentioned above JAYA algorithm is the offshoot of Teacher-Learner Based Optimization (TLBO) algorithm proposed in [14] [15]. This needs only the common controlling parameters like population size and number of iterations. The guidelines for fixing these parameters can be seen in [15]. Here we have fixed the population size as 10 and the number of iterations as 3000. Let

( )
, , f W Q Z be the objective function to be minimized. Let the best candidate be the one associated with the least value of the function (i.e.

( )
, , best f W Q Z ) and the worst candidate is the one with the highest value of the function (i.e.

( )
, , worst f W Q Z ) in all the candidate solutions. We choose B to stand for the weights W when the cluster centres and scale parameters are found separately. In case we use to learn all the parameters, B includes the cluster centres, scaling parameters and the output weights, i.e.

[ ]
, , . At any run of the algorithm, assume that there are 'j' design variables and 'k' candidate solutions and 'i' iterations. So to fit B into the JAYA algorithm, it is denoted by which is the value of the j th variable of the k th candidate during the i th iteration.
during the iteration as, is accepted if its function value is better than that of , , j k i B . All the accepted function values at the end of iteration are retained and these values become the input to the next iteration. The flowchart of JAYA algorithm is shown in Figure 2. Unlike TLBO algorithm that has two phases (i.e. teacher and learner), JAYA algorithm has only one phase and it is comparatively simpler to apply. Rao et al. [16] have used TLBO algorithm in the machining processes. A tea-category identification (TCI) system is developed in [17] and it uses a combination of JAYA algorithm and fractional Fourier entropy on three images captured by a CCD camera. In two studies involving heat transfer and pressure drop, i.e. thermal resistance and pumping power, two objective functions are used to ascertain the performance of the micro-channel heat sink. Multi-objective optimization aspects of plasma arc machining (PAM), electro-discharge machining (EDM), and micro electro-discharge machining (μ-EDM) processes are investigated in [18]. These processes are optimized while solving the multi-objective optimization problems of machining processes using MO-JAYA algorithm.
There are three learning parameters, viz., the cluster centers C i , the scaling parameters ( i β ) and the output weights W between the hidden and output layers. The learning of these parameters is depicted in Figure 3. The first parameter is found using K-means clustering algorithm.
We make use of JAYA algorithm for learning the second parameter, i β . The weights are learned by optimizing the objective function using JAYA algorithm. The RBFN model so obtained can then be used for both classification and function approximation.

Design of Hanman Entropy Network
As the RBFN is not geared up to take care of the uncertainty in the input which may be an attribute or property value, we will make use of the Information set

Definition of Information Set and Generation of Features
Consider a fuzzy set constructed from the feature values {X uj } termed as the Information source values and their membership function values which we take as the Gaussian function values {P uj }. If the information source values do not fit the Gaussian function, we can choose any other mathematical function to describe their distribution. Thus each pair (X uj , P uj ) consisting of information source value and its membership value is an element of a fuzzy set. P uj gives the degree of association of X uj and the sum of P uj values doesn't provide the uncertainty associated with the fuzzy set. In the fuzzy domain, only P uj is used in all applications of fuzzy logic thus ignoring X uj altogether. This limitation is eliminated by applying the information theoretic entropy function called the Hanman-Anirban function [22] to the fuzzy set. This function combines the pair of values X uj and P uj into a product termed as the information value given by where a, b, c and d are the real-valued parameters. In this equation normalization by n is not needed as the number of attributes is very small (less than 10 in the databases used) but needed if the value of H exceeds more than 1.
With the choice of parameters: 0 a = , readers may refer to [19] and [21] respectively.

The Hanman Transform and its Link to Intuitionistic Set
This is a higher form of information set. To derive this transform, we have to consider the adaptive form of Hanman-Anirban entropy function in which the parameters in the exponential gain function are taken to be varaibles. Assuming   (20) This is a useful relation becuase it can be used to make RBFN adaptive by changing the membership function. At this juncture we can make an interesting connection between the modified membership function in Equation (19) and hesitancy function in the Intuitionistic fuzzy set [23]. The hesitancey function is defined as follows: where 1 ucj uj P P = − , the complemetary of P uj . The hesitancy function reflects the uncertainty in the modeling of P uj and P ucj . As Equation (19) bestows the way to evaluate P uj , we can use the new values of P uj and P ucj in determining the updated value of h uj as follows: . We can use this hesitancy function for the design of a new network in future.

Properties of Information Set
We will now present a few useful properties of Information set. 5) The fuzzy rules can be easily aggregated using the information set concept.

The Architecture and Model of Hanman Entropy Network
We will now discuss the architecture of HEN in Figure 4.
The architecture of HEN is the same as that of RBFN but for the function i ϕ , which assumes the specified form of an entropy function of the input. In HEN each n-input vector needs to be categorized into any one of "c" classes. The If X u1 is A 1 and X u2 is A 2 and ••• X un is A n then Journal of Modern Physics As mentioned above that at the fuzzification phase we replace b vj with P vj and set P v0 to 0 in (23) to get the neuron output as: On substituting H uj for the information values in (24) we get: Thus the fuzzification phase in HEN is different from that of RBFN. In HEN the input feature vectors of size n are clustered into k clusters but in RBFN there is a single cluster for each feature of a feature vector. Each neuron in RBFN of Figure 1 has only one radial basis function whereas each neuron in HEN has k radial basis functions in Figure 4. So in Equation (25) k sums 1 ϕ to k ϕ will be multiplied with the corresponding weights w 1l to w kl to yield the l th output Y l in the regression phase as follows: where subscripts MH, S, T and Sh indicate Mamta-Hanman, Sigmoid, Hanman transform and Shannon transform respectively. Note that RBFN network simply responds to the pattern in the input vector but the Hanman entropy network responds not only to the pattern but also to the uncertainty associated with it.

Results of Case Studies
The experimentation is conducted on four datasets: IRIS, Wine, and Waveform from UCI repository and Signature dataset [24] in two phases. In the first phase, we have entirely dealt with the performance analysis of RBFN and in the second phase only with that of Hanman Entropy Network (HEN). We have split up our computations into two cases. In Case-1 which is applicable only to RBFN, the learning/computation of the output weights is delinked from the computation of centriods and scaling parameters. Next, we have employed two learning methods such as Genetic algorithm (GA) and Gradient Descent (GD) and one computational method called Pseudo inverse (PINV) for the weights and K-means clustering for the centriods and scaling parameters.
There is another combination, JAYA + PINV + RBFN wherein JAYA is used for learning scaling parameters and PINV is used for computing the output weights. Of course, the centroids are found by K-means clustering.  Table 1. The last combination gives the best results. Table 2    WAVEFORM dataset: The dataset consists of 5000 samples with each sample comprising 22 attributes. Each class is generated from a combination of 2 out of 3 "base" waves and each instance is generated by adding noise (mean 0, variance 1) to every attribute. RBFN classifies 4350 instances correctly out of 5000 instances with an accuracy of 87% with GA. The weights of RBFN computed using GD and PINV yield the best accuracies of 87.4% and 87.5% respectively but with JAYA + RBFN combimation the accuracy comes down to 85.8% in Table 2.
As can be seen from the results, the efficiency of the classification task increases when we use JAYA algorithm even for learning the weights of the network in comparison to learning the scaling parameters of the membership function. When the concept of information set is incorporated into our approach, the output is computed as Weights*Information values, i.e. l W ϕ * . If the parameter vector, B also includes the centrods and the scaling parameters in addition to the weights then these parameters modify P u indirectly. Then we will write ( )    Table 3. The power of Hanman Entropy network can only be realized when the dataset is very large.
On conducting tests on three datasets as shown in Table 4, we find that the performance of JAYA + RBFN combination is somewhat inferior to that of JAYA + HEN combination on three datasets (Iris, Wine and Waveform) but the performance of Multi-layer perceptron (MLP) network is the worst. The use of high level information set features may help improve the performance of JAYA + HEN.

Conclusions
In this paper not only the performance of Radial Basis Function Network (RBFN) is improved by learning its parameters with a new evolutionary method called JAYA but also the design of Hanman Entropy network is given based on the Hanman-Anirban entropy function. Of all the combinations of RBFN with GA, GD, PNV, MLP and JAYA, JAYA + RBFN gives the best results. The proposed Hanman Entropy network (HEN) along with JAYA outperforms this combination on all the datasets considered in this paper.
As HEN is based on information set theory that caters to uncertainty representation; there is so much flexibility in the choice of information forms. This advantage is missing in RBFN where only the membership function values rule the roost. The only silver lining with RBFN is that we can use Type-2 fuzzy sets where the membership function values can be varied by changing the variance parameter of Gaussian membership function.
The present work opens up different directions to change the information at the hidden neurons of HEN.