Ensemble Neural Network in Classifying Handwritten Arabic Numerals

A method has been proposed to classify handwritten Arabic numerals in its compressed form using partitioning approach, Leader algorithm and Neural network. Handwritten numerals are represented in a matrix form. Compressing the matrix representation by merging adjacent pair of rows using logical OR operation reduces its size in half. Considering each row as a partitioned portion, clusters are formed for same partition of same digit separately. Leaders of clusters of partitions are used to recognize the patterns by Divide and Conquer approach using proposed ensemble neural network. Experimental results show that the proposed method recognize the patterns accurately.


Introduction
Handwritten digit recognition has received remarkable attention in the field of character recognition.To meet industry demands, handwritten digit recognition systems must have good accuracy, acceptable classification times, and robustness to variations in handwriting style.Currently several approaches are able to reach competitive performance in terms of accuracy, including the ones based on multilayer neural networks [1] [2] support vector machines [3] and nearest neighbor method [4].Neural networks require huge amount of training data and time to term effective models, but their feedforward nature makes them very efficient during runtime.
Clustering is a well known task in data mining and pattern recognition that organize a set of objects into groups in such a way that similar objects belong to the same cluster and dissimilar objects belong to different clusters [5].Mitra et al., [6] have provided a survey of available literature on data mining using soft computing.Neural networks are nonparametric, robust and exhibit good learning and generalization capabilities in data-rich environments.Run-length encoding (RLE) is an way of representing a binary image using a run, which is a sequence of "1" pixels.Ravindra Babu et al., [7] have represented the given binary data as Run-Length Encoded data that would lead to compact or compressed representation of data.They have also proposed an algorithm to directly compute the Manhattan distance between two such binary encoded patterns.They have shown that classification of data in such compressed form improves the computation time.Akimov et al., [8] have considered lossless compression of digital contours in map images.The problem is attacked by the use of context based statistical modeling and entropy coding of the chain codes.
Classification is an important problem in the emerging field of data mining.Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory resident data, thus limiting their suitability for data mining large data sets.Handwritten digit recognition has received remarkable attention in the field of character recognition.To meet industry demands, handwritten digit recognition systems must have good accuracy, acceptable classification times, and robustness to variations in handwriting style.Monu Agrawal et al., [4] have proposed a strategy to reduce the time and memory requirements in handwritten recognition by applying prototyping as an intermediate step in the synthetic pattern generation technique.Vijayakumar et al., [9] have proposed a novel algorithm for recognition of handwritten digits by classifying digits into two groups.One consists of blobs with/without stems and the other digits with stems only.The blobs are identified based on morphological region filling method.This eliminates the problem of finding the size of blobs and their structuring elements.The digit with blobs and stems are identified by a new concept called connected component.This eliminates the complex process of recognition of horizontal and vertical lines and concavity.Chen et al., [10] have described the effects of a large amount of artificial patterns to train an online Japanese handwritten character recognizer.They have constructed distortion models to generate large amount of artificial patterns and applied to train a character recognizer.
Park and Lee [11] have presented an efficient scheme for off-line recognition of large-set handwritten characters in the framework of stochastic models and the First Order Hidden Markov Models.Dhandra et al., [12] have proposed script independent automatic numeral recognition system that extract the local and global structural features like directional density estimation, water reservoirs, maximum profile distances and fillhole density.A probabilistic neural network classifier is used in the recognition system.Sarangi, et al., [13] have addressed the performance of Hopfield neural network model in recognizing the handwritten Oriya digit.Meier et al., [3] have proposed new method to train the member of a committee of a hidden layer neural nets.Instead of training various nets of subsets of the training data they have processed the training data for each individual model such that the corresponding errors are de-correlated.Having trained "n" networks, three different methods namely Majority Voting Committee, Average Committee and Median committee have been used to build the corresponding committee of networks.Noor et al., [14] have proposed a system to recognize Arabic (Indian) numerals using Fourier descriptors as the main classifier feature set and a simple structure based classifier is added as a supplementary classifier to improve the recognition accuracy.Patel et al., [15] have tackled the problem of handwritten character recognition with multi resolution technique using Discrete Wavelet Transform and Euclidean distance metric.Rajashekararadhya et al., [16]) have extracted features based on zones of images and recognize mixed numerals of Kannada, Telugu, Tamil and Malayalam when it is existing mixed in the documents using support vector machines.Asthana [17] have resolved the problem in identifying the PIN with Multilanguage script namely Devnagri, English, Urdu, Tamil and Telugu by neural network.A.A. Fatlawi et al., [18] have compared three different neural classifiers for graffiti recognition.
In this paper, handwritten Arabic numerals are recognized by partition, compression, cluster and ensemble neural network methods.The novelty of this method is recognizing cluster representatives by the proposed ensemble network.The rest of the paper is organized as follows.The ensemble neural network model is introduced in Section 2. In Section 3, a training method is given for neural network.An alternative method to recognize handwritten Arabic numerals is proposed in Section 4. In Section 5, the training procedure of the proposed method is given.Experimental result of the proposed work is in Section 6.

Ensemble Neural Network
The proposed network has group of Single hidden layer feedforward neural networks connected with a layer called classifier layer.The single hidden layer network has input, hidden and output layer as shown in Figure 1.The input layer is linear but hidden and output layers are non-linear.The activation function used in the hidden

Neural Network Training
Decide the topology of the single hidden layer feedforward network for training the partitioned patterns for classification.For q partitions of the compressed matrix select q number of networks.Input the patterns through the corresponding neural network one by one.Find the output of the hidden and output layer neurons using ( 2) and (3).Find the error of the network using (4).Update the network weights using ( 5), ( 6) and (7).Again input the patterns through the corresponding network of the partitioned pattern.Repeat the above process until network gives predefined accuracy.
where m represents number of neurons in the layer 1 l − .The error of the network 1 p E is calculated as follows: ( ) ( ) where u net =

Partitioning
Handwritten digits are represented in matrix form.Every pair of row of a digit is considered without overlapping.Matrix is compressed to its half size by applying logical OR operation on bits that occur in the same columns of selected pair of rows.Each row of a compressed matrix is considered as a new pattern.A pattern of each digit is partitioned into many individual patterns based on rows of a compressed matrix.

Clustering with Leader
Bits of each row are member of the group.Bits in each row of a digit are clustered based on distance measure.By considering all patterns of a particular digit clusters are formed for each partition of those digits separately.
Similarly for every digit clusters are formed.Clustering technique with leader concept is used to group meaningful patterns so as to improve classification accuracy with minimum input-output operations.In this method [19], first pattern is treated as a cluster leader.Remaining patterns are compared with the leaders of existing clusters and is assigned to member of a cluster when leader is with minimum distance.If the distance between pattern and the leader is greater than predefined distance then the pattern is a leader of a new cluster.Distance between pattern is computed by the Manhattan formula as follows:

Classification
All leaders are considered for training the feedforward neural network.Standard backpropagation algorithm is used for training.Separate neural network is considered for training the cluster leaders of each partition.After training each neural network individually, the outputs are sent to the classifier layer of the proposed ensemble network.Based on the maximum value received by the neuron of the classifier layer, one of the neuron j D is winner and j is the digit of the input pattern.

Algorithm
Step 1. Convert 192 bits of numerals into 16 × 12 size matrix and treat 193 rd bit as target value for the 16 rows of a digit.
Step 2. Repeat step 1 for all training patterns of the problem.
Step 3. Apply logical OR operation on bits of each column of adjacent in two rows without row overlapping.Now the size 16 × 12 becomes 8 × 12.
Step 4. Treat every resultant pattern as with 8 partitions.
Step 5. Form clusters for every partition of each digit separately.
Step 6. Train each neural network individually using standard in backpropagation algorithm by considering every leader as a pattern.
Step 7. Represent the pattern to be classified in 16 × 12 size matrix.
Step 8. Compress adjacent two rows using logical OR operation.
Step 9. Input first row into first neural network of ensemble network in and second row into second neural network of ensemble network and in similarly input other rows of the compressed matrix.
Step 10.Find the neuron in the classifier layer which is in "ON" state.
Step 11.Conclude the pattern belonging to the class as position of the neuron in which is in "ON" state.

Experimental Results
The proposed method is applied on OCR Handwritten digit data [20]

Figure 1 .
Figure 1.Neural network for partition of a digit.and output layers are sigmoidal.Number of neural networks of this type considered in the ensemble network is equivalent to number of partitions of the matrix.Number of neurons in each output layer is 10, which represent the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.All these single hidden layer neural networks are formed as an ensemble network by adding one more layer called classifier layer.Output layer of allthe feedforward neural networks are connected with classifier layer which has 10 neurons , 0,1, , 9 j D j =  .First output neuron in all networks are connected with first neuron of the classifier layer.Second output neuron in all networks are connected with second neuron of the classifier layer.Similarly remaining output neurons in all networks are connected with corresponding neuron of the classifier layer.All connection lines between output layers of the networks and classifier layer are with weight 1.Net value of the neurons of classifier layer are calculated using (2).The output of the classifier layer is digit m if th m neuron of the layer has received maximum net value, given by (1).The architecture of the ensemble network is shown in Figure 2.( ) 0 1 2 3 4 5 6 7 8 9 maximum , , , , , , , , , Digit D D D D D D D D D D = (1)

, 1 h
− represents input layer, h represents hidden layer, n represents the number of neurons in the output layer, P represents number of patterns and weights are updated by
Novelty of this work is recognition of digit through ensemble neural network.Each digit is converted into matrix form and then compressed using logical OR operation.Each row of a compressed matrix is partitioned into individual patterns.Clusters are formed for each partition of the digits using Leader algorithm.Cluster leaders are only considered for training.Ensemble neural network is with 8 neural networks as the compressed matrix has 8 rows.Training neural network consumes time but negligible time is needed for testing.This is the advantage of neural network but if we consider KNN classifier it consumes large time for classification.But another drawback of backpropagation is local minima.Reasonable time is needed to fix learning parameter value for convergence and avoiding local minima.As each partitioned pattern is trained with individual network the training is faster.

Table 1 .
having 667 patterns per class.6670 patterns each with 193 bits are used for training and 3330 with 192 patterns are used for testing.The last bit of the pattern represent target class of the pattern.The experiment is carried out using MathLab software in the Intel Quad core system.Every pattern is converted into 16 number of patterns with single digit as target value.After compression by logical OR operation, 16 patterns reduce to 8 patterns.Clusters are formed among patterns of each partition of a digit separately.When threshold for distance measure is considered as 2, total number of clusters formed is 2216.As the compressed matrix obtained after logical OR operation is with 8 rows, 8 neural networks are considered for training.Each network is with 13 input neurons including bias, 6 hidden neurons and 10 output neurons.Classifier layer of the ensemble network is with 10 neurons as the number of arabic digits is 10.Number of hidden neurons is selected by trial and error method.The termination condition for training is fixed as 0.001 mean squared error.Number of epochs needed for every network termination is shown in The tabulated values are the average value of 25 different runs of the experiments.The time required for convergence of each network is shown in

Table 2 .
The network weights are initialized from the range [−1 1] [19]omly.The learning curve of the network for partition 3 is shown in Figure3.Among 3330 patterns 98.6% of the patterns are recognized correctly.Table3shows the number of patterns classified correctly for each digit.Table4compares the accuracy of the proposed work with that of Ravindra Babu et al.[7], Agrawal Monu et al.,[4]and Vijaya et al.,[19].

Table 1 .
Number of epochs of each network.

Table 2 .
Training time of each network.

Table 3 .
Number of patterns classified on each digit.

Table 4 .
Comparison of classification accuracy.