Tuning Recurrent Neural Networks for Recognizing Handwritten Arabic Words

Abstract

Artificial neural networks have the abilities to learn by example and are capable of solving problems that are hard to solve using ordinary rule-based programming. They have many design parameters that affect their performance such as the number and sizes of the hidden layers. Large sizes are slow and small sizes are generally not accurate. Tuning the neural network size is a hard task because the design space is often large and training is often a long process. We use design of experiments techniques to tune the recurrent neural network used in an Arabic handwriting recognition system. We show that best results are achieved with three hidden layers and two subsampling layers. To tune the sizes of these five layers, we use fractional factorial experiment design to limit the number of experiments to a feasible number. Moreover, we replicate the experiment configuration multiple times to overcome the randomness in the training process. The accuracy and time measurements are analyzed and modeled. The two models are then used to locate network sizes that are on the Pareto optimal frontier. The approach described in this paper reduces the label error from 26.2% to 19.8%.

Share and Cite:

Qaralleh, E. , Abandah, G. and Jamour, F. (2013) Tuning Recurrent Neural Networks for Recognizing Handwritten Arabic Words. Journal of Software Engineering and Applications, 6, 533-542. doi: 10.4236/jsea.2013.610064.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] P. Mehra and B. W. Wah, “Artificial Neural Networks: Concepts and Theory,” IEEE Computer Society Press, Los Alamitos, 1992.
[2] A. Graves, “Supervised Sequence Labelling with Recurrent Neural Networks,” Studies in Computational Intelligence, Vol. 385, Springer, 2012. http://dx.doi.org/10.1007/978-3-642-24797-2
[3] V. Margner and H. El Abed, “ICDAR 2009—Arabic Handwriting Recognition Competition,” International Conference on Document Analysis and Recognition, Barcelona, 26-29 July 2009, pp. 1383-1387.
[4] X. Yao, “Evolving Artificial Neural Networks,” Proceedings of the IEEE, Vol. 87, No. 9, 1999, pp. 1423-1447. http://dx.doi.org/10.1109/5.784219
[5] X. Yao and Y. Liu, “A New Evolutionary System for Evolving Artificial Neural Networks,” IEEE Transactions on Neural Networks, Vol. 8, No. 3, 1997, pp. 694-713. http://dx.doi.org/10.1109/ 72.572107
[6] S. Theodoridis and K. Koutroumbas, “Pattern Recognition,” Academic Press, Waltham, 2006.
[7] Y. Chauvin, “Generalization Performance of Overtrained Back-Propagation Networks,” Neural Networks, Springer, 1990, pp. 45-55.
[8] G. Mirchandani and W. Cao, “On Hidden Nodes for Neural Nets,” IEEE Transactions on Circuits and Systems, Vol. 36, No. 5, 1989, pp. 661-664. http://dx.doi.org/10.1109/31.31313
[9] Y. Le Cun, J. S. Denker, S. A. Solla, R. E. Howard and L. D. Jackel, “Optimal Brain Damage,” Advances in Neural Information Processing Systems, Vol. 2, No. 1, 1990, p. 1990.
[10] B. Hassibi, D. G. Stork and G. J. Wolff, “Optimal Brain Surgeon and General Network Pruning,” International Conference on Neural Networks, Vol. 1, 1993, pp. 293-299. http://dx.doi.org/10.1109/ ICNN.1993.298572
[11] A. S. Weigend, D. E. Rumelhart and B. A. Huberman, “Back-Propagation, Weight-Elimination and Time Series Prediction,” Proceedings of 1990 Connectionist Models Summer School, Vol. 105, Morgan Kaufmann, 1990.
[12] S. E. Fahlman and C. Lebiere, “The Cascadecorrelation Learning Architecture,” Technical Report, Computer Science Department, Carnegie Mellon University, 1989.
[13] S. J. Perantonis, N. Ampazis and V. Virvilis, “A Learning Framework for Neural Networks Using Constrained Optimization Methods,” Annals of Operations Research, Vol. 99, No. 1-4, 2000, pp. 385-401. http://dx.doi.org/10.1023/A:1019240304484
[14] F.-J. Lin, C.-H. Lin and P.-H. Shen, “Selfconstructing Fuzzy Neural Network Speed Controller for PermanentMagnet Synchronous Motor Drive,” IEEE Transactions on Fuzzy Systems, Vol. 9, No. 5, 2001, pp. 751-759. http://dx.doi.org/10.1109/91.963761
[15] M. C. Mozer and P. Smolensky, “Using Relevance to Reduce Network Size Automatically,” Connection Science, Vol. 1, No. 1, 1989, pp. 3-16. http://dx.doi.org/10.1080/09540098908915626
[16] C.-C. Teng and B. W. Wah, “Automated Learning for Reducing the Configuration of a Feedforward Neural Network,” IEEE Transactions on Neural Networks, Vol. 7, No. 5, 1996, pp. 1072-1085. http://dx.doi.org/10.1109/72.536305
[17] F. H.-F. Leung, H.-K. Lam, S.-H. Ling and P. K.-S. Tam, “Tuning of the Structure and Parameters of a Neural Network Using an Improved Genetic Algorithm,” IEEE Transactions on Neural Networks, Vol. 14, No. 1, 2003, pp. 79-88. http://dx.doi.org/10.1109/TNN.2002.804317
[18] N. Weymaere and J.-P. Martens, “On the Initialization and Optimization of Multilayer Perceptrons,” IEEE Transactions on Neural Networks, Vol. 5, No. 5, 1994, pp. 738-751. http://dx.doi.org/10.1109/ 72.317726
[19] W. Sukthomya and J. Tannock, “The Optimisation of Neural Network Parameters Using Taguchi’s Design of Experiments Approach: An Application in Manufacturing Process Modelling,” Neural Computing & Applications, Vol. 14, No. 4, 2005, pp. 337-344. http://dx.doi.org/10.1007/s00521-005-0470-3
[20] Y.-S. Kim and B.-J. Yum, “Robust Design of Multilayer Feedforward Neural Networks: An Experimental Approach,” Engineering Applications of Artificial Intelligence, Vol. 17, No. 3, 2004, pp. 249-263. http://dx.doi.org/10.1016/j.engappai.2003.12.005
[21] M. Packianather, P. Drake and H. Rowlands, “Optimizing the Parameters of Multilayered Feedforward Neural Networks through Taguchi Design of Experiments,” Quality and Reliability Engineering International, Vol. 16, No. 6, 2000, pp. 461-473. http://dx.doi.org/10.1002/1099-1638(200011/12)16:6<461::AID-QRE341>3.0.CO;2-G
[22] S. Yang and G. Lee, “Neural Network Design by Using Taguchi Method,” Journal of Dynamic Systems, Measurement, and Control, Vol. 121, No. 3, 1999, pp. 560-563. http://dx.doi.org/10.1115/1.2802515
[23] P. Balestrassi, E. Popova, A. d. Paiva and J. Marangon Lima, “Design of Experiments on Neural Network’s Training for Nonlinear Time Series Forecasting,” Neurocomputing, Vol. 72, No. 4, 2009, pp. 1160-1178. http://dx.doi.org/10.1016/j.neucom.2008.02.002
[24] R. Behmanesh and I. Rahimi, “Control Chart Forecasting: A Hybrid Model Using Recurrent Neural Network, Design of Experiments and Regression,” Proceedings of Business Engineering and Industrial Applications Colloquium, Kuala Lumpur, 7-8 April 2012, pp. 435-439.
[25] R. Bozzo, G. Coletti, C. Gemme and F. Guastavino, “Application of Design of Experiment Techniques to Measurement Procedures: An Example of Optimization Applied to the Digital Measurement of Partial Discharges,” Proceedings of Sensing, Processing, Networking, Instrumentation and Measurement Technology Conference, Vol. 1, 1997, pp. 470-475.
[26] D. Staiculescu, J. Laskar and M. M. Tentzeris, “Design of Experiments (DOE) Technique for Microwave/Millimeter Wave Flip-Chip Optimization,” International Journal of Numerical Modelling: Electronic Networks, Devices and Fields, Vol. 16, No. 2, 2003, pp. 97-103. http://dx.doi.org/ 10.1002/jnm.485
[27] A. Olusanya, “The Use of Design of Experiments Techniques to Determine the Relative Effectiveness of Silane Coupling Agents on the Durability of Titanium Alloy Joints: A Case Study,” Techical Report CMMT (A) 128, National Physical Laboratory, 1998.
[28] V. Margner and H. El Abed, “ICDAR 2011—Arabic Handwriting Recognition Competition,” International Conference on Document Analysis and Recognition, Beijing, 18-21 September 2011, pp. 1444-1448.
[29] G. Abandah, F. Jamour and E. Qaralleh, “Recognizing Handwritten Arabic Words Using Grapheme Segmentation and Recurrent Neural Networks,” Submitted.
[30] G. Abandah and F. Jamour, “Recognizing Handwritten Arabic Script through Efficient Skeleton-Based Grapheme Segmentation Algorithm,” 10th International Conference on Intelligent Systems Design and Applications, Cairo, 29 November-1 December 2010, pp. 977-982.
[31] A. Graves and J. Schmidhuber, “Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures,” Neural Networks, Vol. 18, No. 5-6, 2005, pp. 602-610. http://dx.doi.org/10.1016/j.neunet. 2005.06.042
[32] A. Graves, S. Fernández, F. Gomez and J. Schmidhuber, “Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks,” Proceedings of International Conference on Machine Learning, Pittsburgh, 25-26 June 2006.
[33] A. Graves, “RNNLIB: A Recurrent Neural Network Library for Sequence Learning Problems,” 2013. http://sourceforge.net/projects/rnnl/
[34] E. Grosicki and H. El Abed, “ICDAR 2009 Handwriting Recognition Competition,” International Conference on Document Analysis and Recognition, Barcelona, 26-29 July 2009, pp. 1398-1402.
[35] S. Mozaffari and H. Soltanizadeh, “ICDAR 2009 HandWritten Farsi/Arabic Character Recognition Competition,” International Conference on Document Analysis and Recognition, Barcelona, 26-29 July 2009, pp. 1413-1417.
[36] A. Graves, “Supervised Sequence Labelling with Recurrent Neural Networks,” PhD Thesis, Technische Universitat München, 2008.
[37] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, Vol. 9, No. 8, 1997, pp. 1735-1780. http://dx.doi.org/10.1162/neco.1997.9.8.1735
[38] R. Jain, “The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation and Modeling,” John Wiley & Sons, New York, 1991.
[39] M. Pechwitz, S. S. Maddouri, V. Margner, N. Ellouze and H. Amiri, “IFN/ENIT—Database of Handwritten Arabic Words,” 7th Colloque International Francophone sur l’Ecrit et le Document, 2002, pp. 129-136.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.