Multiple Action Sequence Learning and Automatic Generation for a Humanoid Robot Using RNNPB and Reinforcement Learning


This paper proposes how to learn and generate multiple action sequences of a humanoid robot. At first, all the basic action sequences, also called primitive behaviors, are learned by a recurrent neural network with parametric bias (RNNPB) and the value of the internal nodes which are parametric bias (PB) determining the output with different primitive behaviors are obtained. The training of the RNN uses back propagation through time (BPTT) method. After that, to generate the learned behaviors, or a more complex behavior which is the combination of the primitive behaviors, a reinforcement learning algorithm: Q-learning (QL) is adopt to determine which PB value is adaptive for the generation. Finally, using a real humanoid robot, the proposed method was confirmed its effectiveness by the results of experiment.

Share and Cite:

T. Kuremoto, K. Hashiguchi, K. Morisaki, S. Watanabe, K. Kobayashi, S. Mabu and M. Obayashi, "Multiple Action Sequence Learning and Automatic Generation for a Humanoid Robot Using RNNPB and Reinforcement Learning," Journal of Software Engineering and Applications, Vol. 5 No. 12B, 2012, pp. 128-133. doi: 10.4236/jsea.2012.512B025.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] V. I. Pavlovic, R. Sharma, and T. S. Huang, “Visual In-terpretation of Hand Gestures for Human-Computer Interaction: A Review”, IEEE Transaction on Pattern Analysis and Machine Learning Intelligence, Vol. 19, No. 7, 1997, pp. 667-695.
[2] C. Nolker, H. Ritter, Parametrized SOMs for hand posture reconstruction. Proceedings of IEEE-INNS-ENNS International Joint Conference on Neural Networks (IJCNN’00), 2000, pp. 139—144.
[3] G. Heidemann, H. Bekel, I. Bax, A. Saalbach, Hand gesture recognition: selforganising maps as a graphical user interface for the partitioning of large training data sets, in: Proceedings of 17th International Conference on Pattern Recognition (ICPR’04), 2004, pp.487—490.
[4] C.-L. Huang, M.-S. Wu, S.-H. Jeng, Gesture recognition using the multi-PDM method and hidden Markov model, Image and Vision Computing, Vol.18, No.11, 2000, pp.865–879.
[5] R. Amit and M. Mataric, “Learning Movement Sequences from Demon-stration”, Proceedings of the 2nd IEEE International Conference on Development and Learning (ICDL’02), Cambridge, MA, 2002, pp. 203-208.
[6] M. Hossain, M. Jenkin, Recognizing hand-raising gesture using HMM, in: Proceedings of 2nd Canadian Conference on Computer and Robot Vision (CRV’05), 2005, pp.405-412.
[7] G. Caridakis, K. Karpouzis, A. Dro-sopoulos, S. Kollias, SOMM: Self Organizing Markov Map for Gesture Recognition, Pattern Recognition Let-ters, Vol. 31, No.1, 2010, pp. 52-59.
[8] H. Suk, B. Sin and S. Lee, “Recognize Hand Gesture using Dynamic Baysian Network”, Proceedings of 8th IEEE internation-al Conference on Automatic Face & Gesture Recognition (FG’08), 2008, pp. 1-6.
[9] J. Tani, “Learning to Generate Articulated Behavior through the Bottom-Up and the Top-Down Interaction Process”, Neural Networks, Vol. 16, 2003, pp. 11-23.
[10] J. Tani and M. Ito, “Self-Organization of Behavioral Primitives as Multiple Attractor Dynamics: A Robot Experiment”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 33, No. 4, 2003, pp. 481-488.
[11] J. Tani, M. Ito and Y. Sugita, “Self-organization of Distributedly Represented Multiple Behavior Schemata in a Mirror System: Reviews of Robot Experiments Using RNNPB”, Neural Networks, Vol.17, 2004, pp.1273-1289.
[12] T. Kuremoto, Y. Kinoshita, L. B. Feng, S. Watanabe, K. Kobayashi, M. Obayashi, “A Gesture Recognition Sys-tem with Retina-V1 Model and One-Pass Dynamic Pro-gramming”, Neurocomputing, 2012, in press, doi:
[13] G. riz-zolatti and L. Craighero, “The MiNror-neuron System”, Annual Reviews of Neuroscience, Vol. 27, 2004, pp. 169-192.
[14] E. Oztop, M. Kawato and M. Arbib, “Mirror Neurons and Imitation: A Computationally Guided View”, Neural Networks, Vol. 19, 2006, pp. 254-271.
[15] R. S. Sutton and A. G. Barto, “Rein-forcement Learning: An introduction”, The MIT Press, Cambridge, 1998.
[16] J. L. Elman, “Finding Structure in Time”, Cognitive Science, Vol. 14, 1990, pp. 179-211.
[17] D. Rumelhart, G. E. Hinton and R. J. Williams, “Learning Internal Representations by Back-Propagation Errors”, Nature, Vol. 233, 1986, pp. 533-536.
[18] M. I. Jordan, “Attractor Dynamics and Parallelism in a Connectionist Sequential Machine”, IEEE Computer Society Neural Networks Technology Series, 1990, pp. 112-127.

Copyright © 2022 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.