A Reinforcement Learning System to Dynamic Movement and Multi-Layer Environments


There are many proposed policy-improving systems of Reinforcement Learning (RL) agents which are effective in quickly adapting to environmental change by using many statistical methods, such as mixture model of Bayesian Networks, Mixture Probability and Clustering Distribution, etc. However such methods give rise to the increase of the computational complexity. For another method, the adaptation performance to more complex environments such as multi-layer environments is required. In this study, we used profit-sharing method for the agent to learn its policy, and added a mixture probability into the RL system to recognize changes in the environment and appropriately improve the agent’s policy to adjust to the changing environment. We also introduced a clustering that enables a smaller, suitable selection in order to reduce the computational complexity and simultaneously maintain the system’s performance. The results of experiments presented that the agent successfully learned the policy and efficiently adjusted to the changing in multi-layer environment. Finally, the computational complexity and the decline in effectiveness of the policy improvement were controlled by using our proposed system.

Share and Cite:

Phommasak, U. , Kitakoshi, D. , Shioya, H. and Maeda, J. (2014) A Reinforcement Learning System to Dynamic Movement and Multi-Layer Environments. Journal of Intelligent Learning Systems and Applications, 6, 176-185. doi: 10.4236/jilsa.2014.64014.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Sutton, R.S. and Barto, A.G. (1998) Reinforcement Learning: An Introduction. MIR Press, Cambridge.
[2] Croonenborghs, T., Ramon, J., Blockeel, H. and Bruynooghe, M. (2006) Model-Assisted Approaches for Relational Reinforcement Learning: Some Challenges for the SRL Community. Proceedings of the ICML-2006 Workshop on Open Problems in Statistical Relational Learning, Pittsburgh.
[3] Fernandez, F. and Veloso, M. (2006) Probabilistic Policy Reuse in a Reinforcement Learning Agent. Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems, New York, May 2006, 720-727.
[4] Kober, J., Bagnell, J.A. and Peters, J. (2013) Reinforcement Learning in Robotics: A Survey. International Journal of Robotics Research, 32, 1238-1274.
[5] Kitakoshi, D., Shioya, H. and Nakano, R. (2004) Adaptation of the Online Policy-Improving System by Using a Mixture Model of Bayesian Networks to Dynamic Environments. Electronics, Information and Communication Engineers, 104, 15-20.
[6] Kitakoshi, D., Shioya, H. and Nakano, R. (2010) Empirical Analysis of an On-Line Adaptive System Using a Mixture of Bayesian Networks. Information Science, 180, 2856-2874.
[7] Phommasak, U., Kitakoshi, D. and Shioya, H. (2012) An Adaptation System in Unknown Environments Using a Mixture Probability Model and Clustering Distributions. Journal of Advanced Computational Intelligence and Intelligent Informatics, 16, 733-740.
[8] Phommasak, U., Kitakoshi, D., Mao, J. and Shioya, H. (2014) A Policy-Improving System for Adaptability to Dynamic Environments Using Mixture Probability and Clustering Distribution. Journal of Computer and Communications, 2, 210-219.
[9] Tanaka, F. and Yamamura, M. (1997) An Approach to Lifelong Reinforcement Learning through Multiple Environments. Proceedings of the Sixth European Workshop on Learning Robots, Brighton, 1-2 August 1997, 93-99.
[10] Minato, T. and Asada, M. (1998) Environmental Change Adaptation for Mobile Robot Navigation. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems, 3, 1859-1864.
[11] Ghavamzadeh, M. and Mahadevan, S. (2007) Hierarchical Average Reward Reinforcement Learning. The Journal of Machine Learning Research, 8, 2629-2669.
[12] Kato, S. and Matsuo, H. (2000) A Theory of Profit Sharing in Dynamic Environment. Proceedings of the 6th Pacific Rim International Conference on Artificial Intelligence, Melbourne, 28 August-1 September 2000, 115-124.
[13] Nakano, H., Takada, S., Arai, S. and Miyauchi, A. (2005) An Efficient Reinforcement Learning Method for Dynamic Environments Using Short Term Adjustment. International Symposium on Nonlinear Theory and Its Applications, Bruges, 18-21 October 2005, 250-253.
[14] Hellinger, E. (1909) Neue Begrüündung der Theorie quadratischer Formen von unendlichvielen Veräänderlichen. Journal für die Reine und Angewandte Mathematik, 136, 210-271.
[15] Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Pub. Inc., San Francisco.

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.