A Policy-Improving System for Adaptability to Dynamic Environments Using Mixture Probability and Clustering Distribution

Uthai Phommasak; Daisuke Kitakoshi; Jun Mao; Hiroyuki Shioya

doi:10.4236/jcc.2014.24028

Journal of Computer and Communications > Vol.2 No.4, March 2014

A Policy-Improving System for Adaptability to Dynamic Environments Using Mixture Probability and Clustering Distribution

Uthai Phommasak, Daisuke Kitakoshi, Jun Mao, Hiroyuki Shioya
Department of Information Engineering, Tokyo National College of Technology, Tokyo, Japan.
Division of Information and Electronic Engineering, Graduate School of Engineering, Muroran Institute of Technology, Hokkaido, Japan.
DOI: 10.4236/jcc.2014.24028 PDF HTML 2,852 Downloads 4,485 Views Citations

Abstract

Along with the increasing need for rescue robots in disasters such as earthquakes and tsunami, there is an urgent need to develop robotics software for learning and adapting to any environment. A reinforcement learning (RL) system that improves agents’ policies for dynamic environments by using a mixture model of Bayesian networks has been proposed, and is effective in quickly adapting to a changing environment. However, the increase in computational complexity requires the use of a high-performance computer for simulated experiments and in the case of limited calculation resources, it becomes necessary to control the computational complexity. In this study, we used an RL profit-sharing method for the agent to learn its policy, and introduced a mixture probability into the RL system to recognize changes in the environment and appropriately improve the agent’s policy to adjust to a changing environment. We also introduced a clustering distribution that enables a smaller, suitable selection, while maintaining a variety of mixture probability elements in order to reduce the computational complexity and simultaneously maintain the system’s performance. Using our proposed system, the agent successfully learned the policy and efficiently adjusted to the changing environment. Finally, control of the computational complexity was effective, and the decline in effectiveness of the policy improvement was controlled by using our proposed system.

Keywords

Reinforcement Learning; Profit-Sharing Method; Mixture Probability; Clustering

Share and Cite:

Phommasak, U. , Kitakoshi, D. , Mao, J. and Shioya, H. (2014) A Policy-Improving System for Adaptability to Dynamic Environments Using Mixture Probability and Clustering Distribution. Journal of Computer and Communications, 2, 210-219. doi: 10.4236/jcc.2014.24028.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Sutton, R.S. and Barto, A.G. (1998) Reinforcement Learning: An Introduction. MIR Press, Cam-bridge.
[2]	Croonenborghs, T., Ramon, J., Blockeel, H. and Bruynooghe, M. (2006) Model-Assisted Approaches for Relational Reinforcement Learning: Some Challenges for the SRL Community. Proceedings of the ICML-2006 Work-shop on Open Problems in Statistical Relational Learning, Pittsburgh.
[3]	Fernandez, F. and Veloso, M. (2006) Probabilistic Policy Reuse in a Reinforcement Learning Agent. Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems, New York, 720-727. http://dx.doi.org/10.1145/1160633.1160762
[4]	Kitakoshi, D., Shioya, H. and Nakano, R. (2004) Adaptation of the Online Policy-Improving System by Using a Mixture Model of Bayesian Networks to Dynamic Environments. Electronics, Information and Communication Engineers, 104, 15-20.
[5]	Kitakoshi, D., Shioya, H. and Nakano, R. (2010) Empirical Analysis of an On-Line Adaptive System Using a Mixture of Bayesian Networks. Information Science, 180, 2856-2874. http://dx.doi.org/10.1016/j.ins.2010.04.001
[6]	Phommasak, U., Kitakoshi, D. and Shioya, H. (2012) An Adaptation System in Unknown Environments Using a Mix- ture Probability Model and Clustering Distributions. Journal of Advanced Computational Intelligence and Intelligent Informatics, 16, 733-740.
[7]	Tanaka, F. and Yamamura, M. (1997) An Approach to Lifelong Reinforcement Learning through Multiple Environments. Proceedings of the Sixth European Workshop on Learning Robots, EWLR-6, Brighton, 93-99.
[8]	Minato, T. and Asada, M. (1998) Environmental Change Adaptation for Mobile Robot Navigation. Proceedings of IEEE/RSJ International Joint Conference on Intelligent Robots and Systems, IROS’98, Victoria, 1859-1864.
[9]	Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Pub. Inc., San Francisco.
[10]	Hellinger, E. (1909) Neue Begründung der Theorie quadratischer Formen von unendlichvielen Ver?nderlichen. Journal für die Reine und Angewandte Mathematik, 136, 210-271.

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies