TITLE:
A Policy-Improving System for Adaptability to Dynamic Environments Using Mixture Probability and Clustering Distribution
AUTHORS:
Uthai Phommasak, Daisuke Kitakoshi, Jun Mao, Hiroyuki Shioya
KEYWORDS:
Reinforcement Learning; Profit-Sharing Method; Mixture Probability; Clustering
JOURNAL NAME:
Journal of Computer and Communications,
Vol.2 No.4,
March
18,
2014
ABSTRACT:
Along with the increasing need for rescue robots
in disasters such as earthquakes and tsunami, there is an urgent need to develop
robotics software for learning and adapting to any environment. A reinforcement
learning (RL) system that improves agents’ policies for dynamic environments by
using a mixture model of Bayesian networks has been proposed, and is effective in
quickly adapting to a changing environment. However, the increase in computational
complexity requires the use of a high-performance computer for simulated experiments
and in the case of limited calculation resources, it becomes necessary to control
the computational complexity. In this study, we used an RL profit-sharing method
for the agent to learn its policy, and introduced a mixture probability into the
RL system to recognize changes in the environment and appropriately improve the
agent’s policy to adjust to a changing environment. We also introduced a clustering
distribution that enables a smaller, suitable selection, while maintaining a variety
of mixture probability elements in order to reduce the computational complexity
and simultaneously maintain the system’s performance. Using our proposed system,
the agent successfully learned the policy and efficiently adjusted to the changing
environment. Finally, control of the computational complexity was effective, and
the decline in effectiveness of the policy improvement was controlled by using our
proposed system.