Relational Reinforcement Learning with Continuous Actions by Combining Behavioural Cloning and Locally Weighted Regression
Julio H. Zaragoza, Eduardo F. Morales
.
DOI: 10.4236/jilsa.2010.22010   PDF    HTML     6,251 Downloads   10,325 Views   Citations

Abstract

Reinforcement Learning is a commonly used technique for learning tasks in robotics, however, traditional algorithms are unable to handle large amounts of data coming from the robot’s sensors, require long training times, and use dis-crete actions. This work introduces TS-RRLCA, a two stage method to tackle these problems. In the first stage, low-level data coming from the robot’s sensors is transformed into a more natural, relational representation based on rooms, walls, corners, doors and obstacles, significantly reducing the state space. We use this representation along with Behavioural Cloning, i.e., traces provided by the user; to learn, in few iterations, a relational control policy with discrete actions which can be re-used in different environments. In the second stage, we use Locally Weighted Regression to transform the initial policy into a continuous actions policy. We tested our approach in simulation and with a real service robot on different environments for different navigation and following tasks. Results show how the policies can be used on different domains and perform smoother, faster and shorter paths than the original discrete actions policies.

Share and Cite:

J. Zaragoza and E. Morales, "Relational Reinforcement Learning with Continuous Actions by Combining Behavioural Cloning and Locally Weighted Regression," Journal of Intelligent Learning Systems and Applications, Vol. 2 No. 2, 2010, pp. 69-79. doi: 10.4236/jilsa.2010.22010.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] C. Watkins, “Learning from Delayed Rewards,” PhD Thesis, University of Cambridge, England, 1989.
[2] K. Conn and R. A. Peters, “Reinforcement Learning with a Supervisor for a Mobile Robot in a Real-World Envi-ronment,” International Symposium on Computational Intelligence in Robotics and Automation, Jacksonville, FI, USA, June 20-23, 2007, pp. 73-78.
[3] E. F. Morales and C. Sammut, “Learning to Fly by Combining Reinforcement Learning with Behavioural Cloning,” Proceedings of the Twenty-First International Conference on Machine Learning, Vol. 69, 2004, p. 76.
[4] J. Peters, S. Vijayakumar and S. Schaal, “Reinforcement Learning for Humanoid Robotics,” Proceedings of the Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, September 2003, pp. 29-30.
[5] W. D. Smart, “Making Reinforcement Learning Work on Real Robots,” Department of Computer Science at Brown University Providence, Rhode Island, USA, 2002.
[6] W. D. Smart and L. P. Kaelbling, “Effective Reinforcement Learning for Mobile Robots,” Proceedings of the 2002 IEEE International Conference on Robotics and Automation, Washington, DC, USA, 2002, pp. 3404-3410.
[7] R. S. Sutton and A. G. Barto, “Reinforcement Learning: An introduction,” MIT Press, Cambridge, MA, 1998.
[8] L. Torrey, J. Shavlik, T. Walker and R. Maclin, “Relational Macros for Transfer in Reinforcement Learning,” Lecture Notes in Computer Science, Vol. 4894, 2008, pp. 254-268.
[9] Y. Wang, M. Huber, V. N. Papudesi and D. J. Cook, “User-guided Reinforcement Learning of Robot Assistive Tasks for an Intelligent Environment,” Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, NV, 2003, pp. 27-31.
[10] I. Bratko, T. Urbancic and C. Sammut, “Behavioural Cloning of Control Skill,” In: R. S. Michalski, I. Bratko and M. Kubat, Ed., Machine Learning and Data Mining, John Wiley & Sons Ltd., Chichester, 1998, pp. 335-351.
[11] A. Cocora, K. Kersting, C. Plagemann, W. Burgard and L. De Raedt, “Learning Relational Navigation Policies,” IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, October 9-15, 2006, pp. 2792-2797.
[12] C. Gaskett, D. Wettergreen and A. Zelinsky, “Q-learning in Continuous State and Action Spaces,” In Australian Joint Conference on Artificial Intelligence, Australia, 1999, pp. 417-428.
[13] F. Aznar, F. A. Pujol, M. Pujol and R. Rizo, “Using Gaussian Processes in Bayesian Robot Programming,” Lecture notes in Computer Science, Vol. 5518, 2009, pp. 547-553.
[14] S. F. Hernández and E. F. Morales, “Global Localization of Mobile Robots for Indoor Environments Using Natural Landmarks,” IEEE Conference on Robotics, Automation and Mechatronics, Bangkok, September 2006, pp. 29-30.
[15] J. Herrera-Vega, “Mobile Robot Localization in Topological Maps Using Visual Information,” Masther’s thesis (to be publised), 2010.
[16] R. T. Vaughan, B. P. Gerkey and A. Howard, “On Device Abstractions for Portable, Reusable Robot Code,” Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. 3, 2003, pp. 11-15.
[17] L. Romero, E. F. Morales and L. E. Sucar, “An Exploration and Navigation Approach for Indoor Mobile Robots Considering Sensor’s Perceptual Limitations,” Proceed-ings of the IEEE International Conference on Robotics and Automation, Seoul, Korea, May 21-26, 2001, pp. 3092-3097.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.