Topological Order Value Iteration Algorithm for Solving Probabilistic Planning

DOI: 10.4236/cn.2013.51B020   PDF   HTML     2,232 Downloads   3,017 Views  

Abstract

 AI researchers typically formulated probabilistic planning under uncertainty problems using Markov Decision Processes (MDPs).Value Iteration is an inef?cient algorithm for MDPs, because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, LAO*, LRTDP and HDP are state-of-the-art ones. All of these use reach ability analysis and heuristics to avoid some unnecessary backups. However, none of these approaches fully exploit the graphical features of the MDPs or use these features to yield the best backup sequence of the state space. We introduce an improved algorithm named Topological Order Value Iteration (TOVI) that can circumvent the problem of unnecessary backups by detecting the structure of MDPs and backing up states based on topological sequences. The experimental results demonstrate the effectiveness and excellent performance of our algorithm.

Share and Cite:

Liu, X. , Li, M. and Nie, Q. (2013) Topological Order Value Iteration Algorithm for Solving Probabilistic Planning. Communications and Network, 5, 86-89. doi: 10.4236/cn.2013.51B020.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] S. Y. Yan, M. H. Yin, W. X. Gu and X. F. Liu, “Research and Advances on Probabilistic Planning,” Caai Transactions on Intelligent Systems, Vol. 1, 2008, pp. 9-22.
[2] A. Barto, S. Bradke and S. Singh, “Learning to Act using Real-time Dynamic Programming,” Artificial Intelligence, Vol. 72, 1995, pp. 81-138. doi:10.1016/0004-3702(94)00011-O
[3] E. Hansen and S. Zilberstein, “LAO*: A Heuristic Search Algorithm that Finds Solutions Withloops,” Artificial Intelligence, Vol. 129, 2001, pp. 35-62. doi:10.1016/S0004-3702(01)00106-0
[4] B. Bonet and H. Geffner, “Labeled RTDP: Improving the Convergence of Real-time Dynamic Programming,” Proceedings of 13th ICAPS, 2003, pp. 12-21.
[5] B. Bonet and H. Geffner, “Faster Heuristic Search Algorithms for Planning with Uncertainty and Full Feedback,” Proceedings of IJ-CAI-03, 2003, pp. 1233-1238.
[6] C. Guestrin, D. Koller, R. Parr and S. Venkataraman, “Efficient Solution Algorithms for Factored MDPs,” Journal of Artificial Intelligence Research, Vol. 19, 2003, pp. 399-468.
[7] Z. Feng and E. Hansen, “Symbolic Heuristic Search for Factored Markov Decision Processes,” In Proceedings of AAAI-05, 2002, pp. 44-50.
[8] P. Dai, Mausam and S. Daniel, “Focused Value Iteration,” The Nineteenth International Conference on Automated Planning and Scheduling (ICAPS-09), 2009, pp. 82-89.
[9] P. Dai and J. Goldsmith, “Ranking Policies in Discrete Markov Decision Processes,” Annals of Mathematics and Artificial Intelligence, Vol. 59, 2010, pp. 107-123. doi:10.1007/s10472-010-9216-8
[10] M. Pterman and Markov, “Decision Processes: Discrete Stochastic Dynamic Programming,” Wiley-Interscience, 2005.
[11] M. Littman, T. Dean and P. Kaelbling, “On the Complexity of Solving Markov Decision Problems,” In Proceedings of UAI-95, 1995, pp. 394-402.
[12] H. Cormen, C. Leiserson and R. Rivest, “Introduction to Algorithms,” Second Edition, The MIT Press, 2001.

  
comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.