Share This Article:

Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter

Abstract Full-Text HTML XML Download Download as PDF (Size:384KB) PP. 514-520
DOI: 10.4236/ajor.2013.36050    3,202 Downloads   5,060 Views  

ABSTRACT

This paper proposes a technique to accelerate the convergence of the value iteration algorithm applied to discrete average cost Markov decision processes. An adaptive partial information value iteration algorithm is proposed that updates an increasingly accurate approximate version of the original problem with a view to saving computations at the early iterations, when one is typically far from the optimal solution. The proposed algorithm is compared to classical value iteration for a broad set of adaptive parameters and the results suggest that significant computational savings can be obtained, while also ensuring a robust performance with respect to the parameters.

Conflicts of Interest

The authors declare no conflicts of interest.

Cite this paper

E. Arruda and F. Ourique, "Adaptive Strategies for Accelerating the Convergence of Average Cost Markov Decision Processes Using a Moving Average Digital Filter," American Journal of Operations Research, Vol. 3 No. 6, 2013, pp. 514-520. doi: 10.4236/ajor.2013.36050.

References

[1] M. L. Puterman, “Markov Decision Processes: Discrete Stochastic Dynamic Programming,” John Wiley & Sons, New York, 1994.
http://dx.doi.org/10.1002/9780470316887
[2] R. Bellman, “Dynamic Programming,” Princeton University Press, Princeton, 1957.
[3] R. Howard, “Dynamic Probabilistic Systems,” John Wiley & Sons, New York, 1971.
[4] A. S. Adeyefa and M. K. Luhandjula, “Multiobjective Stochastic Linear Programming: An Overview,” American Journal of Operational Research, Vol. 1, No. 4, 2011, pp. 203-213. http://dx.doi.org/10.4236/ajor.2011.14023
[5] M. He, L. Zhao and W. B. Powell, “Approximate Dynamic Programming Algorithms for Optimal Dosage Decisions in Controlled Ovarian Hyperstimulation,” European Journal of Operational Research, Vol. 222, 2012, pp. 328-340. http://dx.doi.org/10.1016/j.ejor.2012.03.049
[6] S. A. Tarim, M. K. Dogru, U. Ozen and R. Rossi, “An Efficient Computational Method for a Stochastic Dynamic Lot-Sizing Problem under Service-Level Constraints,” European Journal of Operational Research, Vol. 215, No. 3, 2011, pp. 563-571.
http://dx.doi.org/10.1016/j.ejor.2011.06.034
[7] E. F. Arruda, M. Fragoso and J. do Val, “Approximate Dynamic Programming via Direct Search in the Space of Value Function Approximations,” European Journal of Operational Research, Vol. 211, No. 2, 2011, pp. 343351. http://dx.doi.org/10.1016/j.ejor.2010.11.019
[8] A. Saure, J. Patrick, S. Tyldesley and M. L. Puterman, “Dynamic Multi-Appointment Patient Scheduling for Radiation Therapy,” European Journal of Operational Research, Vol. 223, No. 2, 2012, pp. 573-584.
http://dx.doi.org/10.1016/j.ejor.2012.06.046
[9] T. Hao, Z. Lei and A. Tamio, “Optimization of a Special Case of Continuous-Time Markov Decision Processes with Compact Action Set,” European Journal of Operational Research, Vol. 187, No. 1, 2008, pp. 113-119.
http://dx.doi.org/10.1016/j.ejor.2007.04.011
[10] H. Wang, “Retrospective Optimization of Mixed-Integer Stochastic Systems Using Dynamic Simplex Linear Interpolation,” European Journal of Operational Research, Vol. 217, No. 1, 2012, pp. 141-148.
http://dx.doi.org/10.1016/j.ejor.2011.08.020
[11] P. Benchimol, G. Desaulniers and J. Desrosiers, “Stabilized Dynamic Constraint Aggregation for Solving Set Partitioning Problems,” European Journal of Operational Research, Vol. 223, No. 2, 2012, pp. 360-371.
http://dx.doi.org/10.1016/j.ejor.2012.07.004
[12] S. D. Patek, “Policy Iteration Type Algorithms for Recurrent State Markov Decision Processes,” Computers & Operations Research, Vol. 31, No. 14, 2004, pp. 23332347. http://dx.doi.org/10.1016/S0305-0548(03)00190-4
[13] A. Almudevar and E. F. Arruda, “Optimal Approximation Schedules for a Class of Iterative Algorithms, with an Application to Multigrid Value Iteration,” IEEE Transactions on Automatic Control, Vol. 27, No. 12, 2012, pp. 3132-3146. http://dx.doi.org/10.1109/TAC.2012.2203053
[14] E. F. Arruda, F. Ourique and A. Almudevar, “Toward an Optimized Value Iteration Algorithm for Average Cost Markov Decision Processes,” Proceedings of the 49th IEEE International Conference on Decision and Control, Atlanta, 15-17 December 2010, pp. 930-934.
http://dx.doi.org/10.1109/CDC.2010.5717895
[15] D. M. John and G. Proakis, “Digital Signal Processing,” 4th Edition, Prentice Hall, Upper Saddle River, 2006.
[16] P. Brémaud, “Gibbs Fields, Monte Carlo Simulation, and Queues,” Springer-Verlag, New York, 1999.
[17] D. P. Bertsekas, “Dynamic Programming and Optimal Control,” 2nd Edition, Athena Scientific, Belmont, 1995.
[18] R. A. D. Peter and J. Brockwell, “Time Series: Theory and Methods,” 2nd Edition, Springer, New York, 1991.
[19] S. M. Ross, “Stochastic Processes,” 2nd Edition, John Wiley & Sons, New York, 1996.

  
comments powered by Disqus

Copyright © 2019 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.