Variance Optimization for Continuous-Time Markov Decision Processes

HTML  XML Download Download as PDF (Size: 398KB)  PP. 181-195  
DOI: 10.4236/ojs.2019.92014    962 Downloads   2,013 Views  Citations
Author(s)

ABSTRACT

This paper considers the variance optimization problem of average reward in continuous-time Markov decision process (MDP). It is assumed that the state space is countable and the action space is Borel measurable space. The main purpose of this paper is to find the policy with the minimal variance in the deterministic stationary policy space. Unlike the traditional Markov decision process, the cost function in the variance criterion will be affected by future actions. To this end, we convert the variance minimization problem into a standard (MDP) by introducing a concept called pseudo-variance. Further, by giving the policy iterative algorithm of pseudo-variance optimization problem, the optimal policy of the original variance optimization problem is derived, and a sufficient condition for the variance optimal policy is given. Finally, we use an example to illustrate the conclusion of this paper.

Share and Cite:

Fu, Y. (2019) Variance Optimization for Continuous-Time Markov Decision Processes. Open Journal of Statistics, 9, 181-195. doi: 10.4236/ojs.2019.92014.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.