TITLE:
Variance Optimization for Continuous-Time Markov Decision Processes
AUTHORS:
Yaqing Fu
KEYWORDS:
Continuous-Time Markov Decision Process, Variance Optimality of Average Reward, Optimal Policy of Variance, Policy Iteration
JOURNAL NAME:
Open Journal of Statistics,
Vol.9 No.2,
April
2,
2019
ABSTRACT: This
paper considers the variance optimization problem of average reward in
continuous-time Markov decision process (MDP). It is assumed that the state
space is countable and the action space is Borel measurable space. The main
purpose of this paper is to find the policy with the minimal variance in the
deterministic stationary policy space. Unlike the traditional Markov decision
process, the cost function in the variance criterion will be affected by future
actions. To this end, we convert the variance minimization problem into a
standard (MDP) by introducing a concept called pseudo-variance. Further, by
giving the policy iterative algorithm of pseudo-variance optimization problem,
the optimal policy of the original variance optimization problem is derived,
and a sufficient condition for the variance optimal policy is given. Finally,
we use an example to illustrate the conclusion of this paper.