Stability Estimation for Markov Control Processes with Discounted Cost ()
1. Introduction
Let on a Borel space
two following Markov control processes be given:
,
(1.1)
,
(1.2)
where
,
are the controls forming the control policies
,
(see [1] [2] for definitions);
are sequences of independent and identically distributed (i.i.d.) random vectors in a separable metric space
. In what follows the distributions of
are denoted by
and
respectively. Let c be a given bounded measurable one-step cost function; for any initial state
and control policy
(
is the set of all control policies see [1]), the expected total α-discounted cost criterion areas follows:
, (1.3)
, (1.4)
Under assumptions 3.1 and 3.2 given in section 3, there exist stationary optimal policies
and
such that
(1.5)
To set the stability estimation problem, first, suppose that process given in Equation (1.2) is interpreted as an “available approximation” to process given in Equation (1.1), i.e.,
is an approximation to
.
Second, the policy
(optimal with respect to Equation (1.4)) is applied to control the “original process” given in Equation (1.1) (instead of “unavailable” optimal policy
).
Following the definition given in [3] [4] [5] [6] [7], we introduce the stability index:
,
where
is the value function defined in Equation (1.5). This definition means that
represents anextra cost paid for using
instead of the optimal policy
.
Under certain Lipschitz conditions it was proved (for the processes with bounded costs c) that
(1.6)
where
is an explicitly calculated constant, and
is the Lévy-Prokhorov metric (see Section 2 for definition). The convergence in
is equivalent to the weak convergence plus the convergence of first absolute moments (see [8]).
Inequalities as given in Equation (1.6) have been developed with other types of metrics (Kantorovich, total variation, etc.) and optimization criteria (the averange cost) see e.g. in [5] [7] [9] [10]. Other types of criteria used to obtain the stability of the process can be consulted in [11] [12] [13].
The aim of the present paper is making advantage of boundedness of c and using the well-known contractive properties of the operators related to the expected total discount cost optimality equations to prove the “stability inequality” as in Equation (1.6) with the Lévy-Prokhorov distance on its right-hand side.
This paper is organized as follows: Section 2 defines the control Markov model and the problem of its stability. Section 3 presents the Lipschitz conditions and the assumptions to guarantee the existence of a optimal control to the Markov control process as well as the mail result of this work, the Theorem 3.1, which establishes the conditions to achieve the stability. Section 4 is presented a couple of application examples, for which the assumptions are validated and then the result obtained in the Theorem 3.1 is applied. Finally, section 5 has presented the proof of the Theorem 3.1 as well as a couple of lemmas that are required for this proof.
2. Setting of the Problem
In standard way (see for instance [1] [14]), it will denote a Markov Control Process (MCP) indiscrete time with infinite horizon, stationary and homogeneous as the following fivefold:
(2.1)
where will be assumed that the controllable process components M have the following characteristics:
• The space state X is a metric space with a metric
and
denotes the sigma-algebra;
• The actions space A is a metric space with a metric l;
• The set of admissible actions
is compact for every
;
• The pairs set of admissible state-actions
is anon-empty (and measurable) Borel subset of the set
and it is equipped with the metric
;
• p is a stochastic kernel in X given
. This stochastic kernel specifies the transition probability:
(2.2)
where
and
.
• Finally,
is a bounded and measurable function called a step cost function.
On the other hand, in many applications the evolution of the MCP given in Equation (2.1) is specified by the following model:
,
(2.3)
where
represents the initial state and
it is a sequence of i.i.d. random vectors that take values in any Borel space S with a common distribution
. In fact, it is considered that S is a metric space equipped with a metric r and
is a measurable function. The expression given in Equation (2.3) will be denoted as the original process.
Let
be the initial state and
the applied control policy, (
is the set of all control policies, see [1] [14] for definitions), then the performance criterion called expected total α-discounted cost is defined as usual, by the following functional:
(2.4)
where
is a fixed discount coefficient;
denotes the expected value corresponds to the distribution of the process
with the initial state
and the control policy
applied.
Now, the function
with
is called the value function and a control policy
(provided it exists) is called optimum (with respect to the criterion
) if it meets the following:
,
(2.5)
Later, conditions will be imposed that will guarantee the existence of an optimal stationary policy
for Equation (2.5), (see [14]).
The stability index and its estimation problem. Estimation of the stability problem arises when there is uncertainty about the likelihood of transition p defined in Equation (2.2). The original task of controller consists of the search (or approach) of the optimal policy
that satisfies Equation (2.5) for the original process. In many applications, this task cannot be accomplished directly because, among other reasons, any of the following:
1) Frequently p or some of its parameters are unknown to the controller and this transition probability is estimated using some statistical procedures (from observations). With the results of these estimates another transition probability is generated
, that is interpreted as an approximation accessible to the unknown p.
2) There are situations in which p is known but too complicated to have a hope to solve the problem of optimization of control policy. In such cases, sometimes p is replaced by “a theoretical approach”
, resulting in a controllable process with a more simple structure.
In both cases, in the optimization policies the controller is to work with the controllable Markov process
defined by the accessible transition probability
. This means that instead of the original process
, given in Equation (2.3), the controller uses an approximate process given by the following equation:
,
, with
given (2.6)
where
are states of the processes;
is an action of the corresponding state; and
is a sequence of random vector i.i.d. with values in S. The only difference between the given processes in Equations (2.3) and (2.6) is possible, the different distributions
and
from the random vector
and
respectively.
Changing
for
in Equations (2.4) and (2.5), it is defined as the corresponding optimization criterion
for approximate processing
, with
.
Suppose now, that it is possible (at least theoretically) to find an optimal policy
for the process
, i.e., the value for the approximate process function is defined as
(2.7)
The control policy
in Equation (2.7) is used as approximation to the nonaccessible optimal policy
(assuming it exists). In other words, the policy
is used to control the original process M instead of the unknown policy
.
The increase in the cost for such an approach is estimated by means of the following stability index, (see [3] [4]):
(2.8)
As proposed in [5] [6], the estimation of the stability problem consists of the search of some inequalities of the following type (stability inequalities):
(2.9)
where:
is a “distance” between the probabilities of transition p and
(expressed in terms of a probabilistic metric).
is a continuous function such that
when
; and
is a function with values calculated explicity.
The results presented in [4] [5] provide inequalities such as the one given in the inequality (2.9) using
for
, and the so-called “strong metrics”: total variation metric and the weighted total variation metric.
The aim of this article is obtaining inequalities of stability such as given in the inequality (2.9) with
and the use of “metric weak” probabilistic, specifically, the Lévy-Prokhorov metric (
).
For instance, the Theorem 3.1 presented in the next section, see inequalities (3.1) and (3.2), ensure that under appropriate conditions it complies
(2.10)
where:
is the Lévy-Prokhorov metric;
and
denotes the sma-algebra of Borel of metric space
.
It is well-known (see [15]) that
metrizes weak convergence in any separable metric space. A succession of random vectors that converge under the metric
, converges weakly.
3. Assumptions and Results
The Hausdorff distance (h) between compact subsets
of the metric space
is given by
,
where
.
Likewise the so-called “strong metric”, the total variation metric (
) is given by
,
where
are in the space of probability distributions over
and
is the supremum norm. Of course under
, then
.
On the other hand, one of the metrics called “weak” is the Kantorovich metric ( κ )
where the function
it is of Lipchitz, namely, the set
is defined as
.
It is well known (see [9]) that in the case of
, it is true that
if and only if
(weak convergence) and that
.
In the remainder of the article, it will be denoted by B to the Banach space of all measurable functions
for which the norm
is finite.
The first set of technical assumptions is required to ensure the existence of minimizers in the value functions of the original and the approximate model, see [16].
Assumption 3.1.
1) The set A is compact for each
; also the mapping of values set as
is upper semicontinuous with respect to the Hausdorff metric.
2) The one-step cost function
is bounded, namely
for each
,
; and for each
, the one-step cost function
is lower semicontinuous in A.
3) For each continuous function bounded
, the functions
with
,
are continuous in
.
The second set of assumptions imposes the “Lipschitz conditions” on the one-step cost function as well as on the transition probabilities of the original and approximate processes.
Assumption 3.2.
There are finite constants
such that the following is true:
1)
for each
;
2)
for all
;
3)
for all
where
is the Hausdorff metric;
4)
for all
,
where
is the total variation metric;
5)
for all
,
;
6) For each
,
and the bounded function
, then the function
is lower semicontinuous in
.
For a proof of the following proposition, see [16].
Proposition 1 (Well-known result). Under the assumptions 3.1 and 3.2, for the control processes given in Equations (2.3) and (2.6) there are optimal stationary control policies denoted by
and
respectively, such that
an
do not depend on the initial state
and
In addition, the corresponding value functions
. In particular, for each fixed
, expected values
and
are well defined.
Now, we are in position to formulate the main result of the paper.
Theorem 3.1. Under the assumptions 3.1 and 3.2, the stability index given in Equation (2.8) meets the following inequality:
, (3.1)
where the stability constant is
(3.2)
Note that if
, then the constant
in the inequality (3.2) it is of order
.
4. Some Examples
4.1. The Process of Regularization of the Water Level in a Dam
An important application of control problems (deterministic and stochastic) are those related to water reserve operations. An excellent introduction to many of these problems, including the connection between these and inventory systems, is given in [17].
In the simplest case of regularization of the water level in a dam, the following modeling can be used for the original process:
,
(4.1)
and the respective approximate model remains as
,
(4.2)
In this model, the state variable
represents the level of the stock (volume) of water that the dam has at the beginning of the period t; the control
is the amount of water that is released from the dam for family consumption, irrigation, electric power, etc. during the period t; and the “disturbance”
is the amount of water that the dam receives, randomly, viarain for instance.
In this example, we get
,
,
, with
, where U is the maximum capacity of the dam.
Let
be the cost paid for the released water service, for example, can be made use of a cost function given by
, proportional to water consumption and where
would represent the cost of a unit of water.
To ensure compliance with assumption 3.2 for this example, it is admitted that the following conditions are met:
➢ C1. The cost for one step
satisfies the assumption 3.2 clauses (1) and (2).
➢ C2. The random variable
has a density
, which is:
1) Bounds by a constant
;
2) Satisfies the condition of Lipschitz with a constant
.
For
, the clause (3) in the assumption 3.2 is verified directly (using the Hausdorff metric definition) with the constant
. Now, denoting by
, it is easy to see that for each y fixed, the function
is Lipschitz in S with the constant 1. Then the clause (5) of this assumption is complied with
. Next, the clauses will be verified (4) of assumption 3.2.
Denoting by
and
with
,
,
, consider the following random variables:
,
.
Since
,
it is enough to prove that for a constant
the following inequality is met
, (4.3)
At the time you will see that, according to the definition of the total variation metric, to prove the inequality (4.3) it must be proved that for each measurable function
, with
it is true that
Now then
where for a random variable
, we get that
.
Using the same representation for
, we get that
then
(4.4)
For the second term on the right side of the last inequality, we get that
(4.5)
Let for instance be
. Then from Equation (4.5) and the condition C2, we get that
(4.6)
Let
be an arbitrary but fixed number and
that denotes an infinitesimal interval with center in z. Since
,
then
Similary
Then in the inequality (4.4) (taking into account that
for
):
or then, assuming for example that
, we get that
(4.7)
(Applying the conditions C1 and C2).
Joining inequalities (4.4), (4.6) and (4.7) is obtained the inequality (4.3) with
.
Finally it has been established that for this example the clause (4) of assumption 3.2 is met with
. Following similar arguments can be shown that the clause (6) of assumption 3.2 is also true. Therefore, in this example inequality (3.1) can be applied to the Theorem 3.1, obtaining the following:
(4.8)
where
(4.9)
On the other hand, the distance
given in the inequality (4.8) is very difficult to calculate. Therefore, the result given in the inequality (4.8) can be expressed in terms of other probabilistic metrics as it’s shown in the following:
v Total variation metric. Using the well-known relationship
, see [18], between the metrics of Lévy-Prokhorov and of total variation and since in this example
, we can narrow the part on the right side of inequality (4.8) for the next stability inequality:
(4.10)
where constant
is given in the inequality (4.9).
v Kantorovich metric (
). Let be (
),
;
,
the distribution functions of random variables
and
, respectively, in Equations (4.1) and (4.2). Then, using the fact that
, see [18], relates the Lévy-Prokhorov metric and the Kantorovich metric (which was defined in Section 2), the part on the right side of the inequality (4.8) is bounded as
(4.11)
where constant
is given in the inequality (4.9).
The integral in the last inequality represents the Kantorovich metric between
and
. The inequality (4.11) is more informative compared to inequality (4.10) since it supports that approximation of
for the corresponding empirical distribution functions.
4.2. Example 4.2
Let be
,
,
, with A being a compact set in
. Now, define the following processes:
,
;
,
;
where
,
,
and
are bounded and Lipschitz functions with constants
and
respectively.
In [19], it is shown that assumption 3.1 is satisfied for this model.
Properly selecting a cost function
that is bounded and Lipschitz, it is assured that the clauses (1) and (2) from assumption 3.2 are fulfilled; for instance, if the following cost function is selected by
, then given that
, we get that
so, by selecting a constant of
, this clause (2) is satisfied.
On the other hand, it is clear that the clause (3) is satisfied for any positive constant
. To validate the clause (4) of assumption 3.2 first, define the following random variables:
,
,
so, it is clear that the probability density of each of the previous random variables is, respectively
,
,
then, since this example
, after some direct calculations we get to the next result
and as it was assumed that the functions H and G are Lipschitz for the constants
,
respectively, then from the last inequality we get that
So, by selecting the constant
the clause (4) of assumption 3.2 is satisfied. To validate the clause (5) of this assumption, let be
,
and note that
,
and since the functions H and G are bounded, let
be the finite constant, such that
for all
. Therefore, from the last inequality we get that
.
So for a constant of
, the clause (5) is satisfied. Finally, since the function
is continuous in all its arguments, then the clause (6) is also true.
In conclusion, the example 4.2 satisfies the assumption 3.2, so the result of the Theorem 3.1 can be applied, see inequalities (3.1), (3.2), and narrow the stability index using the Lévy-Prokhorov metric
,
where
5. Proofs
5.1. Some Preliminary Lemmas
For the proof of the theorem 3.1, the following lemmas will be used:
Lemma 5.1. Under assumption 3.2, the value function
defined in Equation (2.5) satisfies the condition of Lipschitz in the state space X, with the
constant
.
Proof. For the assumption 3.2 clause (1), for each
we get that
is bounded by
, then
.
On the other hand, in [16] it is proved that the following operators:
(5.1)
are contractive in the space of Banach B with module
.
Now, of these operators will be selected the terms that are inside the “brackets” to define the following function:
,
.
It is claimed that the function
is Lipschitz for the constant
.
To prove it, let be
,
, then
Applying the assumption 3.2 clause (2) and the fact that
, we get that
Then, applying the assumption 3.2 clauses (4), the previous inequality can be expressed as
therefore,
in
, where
.
By virtue of which the operators given in Equation (5.1) are contractive, we get that
.
Then, to prove that the function
is Lipschitz in X with the constant
, is enough to try the following:
For the function
, so that
, with
, it follows that for all
,
:
(5.2)
Remark 5.1. Observe that
so, we have that
So if the inequality (5.2) is met, then it is true that
,
which would conclude that
is Lipschitz in X.
Next the proof of the inequality (5.2) is presented.
Let be
.
Then, by the inequality of the triangle we get the following:
,
,
where
(5.3)
It will be proven that
(5.4)
The proof will be done by contradiction. Assuming inequality is not met given in (5.4), then there is a
such that the following is satisfied:
(5.5)
Due to the compactness of the sets
,
and to the continuity of g, there are elements
,
for which the infimum are reached in I, see Equation (5.3).
If it is admitted for example that
,
Then
(5.6)
Now, for the assumption 3.2 clause (3), as
, exists
such that
and consequently we get that
The above implies that
if this last inequality is substituted in the inequality (5.6), we obtain
which contradicts the fact that
is the element for which the minimum of
over
is reached. Therefore, the assumption made in the inequality (5.5) is false. Then, we get that
, which implies that
and consequently
Finally, because of the comments made in remark 5.1, we get that
which proves lemma 5.1.
Lemma 5.2. Under assumption 3.2, the value function
defined in Equation (2.5) satisfies the condition of Lipschitz in space S with the constant
.
Proof. For proof of lemma 5.2, the following function will be used: Let be
; define the function
as.
Let be
,
. By the definition of functional
we get that
,
and because of lemma 5.1, we came to the next inequality:
Now, applying assumption 3.2 clause (4) to the previous inequality we get the following:
,
namely
5.2. The Proof of the Theorem 3.1
To prove inequality (3.1) we take advantage of method proposed in [7], nevertheless we need to modify this technique and the combination of certain Lyapunov like conditions in the results of the paper allows to use the contractive properties of the operators related to the discount cost optimality equations, so the following are the incorporations and developments required for the proof of the Theorem 3.1 obtained in this article, with a bounded cost function.
Let
,
be the optimal stationary policies for processes given in Equations (2.3) and (2.6) respectively, and
the corresponding value functions.
Then (see chapter 8 of [16])
and
satisfy the following optimality equations (even more,
are the only solutions to these equations):
(5.7)
(5.8)
For all
are defined
,
.
As it has been proved in [7], the stability index given in Equation (2.8) can be represented as
(5.9)
where
(5.10)
and
is the trajectory of the process given in Equation (2.3) applying the control policy
.
By the definition given in Equations (5.10) and (5.8) along with the fact that
is optimal for the process given in Equation (2.6), we have that
,
which implies
,
and by the definition of the functions H and
Then, applying the inequality of the triangle
(5.11)
Define now, the next pseudo-metric:
(5.12)
Then, from Equation (5.12) it is observed that the first summand on the right side of Equation (5.11) is bounded by
.
On the other hand, applying the supremum
in Equation (5.11), the second term on the right side is
(5.13)
As already mentioned, the operators given in Equation (5.1) are contractive in B with module a. So, Equations (5.7) and (5.8) can be expressed as
and
. Now, given that
and
are fixed points of these operators, we get that
,
now, applying the inequality of the triangle
Using the definition given in Equation (5.12), we obtain that the previous inequality can be expressed as
Substituting this last expression in the inequality (5.13), we get that the second term on the right side of inequality (5.11) is bounded by
and so inequality (5.11) is bounded by
(5.14)
Finally, substituting inequality (5.14) in Equation (5.9) we obtain
(5.15)
To find a dimension for
, it will be used the definition of the Dudley metric (
) in the space of distributions in
:
where
,
and
(See [15] for definition and properties of
).
By the lemma 5.2, we get that
and since
, then the stability index can be narrowed in terms of Dudley’s metric by the following expression:
Now, using the well-known relationship
between the Dudley metric and Lévy-Prokhorov metric (see [18]) and after some direct calculations, the desired inequality (3.1) is obtained with the constant given in Equation (3.2).
6. Conclusions
Despite the vast literature that exists on the subject of Markov controllable processes, a few studies have been carried out on the subject of estimating stability. The study of stability for Markov control processes represents a challenge both from a theoretical and a practical point of view. Proposing appropriate probabilistic metrics to achieve so-called stability inequalities is an additional effort.
In this article, conditions were found to obtain the stability of a Markov control process under the optimization criterion of expected total α-discounted cost with a bounded cost function using the Lévy-Prokhorov metric.
The importance of being able to use the Lévy-Prokhorov metric lies in the fact that for application problems it allows estimations of the stability index under the use of empirical distributions for the random elements, since they converge weakly under this metric to the distributions that are it tries to estimate (unlike the so-called “strong metrics”).
On the other hand, since in applications, there is no company that can bear unlimited (unbounded) costs, the results found in this work using simple techniques such as contractive operators provide an estimate of the increase in cost (the stability index) to control the “original process” using the optimal policy of the “approximate process”. Of course, the stability constant (
) affects this stability index, specifically in this work it was found that this constant is of order
if
. There are arguments to support the hypothesis that in the left part of the inequality (3.1) (for each initial state fixed x):
when
and the distribution of
and
are fixed. It is not clear what the rate of such growth is. Therefore, it is proposed that in future research, based on particular (and simple) control processes, to verify the growth rate of
using computational experiments and process simulation.
Acknowledgements
The author is particularly grateful to Professor Edgar Vladyvosky M.S. for his instructive discussions on a generalization of the Markov processes and properties of the Lévy-Prokhorov metric.