Modeling and Design of Real-Time Pricing Systems Based on Markov Decision Processes

A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load curve. In this paper, using a Markov decision process (MDP), we propose a modeling method and an optimal control method for real-time pricing systems. First, the outline of real-time pricing systems is explained. Next, a model of a set of customers is derived as a multi-agent MDP. Furthermore, the optimal control problem is formulated, and is reduced to a quadratic programming problem. Finally, a numerical simulation is presented.


Introduction
In recent years, there has been growing interest in energy and the environment.For problems on energy and the environment such as energy saving, several approaches have been studied (see, e.g., [1] [2]).In this paper, we focus on real-time pricing systems of electricity.A real-time pricing system of electricity is a system that charges different electricity prices for different hours of the day and for different days, and is effective for reducing the peak and flattening the load curve (see, e.g., [3]- [6]).In general, a real-time pricing system consists of one controller deciding the price at each time and multiple electric customers such as commercial facilities and homes.If electricity conservation is needed, then the price is set to a high value.Since the economic load becomes high, customers conserve electricity.Thus, electricity conservation is achieved.In the existing methods, the price at each time is given by a simple function with respect to power consumptions and voltage deviations and so on (see, e.g., [6]).In order to realize more precisely pricing, it is necessary to use a mathematical model of customers.
In this paper, using a Markov decision process (MDP), we propose a mathematical model of real-time pricing systems.Since in many cases, the status of electricity conservation of customers is discrete and stochastic, it is appropriate to use an MDP.Then, a set of electricity customers is modeled by a multi-agent MDP.Furthermore, we consider the finite-time optimal control problem.By appropriately setting the cost function, it is achieved that customers conserve electricity actively.This problem can be used for the model predictive control method, which is a control method that the finite-time optimal control problem is solved at each time.In addition, the finite-time optimal control problem can be reduced to a quadratic programming problem.The proposed approach provides us with a basic of real-time pricing systems.
This paper is organized as follows.In Section 2, the outline of real-time pricing systems is explained.In Section 3, a model of electricity customers is derived.In Section 4, the optimal control problem is formulated, and its solution method is derived.In Section 5, a numerical simulation is shown.In Section 6, we conclude this paper.
Notation: Let  denote the set of real numbers.Let n I , 0 m n × denote the n n × identity matrix, the m n × zero matrix, respectively.For simplicity, we sometimes use the symbol 0 instead of 0 m n × , and the symbol I instead of n I .For two events , A B , let  denote the conditional expected value of A under the event B .

Outline of Real-Time Pricing Systems
In this section, we explain the outline of real-time pricing systems studied in this paper.
Figure 1 shows an illustration of real-time pricing systems studied in this paper.This system consists of one controller and multiple electric customers such as commercial facilities and homes.For an electric customer, we suppose that each customer can monitor the status of electricity conservation of other customers.In other words, the status of some customer affects that of other customers.For example, in commercial facilities, we suppose that the status of rival commercial facilities can be checked by lighting, Blog, Twitter, and so on.Depending on power consumption, i.e., the status of electricity conservation, the controller determines the price at each time.If electricity conservation is needed, then the price is set to a high value.Since the economic load becomes high, customers conserve electricity.Thus, electricity conservation is achieved.
In this paper, the status of electricity conservation of each customer is modeled by a Markov decision process (MDP).Then a set of customers is modeled by a multi-agent MDP (MA-MDP).Furthermore, by using the obtained MA-MDP model, we consider the optimal control problem and its solution method.

Model of Customers
First, consider modeling the dynamics of each customer by a one-dimensional MDP.The value of the state x is randomly chosen among the finite set { } expresses the status of electricity conservation, and " 0 " implies the status that a customer conserves electricity maximally, " 1 n − " implies the status that a customer does not conserve electricity.Then the MDP of a customer is given by ( ) ( ) ( ) ( ) where ( ) u t ∈  is the control input, and corresponds to the price.The vector the probability that the state is i at time t .In addition, the initial probability distribution must satisfy the following condition: ( ) The transition probability matrix ( ) ( )  The control input is determined under the condition for each element: and the condition for each column: ( ) Next, consider modeling the dynamics of a set of customers by an MA-MDP.The number of customers is given by q .For the customer i , the state is given by , and from (1), the MDP model is given by Then, we suppose that the MA-MDP model expressing the dynamics of a set of customers is given by ( ) where ij λ expresses the effect of couplings between customers, and is a constant satisfying the following condition: 1 1, 1, 2, , .
For simplicity of discussion, coupling terms are given by ij I λ , but may be replaced with matrices satisfying some condition corresponding to (5).

Problem Formulation
Consider the following problem.Problem 1. Suppose that for the MA-MDP model ( 4) expressing the dynamics of customers, the initial state  , the desired state d x , and the prediction horizon N are given.Then, find a control input sequence subject to the following constraint: , , , , where f ⋅ is a given linear function, M is a given vector., i i Q R are given weights.Hereafter, for simplicity of notation, the condition in the cost function ( 6) is omitted.
By using the constraint (7), the input constraint such as ( ) ≤ can be imposed.In addition, by adjusting , i i Q R , several specifications such that the state ( ) x t must converges to the neighborhood of the desired state d x can be considered.

Solution Method
We derive a solution method for Problem 1. First, consider the MDP model ( 1).The MDP model is a class of nonlinear systems.However, in this case, it can be transformed into a linear system.The MDP model ( 1) can be rewritten as .
 By the property of the probability distribution, the relation holds.From this fact, the MDP model ( 1) can be equivalently transformed into the following linear system: where Next, by using the linear system (8), consider representing the MA-MDP model (4) as a linear system.The linear system for the customer i is denoted by ( ) ( ) ( ) Then, the MA-MDP model (4) can be equivalently transformed into the following linear system: ( ( ) where .
Therefore, the cost function ( 6) can be rewritten as From the above discussion, Problem 1 is equivalent to the following problem.10) is a constant).See [7] for further details.

Numerical Example
Since it is difficult to use data in real systems, we present an artificial example.The state is chosen among the finite set { } 0,1, 2,3 .The number of consumers is given by 5 q = .The coefficient matrices , A B in the linear system for the consumer i are given by The parameters ij λ in ( 9) are given by 0.7 0.3 0 0 0 0.2 0.6 0.2 0 0 .0 0.1 0.8 0.1 0 0 0 0.2 0.7 0.1 0 0 0 0.1 0.9 The parameters N , d x , i Q , and i R are given by 10 N = , 0 From 0 i R = , Problem 2 is reduced to an LP problem.The initial state is given by In addition, the input constraint ( ) In this numerical example, we consider the following two cases: • The price for each customer is the same (i.e., ( ) ( ) ( ) • The price for each customer is different.Case (i) is the conventional case in real-time pricing systems.In Case (ii), we suppose that the difference in the price is covered by using local concurrencies such as the Eco-point point system [8].The Eco-money system [9] in Japan were introduced to stimulate the economy and raise awareness of global warming.In the Eco-point point system, many points, which correspond to money in a local concurrency, are given for the products that are effective from the viewpoints of electricity conservation and the environment.Such a system for energy management systems has been discussed in [10].
Next, we present the computational results.First, the computational result in Case (i) is explained.decreases.Thus, the state converges to 0 , which corresponds to the status that a customer conserves electricity maximally, with a certain probability.Furthermore, the optimal value of the cost function is 97.5902 , and the optimal control input is derived as       Figure 6 and Figure 11).Furthermore, the optimal value of the cost function is 84.5057, and we see that the optimal value of the cost function is improved.The optimal control input ( ) ( ) ( ) ( ) ( ) ( )  is derived as  ( ) ( ) ( ) ( ) ( ) From these values, we see that in the steady state, ( ) u t is widely different to ( ) i u t , 1, 2,3, 4 i = .Thus, in the system considered here, it is appropriate to use a local concurrency.

Conclusions
In this paper, we have proposed a modeling method and an optimal control method of real-time pricing systems using the MDP-based approach.In many cases, the status of electricity conservation of customers is discrete and stochastic, and the use of the MDP model is effective.A real-time pricing system is modeled by multi-agent MDPs, and the optimal control problem is reduced to a QP problem.Furthermore, a numerical simulation has been shown.The proposed method provides us with a new method in real-time pricing of electricity.
There are several open problems.First, it is important to develop the identification method of the MA-MDP model based on the existing result (see, e.g., [11]) for MDPs.Since the effect of couplings between customers was simplified, it is also important to consider modeling it more precisely.Next, the optimal control problem is reduced to a QP problem or an LP problem.These problems can be solved faster than a combinatorial optimization problem such as a mixed integer programming problem.However, for large-scale systems, the computation time for solving the optimal control problem will be long.Then, it is important to develop a distributed algorithm.

Figure 1 .
Figure 1.Illustration of real-time pricing systems.
reduced to a quadratic programming (QP) problem, and can be solved by a suitable solver such as MATLAB and IBM ILOG CPLEX.In addition, if 0 i R = , then Problem 2 is reduced to a linear program- ming (LP) problem (we remark that 2 i d Q x in the cost function ( 0.2 0.2 0.2 0.2 0 0.2 0.3 0.2 0.3 0.4 0.1 0.1 0.1 0.3 0.1 0.1 0.2 0.2 0 0.1 0.1 0.1 0.4

Figures 2- 6
show the probability distribution for each customer.From these figures, we see that

Figure 5 .
Figure 5. π 4 (t) in Case (i).Next, the computational result in Case (ii) is explained.Figures 7-11 show the probability distribution for each customer.Comparing Figures 2-6 with Figures 7-11, we see that transient responses of ( ) 0 i t π are