A Fast Algorithm for Large-Scale MDP-Based Systems in Smart Grid

In this study, we investigate the fast algorithms for the Large-Scale Markov Decision Process (LSMDP) problem in smart gird. Markov decision process is one of the efficient mathematical tools to solve the control and optimization problems in wireless smart grid systems. However, the complexity and the memory requirements exponentially increase when the number of system state grows in. Moreover, the limited computational ability and small size of memory on board constraint the application of wireless smart grid systems. As a result, it is impractical to implement those LSMDP-based approaches in such systems. Therefore, we propose the fast algorithm with low computational overhead and good performance in this study. We first derive the factored MDP representation, which substitutes LSMDP in a compact way. Based on the factored MDP, we propose the fast algorithm, which considerably reduces the size of state space and remains reasonable performance compared to the optimal solution.


INTRODUCTION
Smart grid (Li et al., 2010;Moghe et al., 2012;Moslehi et al., 2010) is a promising technology to improve the efficiency and reliability of the generation, transmission and distribution of electricity services.Compared to the traditional power grid, it introduces a two-way communication network to assist control and monitor the electric energy movements.The information communication mechanism enhances reliably and efficiently control of suppliers and consumers in power grids.Due to the time-varying nature of wireless channel and the complicated actions in smart grid, Markov decision process becomes an ideal mathematical tool to model and solve related problems in such scenarios.In this study, we focus on the MDP-based control and optimization problems in wireless smart grid systems.
In what follows, we survey recent works on the MDP-based approaches in wireless communications.Fu and Schaar (2010) propose a multi-user Markov decision process by considering the video traffic characteristics and the time-varying network conditions.A decomposition method is proposed to enable each wireless user can individually solve its own local MDP problem.Niyato et al. (2009) study the QoS provisioning problem for multiple traffic types in MIMO communications.They model the minimization problem of the weighted packet dropping probability as a constrained Markov decision process, by which the optimal antenna assignment and admission control is derived.The transmission power and modulation order adaptation strategies are investigated in Karmokar et al. (2008).The wireless channel is modeled as a firstorder Markov chain.The optimal transmission power and rate policies are proposed base on the semi-Markov decision process.Phan et al. (2010) present an energyefficient transmission policy for wireless sensor networks with a strict energy capacity constraint.The optimal transmission threshold is derived based on the Markov chain model.The condition to perform the vertical handoff in the 4 th generation wireless communication is studied in Lin and Wong (2008).The maximization problem for the expected total rewards of connection is formulated as MDP.
However, the complexity of the above MDP-based approaches exponentially increases when the number of system states grows in, which causes that the problem cannot be efficiently solved in practice.In smart grid, the wireless systems not only undertake transmission tasks but also implement control actions.Furthermore, the systems normally have limited computation ability and small memory space.Therefore, it is difficult to obtain the control policy based on the large-scale MDP models in such systems.It is necessary to study the fast and efficient algorithms for such problems based on the practical wireless smart grid systems.To this end, our main contributions in this study are as follows.First, we propose the framework of the wireless smart grid system and propose the factored MDP, which represents the large-scale MDP in a compact way.We derive the fast algorithm based on the factored model, which remains good performance compared to the optimal solution.System model: We consider the wireless transmission system with energy harvesting devices in smart grid.Smart meter is one of the key components in smart grid, which is used to control, monitor grids and transmit data.In general, smart meters have a limited battery capacity and it is inconvenient and costly to replace the battery when the smart meters are deployed at substations.Therefore, we introduce the energy harvesting device into the smart meters, which can harvest environmental energy and sustainably provide energy support to smart meters.In addition, smart meters implement real-time monitoring, which results in an instant data delay requirements.The smart meters should have the ability to efficiently control the energy consumption and to meet the instant delay constraint.The basic system model is shown in Fig. 1.
We assume that the time is divided into slots and all system parameters are discretized.The system parameters keep stable during each time slot.Let Q t denote the data queue state in time slot t, where 0 <Q t < Q c and Q t ∈ Q.Let Y t denote the available energy level, where 0 <Y t <Y c and Y t ∈ Y. Let g t denote the channel state and E t denote the energy harvesting rate in time slot t, where, g t ∈ G and E t ∈ E Let S t = {Q t , Y t , g t , E t } denote the system state in time slot t, where S t ∈ S. Let R c denote the required transmission rate and denote the noise power.Let Π denote the control policy.The energy control problem is formulated as follows: where the expectation is over the system state and the cost function is given by: ( ) where, a t denotes the action in time slot t and A denotes the action set, we have: The problem (1) is a discount infinite MDP problem, which can be solved by value iteration (Puterman, 1994).However, the large-scale system states make it impossible to solve by the smart meters with only limited computational ability and small memory space.In the next sections, we will propose the factored MDP, which can represent the problem (1) in a compact way.
Factored MDP: Factored MDP is a compact representation for the original MDP problem.It is neither a new formulation nor a solution for the problem.However, it has some properties which are beneficial for solving the large-scale MDP problem in a compact way.It is based on the dynamic Bayesian network and can reduce the size of the original MDP model from two aspects.First, it uses state variables to substitute the system states with identical property.Therefore, the system state space is represented by state variables.Second, it adopts dynamic Bayesian network to characterize the transition probability of system states.The dynamic Bayesian network is a directed acyclic graphical network based on time sequence.The transition matrix is sparse in general.That is, the system states in the next time slot are dependent on part of system states in the current time slot.Dynamic Bayesian network can represent the connection between adjacent time slots in a compact way.
Let X' 1 , X' 2 , X' 3 and X' 4 denote the system variables corresponding to the system state Q t , Y t , g t and E t , respectively.Therefore, we have: Let {Parent (X' 1 ), Parent (X' 2 ), Parent (X' 3 ), Parent(X' 4 )} denote the parents of {X' 1 , X' 2 , X' 3 , X' 4 }.The transition graph of the dynamic Bayesian network is a two-layer directed acyclic graph as shown in Fig. 2.
The system state S t = {Q t , Y t , g t , E t } is substituted by four variables {X' 1 , X' 2 , X' 3 , X' 4 }.In Fig. 1, we have Parent (X' 1 ) = {X 1 , X 2 }.This is because the data queue dynamic depends on the previous queue state and the previous available energy level.Each node X' i is associated with a conditional probability distribution P (X' i | Parent (X' i )).In Fig. 1, h i denotes the basis of the value function.Let V(s) denote the value function of the system state s S  .For the optimal policy *  , we have: where, s S  .Let V s denote V(s) in ( 6).The MDP problem can convert to the following linear programming: where α (s) denotes the state relevance weight, which should be positive.First, we factor the value function and simplify the objection function.Assume that the value function can be approximated by the subspace which is spanned by the basis function {h 1 (x), h 2 (x), ⋯, h k (x)}.We have: Plug the above equation into the problem (7).We have: We can further obtain that: We have: The problem (12) has only K variables compared to |Q|×|Y|×|G|×|E| state in problem (7).However, the number of constraints in ( 12) is still very large.In the next section, we will propose the fast algorithm with much less complexity.
The fast algorithm via state aggregation: For the large-scale MDP problem, the system state space is very huge and the transition matrix is sparse in general.Factored MDP uses a compact representation to replace the original MDP problem.The traditional value iteration is then transformed into linear programming.Then we use finite linear basis to reduce the free variables in the linear programming.Next, we present the fast algorithm via system state aggregation to further reduce the size of the problem.
We divide the system states into several groups and aggregate the system states in each group into a macrostate and generate costs among macro states.The policy therefore becomes a hierarchical policy.The hierarchical policy is a mapping from one macro system states to another macro system state.The cost function and the transition matrix are based on the macro states.We then formulate a reduced MDP problem, which is equivalent to the original MDP problem (1).
The original MDP problem ( 1) is defined by a fourtuple set M = (S, A, P, r), where S is system state space, A is the action space, P is the transition matrix, r is the cost functions.The reduced MDP model is defined by M' = (S', A, P', r'), which has the same action space with the original MDP model.And S' = {B 1 , B 2 , ..., B L }, where, After we divide the system state space, we implement the approximate linear programming to obtain the policy.In the next section, we will show the simulation results of the proposed algorithm.

SIMULATION RESULTS
In our simulation studies, the length of the time slot is set as 5 ms.We consider the channel experience the Rayleigh fading.The Doppler shift is set as 10 Hz.The channel fading is characterized by 4 states.We assume that the required SNR is fixed.Therefore, there are 4 types of transmission powers.The data queue length is set as 3.The packet arrival process is set as a Poisson process with rate 0.1.We assume that one packet can be sent during one time slot.The energy buffer length is set as 80.The energy harvesting dynamic is characterized by 3 states.The energy harvesting trace is shown in Fig. 3. Therefore, the total number of system states is 2880.We run 1000 realization of the channel with 10 min.
We compare the proposed fast algorithm to the exact value iteration algorithm.The exact value iteration algorithm provides the optimal solution of the  problem (1). Figure 4 shows the actions of the optimal solution and the proposed solution for all system states.It is shown that the proposed algorithm takes identical action as the optimal solution most of the time.The action accuracy reaches 93.8% compared to the optimal solution.Next, we compare the size of the two methods and the computational time as shown in Table 1.The proposed solution has much smaller number of system states and the computational time period is very short compared to the optimal solution.In addition, the energy consumption and the accuracy remain good performance similar to the optimal solution.

CONCLUSION
In this study, we have proposed the fast algorithm for large-scale MDP in the wireless smart meter systems.We first addressed the compact representation problem for the large-scale MDP.Based on the factored MDP representation, we then proposed the state aggregation algorithm, which can largely reduce the size of the original problem with very low complexity.Extensive simulation results have verified the good performance of the proposed scheme.

Fig. 1 :Fig
Fig. 1: The wireless smart meter system stage aggregation method is given as follows: Input: Initial partition b 1 = {B 1 , B 2 , B 3 ,⋯, B L }, arget partition size Output: Target partition {B 1 , B 2 , B 3 , ⋯, B L }′  Initialize: b t = b 1  While (current size <= target partition size) do  For every block in partition b t do  Further partition the related block  End  End  Return: The target partition {B 1 , B 2 , B 3 , ⋯, B L }′

Table 1 :
Computational performance comparison between the optimal solution and the proposed solution