^{1}

^{*}

^{2}

^{*}

^{1}

^{*}

^{2}

^{*}

^{2}

^{*}

^{3}

^{*}

This work applies non-stationary random processes to resilience of power distribution under severe weather. Power distribution, the edge of the energy infrastructure, is susceptible to external hazards from severe weather. Large-scale power failures often occur, resulting in millions of people without electricity for days. However, the problem of large-scale power failure, recovery and resilience has not been formulated rigorously nor studied systematically. This work studies the resilience of power distribution from three aspects. First, we derive non-stationary random processes to model large-scale failures and recoveries. Transient Little’s Law then provides a simple approximation of the entire life cycle of failure and recovery through a queue at the network-level. Second, we define time-varying resilience based on the non-stationary model. The resilience metric characterizes the ability of power distribution to remain operational and recover rapidly upon failures. Third, we apply the non-stationary model and the resilience metric to large-scale power failures caused by Hurricane Ike. We use the real data from the electric grid to learn time-varying model parameters and the resilience metric. Our results show non-stationary evolution of failure rates and recovery times, and how the network resilience deviates from that of normal operation during the hurricane.

The power grid is a vast interconnected network that delivers electricity to customers. Power distribution system lies at the edge of the power grid [

A fundamental research issue pertaining to this real problem is the resilience of power distribution to large- scale external disruptions. Here, resilience corresponds to the ability of power distribution to reduce failures and recover rapidly when failed [

・ Randomness and dynamics (i.e., non-stationarity) of failures and recoveries,

・ Time-varying resilience at the network level,

・ Estimation of non-stationarity and the resilience using real data from the electric grid.

A pertinent first step is to model large-scale failures and recoveries. Such a model is a prerequisite for deriving a resilience metric at the network level. The metric needs to reflect the intrinsic characteristics of large- scale failures and recoveries. Severe weather disruptions such as hurricanes evolve randomly and dynamically. So do large-scale failure and recovery at power distribution. For example, failures occur and recover depending on random factors such as the intensity of a storm and dynamic allocations of repair crews. These factors vary with time. Hence, it is appropriate to model based on non-stationary random processes.

Prior approaches account for randomness of failures but rarely dynamics [

Another challenge is how to quantify the resilience of power distribution. Resilience in this work measures the performance of power distribution during severe weather. In principle, such a resilience metric should manifest the difference between the performance in severe weather and normal operations [

A third challenge is that unknown parameters of non-stationary models and a resilience metric need to be estimated from real data [

The contribution of this work is to address the above three challenges, which are:

・ To develop a model based on non-stationary random processes,

・ To derive a dynamic resilience metric based on the model,

・ To learn time-varying model parameters and resilience metric using large-scale real data.

We first formulate, from bottom up, an entire life cycle of large-scale failure and recovery. The problem formulation begins at the finest level of network nodes based on temporal-spatial stochastic processes. Since each external disruption results in one snapshot of nodal states (failed and normal), information from one weather event is insufficient for completely specifying a temporal-spatial model [

A resulting temporal model can be approximated by a

We study an analytically tractable case of

The rest of the paper is organized as follows. Section 2 provides background knowledge and an example of large-scale failures at power distribution. Section 3 develops a problem formulation from nodes (components) to

To provide intuition for modeling on failures and recoveries induced by severe weather, we begin with two examples.

The first example illustrates how failures can be induced by severe weather. The example is on a small section of a power distribution system drawn from [

a) If any of the components

b) If

c) Recovery depends on the types of failures, restoration schemes, as well as the terrene conditions. For example, if either source

In summary, failures and self-recoveries in a small time-scale of seconds depend on detailed topology and self-recovery schemes. Failure and recovery at a larger time scale of minutes and beyond are often due to external disruptions that evolve dynamically and randomly.

The second example illustrates non-stationary failures and recoveries. The example uses real data on an operational power distribution system during Hurricane Ike. Hurricane Ike occurred in 2008 and affected more than 2 million customers at densely populated areas in Texas and Louisiana.

a) Failure occurrence was time-varying and random. More failures occurred during the hurricane than those that occurred before and after.

b) Recovery time was also time-varying and random. Recovery time was different for failures occurred at different time. For example, more failures occurring during the hurricane recovered slowly than those that occurred before and after.

As the result, the probability distributions of failure-occurrence and failure duration vary with time in minutes and hours. Note that information on root causes of failures and recoveries is unavailable, which is beyond the scope of this work.

We now formulate time evolution of large-scale failure and recovery as a non-stationary random process. We begin with detailed information on nodal states (failure and normal). We then aggregate the spatial variables of nodes to obtain the temporal evolution of failure and recovery of an entire network.

A spatial-temporal random process provides theoretical basis for modeling large-scale failures at the finest scale of nodes (component). The shorthand notation i is used to specify both the index of a node and its corresponding geo-location, where

Let

Failures caused by external disruptions exhibit randomness. Whether and when a node fails is random. Whether and when a failed node recovers is also random. Given time

Equation (1) models an individual node in a network. The model includes Markov temporal dependence for nodal states which is a simple assumption for state transitions. Such a model can be applied to a heterogeneous grid where nodes experience different failure and recovery processes in general. There are no assumptions on an underlying network topology nor independence/dependence of failures. Such n equations for n nodes together form a spatial-temporal model for a network.

Each severe weather event generates one snapshot of network states. Information available on failures and recoveries is often from one or a few events. Such information is insufficient for specifying the spatial-temporal model. Hence, we derive a temporal model in this work by considering an entire network as a whole.

Our temporal model aggregates spatial variables from Equation (1),

The probability can be further related to an indicator function, e.g.,

Definition 1. Let

Let

An increment

Definition 2. Failure process

Now we assume that failure

Definition 3. Recovery process

Assume

Similarly, assume that, at most one recovery occurs during

Here recovery time (or failure duration) is also assumed at the time scale of minutes. Equation (2) can be rewritten as

Hence, the expected number of nodes in the failure state equals the difference between the expected failures and the expected recoveries. The time-scale of a minute enables this work to focus on modeling failures that are induced by external disruptions and the recoveries that can not be accomplished by instant self-healing schemes. The aggregation conceals spatial variables [

Failure

1) The arrival process to the queue corresponds to the number of failures

2) A failure that occurs in

3) The departure process of the queue corresponds to the number of recoveries

A

Transient Little’s Law provides an analytically tractable case of

Theorem 1. Transient Little’s Law [

Consider

where

Consider an increment of arrivals as new failures, an arrival rate as a failure rate, a delay as a failure duration, and departures as recoveries. Assume that recoveries occur following first-in-first-out (FIFO) policy. Transient Little’s Law can then be directly applied to our problem. The theorem has an intuitive explanation:

Define recovery rate

Applying Transient Little’s Law, the recovery rate can be related to a failure rate and a recovery time distribution by the corollary below.

Corollary 1. Let

The proof of the corollary is in Appendix 1. In summary, two pertinent quantities completely determine the expected number of failures and recoveries: Failure rate

We now derive a resilience metric using the pertinent parameters for an entire life cycle of non-stationary failure and recovery. While resilience can be characterized from multiple dimensions [

For an non-stationary recovery process, a failure duration depends on when the failure occurs (

When

Definition 4. Infant and aging recovery

Let

Note here

As failure and recovery processes are dynamic, a resilience metric should be dynamic also. Furthermore, how resilience varies with time should result from the dynamic model of failure-recovery processes. Following such a principle, we define resilience from bottom-up, starting with one node. Probability

Definition 5. Resilience of a node

Given threshold value

Aggregating the resilience of nodes over an entire network, (system) resilience

Definition 6. Resilience of a network

Given threshold value

Hence, aggregating over spatial variables, network topology and automated reconfiguration, the resilience of a network is an average resilience of all network nodes:

1) Resilience is a property of a distribution network as a whole to survive large-scale external disruptions.

2) Resilience is a function of time that reflects temporal evolutions of failures and recoveries in a network.

3) Resilience shows the ability of a distribution network to resist failures and recover rapidly.

4) Resilience depends on threshold

The resilience metric can be characterized by the parameters of the model, i.e., non-stationary random processes in Section 1. In particular, the resilience metric (Equation (13)) can be represented through a simple expressions owing to Transient Little’s Law,

The second term corresponds to the aging recoveries at time t. Let

The above expression shows that given threshold

A special case of resilience is when the failure process is a Non-Homogeneous Poisson Process (NHPP). As a commonly-used failure process [

When a failure process is an NHPP, a

Threshold

where

where

In general, a failure-recovery process can be regarded as a combination of these two special cases. At time

We apply the non-stationary failure-recovery processes to a real-life example of large-scale failures caused by a hurricane. Our focus is on estimating the three pertinent quantities

Hurricane Ike was one of the strongest hurricanes that occurred in 2008. Ike caused large scale power failures, resulted in more than two million customers without electricity, and was considered by many as the second costliest Atlantic hurricane of all time [

Reported by the National Hurricane Center [

Widespread power failures were reported across Louisiana and Texas starting September 12 [

Among the 2005 samples, there are groups of failures that occurred within a minute. Failures within a group are considered as dependent and aggregated as one failed entity. Each group has a unique failure occurrence time

The 463 samples are then randomly partitioned into a training set of 333 samples and a test set of 130 samples. The training set is used to learn parameters. The test set is used for validating the model and the parameters.

We now use the data set to study the empirical processes

First, we use the training set to determine failure rate

simple moving average [

estimate the failure rate and use the testing set to validate the estimation.

We now consider hypothesis

Poisson process. If

We perform Pearson’s test [

However, not rejecting

Next we study empirical recovery-time distribution

The 463 samples in our data set consist of durations of the failures that occurred from 7 a.m. September 12 to 4 p.m. September 14.

As the failure durations varied with the hurricane (

where

We select a Weibull distribution as a mixture function because the parameters exhibit clear physical meaning [

where

The parameters of the Weibull mixtures are estimated from the training samples. For simplicity, we divide the failure time into 5 intervals shown in

We now study time evolution of resilience. First, we obtain an optimal threshold

The network resilience is then obtained through Equation (15) using the failure rate

・ Prior to the hurricane, no failures occurred yet, and the resilience was close to 1.

・ A large number of failures then occurred and reduced the resilience to a lower level. How fast the resilience decreased was measured by

・ At 3 am September 14^{th}, about 42.7 hours after the first observed failure (24.8 hours after the landfall, and 26.25 hours after the failure rate reached the maximum value) the resilience reached the minimum value. There, 46% (214 out of 463) of total failures were in aging recovery. The maximum reduction of resilience from that of

the normal operation was

・ After the minimum, the resilience increased when more failures were restored. The impact from the hurricane was fading gradually. It took about 10.7 days for the resilience to return to that of the normal operation from the minimum value.

The dynamic resilience metric

We have derived a non-stationary random process to model large-scale failure and recovery of a power distribution network under external disruptions. The resulting model is a dynamic

We had used real data from an operational network that was impacted by Hurricane Ike. The failure rate and non-stationary probability distribution of failure durations as well as resilience metric are estimated from the real data. The failure process has been shown to be an non-homogenous Poisson process at the time scale of minutes. The recovery-time distribution has been modeled as Weibull mixtures with time-varying parameters. A threshold value is obtained as 15.5 hours for this network, where 50.8% of the failures recovered rapidly. The network resilience reached its minimum value 24.8 hours after the landfall when the aging recoveries were 46% of all failures. The network experienced the most difficult time when the failure rate reached the peak value and the aging recovery dominated until the resilience decreased to the minimum. It then took about 10 days for the network to regain 100% resilience from the minimum value. These observations suggest that enhanced recovery, especially during the most difficult duration, can perhaps reduce the worst impact to the network and improve the overall resilience and the recovery time.

There are several directions for extensions of this work. The first is to utilize spatial and network variables in the non-stationary model. Temporal resilience can then be extended to measure spatiotemporal characteristics. Different time scales may need to be considered to account for the impacts from a system structure. Such extensions are natural as our model is derived from bottom-up starting with nodes at certain geo- and system- locations. Our preliminary work shows a step towards such an extension [

The authors would like to thank Chris Kung, Jae Won Choi, Daniel Burnham and Xinyu Dai for data processing, Kurt Belgum for helpful comments on the manuscript, Anthony Kuh, Vince Poor, and Nikil Jayant for helpful discussions. The support from the National Science Foundation (ECCS 0952785) is gratefully acknowledged.

YunWei,FloydGalvan,ChuanyiJi,StephenCouvillon,GeorgeOrellana,JamesMomoh, (2016) Non-Stationary Random Process for Large-Scale Failure and Recovery of Power Distribution. Applied Mathematics,07,233-249. doi: 10.4236/am.2016.73022

Proof: We begin with the Transient Little’s Theorem. Computing the derivative of both sides of Equation (7), we have

where

The first term on the right-hand-side is

Pearson’s Hypothesis Test: The hypothesis test is based on a chi-square statistic which compares the failure occurrence times with their sample mean. The details of testing

1) Compute the estimated failure rate

2) Divide the failure occurrence times into m non-overlapping intervals

3) Count

4) Use the estimated

occurrences.

5) Compute the sum

(number of independent parameter fitted) − 1. Since one parameter

6) Given a confidence level, for instance 95%, we obtain a threshold value