^{1}

^{*}

^{1}

^{*}

On-line transient stability analysis of a power grid is crucial in determining whether the power grid will traverse to a steady state stable operating point after a disturbance. The transient stability analysis involves computing the solutions of the algebraic equations modeling the grid network and the ordinary differential equations modeling the dynamics of the electrical components like synchronous generators, exciters, governors, etc., of the grid in near real-time. In this research, we investigate the use of time-parallel approach in particular the Parareal algorithm implementation on Graphical Processing Unit using Compute Unified Device Architecture to compute solutions of ordinary differential equations. The numerical solution accuracy and computation time of the Parareal algorithm executing on the GPU are demonstrated on the single machine infinite bus test system. Two types of dynamic model of the single synchronous generator namely the classical and detailed models are studied. The numerical solutions of the ordinary differential equations computed by the Parareal algorithm are compared to that computed using the modified Euler’s method demonstrating the accuracy of the Parareal algorithm executing on GPU. Simulations are performed with varying numerical integration time steps, and the suitability of Parareal algorithm in computing near real-time solutions of ordinary different equations is presented. A speedup of 25× and 31× is achieved with the Parareal algorithm for classical and detailed dynamic models of the synchronous generator respectively compared to the sequential modified Euler’s method. The weak scaling efficiency of the Parareal algorithm when required to solve a large number of ordinary differential equations at each time step due to the increase in sequential computations and associated memory transfer latency between the CPU and GPU is discussed.

The time-domain simulation technique is widely used by the power industry to describe a power grid transient behavior accurately. The high level of accuracy achieved using the time-domain simulation technique is due to the use of detailed mathematical models of controls, nonlinearity, saturation, and protection systems. Power system stability studies or analysis typically involve computing the system response to a sequence of large disturbances, such as generator outage or network short circuit, followed by a switching operation as part of protective measures. The system response computation involves a direct simulation in the time-domain of duration varying between 1 s and 20 min., or more. The system response or stability at different stages of time-domain simulation is affected by different components of the grid dictated by the level of mathematical modeling of the individual components used.

The power system stability studies using the time-domain simulation technique are performed using two levels of mathematical models of the grid components, namely the short-term and long-term models. The short-term models represent the rapidly responding system electrical components of the power grid like generators, exciters, governors, turbines, etc., while the long-term models represent the slow-oscillatory system power balance (variable load). Using the short-term models to perform power system stability studies addressing the post-disturbance times of up to 5 - 10 secs is classified as “Transient Stability” Analysis (TSA) whereas using the long-term models to stability studies are associated with frequency and voltage stability. The focus of the research work presented in this paper is on the Transient Stability Analysis of the power system.

The TSA involves computing the step-by-step solution of thousands of non-linear systems of coupled differential-algebraic equations (DAEs) representing the dynamic components and the network interconnect of the dynamic components of the power system. The TSA, in particular, is concerned with the simulation of faults and contingencies, which can produce instability of the power system. The focus is to simulate a number of possible contingencies in a short-time horizon to evaluate possible instability conditions and develop appropriate corrective actions. The preventive simulation and corresponding corrective action are repeated for tens or hundreds of cases, until a system or utility operator by an online evaluation of the power system state, detects unsafe operating conditions. These exhaustive and computationally intensive simulations are performed to provide an operator with appropriate corrective action that can be triggered when a contingency occurs in real-time. However, the online TSA, in particular, is a computationally challenging problem, requiring 10 - 15 minutes to perform preventive simulation of a power system (depends on the size of the power system) for a set of fault conditions and outages [

The parallelization techniques that can be applied to perform a transient stability analysis of a power system can be broken into spatial domain decomposition, numerical method, and temporal domain decomposition or time-parallel parallelism approaches. The spatial decomposition approach [

The first two parallelization techniques have been researched and available in some commercial packages like PSS-E, PowerWorld, and OPAL-RT [

In this paper, we investigate the use of the time-parallel approach and in particular the Parareal algorithm (PRA) implementation on the Graphical Processor Unit (GPU) using the Compute Unified Device Architecture (CUDA) for solving ODEs representing the electrical components of the power system. The investigation in [

The implementation of PRA has been investigated in [

The research in [

In the above investigations of the TSA using time-parallel approach, all of the implementations have been on a single node or multi-node clusters based on the Intel processors using MATLAB programming language for the algorithm implementation. The speedup achieved does not include the communication overhead between multiple cores on a single node and between the nodes in a cluster. Furthermore, the research in all of the above investigations is on evaluating the suitability of PRA for transient stability analysis. In this paper, we investigate the performance of PRA to solve ODEs using heterogeneous computing architecture, namely the use of massive parallel cores on the graphical processing units (GPU) with compute unified device architecture (CUDA) [

This paper is organized as follows: In section 2, the PRA with the Predictor Correction approach is discussed. Section 3 gives a brief overview of power system modeling, in particular, the classical and fourth-order models of synchronous generator. In section 4, we present the implementation of Parareal algorithm on NVIDIA GPUs, and in section 5, the simulation results and analysis are presented. Section 6 presents the conclusion and future work.

The origin of PRA can be traced back to spatial domain decomposition technique. The PRA involves dividing the entire simulation time T into small sub-intervals and solving these subintervals in parallel. Initial conditions are required to solve the small sub-intervals in parallel. The initial conditions are provided by a fast but less computationally expensive sequential numerical integrator. The small sub-intervals are then solved in parallel to get more accurate solution of an ODE.

Consider a general nonlinear ODE with given initial condition, u ( 0 ) = u 0 as shown in Equation (1)

u ˙ = f ( u , t ) , t ∈ [ 0 , T ] (1)

The entire simulation time t is decomposed into N sub-intervals as T 0 < T 1 < ⋯ < T N with the step size of Δ T = T n − T n − 1 , ∀ 1 ≤ n < N .

Two numerical operators, namely, coarse and fine propagators are defined in PRA. The coarse propagator denoted as G_{ΔT}, operates using the initial condition u ( T n − 1 ) = U n − 1 is used to compute the approximate solution of Equation (1) with time step ΔT at time T_{n} as shown in

U n ˜ = G Δ T ( T n − 1 , U n − 1 ˜ , Δ T ) , U 0 ˜ = u 0 (2)

The fine propagator denoted as F_{δt}, also will use the initial condition u ( T n − 1 ) = U n − 1 to compute approximate solution of Equation (1) with smaller time step δ t ≪ Δ T at time T_{n} as shown in

The solution computed from the fine propagator is denoted as U n ^ . The solution obtained from the fine propagator is more accurate compared to coarse propagator but it is computationally expensive. The fine propagator is mathematically described in Equation (3)

U n ^ = F δ t ( T n − 1 , U n − 1 ^ , δ t ) , U 0 ^ = u 0 (3)

The flowchart of the implementation of PRA is shown in

The flow chart consists of the three steps of the PRA which are discussed below:

Step 1: The initial step of PRA is the computation of the initial conditions sequentially using the coarse propagator that is used for solving the sub-intervals in parallel. The initial coarse propagator generates a fast but less accurate initial conditions using Equation (4).

U n 0 = U n 0 ˜ = G Δ T ( U n − 1 0 ) , ∀ 1 ≤ n < N (4)

where, the superscript “0” denotes the initial iteration.

This step is performed to initialize the PRA iterations.

Step 2: The fine propagator is used to propagate the fine solution in parallel over each sub-interval t ∈ [ T n − 1 , T n ] as

U n k ^ = F δ t ( U n − 1 k − 1 ) , ∀ 1 ≤ n < N (5)

where, k = 1 , 2 , ⋯ , k max is the iteration number.

Step 3: Once the fine solutions are obtained using the coarse solutions as initial conditions, the PRA corrects the sequential coarse predictions using pre- dictor-corrector method. Predictor-Corrector method is used to correct the solution difference obtained from coarse and fine propagators for the next iteration. The predictor-corrector scheme is described in Equation (6).

U n k = U n k ^ + U n k ˜ − U n k − 1 ˜ , ∀ 1 ≤ n < N (6)

where,

U n k ˜ = G Δ T ( U n − 1 k ) (7)

The notation U represents the correct coarse solution which is used as the initial conditions for step 2 of the next iteration. At the end of 1^{st} iteration, the coarse value at time T_{1} gets corrected to the fine solution. Similarly, at the end of the kth iteration, the coarse value at time T_{k} will get corrected to its respective fine solution. The steps 2 and 3 of the algorithm is iterated until the difference between the two successive coarse values meets the desired tolerance level shown in Equation (8).

| U n k − U n k − 1 | ≤ t o l , ∀ 1 ≤ n < N (8)

The coarse solutions are generally less accurate and play an essential role in the convergence of the algorithm [

Power system dynamics are modeled as a set of Differential-Algebraic equations (DAE) of the form

x ˙ = f ( x , y , u ) (9)

g ( x , y ) = 0 (10)

The set of differential Equation (9) describes the behavior of all dynamic elements of a power grid like generators, exciters, governors, turbines, etc. The set of algebraic Equation (10) describes the power grid network connectivity, and all the static elements, i.e., static load. The x represents the system dynamic state variables of the power grid, and is dependent on the level of models of the dynamic elements [

The classical model primarily focuses on modeling the rotor angle and angular velocity of the generator when subjected to a disturbance. When the power system is subjected to a disturbance, the rotor of the synchronous generator will accelerate or decelerate with respect to rotating magnetic field which causes relative motion. The relative motion of electromechanical oscillations of the synchronous generator is represented as “Swing equation” [

H π f o d 2 δ d t 2 = P m − P e = P a (11)

where,

H is the inertia constant (MJ/MVA).

P_{m} is the mechanical input power.

P_{e} is the electrical output power, where P e = E ′ V X sin δ .

E' is the internal EMF of the generator.

V is the terminal voltage.

X = X ′ d + X t + X l

X ′ d is the d axis transient reactance.

X t is the transformer reactance.

X l is the line reactance.

the difference, P_{m} − P_{e} is known as the accelerating power P_{a}.

f_{o} is the nominal frequency.

ω s = 2 π f o is the rated angular speed.

δ is the rotor angle.

ω = d δ d t is the relative speed or angular velocity with respect to the synchronously revolving magnetic field (reference frame).

The variation of the two state variables δ and ω with respect to time due to a disturbance is mathematically modeled by two first order ODEs shown below:

d δ d t = ω − ω s = Δ ω (12)

d Δ ω d t = π f 0 P a H . (13)

The detailed model of a synchronous generator addresses the direct and quadrature axis parameters of a synchronous generator taking into account the saliency. A salient pole synchronous generator is represented with the steady state and transient reactances on both direct and quadrature axis along with corresponding voltages and currents. Four-time dependent ODEs [

d δ d t = ω − ω s = Δ ω (14)

T ′ d 0 d E ′ q d t = − E ′ q − ( X d − X ′ d ) i d + E f d (15)

T ′ q 0 d E ′ d d t = − E ′ d + ( X q − X ′ q ) i q (16)

H π f o d ω d t = T m − T e − D ( ω − ω s ) (17)

where,

E ′ d and E ′ q are the transient voltages along direct (d) and quadrature (q) axis respectively of the generator.

i d and i q are the stator currents of the d and q axis respectively.

D is the damping constant.

X d and X q are the d and q axis synchronous reactances respectively.

X ′ d and X ′ q are the d and q transient reactances respectively.

T ′ d 0 and T ′ q 0 are the open-circuit transient time constants for d and q axes.

T m and T e are the mechanical and electrical torque, respectively.

The electrical torque T e is given by the Equation (18) below,

T e = E ′ d i d + E ′ q i q + ( X ′ q − X ′ d ) i q i d (18)

To compute T e using Equation (18), two algebraic equations involving the stator parameters of the generators have to be solved.

Therefore, the set of differential equations modeling the dynamics of the power system with classical model of the synchronous generators will consist of two first order ordinary differential equations, and with the detailed model will consists of four first order ordinary differential equations along with three algebraic equations. The TSA of a power system due to a disturbance will involve solving a set of first order ODEs of each generator in a power system using a suitable numerical integration method. Since the generator ODEs are typically stiff the time step used in the numerical integration method has to be small to compute an accurate solution and not encounter numerical integration instability.

The number of ODEs modeling the dynamics of a generator depending on the level of modeling can vary from two to twenty seven when all of the control devices like exciter, governor, turbine, stabilizer, etc., are included. A typical power grid having in excess of thousands of generators, the TSA involves computing the numerical integration solution of in excess of ten thousands of ODEs to determine the stability of the grid, necessitating the use of Parareal algorithm executing on GPUs.

The two state variables δ and ω variation in time due to a disturbance is determined by solving the Equations (12) and (13) in case of classical model and Equations (14) and (17) in case of detailed model using a suitable numerical integration method. The ODEs representing the classical and detailed models being stiff requires the use of an explicit integration method like modified Euler’s method. The modified Euler’s method is used by both coarse and fine propagators. The modified Euler’s method is also known as the predictor-corrector method. The modified Euler’s method is a single-step method, which given the initial values for an interval (t_{n}_{−}_{1}, t_{n}), the approximate solution at t_{n} is obtained in two steps:

Step 1: Predictor

In this step, the approximate solution y n p is computed using the explicit Euler’s method with time step size h described by the Equation (19).

y n p = y n − 1 + h f ( x n − 1 , y n − 1 ) (19)

where h f ( x n − 1 , y n − 1 ) is the slope of the tangent at point ( x n − 1 , y n − 1 ) .

Step 2: Corrector

Using the predicted y n p solution from step 1, the corrected solution y n is computed using equation 20. The correction involves calculating the average of the slopes at points ( x n − 1 , y n − 1 ) and ( x n , y n p ) and adding it to the corrected solution in the previous time step.

y n = y n − 1 + h 2 { f ( x n − 1 , y n − 1 ) + f ( x n , y n p ) } (20)

Therefore at each time step, an approximate solution is first computed and then a corrector is applied to improve the approximate solution of the state variables.

General Purpose Computing on GPU (GPGPU) is a well-established parallelization domain to accelerate scientific and engineering computations in a number of fields. NVIDIA’s Compute Unified Device Architecture (CUDA) is the most widely adopted programming model for GPGPU. In the research [

The pseudocode of the PRA implementation for GPGPU is shown in

For the GPGPU implementation, the sequential steps of the PRA are executed on the host (CPU) and the parallel step of the PRA executed on the device or accelerator (GPU). First, the coarse solutions computed on the host are copied from the host-to-device for use by the fine propagators. After the fine solutions are computed on the device, the fine solutions are copied back from device-to-host

for the predictor-corrector step. The corrected coarse values on the host are again copied to the device for the next iteration of the fine propagators. Therefore, the memory transfers back and forth between host and device in each iteration contributes to an increase of the computation time. The focus of this work at this stage is on determining the suitability of PRA for solving ODEs on GPUs using CUDA, optimization techniques to reduce latency due to memory transfers and use of low latency shared and constant memories on the GPU are not addressed.

The performance of the PRA is demonstrated by studying the dynamics of a single machine infinite bus system (SMIB) shown in

The generator in

The coefficients of the equations 12 through 17 of classical and detailed mathematical models of a generator are computed using the generator model parameters [

In

The PRA is implemented on a server having a Intel Xeon CPU E5-2670 @2.30 GHz, interfaced through the PCIe bus to with NVIDIA Quadro RTX 6000 GPU hosting 4608 computing cores with 24 GB GPU memory [

TSA simulations using both classical and detailed generator models were performed using both the sequential algorithm and PRA. First, the variation of rotor angle of the generator with respect time due to a disturbance computed with

Parameter | H | X ′ d |
---|---|---|

Numerical Values | 5 MJ/MVA | 0.3 pu |

Parameter | H | X ′ d | X ′ q | X d | X q | T ′ d o | T ′ q o |
---|---|---|---|---|---|---|---|

Numerical Values | 3.74 MJ/MVA | 0.23 pu | 0.5 pu | 1.93 pu | 1.77 pu | 5.2 s | 0.81 s |

traditional sequential and Parareal algorithms are compared to analyze the accuracy of PRA while satisfying a convergence tolerance of 0.01 radians or 0.57˚. Next, the impact of the number of coarse propagators and fine propagators on speedup is analyzed.

Classical generator model has only two state variables rotor angle δ and rotor speed ω that result in two ODEs that need to be solved at every time step. A 3φ to ground fault was simulated on one of the transmission lines of SMIB at time 0.5 secs and the fault is cleared at time 0.8 secs by isolating the faulted line from the rest of the SMIBs system. Since the ODEs are derived from the classical generator model, the TSA is performed to determine the first swing stability of the SMIB after experiencing a disturbance or fault. In

The second set of simulations was performed with a fault clearing time of 1.0 secs which is larger than the critical clearing time of 0.42 secs. Critical clearing time is the time before which a fault has to be cleared for the power system to transit to a stable steady state. Also, the critical clearing time is fault and system steady state dependent. In

numerically). Furthermore, these numerically instable values are used as the initial conditions for fine propagators resulting in an amplification of the numerical instability. The numerical instability affects negatively the performance of the predictor-corrector stage of the PRA resulting in an increasing error. This behavior is expected as the system is unstable.

The ODEs of the detailed model of a generator incorporating the saliency and transient reactances are typically stiff compared to the ODEs of the classical model. Due to the stiffness, the maximum time step that could be used without the solution diverging for both the sequential simulation and the coarse-propagator was found to be 70 msecs compared to 98 msecs for the classical model. Therefore, the simulation parameters i.e., the coarse and fine propagators time steps, the fault location and type, and the fault duration are identical to the simulations using the classical model. The simulations with the detailed model are carried out for a long period of time to study the effect of saliency and the damping.

In

In

The performance of PRA is analyzed using the execution time speedup achieved with respect to the traditional sequential algorithm. The speedup is given by Equation (21).

speedup = T s e q T P R A (21)

where,

T s e q is the computation time of the sequential algorithm.

T P R A is the execution time of the PRA.

The T P R A is defined as the execution time since it is the sum of four-time components as shown in Equation (22)

T P R A = t H c + ∑ i = 1 N ( t H G + t G f + t G H + t H p c ) (22)

where,

t H c is the computation time of the coarse propagator on the host.

t H G is the memory transfer latency between the host and the GPU.

t G f is the computation time of the fine propagators on the GPU.

t G H is the memory transfer latency between the GPU and the host.

t H p c is the computation time of the predictor-corrector on the host.

N is the number of iterations.

The coarse propagator computation time is dependent on the coarse propagator time step t s t e p c and the fixed interval of time T for which the ODEs are solved. For a fixed T, the coarse propagator computational time will increase with smaller t s t e p c . The memory transfer latencies t H G and t G H both are also dependent on the coarse propagator time step t s t e p c . The number of fine propagators N f corresponding to a coarse propagator time step t s t e p c and for a given T is

N f = T t s t e p c (23)

By varying N f , the number of threads executing in parallel on the GPU cores is varied and varying the fine propagator time step t s t e p f the computation load of each thread is varied.

The speedup achieved using the PRA is demonstrated through a number of simulations with varying t s t e p c or N f , and t s t e p f . In

t s t e p c (msecs) | t s t e p f (usecs) | N f | Execution time (msecs) | Speedup | |
---|---|---|---|---|---|

Sequential | CUDA | ||||

20.0 | 200.0 | 128 | 1.778 | 0.183 | 9.7 |

10.0 | 100.0 | 256 | 3.594 | 0.242 | 15 |

5.0 | 50.0 | 512 | 7.105 | 0.283 | 25 |

resulting in a speedup of 25×. The PRA on GPU provides better performance when the fine propagator computation load is large, i.e. smaller t s t e p f .

In

In

t s t e p c (msecs) | t s t e p f (usecs) | N f | Execution time (msecs) | Speedup | |
---|---|---|---|---|---|

Sequential | CUDA | ||||

10.0 | 100.0 | 2560 | 40.786 | 1.875 | 21.7 |

5.0 | 50.0 | 5120 | 72.57 | 2.728 | 26.6 |

2.0 | 20.0 | 12800 | 159.665 | 5.043 | 31.7 |

1.0 | 10.0 | 25600 | 289.96 | 9.714 | 30.0 |

at each time step is twice that with the classical model.

In

Therefore, from

TSA performed using the time-domain solution approach is a compute-intensive problem and is typically conducted offline by the utilities. In this paper, the use of PRA to solve the ODEs for two synchronous generators models of a SMIB test system to perform TSA using GPUs has been demonstrated successfully. The

PRA was evaluated for accuracy with both stable and unstable cases of the test system. The absolute error between the ODE solutions by PRA and the sequential algorithm is very small demonstrating the accuracy of the PRA. The PRA speedup achieved using GPUs demonstrated that the numerical integration computational time can be significantly reduced in comparison to traditional sequential numerical integration. However, PRA is an iterative algorithm that can impact the performance due to significant amount of memory transfers between the host and device for systems with higher-order generator models. In future work, various methods will be explored to mitigate the memory transfers between the host and device, and the PRA algorithm will be tested for higher-order generator models for large power systems.

This work was supported in part by the Department of Energy under grant DE- SC0012671.

The authors declare no conflicts of interest regarding the publication of this paper.

Lakshmiranganatha, A. and Muknahallipatna, S.S. (2020) Graphical Processing Unit Based Time- Parallel Numerical Method for Ordinary Differential Equations. Journal of Computer and Communications, 8, 39-63. https://doi.org/10.4236/jcc.2020.82004