^{1}

^{*}

^{1}

Time-spectral solution of ordinary and partial differential equations is often regarded as an inefficient approach. The associated extension of the time domain, as compared to finite difference methods, is believed to result in uncomfortably many numerical operations and high memory requirements. It is shown in this work that performance is substantially enhanced by the introduction of algorithms for temporal and spatial subdomains in combination with sparse matrix methods. The accuracy and efficiency of the recently developed time spectral, generalized weighted residual method (GWRM)
are
compared to that of the explicit Lax-Wendroff and implicit Crank-Nicolson methods. Three initial-value PDEs are employed as model problems; the 1D Burger equation, a forced 1D wave equation and a coupled system of 14 linearized ideal magnetohydrodynamic (MHD) equations. It is found that the GWRM is more efficient than the time-stepping methods at high accuracies. The advantageous scalings <i>N<sub>t</sub></i><sup style="margin-left:-6px;">1.0</sup><i>N<sub>s</sub></i><sup style="margin-left:-6px;">1.43</sup>
and <i>N<sub>t</sub></i><sup style="margin-left:-6px;">0.0</sup><i>N<sub>s</sub></i><sup style="margin-left:-6px;">1.08</sup>
were obtained for CPU time and memory requirements, respectively, with
<i>
N_{t}
_{</i>}
and
<i>
N_{s}
_{</i>}
denoting the number of temporal and spatial subdomains. For time-averaged solution of the two-time-scales forced wave equation, GWRM performance exceeds that of the finite difference methods by an order of magnitude both in terms of CPU time and memory requirement. Favorable subdomain scaling is demonstrated for the MHD equations, indicating a potential for efficient solution of advanced initial-value problems in, for example, fluid mechanics and MHD.

In time-spectral methods for time-dependent ordinary partial differential equations, a spectral representation is employed for the temporal domain. As an alternative to standard finite time differencing, this approach has been studied by several authors [

The focus is here on the recently developed Generalized Weighted Residual Method (GWRM), where truncated Chebyshev expansions are employed [

In the GWRM, not only temporal and spatial but also physical parameter domains may be treated spectrally using Chebyshev polynomials, being of interest for carrying out parameter scaling dependence in a single computation. How this works becomes clear as the method is briefly described in the next section.

Various suggestions related to spectral methods in time have been put forth in the literature. As early as 1979, a pseudo-spectral method, based on iterative calculation and an approximate factorization of the given equations, was suggested [

In 1986 and 1989, Tal-Ezer [

In 1987, a “double spectral method”, with polynomial spectral functions in both space and time variables was suggested for nonlinear diffusion equations [

Ierley et al. [

In 1994, Bar-Yoseph et al. [

A theoretical analysis of Chebyshev solution expansion in time and one-dimensional space, for equal spectral orders, was given in [

Time-spectral methods feature high order accuracy in time. For primarily implicit finite difference methods, deferred correction methods may provide high order temporal accuracy [

A relatively recent approach to increase the temporal efficiency of finite difference methods is time-parallelization via the parareal algorithm [

A time-spectral method for periodic unsteady computations, using a Fourier representation in time, was suggested in [

Returning to the issue of efficiency, most of the GWRM computational effort is spent in solving the system of linear or nonlinear (depending on the type of problem) algebraic equations for the Chebyshev series coefficients. Iterative root solvers require either the computation of an inverse matrix or the solution of an equivalent matrix equation. As a simple example consider GWRM solution of a 1D initial-value PDE, employing Chebyshev polynomials of order K and L in time and space, respectively. Then Ω = [ ( K + 1 ) ( L + 1 ) ] 3 numerical operations are typically required for matrix inversion and Ω / 3 operations using LU decomposition for solving the corresponding matrix equation [

The paper is arranged as follows. In the next section, a short introduction to the GWRM is provided. In Section 3 several methods for improved GWRM efficiency will be presented. These will, in turn, be implemented as we compare the efficiency of the GWRM versus explicit and implicit finite differencing methods in section 4. The paper ends with discussion and conclusion.

We may write a system of parabolic or hyperbolic initial-value partial differential equations symbolically as

∂ u ∂ t = D u + f (1)

where u = u ( t , x ; p ) is the solution vector, D is a linear or nonlinear matrix operator and f = f ( t , x ; p ) is an explicitly given source (or forcing) term. Note that D may depend on both physical variables (t, x and u ) and physical parameters (denoted p ) and that f is assumed arbitrary but non-dependent on u . Initial u ( t 0 , x ; p ) as well as (Dirichlet, Neumann or Robin) boundary u ( t , x B ; p ) conditions are assumed known.

Our aim is to determine a spectral solution of Equation (1), using Chebyshev polynomials [

u ( t , x ; p ) = ∑ k = 0 K ' ∑ l = 0 L ' ∑ m = 0 M ' a k l m T k ( t ) T l ( x ) T m ( p ) . (2)

The Chebyshev polynomials of the first kind (henceforth simply referred to as Chebyshev polynomials) are defined by T n ( x ) = cos ( n arccos ( x ) ) . These are real ordinary polynomials of degree n, orthogonal in the interval [ − 1 , 1 ] over a weight w x = ( 1 − x 2 ) − 1 / 2 . Thus T 0 ( x ) = 1 , T 1 ( x ) = x , T 2 ( x ) = 2 x 2 − 1 and so forth. Extension to arbitrary computational intervals for t, x and p is described in [

As in standard WRM a residual R is defined as, here using the Picard integral formulation of Equation (1):

R ≡ u ( t , x ; p ) − [ u ( t 0 , x ; p ) + ∫ t 0 t { D u + f } d t ′ ] (3)

The coefficients a k l m of the Chebyshev series are subsequently determined from the set of algebraic equations being generated by R from the requirement that the residual should satisfy the Galerkin WRM defined over the full computational domain:

∫ t 0 t 1 ∫ x 0 x 1 ∫ p 0 p 1 R T q ( τ ( t ) ) T r ( ξ ( x ) ) T s ( P ( p ) ) w t w x w p d t d x d p = 0. (4)

Interval variables τ , ξ and P are defined in [

All computations are performed using the computer mathematics program Maple. The GWRM is easily coded in languages like Matlab or Fortran, but absolute computational speed is not important for the comparisons with finite difference methods made here; rather it is important that all comparisons are carried out within the same computational environment.

An early implementation of the GWRM was compared with finite difference methods for solving two elementary initial-value problems in [

The 1D Burger equation, being related to problems in fluid mechanics and magnetohydrodynamics (MHD), is

∂ u ∂ t = − u ∂ u ∂ x + υ ∂ 2 u ∂ x 2 (5)

where υ can be interpreted as a (kinematic) viscosity. For comparisons, we use an exact solution of this equation [

The 1D forced wave equation being solved is

∂ 2 u ∂ t 2 = υ ∂ 2 u ∂ x 2 + f ( t , x ) (6)

u ( t , 0 ) = u ( t , 1 ) = 0

u ( 0 , x ) = sin ( n π x )

∂ u ∂ t ( 0 , x ) = α A sin (βx)

where the forcing function is f ( t , x ) = A ( υ β 2 − α 2 ) sin ( α t ) sin ( β x ) . The exact solution is u ( t , x ) = cos ( n π ν 0.5 t ) sin ( n π x ) + A sin ( α t ) sin ( β x ) , featuring two time scales where we choose the driving term time scale to be much longer than the intrinsic time scale; the respective ratio is R = α / ( n π υ ) . The primary aim was here to average out the fast time scale behavior in order to generate approximate solutions following the slower time scale. For similar accuracy, the GWRM was here about 10 times faster than Lax-Wendroff and 30 times faster than Crank-Nicolson.

In the following, we will present algorithm improvements that substantially enhance the performance of the GWRM for the 1D Burger and the 1D forced wave equations described above. Comparisons will be made with the explicit Lax-Wendroff and implicit Crank-Nicolson methods. Although more efficient time stepping methods for the model problems have been developed, these are chosen as well known reference methods.

Furthermore, GWRM performance improvements for a third, advanced problem will be studied; the set of 14 (actually 7 complex), linearized ideal MHD equations modelling the stability of a magnetically confined plasma.

How then, is the GWRM made more efficient? The measures that are taken fall into two categories: a) optimal adaption of the root solver SIR to the GWRM and b) streamlining of the GWRM itself. Below we present the ideas and algorithms that have been developed for these categories; performance results will be given in the next section.

ODEs and PDEs can be solved globally by the GWRM scheme given in section 2 using single spatial and temporal domains. High resolution then requires high modal numbers K and L (we let M = 0 in this paper) which in turn results in a large set of N = ( K + 1 ) ( L + 1 ) nonlinear or linear algebraic equations to be solved simultaneously by SIR. A natural step to avoid the corresponding cubic and quadratic dependencies on N for the number of operations and memory storage, respectively, would be to divide the physical domain into coupled subdomains in space and time.

Substantial CPU time would be saved if the subdomain equations could be computed independently to some extent. Attempts to update the spatial domains independently at each iteration, using previous iterates for boundary conditions only, was however found to be only partially successful [

The root solver SIR [

S1. Matrix and vector numerical package. It is important that the computational environment includes efficient packages for standard operations on vectors and matrices. In Maple, the transition from the linalg to the Linear Algebra package resulted in faster handling of the matrix equations. Certain packages, like Vector Calculus, should not be called globally since they slow down computations.

S2. Solution of matrix equations. In SIR, the matrix equation x = A ( x − φ ) + φ is solved iteratively, where the vector x contains the Chebyshev coefficients of the solution u, φ is a vector with components that are functions of the coefficients, and A is a linear matrix operator being computed to provide optimal convergence at each iteration. To determine A, a linear matrix equation involving the system Jacobian J ≡ ∂ ( x − φ ) / ∂ x needs to be solved. A large fraction of the GWRM CPU time lies here. Using LU decomposition solution of this system, instead of inversion of J, a dependence Ω / 3 rather than Ω for the number of operations is obtained for large matrices. For small matrices, however, inversion turns out to be faster, thus there is an option to choose either method.

S3. Choice of equation solver mode. For many problems, SIR can be run as Newton’s method since sufficient convergence is achieved and less iterations are needed. For improved convergence, SIR default settings [

S4. Effect of A matrix on convergence. When solving linear algebraic equations, the matrix A needs to be computed only for the first domain, provided that the domains are equidistant in time, and can then be re-used for the following time domains. This fact is extremely useful when dividing the temporal domain of the problem into subdomains. Nonlinear PDEs usually require at least 5 - 10 iterations. For the last few iterations, however, the A matrix is nearly constant. Thus substantial CPU time is saved by computing A in the first few iterations only; even beautiful houses can be built with ugly scaffolds.

S5. Band matrix methods. Sparse, band-shaped Jacobian matrices J occur in problems where many spatial subdomains are employed because only neighboring domains are analytically coupled. The Maple Linear Algebra package has built-in algorithms that automatically handle sparse matrix equations efficiently.

S6. J matrix differentiation. The Jacobian J is obtained exactly by analytical differentiation of φ. This is a tedious procedure that, without optimization, may require more than 50% of the total GWRM CPU time for matrices of dimension about 3000 or higher. By implementing algorithms that differentiate the non-zero band matrix elements only, favorable scaling with the number of spatial subdomains is obtained for very large matrices.

S7. Spatial and temporal subdomain influence on φ. In particular for nonlinear problems, the components of φ may be lengthy and complex, thus being time-consuming to differentiate analytically. Significant speed is gained by the use of spatial and temporal subdomains, since then the same global accuracy may be obtained using lower order Chebyshev polynomial expansions in each subdomain, resulting in more manageable φ vectors for differentiation.

S8. Choice of initial vector x = x_{0}. As for all iterative methods, SIR convergence strongly depends on the choice of initial vector x_{0}. The closer to the solution, the faster the convergence. In GWRM computations, x_{0} is typically taken to be the initial condition or, when multiple time domains are used, the solution for the end of the previous time interval. Thus, if the temporal length is reduced, the solution vector x will arbitrarily approach the initial guess x = x_{0}. Hence, GWRM convergence is always guaranteed. In some computations particularly well-conditioned choices of x_{0} can be made. For example in numerical weather prediction, several scenarios are computed with slightly different initial conditions in order to provide ensemble results. Rapid GWRM convergence can then be reached by using solutions x from previous computations as x_{0} [

Next follows a discussion on the measures taken to optimize the GWRM.

S9. Spatial and temporal subdomains. The use of spatial and temporal subdomains implies that the same accuracy can be retained with lower order Chebyshev polynomials in each domain. Optimistically, if this order could be reduced to half by halving the interval, a speed gain of about a factor 4 would be obtained because of the cubic dependence on the number of modes and that the number of intervals is doubled. In reality, the story is more complicated and there is usually an optimum subinterval length [

S10. Overlapping spatial subdomains. It is preferable to use overlapping spatial subdomains in Chebyshev spectral methods as compared to a procedure where function and functional derivative values are matched at the borders. A standard is two-point overlap (“hand-shake”). The reason is that the Chebyshev spectral space representation of derivatives is sensitive to the values of higher order coefficients, which values are quite approximative both during early iterations and for solutions that do not need to be precisely calculated. The amount of overlap can be chosen arbitrarily; very small values (order 10^{−6} of the spatial domain) are usually favorable for high accuracy. The number of overlap points required to preserve boundary condition information across the spatial domain is a function of the number of first order PDEs that are solved [

S11. Adaptive temporal subdomains. Time overlap is only used for the temporal domains when it enhances convergence, since accuracy generally is negatively affected. Adaptive time interval length, however, greatly enhances efficiency. Best results have been obtained by starting with a relatively long time interval; if convergence is not reached, the time interval is automatically reduced and a new computation is performed. The algorithm regularly strives to increase the time interval length, which procedure is very forceful in smooth computational terrain. It may be mentioned that this algorithm is very robust since Chebyshev polynomials are limited to values in the interval [−1, 1]; thus the numerical values of higher order Chebyshev coefficients directly measure convergence.

S12. Time parallelization. The use of spatial subdomains opens up the possibility for performing strongly parallel computations in each time interval. In an approach termed “the Common Boundary Condition method” (CBC) we solve the local physical equations of each subdomain in parallel for each iteration, whereas the global computation only involves the boundary equations that connect the domains. This promising procedure is relatively complex and will be reported elsewhere.

S13. Clenshaw’s algorithm. Nearly all GWRM computations take place in spectral space. The computation of a Chebyshev series however, which may be needed for example when handling overlapping temporal domains, suffers from truncation errors at higher modal numbers. Clenshaw’s algorithm [

S14. End conditions. Since the GWRM is an acausal algorithm, initial conditions can be traded for end conditions for possible improvement of numerical stability. This potential avenue is, so far, only explored for some simple cases with neutral result.

Finally we would like to mention our present development of an adaptive spatial subdomain method that automatically widens the spatial domains in smooth regions and narrows them near sharp spatial gradients. The idea is to narrow the domains with the highest amplitudes of the highest order Chebyshev coefficients, since these indicate limited resolution. Substantially extended accuracy, with only marginally increased computing time, has been found for the Burger equation. Results will be reported elsewhere.

Early implementations of the GWRM were compared with finite difference methods with respect to convergence, accuracy and efficiency for the two model problems discussed above [

In [_{t} = 1, K = 9, N_{s} = 2, L = 7 using a relatively fast algorithm where the spatial subdomains were solved independently at each iteration, and coupled thereafter. We here and henceforth use the notation T = t_{1} − t_{0}, with t_{0} = 0. The number of time intervals is denoted by N_{t} and N_{s} is the number of spatial subdomains. Obtained run parameters were CPU time 2.48 s and memory use 182 MB. This algorithm is, however, often numerically unstable [^{−3}, using 14.1 s CPU time and 192 Mb. The new, optimized code (employing the measures of section 3) is substantially more efficient, using 1.27 s and 37.1 MB.

Comparing with finite difference methods, the same accuracy is obtained with the second order Lax-Wendroff method [^{−3}.

In summary, for an accuracy of 1.0 × 10^{−3} the optimized GWRM solution of the Burger equation for v = 0.01 required about half the CPU time of the explicit Lax-Wendroff method and only 15% of the memory. The semi-implicit Crank-Nicolson method needed the same amount of memory as the GWRM but was about two times faster. In this section accuracy is studied, so we now turn to comparisons for higher accuracy.

Using the optimized GWRM, again for v = 0.01 , an increased accuracy of 1.0 × 10^{−4} was obtainable for T = 10, N_{t} = 5, K = 6, N_{s} = 5, L = 7, requiring 4 iterations for each time interval, 6.72 CPU s and 88.3 MB. The Lax-Wendroff method needed 57.4 CPU s and 1430 MB, using Δ x = 1 / 200 and 8100 time steps. Corresponding parameters for the semi-implicit method was 28.6 CPU s, 456 MB, Δ x = 1 / 400 and 4500 time steps. Increasing accuracy to 1.0 × 10^{−5}, the GWRM provides a solution for T = 10, N_{t} = 12, K = 6, N_{s} = 8, L = 7, with 3 iterations for each time interval, in 32.3 CPU s using 195 MB of memory. This accuracy could neither be achieved with the Lax-Wendroff nor the Crank-Nicolson method within 180 CPU s or below 3000 MB of memory. As an example, 2.0 × 10^{−5} accuracy was found for the latter method using Δ x = 1 / 900 and 22,000 time steps in 472 CPU s for 3390 MB memory use.

In summary, for v = 0.01 and an accuracy of 1.0 × 10^{−4}, the optimized GWRM solution required 12% of the CPU time and 6% of the memory of the Lax-Wendroff method. When compared to the Crank-Nicolson method, the numbers become 23% and 19% for CPU and memory requirements, respectively. The GWRM consequently strongly outperforms both finite difference methods for higher accuracies. For lower accuracies the finite difference methods become more competitive.

It is well known that spectral methods often are less efficient for problems where shocks or steep gradients must be resolved. This is confirmed for the stiffer 1D Burger case with v = 0.003 . A steep gradient towards x = 1 develops due to convection, as can be seen in ^{−4 }accurate solution for T = 10, N_{t} = 5, K = 6, N_{s} = 9, L = 7, with maximum 4 iterations for each time interval, in 17.4 CPU s using 181 MB of memory. The Lax-Wendroff method requires, for the same accuracy, 2.75 CPU s and 180 MB, with Δ x = 1/80 and 1000 time steps. Corresponding parameters for the semi-implicit method are 4.62 CPU s and 187 MB, using Δ x = 1/80 and 4000 time steps. For an accuracy of 1.0・10^{−4} the GWRM needs T = 10, N_{t} = 10, K = 6, N_{s} = 20, L = 7, with maximum 4 iterations for each time interval, in 153 CPU s using 306 MB of memory. The Lax-Wendroff method uses 73.2 CPU s and 1420 MB for the parameters Δ x = 1 / 200 and 10,000 time steps, whereas the semi-implicit method uses 106 CPU s and 2040 MB for the parameters Δ x = 1 / 300 and 20000 time steps. Thus it is again seen that for high accuracy the GWRM becomes comparatively more efficient and much less memory demanding than the finite difference methods.

Of particular interest is GWRM CPU time and memory scaling with N_{t} and N_{s}. Using the case mentioned at the beginning of this section we have performed scans where N t ∈ [ 1 , 15 ] and N s ∈ [ 1 , 15 ] . It was found that CPU time scales as N t 1.0 N s 1.43 and memory usage as N t 0.0 N s 1.08 (for N t > 2 ). These scalings represent a substantial improvement as compared to the cubic and quadratic

scalings with N t N s for CPU time and memory, respectively, that hold for unoptimized code without subdomains (assuming KN_{t} and LN_{s} global modes would be used instead).

Finally we consider which of the measures S1-S8, G1-G6 that contribute most to the improved GWRM performance. Clearly simultaneous use of temporal and spatial subdomains (G1, G2) is important through the avoidance of high order global temporal and spatial modes. The CPU time linear dependence (and memory independence) on N_{t} is expected, whereas band matrix methods (S5), and also measures S1-S4, S6-S7, contribute to the weak N_{s} dependence. The present Burger problem is easily solved by SIR, which converges also in Newton mode, being quite insensitive to the choice of initial vector x_{0} (S8). Measures G3-G6 were unimportant here. We may mention, however, that measure G3, automatic time interval adaption, may improve efficiency substantially in certain problems; for example in a solution of three coupled, time-dependent and chaotic ODEs it leads to GWRM efficiency beyond that of fourth order Runge-Kutta methods [

The forced 1D wave equation studied in [

Focusing on efficiency in finding smooth, time-averaged solutions, accuracy is here determined by comparison with the slow time scale part of the exact solution, that is the second term of u ( t , x ) = cos ( n π ν 0.5 t ) sin ( n π x ) + A sin ( α t ) sin ( β x ) . The optimized GWRM code solves the case in [_{t} = 1, K = 6, N_{s} = 1, L = 8 for A = 10, α = 2π/T, β = 3π, n = 3) to an accuracy of 0.08 in 0.212 CPU s using 36.1 MB of memory. The Lax-Wendroff method solution of [

Being a hyperbolic equation, the wave equation is not well suited for the use of implicit finite difference methods because of the problem of resolving phase at time steps larger than the maximum time step stipulated by the CFL condition. Here, however, the emphasis is rather on time-averaged accuracy and efficiency, thus it is of interest to see how an implicit method like Crank-Nicolson’s performs. This method has now been optimized in relation to [

achieved employing 1.16 s and 64.1 MB. The Lax-Wendroff method is thus preferable of the two finite difference methods, in spite of being explicit. The GWRM, however, is more accurate and much faster than both the finite difference methods.

The case above features a single wavelength of the slow time scale. In practical situations, many period and slow time scale solutions often are of interest. In _{t} = 10, K = 6, N_{s} = 2, L = 8 for A = 10, α = 20π/T, β = 3π, n = 3). A global accuracy of 0.22 was obtained using 2.66 CPU s and 83.2 MB of memory. Using N_{s} = 1 (a single spatial domain) nearly the same accuracy was obtained in 1.08 s, using 66.7 MB.

Comparing with the finite difference methods, Lax-Wendroff obtains the same accuracy with Δ x = 1 / 50 and 10,000 time steps (CFL limit) using as much as 15.8 CPU s and 442 MB. The Crank-Nicolson method features low accuracy because of phase drift and is thus outperformed by the Lax-Wendroff method.

Magnetohydrodynamic (MHD) stability usually is a necessary condition for magnetically confined fusion plasmas. Theoretically, the stability of a specified plasma equilibrium may be tested by linearizing the so-called ideal MHD equations

∂ ρ ∂ t + ∇ ⋅ ( ρ u ) = 0 (7)

ρ d u d t = j × B − ∇ p

E + u × B = 0

d d t ( p ρ − Γ ) = 0

∇ × E = − ∂ B ∂ t

∇ × B = μ 0 j

Standard notation has been used; mass density, scalar kinetic pressure, fluid flow, magnetic field, current density and electric field are denoted by ρ , p , u , B , j , E , respectively. The notation d / d t ≡ ∂ / ∂ t + ( u ⋅ ∇ ) for the total derivative has been used. Having specified a plasma equilibrium and the boundary conditions (in this case in circular cylindrical geometry), a perturbation is applied and the plasma time dynamics is investigated for possible exponential growth, in which case the equilibrium is unstable. Details are given in [

The stability of two equilibria will be studied here, applying the GWRM. The first is that of [_{0z} = 0.05 (normalized units, erroneously given as 0.2 in [_{0z }= 0. The azimuthally perturbation has Fourier mode number m = 1 and the axial mode number is k = 10. Both equilibria are strongly unstable to this perturbation, featuring exponential growth rates of order unity (normalized to the Alfvén time). A difficulty for the GWRM is thus to polynomially resolve the exponential growth in time. In order for the dominant mode to develop, the equations typically needs to be solved for times T = 10 or more. For benchmarking, GWRM results are compared with an eigenvalue shooting code [

First we note that the CPU time and memory requirement for case A, earlier discussed in [_{s} = 1); furthermore temporal K and spatial L maximum mode numbers were both 5. Employing the improvements described in Sections 3.1 and 3.2, the CPU time becomes reduced to 4.44 s and memory to 89.8 MB. Both cases gave the same result; growth rates = 0.83 within 1% error and eigenfunctions within approximately 2% error.

Of particular interest is dependence on number of time intervals N_{t} and number of spatial domains N_{s}. Since a linear equation is solved, the A matrix needs to be solved only for the first time interval (see S4). For the case above, the first time interval needs 1.49 s for full solution, whereas succeeding time intervals on average require only 0.68 s, thus a 54% reduction. The CPU time scaling with N_{t} for these time intervals is linear. Memory requirements are essentially independent of N_{t}.

For case B) the parameters T = 20, N_{t} = 3, K = 5, N_{s} = 1, L = 5 were here used for a run that took 11.1 s, using 105 MB memory, with 15% maximum error in eigenfunction u_{r}. Using N_{s} = 2 (with 1.0 ´ 10^{−6} overlap), the CPU time increased to 26.2 s and memory to 336 MB, whereas eigenfunction error decreased to 5%; see

Increasing the number of spatial domains, the CPU time scaling N s 1.49 was obtained, in stark contrast to the unoptimized scaling N s 3 . The memory scaling was found to be N s 1.69 (rather than N s 2 ) as a result of memory use unrelated to SIR.

The ambition of this work has been to evaluate the performance of optimized implementations of the time-spectral method GWRM as compared to finite difference methods in time. In early work [

In the present work we report on results using fully coupled spatial subdomains. As described in Section 3, a number of measures to enhance efficiency both for the GWRM itself, but also for the root solver SIR, have been developed. Returning to the earlier model problems, using the new algorithms, we now find strongly enhanced performance. Of primary importance are the improved CPU time and memory scalings, where the uses of sparse matrix methods play central roles.

Let us estimate the requirements for solution of an advanced 2D initial-boundary value problem. Primarily, GWRM efficiency depends on the solution of a matrix equation in SIR for determining the matrix A, used by SIR. In the absence of subdomains in time and space, the dimension N of this matrix is determined by the number of simultaneous equations to be solved N e , and the number of Chebyshev modes ( K , L x , L y ); thus N = N e K L x L y , assuming K + 1 ≈ K etc. With N e = 5 , K = 100 , L x = 50 , L y = 50 , we obtain N = 1.3 × 10 6 . Standard Gauss elimination requires Ψ = O ( N 3 ) operations for each SIR iteration. Thus, for this case, Ψ ≈ 2 × 10 18 operations, which would call for high performance computers.

A substantial improvement in efficiency comes from the introduction of subdomains in time and space. We let L x = N x L s x , L y = N y L s y and K = N t K s . Here N x and N y denote the number of spatial subdomains in the x- and y-directions, respectively and N t is the number of temporal subdomains. L s x , L s y , and K s denote the number of Chebyshev modes used for each domain. In the unoptimized case we have approximately N = N e K s N x N y L s x L s y and Ψ = O ( N t N 3 ) . Letting N t = 10 , N e = 5 , K s = 10 , N x = 10 , N y = 10 , L s x = 5 , L s y = 5 we find N = 1.3 × 10 5 and Ψ = 2 × 10 16 , Clearly, spatial optimization is of the essence. The scalings found from the optimizations presented in this paper substantially improves the situation. Using the optimized dependency O ( ( N x N y ) 1.45 ) obtained in this work, rather than O ( ( N x N y ) 3 ) , we find Ψ = 1.6 × 10 13 , a substantial reduction. A gigahertz table top computer could thus solve the problem within a few hours.

The scalings discussed above are indeed validated for the 1D problems considered in this paper; taking into account the N e dependence good agreement is obtained with the CPU times used.

Turning to 3D problems, we may assume a further scaling of the number of operations with N z 1.45 L s z 3 . Thus for a problem with N z = 10 and L s z = 5 , using the above parameters, we have N = 6.5 × 10 6 and Ψ = 5.6 × 10 16 , which is not prohibitive for high performance computers.

Further novel ideas for improving GWRM efficiency can, however, be employed. In recent work, to be published elsewhere, the number of simultaneous global spatial equations to be solved by SIR is reduced to the boundary equations (external plus internal) only. The physics equations of each spatial subdomain are solved locally at each iteration, and strong time parallelization is possible. The resulting improved scalings are particularly important for problems with multiple spatial dimensions.

In this paper we have not employed automatic adaption of the time intervals (G4). This method has been proven to be very efficient when the GWRM was used for solving a set of chaotic differential equations in time, typical for numerical weather prediction [

The time-spectral generalized weighted residual method (GWRM) employs a Chebyshev polynomial representation in time instead of the time differencing procedures that are typical for standard methods for solving differential equations. Unoptimized use of the method is, however, hampered in efficiency by the cubic dependence of the number of operations on the total number of Chebyshev modes. Several measures for enhancing efficiency, primarily sparse matrix methods, have here been studied, employing multiple temporal and spatial domains.

It was found that Burger’s 1D equation, with viscosity parameter υ = 0.01 , was solved significantly faster and more accurate by the GWRM than by the explicit Lax-Wendroff and the implicit Crank-Nicolson finite difference methods for accuracies of order 1.0 × 10^{−4} or higher. Furthermore, it was found that GWRM CPU time scales as N t 1.0 N s 1.43 and memory usage as_{t} and N_{s} are the number of time intervals and spatial subdomains, respectively. This is a significant improvement of the

The slower time scale of a forced wave equation problem, solved by all three methods, is found and followed by the GWRM much faster and using less memory than the finite difference methods.

For an ideal MHD stability problem, it was found that the performance enhancement measures S1-S8, G1-G5 of section 3 yielded a more than five-fold increase in efficiency. Being a linear problem, for which information from the first time interval can be reused, the CPU time for further time intervals becomes halved. A CPU time scaling with spatial subdomains

In closing, it may be mentioned that all obtained GWRM solutions are analytical piece-wise polynomial expressions in time and space, thus immediately tractable for analysis. By using Chebyshev expansions also in parameter space, scaling behavior can be determined in a single GWRM run, as demonstrated in [

Scheffel, J. and Lindvall, K. (2018) Optimizing Time-Spectral Solution of Initial-Value Problems. American Journal of Computational Mathematics, 8, 7-26. https://doi.org/10.4236/ajcm.2018.81002