Using Scalit for Performing Accurate Rovibrational Spectroscopy Calculations for Triatomic Molecules: a Practical Guide

This paper presents a practical guide for use of the ScalIT software package to perform highly accurate bound rovibrational spectroscopy calculations for triatomic molecules. At its core, ScalIT serves as a massively scalable iterative sparse matrix solver, while assisting modules serve to create rovibrational Hamiltonian matrices, and analyze computed energy levels (eigenvalues) and wavefunctions (eigenvectors). Some of the methods incorporated into the package include: phase space optimized discrete variable representation, preconditioned inexact spectral transform, and optimal separable basis preconditioning. ScalIT has previously been implemented successfully for a wide range of chemical applications, allowing even the most state-of-the-art calculations to be computed with relative ease, across a large number of computational cores, in a short amount of time.


Introduction
ScalIT, in principle, can be applied across a broad range of scientific disciplines; however, recent development and application has been geared toward molecular chemical dynamics, performing exact time-independent quantum dynamics calculations.More specifically, it has been applied to compute bound rovibrational energy levels and wavefunctions.The software suite has been designed at the lowest level to scale across a large number of CPUs, and be as general as possible, allowing the scientific community to "tap" into ScalIT for use with their own algorithms.In this paper we focus on the operation of ScalIT from the non-expert user's perspective, after a brief description of the underlying numerical methods.If a more detailed analysis is desired, the reader is referred to the literature [1]- [20].High level modules have been created for performing calculations on systems up to four atoms; for simplicity, we restrict ourselves to discuss only triatomic molecules, although the procedure described is nearly identical for more difficult systems.

The "A Matrix" Form, and ScalIT Methodology
We present a broad overview of ScalIT's core functionality here.The emphasis is to provide the reader with an understanding of where in the science landscape ScalIT lives, as well as how its iterative nature has fueled its success.In order to understand the process of using ScalIT, one must have a basic grasp of its problem scope, as well as the fundamental "nuts and bolts" of the machinery used.
We are solving the time independent Schrödinger equation (TISE) formulated for rovibrational dynamics of molecular systems, which is a symmetric eigenvalue/eigenvector problem.It is solved in a variational way, in which the rovibrational Hamiltonian is represented as a matrix, H, in a "basis" of grid points (we use a discrete variable representation (DVR)) distributed over position space.For a molecule of n atoms, there are 3n − 5 rovibrational degrees of freedom (dofs) that must explicitly be considered, which ScalIT describes using Jacobi coordinates.Therefore, triatomics have 4 dofs (r, R, θ, K), r being the distance between two atoms, typically taken to be the heavier two.R is the distance between the center of mass of r and the third atom, and θ the angle between r and R. K is a dof associated with rotation.Figure 1 shows a model of an AB 2 triatomic molecule, meaning two of the atoms are of the same type.
A new Hamiltonian matrix is created and solved for each combination of the symmetry parameters J and ϵ, where J is the total angular momentum, and ϵ is inversion parity.With the coordinate system established, the triatomic Hamiltonian can be expressed as: where V is the potential energy matrix, and i T is the i th kinetic energy matrix associated with the i th dof.V is a diagonal matrix, and each i T is diagonal with respect to all dofs excluding that of its own subscript(s).This discretized grid-based representation of the molecular Hamiltonian yields a resultant matrix with a highly structured block-form, characterized by diagonal blocks of precisely the same size, and off-diagonal blocks which are themselves diagonal.We term this the " A matrix form."The A matrix is sparse, meaning that a large number of the matrix elements are zero.As the number of dofs increases, a fractal, self-similar pattern is produced which increases sparsity, while maintaining the same structure at all scales.A pictorial representation can be seen in Figure 2.
Sparsity of the H matrix is an important numerical consideration, given that the matrix dimensionality, N, scales exponentially with the number of dofs.Consequently, direct diagonalization routines such as Householder's algorithm, for which memory requirements scale as N 2 , and computational effort as N 3 , can become intractable, even for triatomics.Sparse iterative approaches have shown to be a viable means for solving linear algebra problems with such large matrices [11] [12] [18] [21]- [25], for matrix-vector products scale with the number of nonzero elements, thus reducing computational effort with increasing sparsity.Furthermore, storage of the entire H is not necessary, significantly reducing storage requirements.The sparsity and highly structured form of A matrices would suggest the use of sparse iterative techniques, yet research has shown the A matrix to be a  worst case scenario in many respects, that elicits poor performance from all standard iterative methods [26]- [28].To this end, the algorithms developed in ScalIT address the shortcomings, and have proven to be effective and highly parallelizable.
ScalIT overcomes the above difficulties in part through the preconditioned inexact spectral transform (PIST) algorithm [12]- [14].This approach calculates a window of states around a chosen energy by applying the Lanczos algorithm to ( ) ( ) , where I is the identity, rather than to H itself.This shift-and-invert strategy extremizes the states close to energy E , reducing the required number of iterations for convergence, especially in regimes exhibiting a high density of states.The novel aspect of this approach is in not computing the matrix-vector product of ( ) F H exactly, but approximately, and subsequently diagonalizing H in the basis of the inexact Lanczos vectors iteratively until a given number of states are converged to desired accuracy.Each approximate Lanczos iteration ( ) I H b is computed by solving a system of linear equations ( ) E − = I H x b using the quasiminimal residual (QMR) method to some predefined relative error.After each matrix-vector product, the resultant Lanczos vectors are explicitly reorthogonalized using the Gram-Schmidt procedure, to ensure against spurious eigenvalues.The total number of required matrix-vector products is M L, where M is the number of approximate Lanczos iterations, and L is the number of QMR iterations per Lanczos.Typically, M is low due to the spectral transform applied, although L can be large in regions of high spectral density.Although M L is typically lower than alternative methods, the main advantage is the ability to use preconditioning to substantially reduce L, and subsequent total computational effort.
It is well known that when solving a system of linear equations = Ax b iteratively, using a preconditioner P such that can greatly reduce L. A good preconditioner must therefore be "close" to A , as well as easy to invert.It has been shown that for chemical applications, using an adiabatic approximation to the Hamiltonian yields much better results than generic matrix preconditioners [6] [17].ScalIT does this via a recursive application of the optimal separable basis (OSB) procedure [2]- [6] [15]- [17].Simply put, OSB is characterized by applying a series of block Jacobi rotations to the Hamiltonian to minimize off diagonal elements, and eliminating the residual off diagonal components.What is left is a zeroth order approximation to the exact Hamiltonian,

H
which is diagonal in its simplest implementation, thus trivial to invert.To increase preconditioner effectiveness, a "Wyatt" variant of the OSB procedure can be used (OSBW) [6] [13].This corresponds to calculating off-diagonal elements of OSB

H
in the vicinity of E , known as a "Wyatt block", to combat near singularities that occur when eigenvalues are close to E .Computational burden is increased due to the calculation of these off-diagonal elements, yet studies have shown that this is greatly overcome by the reduction in L, especially in high-density-of-state regimes.

Using ScalIT
In the context of bound rovibrational spectroscopy calculations for triatomic molecules, a typical ScalIT calculation will proceed in three (or four) separate stages: 1) Compute 1D phase space optimized (PSO) effective potentials (defined shortly) and basis sets for radial coordinates, r and R.
3) Solve the Hamiltonian matrix eigenproblem to compute rovibrational energy levels and (optionally) wavefunctions.
Each stage has a different high-level ScalIT module (or set of modules) associated with it, as well as its own short input file (typically 6 -10 lines each).
Stage 1 above is problem-(i.e.potential energy surface (PES)-) specific.For each of a very large number of uniformly-spaced points for a given radial coordinate (e.g.r), the PES is globally minimized with respect to all other coordinates, to determine the value of the radial PSO effective potential at that point [e.g.[10].Although using local minima can, under some circumstances, provide a more efficient PSO DVR (and is computationally far less expensive), the global minimum is always reliable, and is therefore safest for use as the default choice for ScalIT [11].In any case, thousands of sampling points are used to avoid spline-ringing problems [9], and also to ensure that the subsequent PSO variational basis representation (VBR) calculation (an intermediate step towards generating the PSO DVR) is extremely well converged (i.e., to several digits of precision beyond that of the final calculation, so as to rule out Stage 1 as a significant source of numerical error).Because the number of sampling points (technically sinc-DVR grid points [29]) is so large, the minimization stage of the calculation is somewhat slow (e.g.requires several hours on a single core), but need only be performed a single time for any given PES.(Parallelization would be trivial, but is not yet implemented in the current version of ScalIT).Once the radial PSO effective potentials are determined, they are used to compute the 1D radial PSO VBR functions via diagonalization of the corresponding 1D effective radial sinc-DVR Hamiltonian matrices; the data is then stored as intermediate output for the later stages.This part of the calculation also only needs to be performed a single time, provided the number of radial PSO VBR functions computed (specified by the user in the input file) is larger than or equal to the largest radial PSO DVR basis size used in the later Stages 2 and 3. Other user-specified inputs include the masses, the potential energy surface (PES), and the radial sinc-DVR coordinate ranges and grid spacings.
Stage 2 employs one of three standard ScalIT modules-depending on whether the triatomic molecule of interest is of the AB 2 form (i.e. two identical nuclei) or not, and if so, whether the even-or odd-permutation symmetry block is desired.In the input file, the user specifies J, ϵ, the PSO DVR basis size for both radial coordinates, r and R, and the maximum j value to use in the basis expansion, j max .j is the body-fixed analog of J. Radial PSO DVRs of the specified sizes are constructed using the PSO VBR data previously computed in Stage 1.The bend-angle basis functions are automatically combined with the appropriate body-fixed rotational states, with optional application of energy truncation to the combined bend-rotation states (K truncation could also be applied).The various components of the full-dimensional, but sparse, rovibrational Hamiltonian matrix are then stored in separate data files (one for the radial PSO DVR grid point and matrix data, the other for the angular contributions).Stage 2 must be repeated for each separate convergence calculation-i.e., for each distinct set of basis parameters used.However, it is quite fast for triatomic systems, typically requiring only seconds to minutes of CPU time.
Stage 3, the "meat" of the ScalIT package, employs but a single standard module-designed to accommodate calculations of arbitrary symmetry and dimensionality (i.e., tetraatomic and pentaatomic calculations use this same module).The coordinates may also be combined into "layers", if desired.The first phase of Stage 3 is the OSB preconditioner construction phase for which the default choice is to treat the radial coordinates as separate inner coordinate layers, and the combined bend-rotation states as a single outermost layer.For standard OSB preconditioning, no user parameters are required, although more sophisticated preconditioning schemes do require this (e.g., specification of the Wyatt window center and width, for OSBW preconditioning).In the second phase of Stage 3, the PIST method is used to compute a predetermined number of rovibrational eigenstate energies (and wavefunctions if desired) in the vicinity of a specified central energy, E, to within a predetermined (iterative) error (generally chosen to be far smaller than the basis set convergence error).Note that wavefunction calculations do not require a second PIST calculation.Also, M is small (typically ~2 times the desired number of states) thus rendering explicit reorthogonalization feasible.As for the QMR component of PIST, it is hoped that preconditioning has reduced L to a manageably small value.As input for stage 3, users specify the number of (possibly combined) coordinates, the energy window over which rovibrational states are to be computed, and various other parameters as described above.An output file is generated that lists the computed rovibrational eigenvalues.Also, various optional data files may be written-as may be used, e.g., for wavefunction plotting purposes in Stage 4.
The optional Stage 4 is executed if wavefunction plots are desired.All of the requisite data may be found in the optional output files of Stage 3.However, this data is not presented in a form that would be useful for plotting via standard packages such as Mathematica.It is the purpose of Stage 4 to extract and output such information, for selected wavefunctions.Specifically, this module will output explicit values for the eigenstate wavefunction density, as a function of the four coordinates, R, r, θ, and K.An optional summation over all K values can also be computed, as well as a transformation from Jacobi to hyperspherical coordinate systems.
The four stages described above are logical, and easy to use in practice.To date, even undergraduate and high school students (with some guidance) have been able to converge triatomic rovibrational state calculations using ScalIT as described above.

Applications
Although ScalIT can, in general, be used to solve a variety of physical problems, current application and development have been limited to calculating bound rovibrational spectra for triatomic systems.We give here only a few examples of the recent applications.
In a recent publication [30], the bound states of the hydroperoxl radical, HO 2 , were calculated up to the highest possible total angular momentum value, 130 J = , corroborating and improving upon the studies previously performed by other researchers (up to 50 J = ).Being a notoriously "floppy molecule", HO 2 has remained a relatively difficult molecule to study, due to large coupling between rotational and vibrational degrees of freedom, and a deep potential well on the ground PES.Despite this, ScalIT was able to efficiently calculate states on a single CPU core in less time than previous studies on parallel platforms, although most calculations performed were done across multiple machines in even less time.
All rovibrational states were converged to an accuracy of 6 10 − eV or better (five or six digits of precision).For 50 J = , a basis size of 5 3.71 10 × was used, one fifteenth of what was required by the previous study.Much of the basis set reduction was achieved through the use of the PSODVR in radial dofs.Moreover, approximately 1500 total iterations were required to reach the above accuracy, compared to 6 ~10 required by Smith [31]- [35].As one moves up the energy spectrum, the number of iterations required for convergence is increased, regardless of the iterative method employed.To this end, high lying states of 50 J = were computed to test ScalIT in what the authors consider to be the most difficult area of the HO 2 spectrum.These were seen to be the most computationally expensive calculations performed, due to the increase in QMR iterations per Lanczos from ~15 to ~400.The necessary basis size only needed to be increased to 6 1.08 10 × , about a fifth of the basis size in the previous study for the low lying states.As J increases, the coordinate range is extended, and the number of relevant eigenstates reduced.At 120 J = , there are only about 20 states for each parity, all of which are very delocalized.To achieve the same accuracy at this high-density-of-states regime, a basis increase of only ~8× was required, and only slightly more QMR iterations were necessary.The HO 2 study has proven to be a testament to the efficacy of applying the ScalIT suite of codes to efficiently calculating bound rovibrational data to triatomic systems.
ScalIT has also been applied to a series of rare gas clusters [7] [19] [20]: Ne 3 , Ne 2 Ar, Ar 2 Ne, Ar 3 , as well as the neon tetramer, Ne 4 .Study of rare gas clusters in chemical physics gives insight to a variety of phenomena, including Van der Waals interactions, solvation processes, and phase change boundaries.Computationally, rare gas clusters can be fairly accurately described by pair-wise potentials, obviating the development of a PES.Despite this, their inherent long range interactions yield extremely "floppy" behavior, and typically exhibit a high density of states, both contributing to the difficulty in computation.All rovibrational bound states were com-puted up to dissociation for each of the respective systems.Due to the relatively heavy nature of Ar 3 , the difficulty of computing high lying states near dissociation is heavily exacerbated, yet was still accomplished with a modest computational allocation and will be the subject for an upcoming publication.The neon tetramer is currently under investigation by the authors and will be, to the authors' knowledge, the most comprehensive rovibrational bound state calculation of rare gas clusters reported to date.

Conclusion
The poor performance exhibited from standard iterative sparse matrix solvers applied to the A matrix has fueled the development of ScalIT.From the ground up, ScalIT has been developed to be a highly parallelizable, robust iterative sparse matrix eigensolver.Furthermore, because parallelization is applied at the level of the matrixvector product, any level of the software suite can be utilized by the broader community, though we focus here on the high-level use of calculating rovibrational spectra for triatomic molecules.By following the procedure given above, a non-expert user can easily compute such spectra.ScalIT has been applied to many molecular applications, and has proven itself even in high-density-of-states regimes.ScalIT is available free of charge upon request to the authors.

Figure 1 .
Figure 1.AB 2 molecule (Model of an AB 2 molecule as described by Jacobi coordinates).

Figure 2 .
Figure 2. A matrix form (A matrix form with greater than two degrees of freedom.This form is characterized by being block diagonal, with off-diagonal blocks being diagonal themselves.Note the self similar pattern of the diagonal blocks).