On DFT Molecular Simulation for Non-Adaptive Kernel Approximation

Using accurate quantum energy computations in nanotechnologic applications is usually very computationally intensive. That makes it difficult to apply in subsequent quantum simulation. In this paper, we present some preliminary results pertaining to stochastic methods for alleviating the numerical expense of quantum estimations. The initial information about the quantum energy originates from the Density Functional Theory. The determination of the parameters is performed by using methods stemming from machine learning. We survey the covariance method using marginal likelihood for the statistical simulation. More emphasis is put at the position of equilibrium where the total atomic energy attains its minimum. The originally intensive data can be reproduced efficiently without losing accuracy. A significant acceleration gain is perceived by using the proposed method.


Introduction
Nanotechnology tends to be the technology of focus in this century [1].Many of the macroscopically visible continuum applications seem to have attained nearly optimal state in computer implementation as well as in theoretical analysis.Not only the analytical and numerical treatments of nanotechnology are interesting but it has real-world applications in various disciplines including aircraft, automobile, electronic and medical engineering.This work is partially a continuation of our former studies in [2]- [4] but we consider here electronic structures instead of solute-solvent interactions.In this paper, we would like to concentrate on the efficient treatment of quantum information.We present a method which is efficient for noisy data that contain measure-ment imperfection because quantum data measurements are not completely ab-initio.Among other methods, DFT (Density Functional Theory) contains some parameters which are obtained from fitting procedures.Our motivation is to generate a system which is both accurate and fairly inexpensive to evaluate.Our method applies to all sorts of quantum information but we focus on the atomic energy in this paper.The availability of efficient and accurate evaluations can be used for subsequent applications.The region where the system admits its equilibrium is examined more carefully than the remaining region during the approximation.That is, a geometry optimization must be performed initially in order to determine the position where the energy attains its minimum.The most general setup enables an estimation of an unknown function in arbitrary dimension such that noisy sample functions together with their higher derivatives are provided.In this document, we treat the simplified version where only the sample values are provided.Depending on the desired accuracy and the size of the atomic system, the determination of the energy can last several minutes till several hours by using direct DFT computations.The molecular configurations consist of germanium, silicon or composites of them using several space group symmetries.The function to be reconstructed is the energy surface of a configuration when some tensor transformations are applied to DFT.The possible transformations include isotropic stretching, anisotropic stretchings and Voigt tensor.Our numerical experiments reveal that the proposed statistical pre-fitting process seems to be a reasonable approach to obtain a good accuracy in an affordable computing time.The preparation overhead lasts very long but it needs only to be performed once for all and its output can be stored.The acceleration factor between the direct DFT and the kernel based approximation is approximately of order 10 5 .Due to this significant acceleration gain, the preparation process shows to be worth calculating.

Ground State Energy
For a configuration of nuclei { } j a of number u N , the main problem in electronic structure computation in- volves the general Hamiltonian operator.We denote the coordinates of the i-th electron by For the Born-Oppenheimer or adiabatic approximation, one assumes that the mass and the volume of the atoms are very large in comparison to those of the electrons.Thus, the atoms move comparatively slower than the electrons.As a consequence, one treats the time-independent Hamiltonian operator with respect to a set of nuclei i a which are supposed to be stationary.That is, the electronic structure is go- verned by the expression The above three terms are related to the kinetic energy, the atom-electron interaction and the inter-electron interaction while e m , e q , j Z ,  are quantum parameters.The domain of computation is , , , , where , 1 2, 1 2 , , , , , , , , . The ground state energy corresponds to the smallest eigenvalue min E of (2).The anti-symmetrization operator  applied to any e N -variate function f is defined by ( ) ( ) in which e N

Π
is the set of permutations over { } The anti-symmetrization operator  has the properties that it commutes with the Hamiltonian operator and that for an e e N N × matrix M one has . The Hartree-Fock approach is the variational formulation on the Hamiltonian operator (1) where the trial functions are antisymmetric functions.The main difficulty is that the problem is of 3 e N -dimension without taking the electronic spins into account.In ad- dition, on account of the antisymmetric property, a direct use of the Slater determinant often produces computations having orders of ( ) which are very expensive.Counting the electronic spins leads usually to a factor of 2 e N which makes the computation even more intractable.Some simplifications of the stationary Hamilton operators have already been proposed.For the DFT, one solves a set of equations for each electron.The similarity of the solutions is then derived from the theory of Kohn-Sham [5] which consists in replacing the complicated initial problem into several ones.For each 1, , where eff V is the effective potential energy which depends implicitly on the total electron density ( ) ( ) x x .The problem (2) is then reduced from dimensions 3 e N to e N sets of 3D smaller problems (5).The influence of one electron with respect to the other electrons is measured by the total electron density.These approaches enable the treatment of Hamiltonian problem even for an electronic structure having a large number of particles on a single desktop.The effective potential is constituted of the Hartree potential H V , the exchange correlation potential XC V and the external electrostatic field such as x x x x x x (6) in which the Hartree potential is the inverse of the Poisson operator such as ( ) ( ) For its evalu- ation, either a Poisson problem is solved or one convolves with the Green fundamental solution such as x x x x .The exchange-correlation potential is some correction term [6] which is usually done by LDA (Local Density Approximation) or GGA (Generalized Gradient Approximation).Analytic expressions of the correlation energy are only known in a few special cases which mainly consist of the high and low density limits.The exchange-correlation potential is related to the exchange-correlation energy by as the exchange and the correlation parts.In term of the exchange-correlation energy density XC where .
For the local density approximation (LDA), the exchange energy density is expressed as Analytic values of the correlation energy density are only known for some extreme cases.The external electrostatic field potential V ext is pro- vided by the kernel becomes known, the Khon-Sham approach uses the approximation to E of ( 2) by The main improvement from LDA to GGA is that the exchange-correlation energy does not depend only on the total electron density but also on its gradient such as [ ]( ) Because of the imperfec- tions resulting from the estimation of parameters from experimental measurements, the DFT approach is not ab-initio as in the initial equations.But the statistical fitting approach proposed in this paper can deal with noisy data.The eigenvalue problem in ( 5) is nonlinear because its variational operator depends on ρ which in turn depends on i ψ .It is solved by using a sequence of the linear eigenvalue prob- lems SCF (Self Consistent Field).After assembling the operator ( 9), the linear eigenvalue problem is solved for the smallest eigenvalue ( ) x is determined as the eigenfunction corresponding to ( ) min k λ .In practice, one represents Ψ as a set of basis [7] which are usually plane waves, Finite Element Me- thod, or wavelets.

Kernel-Based Approximation
In this section, we will survey the main points about kernel-based approximation which are relevant in the quantum approximation.The most general setup of that approximation in any dimension D consists in ac- cepting some inputs which are , in which  and  are some subsets of  while i ε , i κ , i λ are measurement imperfections.In general, using the gradients is computationally very expensive so that the set  is much smaller than the set  while the Hessians are even more restrictively used.In this paper, we use only data measurements such that = ∅  and = ∅  .The correlation [8] function to be used is the Matern function where ( ) , ν  are positive hyper-parameters and K ν is the modified Bessel function.In dimension D , the spectral density of the Matern function is If ν tends to infinity, one obtains the squared exponential case.For the special case of half-integers such as 1 2 p ν =

+
, one obtains a product of an exponential and a polynomial of order p such that (13) becomes For the most applied cases where 3 2 , one has ( ) Suppose N is the number of observations such that ( ) in (10).For noise-free observations, one has the covariance For a noisy data where the additional noise follows the Gaussian distribution Cov , , .
Let us denote the sampling points by x which is a matrix having the size N D × where D is the dimension of the trial variables i x .The latent function and the set of observations are defined in a similar . The marginal likelihood is the integral of the likelihood and the prior such as We denote by ( ) where r will be defined below as a generalized distance between p x and .We will use also K to denote the de- and one has no noise, then the marginal likelihood is ( ) ( ) By taking the logarithm, the computation of the marginal likelihood is as follows ( ) ( ) from which one obtains the log marginal likelihood ( ) In the presence of noise ( ) 2 0, n ε σ   by using (19), the negative log marginal likelihood becomes The first term ( ) y is the data-fit term while the second term

N
are respectively the complexity term and the normalization term.For a set of test points are defined in a similar way as X and ( ) , K X X .The predicted estimation follows the Gaussian distribution That is to say, for given X and y , the predictive distribution for a covariate vector * X is a Gaussian ad- mitting the following mean and variance where those expressions are dependent on some set of hyper-parameters which we discuss below.The value of the generalized distance r for given in which 1 , , D λ λ  are related to hyper-parameters.In general, the set of hyperparameters include ( ) where  is from ( 16) and (17) as well as ( ) The main objective is to determine the hyperparameters by optimizing the marginal likelihood.In practice, using the log marginal likelihood is more efficient to implement.In order to accelerate the speed of the nonlinear optimization, we need the gradients of the functional with respect to the hyperparameters.For a hyper-parameter θ , one has the partial derivative From the chain rule, one obtains This is singular when x which occurs on the diagonal entries of the covariance matrix.In fact, one , ., Working with 2 : R r = provides a regularization everywhere as ( ) is smooth at the origin because one has indeed ( ) ( ) where MATERN f is a polynomial.In particular, for the cases where . In fact, one obtains the relation where MATERN q is a polynomial.Therefore, one deduces from (35 which is regular for all values of ( ) , p q r x x .

Simulation Results
In this section, we would like to report some results from computer simulation of the formerly described approach.First, we will present some results pertaining to general real valued multi-variate functions.That will be followed by some application in quantum simulation.The former theoretical approach was implemented by using C/C++, BLAS/LAPACK and NLOPT.The BLAS packet is used for the fast vector operations.We use LAPACK for the linear operations such as Cholesky factorization and dense matrix solvers.We use NLOPT for the nonlinear operations for both the geometry optimization and the optimal hyper-parameters in the log marginal likelihood (25).NLOPT supports diverse nonlinear optimization operations [9] in which local optimizers are involved.A local one searches only inside a neighborhood of a certain provided starting initial guess.The optimizers are performed by using derivative free or gradient based algorithm which are available in NLOPT.Derivative free algorithms include BOBYQA (Bond Optimization BY Quadratic Approximation), COBYLA (Constrained Optimization BY Linear Approximation), NEWUOA (NEW Unconstrained Optimization Algorithm).Gradient-based methods include MMA (Method of Moving Asymptotes) and LBFGB (Limited memory Broyden-Fletcher-Goldfarb-Shanno).The code is a very developed version of the matlab implementation provided in [8].The new additional enhancements from the matlab version consist of the following features.First, using NLOPT provides a lot of improvements as compared to the original matlab nonlinear conjugate gradients.That can be observed when both the number of points and the dimension become large in the nonlinear optimization of the hyper-parameters.In addition, we can also accept higher derivatives in the input apart from the functional values.One can use the entire gradient or only some components of it.Furthermore, the gradient of the kernel-based approximation can be evaluated.That can be done analytically instead of using finite difference.In addition, our C/C++ code admits some python interface enabling direct application to ATK which is the quantum package we use.
As a first test, we consider the reconstruction of the function ( ) ( ) , , sin  by using the kernel-based approximation.Since that function does not present any special feature such as cusp or boundary layer or any special interesting region, we use only randomly generated points.The initial guess of the hyper-parameters is provided by the users.One considers the determination of the final hyper-parameters as an unconstrained nonlinear optimization.The results of the computations are collected in Table 1 where the errors are computed for the function values as well as for the gradient values.The results exhibit the performance for different numbers of data for each dimension.In addition, we examine the effect of increasing the dimension D .In Figure 1(c), we display the error plots in function of the number of data.We consider dimensions 3, 4,5, 6 D = by analyzing the errors in the function evaluation.We observe that the increase of the dimension does not really deteriorate the accuracy too much.One needs certainly more points for higher dimensions but the increase of the number of additional points is not too significant.
For the application to nano-simulation, we consider the unit cell generated by three vectors 3 , , After a transformation ( ) : For a bulk configuration  inside the unit cell, we denote by ( ) T  the transformed configuration where the fractional coordinates belonging to the transformed unit cell ( ) T  remain the same as in the original configuration  .
Their values are within [ ] 0,1 before and after the transformations.That is, the fractional coordinates of the reference configuration  remain unchanged from beginning till the end of the computation.We are interested in the impact of the DFT energy by applying the transformation to the unit cell vectors.The transformation depends on D parameters so that the D -variate function to be approximated maps The first transformation consists of an isotropic one that corresponds to a unidimensional function which is a scaling.That transformation amounts to stretching and confining the unit cell equally in all x , y , z direc- tions.Thus, the function to be approximated maps λ into ( ) where one has : . The next transformations are anisotropic in the sense that the unit cell is scaled differently in the x , y , z directions.The first anisotropic transformation scales in a way that 1 : . That transformation corresponds to the case where the dimension is 3 D = and ( ) T λ λ λ can be represented as a diagonal matrix having ( ) , , λ λ λ as entries.For the 2D anisotropic case, the transformation is given by ( ) , T λ λ which has diagonal entries ( ) , , λ λ λ λ .For the most general trans- formation, one utilizes a strain tensor This is the Voigt strain tensor which is usually encountered in anisotropic transformations such as elasticity.This general transformation corresponds to a six-variate function since the Voigt strain matrix is supposed to be symmetric.The isotropic and 2D/3D anisotropic transformations are ensured to be non-singular, provided that the diagonal parameters are strictly positive.The tensor transformation in (38) is non-singular as long as the parameters [ ] ε are not too large as the Voigt tensor approaches the identity transformation.Although we are in- terested in the general approximation of the quantum energy, we put more emphasis on the neighborhood of the position where the energy attains its minimal value because it defines the equilibrium of the system.The main objective is to obtain a good approximation next to the optimal point ( ) , , , , argmin Energy for a molecular or bulk configuration  .We have built a function that performs the geometry optimization using unpolarized single ζ DFT calculators for any given molecular configuration.This is the first instance in the program where the optimizer NLOPT is applied.The Gaussian approximation are applied on a set of points , , In our applications, we accumulate very dense points in the neighborhood of the optimal value opt λ while only few points are used elsewhere.An illustration of this situation is displayed in Figure 1(a) for the unidimensional case where the energy curve is traced together with the samples accumulating at the equilibrium.A typical sampling point distribution for the spatial case is depicted in Figure 1(b).The implementation of the quantum application was realized by using ATK (Atomistix ToolKit) [10] [11] which consisted of python scripts.The ATK has some GUI extension well known as VNL (Virtual NanoLab).For given nuclei coordinates { } x x e i − where 50% x = in a germanium-silicon combination obtained from a diamond structure.After applying the approach described in the previous sections to several atomic configurations, we collect the major results on Table 2.It mainly summarizes the elapsed time for the preparation and the performance of the new method.In the tabulated outcomes, we consider first Body Centered Cubic of germanium.In addition, we use composites which are labeled G S e i .All those composites are structured by us- ing Face Centered Cubic as space group symmetry.The appropriate parameter values are obtained from the American Mineralogist Crystal Structure Database [12].The preparation step consists of a geometry optimization and the determination of the kernel approximation.The duration of the Gaussian kernel is dominated by the DFT computations related to the point samples.The numbers of point samples are 70, 105, 200 and 250 for isotropic, 2D anisotropic, 3D anisotropic and Voigt tensor respectively.The stochastic computation as described in section 2.2 is very fast.In fact, the application of stochastic simulation is at most 2 percent of the whole preparation.All the computations were performed with the DFT basis unpolarized single ζ which is the least intensive basis available in the implementation.If other bases were used, the time for the DFT computation would last even much longer.In the case of DFT double ζ polarized, the scaling of the computation intensity might be doubled.In all dimensions, the preparation expense has long durations.Depending on the tensor transformation and the number of sample points, the preparation overhead can last a few minutes till several days.But the output of those preparations can be stored so that they need only be computed once for all.The ratio between the direct DFT-evaluation and the evaluation using kernel approximation is displayed in the last column of the for the isotropic, 2D anisotropic, 3D anisotropic and 6D tensor cases.Since the acceleration advantage is very good, investing on the preparation process is worth calculating as the results can be stored and subsequently post-processed.In order to observe the improvement of the accuracy, we consider two silicon configurations in Figure 3.As a matter of fact, the first configuration consists of silicon admitting a hexagonal lattice using space group P6 3 /mmc.The second one admits a Face Centered Cubic lattice possessing the space group Fd3m.For both configurations, the bond lengths are obtained from the American Mineralogist Crystal Structure Database [12].They are respectively represented by triangular and circular marks on the diagonal.That figure displays a comparison of the accuracy between the DFT computation and the kernel approximation.Marks which are closer to the diagonal identify good agreements between the two methods.We present only the case of anisotropic 3D transformation in this comparison because the other cases would exhibit similar results.The only difference between the two figures is that more sample points are used for the second one.More precisely, 45 sampling points are used in the first si-  while 225 are used for the second one.One can observe that the values on the diagonals become more precise as more sampling points are used.Additional sampling points can be used if a better accuracy is desired with the costs of having longer preparation overhead.

3 ⊂ 2 +
Ω  which is supposed to be sufficiently larger than the nuclei cloud { } 1 u N j j = a such that the above operator has neglecting influence beyond Ω .By considering the electron spins which take values 1 example of using the ATK simulation, we observe the isosurface of the electron density function in Figure 2 for a composite system of germanium/silicon in which only the atoms inside the unit cell are displayed.The displayed figure corresponds to 1 G S

Table 2 .
It exhibits that in average, the orders of acce-

Table 2 .
Preparation overhead and acceleration gain.