The Bivariate Normal Integral via Owen's T Function as a Modified Euler's Arctangent Series

The Owen's T function is presented in four new ways, one of them as a series similar to the Euler's arctangent series divided by $2\pi$, which is its majorant series. All possibilities enable numerically stable and fast convergent computation of the bivariate normal integral with simple recursion. When tested $\Phi_\varrho^2(x,y)$ computation on a random sample of one million parameter triplets with uniformly distributed components and using double precision arithmetic, the maximum absolute error was $3.45\cdot 10^{-16}$. In additional testing, focusing on cases with correlation coefficients close to one in absolute value, when the computation may be very sensitive to small rounding errors, the accuracy was retained. In rare potentially critical cases, a simple adjustment to the computation procedure was performed - one potentially critical computation was replaced with two equivalent non-critical ones. All new series are suitable for vector and high-precision computation, assuming they are supplemented with appropriate efficient and accurate computation of the arctangent and standard normal cumulative distribution functions. They are implemented by the R package Phi2rho, available on CRAN. Its functions allow vector arguments and are ready to work with the Rmpfr package, which enables the use of arbitrary precision instead of double precision numbers. A special test with up to 1024-bit precision computation is also presented.

The method presented in this paper extends the theoretical knowledge and can be easily implemented.Its great advantage over the already mentioned methods, which enable Φ 2 ϱ (x, y) computation via Owen's T function, is its simplicity, faster convergence and ensured numerical stability.Regarding other methods that have proven to be good or even excellent in terms of Φ 2 ϱ (x, y) computation speed and accuracy, e.g. the one implemented in the pmvnorm() function from the mvtnorm package (Genz and Bretz 2009), the main advantages are simplicity and independence from various constants.Both enable easy portability between different environments and using high-precision arithmetic.There is no need to adjust the number and precision of various constants, such as weights and knots in quadrature formulas, which are used in many Φ 2 ϱ (x, y) computational methods.In Section 2, the background results are presented, which are the starting point for the development of a new method, as well as for the acceleration of the tetrachoric series, which was used for the benchmark values computation.In Section 3, auxiliary results are derived.In core Section 4, new series for Φ 2 ϱ (x, 0) are derived in Theorems 3 and 4, and in Corollary 2 they are combined in a unified series for Φ 2 ϱ (x, y).However, in all cases, the series must be supplemented by the standard normal cdf and some of them also by the arctangent function computation.Four series for computing the Owen's T function are presented in Theorem 5.Among them (40b) can be viewed as a modified Euler's arctangent series.In Sections 5 and 6, algorithmic and numeric issues are discussed, and testing results are presented.
Using double precision arithmetic and testing on a random sample of one million triplets (x, y, ϱ) with uniformly distributed components, the maximum absolute error was 3.45•10 −16 , which is approximately 3.11-times the machine epsilon 2 −53 .On the same random sample, but with ϱ transformed so that more than half of |ϱ| were greater than 0.9999, and by equivalent computing of two function values instead of Φ 2 ϱ (x, y) in rare cases with φ 2 ϱ (x, y) > 1, the maximum absolute error was even slightly smaller.

Background results
The bivariate standard normal cdf can be expanded into tetrachoric series (Kendall 1941, p. 196, Abramowitz andStegun 1972, p. 940, 26.3.29) where He k (x) are the "probabilist's" Hermite polynomials defined by The tetrachoric series (8) converges a bit faster than a geometric series with a quotient ϱ and slowly converges for |ϱ| close to or equal to 1.
Setting y = 0 in (8), transforming the summation index k 2 → k, and using (2k for k ∈ N 0 , we get a slightly faster converging series Vasicek (1998, p. 7) found an alternative series for the computation of Φ 2 ϱ (x, 0), which is a good alternative to the series (9) if ϱ 2 > 1 2 .Integrating the well known equation (Plackett 1954, p. 352 with respect to ϱ, using equations Φ 2 0 (x, y) = Φ(x) Φ(y) and (3), we get Due to historical reasons, the variates h and k are often used instead of x and y.We will use h instead of x but will not need k in the role of y.

Auxiliary results
This paper pays attention to the estimation of the method error due to truncation of infinite series, while the analysis of rounding errors is out of its scope.The following lemma is intended for this.In it and all the theorems below, the phrase "truncated after the n th term" also means omitting the entire series.In these cases, set n = −1 and assume the empty sum is 0.

Lemma 1. If the Euler's series
is truncated after the n th term, then the remainder has the same sign as r and is bounded in absolute value by for k ∈ N 0 (Gradshteyn and Ryzhik 2015, p. 397, 3.621, 4.), and using Tonelli's theorem, it follows The denominator in the last integrand is an increasing function of x.Inserting 1 and 0 instead of x implies and, together with B n = r 1+r 2 S(r, 1), ( 20).
In the sequel, the lower and upper regularized gamma functions Γ(a) respectively, where γ(a, x) = x 0 t a−1 e −t dt and Γ(a, x) = ∞ x t a−1 e −t dt are the lower and upper incomplete gamma functions respectively, both defined for a > 0 and x ≥ 0, will be very important, as well as their lower and upper bounds.
2 ) and a substitution 1 2 β 2 t 2 = u, we get where the first sum is finite.Using the fact that P (a, x) is a decreasing function of a (Alzer 1998, p. 154), ( 25) and the absolute convergence assumption of f (x), it follows and Analogously to above, and using (2k and (27).
Note that in the same way as the equations ( 26) and ( 27) were derived, analogous equations can be derived if the integration limits 0 and x are replaced by x and ∞ respectively, and lower regularized gamma function is replaced by the upper one.However, for each case it should be checked whether the order of integration and summation can be reversed.

Main results
The core of this article are Theorems 3, 4 and 5. Since the first two do not specify variants for the transformed parameters, the last one, together with ( 13) and ( 14), is more practically useful.It also justifies the title of the article. where If it is truncated after the n th term, then where where and Stegun 1972, p. 932, 26.2.11), it follows the series on the left side converges for every r, h ∈ R and is an odd function of rh.If rh < 0, then all its terms are positive, implying that it converges absolutely.
Note that the upper bound ( 24) is not suitable for a simple estimation of R n here because lim k→∞ s k+1 = 0, hence Q(k + 1, q) ≤ 1 was implicitly used instead.Q(a, x) is an increasing function of a.The ratio Q(k+1,q) can be greater than 1, e.g. if h = 3 and r = 1, we get where Proof.Since 4), ( 5), ( 29) and ( 34) imply (37).
Note that r 1+r 2 = r 1+r 2 and q = 1 2 (1 + r 2 )h 2 = 1 2 (1 + r 2 )h 2 = q, therefore they are invariant on the transformation of rh → h and 1 r → r, but S n , R n and B n from Theorems 3 and 4 are not because They transform to S n , R n and B n .By transforming the parameters and using (17), the equations ( 30) and ( 35) can be rewritten to respectively, where β = 0 if r ≥ 0 and β = 1 2 if r < 0. The transformation makes sense if |r| > 1, implying ϱ 2 < 1 2 and faster convergence.For example, the inequality 0 ≤ R n ≤ B n from Theorem 4 transforms into 0 ≤ R n ≤ B n , where the upper bound B n is r 2n+4 -times smaller than a corresponding upper bound B n .The noteworthy cost for faster convergence is the additional computation of Φ(rh).
Note that when truncating the series (40a) and (40b), the truncation error in absolute value can be estimated by ( 31) and ( 36) respectively.The same upper bounds with r = 1 r instead of r also apply to (40c) and (40d).

Recursion and asymptotic speed of convergence
Let S = r S 2π (1+r 2 ) and ẽk = B k , where S and B k relate to Theorem 4. Using (34) and ( 28), the series S can be calculated by Recursion 1 (tildes are omitted).Analogously, let Ŝ = r S 2π (1+r 2 ) and êk = B k , where S and B k relate to Theorem 3. The series Ŝ can be calculated by Recursion 2 (hats are omitted).In practice, both recursions can be easily combined, since only four steps, marked with ▷, are different.
In both cases, given ϵ > 0, the recursion is terminated when the condition e k+1 < ϵ is met and S k+1 is taken as its result.However, if ϵ > 0 is too small, or even intentionally negative, the calculation proceeds up to the actual accuracy threshold -the recursion is terminated when the condition S k+1 ≤ S k is met.This criterion was used in the calculations presented in the next section.
In both recursions, all variables are positive and are bounded upward.If q ̸ = 0, the sequence {d k } strictly increases and, due to (24), converges to 1.The sequence {S k } is strictly increasing if r ̸ = 0.If r = 0, the recursion terminates with S 1 = 0. Note that ( 7 , the sequence {S k } changes its sign but keeps the direction.However, the modified Recursion 2 computes Ŝ = r S 2π (1+r 2 ) − arctan r 2π .

Recursion 1
16: Let Sn and Ŝn be the results of Recursion 1 and modified Recursion 2 respectively, calculated with the same h and r if |r| ≤ 1, or with h and r if |r| > 1.Let Φ2 ϱ (h, 0) be calculated by ( 35) or (39), where unknown remainders are ignored.Analogously, let Φ2 ϱ (h, 0) be calculated by ( 30) or (38), where also arctan r ) and Φ2 ϱ (h, 0) underestimates and the other overestimates Φ 2 ϱ (h, 0).Consequently, the upper bounds (31) and ( 36) are not needed if S and modified Ŝ are calculated in the same recursion which terminates when | Ŝk+1 − Sk+1 | become small enough.Upper bounds are also not needed if in Recursions 1 and 2 step 17 is replaced so that S k+1 ≤ S k is the only termination condition.In this case, steps 3, 8 and 15 are unnecessary.
Given q, which one of the series ( 29) and (34) converges faster?Since P (k + 1, x) is a cdf of the gamma distribution with the median ν ∈ (k + 2 3 , k + 1), it follows The answer depends on the number of considered terms, which depends on the prescribed accuracy.On this basis, it seems that series (29) is more suitable than (34) for "small" q and vice versa for "large" q.Tables 1 and 4 in the next section show how this is reflected in practice.
The series (29) or its corresponding transformed version, unwritten and based on (40c), asymptotically converges as k 2 (1−ϱ 2 ) .On the other hand, since (24) implies lim k→∞ Q(k + 1, x) = 1, the series (34) or its corresponding transformed version, unwritten and based on (40d), asymptotically converges as k Based on both asymptotic convergences, comparing (34) to ( 29) would be unfair because the calculation of arctan |r| should also be taken into account for (29).Since it is implicitly embedded in Recursion 1, it is calculated on the fly at the cost of worse speed of convergence.Each of the recursions is executable with the basic four arithmetic operations, two calls of elementary functions (if the calculation of r from ϱ is also considered), and arctan|r| in the modified Recursion 2. However, the computational cost of Φ(h) (and Φ(rh) if needed) should also be taken in account, but it is the same in ( 29) and (34).

Numeric issues and testing results
A common cause of numerical instability, where two absolutely large values give an absolutely small result when added or subtracted, cannot occur in Recursions 1 and 2. The calculation by them is stable, but something may happen that needs to be pointed out.The side effect of transforming rh → h and 1 r → r is that |h| can become unusually large.When computing Φ(h), according to Marsaglia (2004, p. 2), "[...] the region of interest may not be the statistician's 0 < x < 5, say, but rather 10 < x < 14 [...]".Assume the same needs when computing Φ 2 ϱ (h, 0), and using transformed parameters if ϱ 2 > 1 2 .If h = 14 and ϱ = 10 √ 1+10 2 ≈ 0.995, then r ≈ 10, r ≈ 0.1 and h ≈ 140, implying that the region of interest could be |h| < 140.Even the usual |h| < 5 transformed to |h| < 50 is well beyond the established limits.
The "problem" of too large |h| is mentioned because we have to be aware of it if we are going to perform high-precision computation.In any case, the recursion must be complemented by a suitable computation of Φ(h) and arctan r if needed.As an example that not all are like that, the computation by the well known series Φ and Stegun 1972, p. 932, 26.2.10) and the standard double precision arithmetic is numerically unstable even for moderately large |h|, e.g. using the programming language R (R Core Team 2023), for h = 10 we get a completely wrong result, approximately −1683.A similar problem can lead to a significant decrease in accuracy if T (h, r) is calculated with ( 7), as will be seen in Subsection 6.1.
During testing, the values of T (h, r) and Φ 2 ϱ (x, y) were computed for a selected and randomly generated set of parameters.Reference values were computed using the accelerated tetrachoric series and quadruple (128-bit) precision.Absolute differences between tested and reference values are declared as absolute errors.The maximum absolute error is considered a measure of accuracy.Since the mean values and some quantiles are also important, they are shown in the tables in Subsection 6.2.An integral part of the testing was also a comparison of the results obtained with the Phi2rho package and competing packages available on CRAN.

Numeric issues and results when computing T (h, r)
Numerical calculations of T (h, r) were performed on 39, 999 grid points (h, r), where h ranges from −10 to 10 in increments of 0.1, r = ϱ √ 1−ϱ 2 and ϱ ranges from −0.99 to 0.99 in increments of 0.01.Since for ϱ = ±0.99we get r ≈ ±7, the actual range of h for which Φ(h) must be computed is approximately from −70 to 70, so well beyond the h max (0) ≈ 37.64.
Note atanExt = no in tables means that Recursion 1 was executed, hence (40b) was used if |r| ≤ 1 and (40d) if |r| > 1, and atanExt = yes means that Recursion 2 was executed, hence (40a) was used if |r| ≤ 1 and (40c) if |r| > 1.Both recursions are well suited for vectors, so each of the parameters h and r can be a scalar or a vector.If they are both vectors, they must be of the same length, if one of h and r is a vector and the other is a scalar, the latter is replicated to the length of the former.Since the number of iterations is determined by the "worst" component, we would expect execution to be faster if function calls are done in a loop for each component of the vector separately, but usually the run time is significantly increased due to the overhead of the loop, more initialization, etc.
For the consistency test, T (h, r) was calculated on the test grid with h and r both scalars, one a scalar and the other a vector.For all three calculations with Recursion 1, the results matched perfectly, as well as for all three with Recursion 2. The maximum absolute difference between calculations with Recursions 1 and 2 was approximately 9.49 • 10 −17 .
When testing accuracy, the tetrachoric and Vasicek's series, original and accelerated, were also tested, using T (h, r) = Φ 2 ϱ (h, 0) − 1 2 Φ(h).By transforming the parameters, if appropriate, both comparison methods gained a lot, as can be concluded from Table 1.The absolute errors, the maximum values of which are collected in this table, were calculated against the reference values computed with the accelerated tetrachoric series and 128-bit precision.Since the vector computations were performed, there are two average numbers of iterations.The first is empirical because the iterations were terminated according to the "worst" component of the vector, and the second is hypothetical, if the computations were performed for each component separately.Since all iterations ran to the accuracy limit and with an unbounded number of iterations, the original Vasicek's series needed the enormous number of iterations, not only for the worse case, but also in average, when calculating with scalar h and vector r.However, it is not intended for cases with ϱ 2 < 1 2 , and its error can be well estimated, which was ignored here.

Numeric issues and results when computing Φ 2 ϱ (x, y)
In all Φ 2 ϱ (x, y) computational methods based on (4) and ( 5), or ( 13) and ( 14), exceptional cases with |ϱ| = 1, x ̸ = 0 and x − y sgn(ϱ) = 0 are not problematic because (3) can be used, however, nearby cases are their weak point.Recalling that q x = q y = x 2 −2ϱxy+y 2 2 (1−ϱ 2 ) = q and using ( 10) and ( 2 If |ϱ| is close to 1 and x − y sgn(ϱ) ≃ 0, then x 2 − 2ϱxy + y 2 ≃ 0 and q may be small enough that ∂ ∂ϱ Φ 2 ϱ (x, y) is moderate or even large.As a result, even a small rounding error in the calculation of key variables that depend on ϱ can cause a much larger error in the result.The calculation of r x and r y with ( 14) is particularly problematic.In a similar situation, Meyer (2013, p. 5) recommends an equivalent but less sensitive calculation by and analogously for r y .In our case, his proposal improves accuracy, but does not solve the problem sufficiently.
The problem could be solved by a hybrid computation along the lines of some other methods, In this case, testing was performed on the randomly generated set of parameters.After setting the random generator seed to 123 to allow replication and independent verification, one million x values, one million y values, and one million ϱ values were drawn from a uniform distribution on the intervals (−10, 10), (−10, 10), and (−1, 1) respectively.
The absolute errors, which statistics are collected in Table 2, were calculated against reference values computed with the accelerated tetrachoric series as described, but with 128-bit precision.From a user's point of view, only the parameters x, y and ϱ needed to be converted into 128-bit "mpfr-numbers" before calling Phi2xy( x, y, rho, fun = "tOwenT" ), hence x <-mpfr( x, precBits = 128 ) and analogously for y and ϱ.In this case, the computation took significantly more time, since all intermediate and final results were 128-bit "mpfr-numbers".In this test, ϱ min ≈ −0.99999991, ϱ max ≈ 0.99999775, there are 5 cases with φ 2 ϱ (x, y) > 1 and max φ 2 ϱ (x, y) ≈ 1.38.In order to better check the accuracy if |ϱ| is close to 1, the computation was repeated with the same x and y, and ϱ * = 2 Φ(8ϱ) − 1.In this case, more than half of |ϱ * | are greater than 0.9999 and none is equal to 1.The results are in Table 3.In this test, there are 178 cases with φ 2 ϱ (x, y) > 1 and max φ 2 ϱ (x, y) ≈ 376.For both novel series in Table 3, the maximum absolute error is 1.54 • 10 −14 if (41) is not used, improving to 1.21 • 10 −15 if Meyer's advice is followed.Similarly applies to the other two series.
Due to the time-consuming 128-bit precision computation, the presented tests are the only performed tests based on the 128-bit precision reference values.However, additional tests were made by comparing the results of Phi2xy() with the results obtained by other functions from packages available on CRAN.For three functions, the accuracy was found to be worse The OwenQ::OwenT() function also proved to be equivalent to Phi2xy() as measured by the maximum absolute error on the test sets of triplets, but only when upgraded to compute Φ 2 ϱ (x, y) in the manner used in Phi2xy().This function and Phi2rho::OwenT() are essentially wrappers for the internal functions tOwenT(), vOwenT() and mOwenT(), which as workhorses compute series.
In terms of reliability, stability and accuracy of the new methods, no problems or significant differences according to the parameters from the different data sets were detected, except those already described and related to |ϱ| ≈ 1 and x − y sgn(ϱ) ≃ 0. To eliminate them, the equation ( 41) was derived, which also proved to be successful when using the OwenQ::OwenT() function, upgraded to compute Φ 2 ϱ (x, y).Regarding parameters from different data sets, the only detected big difference is in the number of iterations, as can be seen from Figure 1 in the next subsection.
Two million Φ 2 ϱ (x, y) computations on Windows 10 and 64-bit R 4.3.1 on the old Intel i7-6500U CPU @ 2.50 GHz and 8 GB RAM, using only one thread of four, lasted 26, 31 and 286 seconds for Phi2xy(), upgraded OwenQ::OwenT() and pmvnorm() with algorithm = TVPACK respectively.The 128-bit precision reference values computation took over 17 hours, compared to 43 seconds for the double precision one.Based on Tables 1 and 4, we can conclude that the increased number of iterations has only an insignificant effect on the huge time extension factor.

High-precision computation
All functions in the Phi2rho package are ready to use the Rmpfr package, which enables using arbitrary precision numbers instead of double precision ones and provides all the high-precision functions needed.It interfaces R to the widely used C Library MPFR (Fousse, Hanrot, Lefèvre, Pélissier, and Zimmermann 2007).Assuming that the Rmpfr package is loaded, the functions should only be called with the parameters, which are "mpfr-numbers" of the same precision.All 128-bit precision benchmark values used in Subsections 6.1 and 6.2 were calculated using Rmpfr.
To get a sense of how the number of iterations depends on the required precision, the test from Subsection 6.1 was partially repeated using the 128-bit precision computation.The results are collected in Table 4.Note that 2 −128 ≈ 2.94 • 10 −39 and that in this case the values in the table are not the maximum absolute errors because the comparison values are computed with the same precision.From Tables 1 and 4 can be concluded that there is no significant difference in accuracy between the compared series if the accelerated series are considered, but one of the new series converges significantly faster than the others.It can also be concluded that the Vasicek's series with ϱ instead of ϱ is a reasonable alternative to the tetrachoric series (8) also for ϱ 2 < 1 2 and remains a reasonable alternative for ϱ 2 > 1 2 if compared to the transformed tetrachoric series (9).
The number of iterations for the computation with the new series with double and 128-bit precision can be compared in Figure 1.Only the first quadrant is presented because others are its mirror images.5 and 6 can be assumed that the advantage increases with increasing precision.For 53-bit precision (double precision), it required a half, and for 1024-bit precision only a fifth, of the iterations demanded by the competing series.However, the cost of calculating arctan r is not taken into account.In connection with this, let us just mention two more questions.
Recalling that e −x I n (x) = P (n + 1, x), the Taylor series of the arctangent function and (7) imply which Fayed and Atiya (2014a, p. 241) have already noticed.If the above series, ( 29) and ( 34) are viewed as functions of the variable h, they have a similar structure, but the coefficients, depending on r, belong to different arctangent series.Whether the similarity could be explained by a similar Euler transform as the one which transforms the Taylor arctangent series to the Euler's arctangent series seems interesting question, but was not deeply explored.
In Recursion 1, the external calculation of the arctangent function is avoided, assuming that the Euler's series ( 19) is used for it, and the expectation that there would be no difference in the number of iterations for Recursions 1 and 2 if those for calculating the arctangent were also taken into account.However, the use of the external arctangent calculation allows the use of already known and possible future faster converging series.Such is the case with which is based on the Taylor series of arcsin r and arctan r = arcsin r √ 1+r 2 .Indeed, the number of iterations is not significantly smaller if |r| ≤ 1 and the accuracy obtained by double precision computation is sufficient, and even the square root must be calculated, but it could be significantly smaller in high-precision computation.The Euler's series ( 19) is included in (Abramowitz and Stegun 1972) and (Olver et al. 2010), but (42) is not, even though it converges faster.It can be found in (Gradshteyn and Ryzhik 2015, p. 61, 1.644 1.) as r 2 1+r 2 k with a reference to the 1922 source.Due to equations (2k − 1)!! = (2k)! 2 k k! and (2k)!! = 2 k k!, the coefficients are the same as those in (42).Whether we could also find a corresponding series for the Owen's T function, which would converge faster than (40b), remains an open challenge.

y,S
is S with ck instead of c k , and β = 0 if xy > 0 or xy = 0 and x + y ≥ 0, and β = 1 2 otherwise.

Table 3 :
Φ 2 ϱ * (x, y) computation -absolute error comparison as measured by the maximum absolute error on the test grid from Subsection 6.1.Regarding accuracy, only the function pmvnorm() with the parameter algorithm = TVPACK from the mvtnorm package proved to be equivalent to Phi2xy() and even slightly more accurate with the maximum absolute errors 2.58 • 10 −16 and 2.19 • 10 −16 on the test sets of triplets (x, y, ϱ) and (x, y, ϱ * ) respectively.The same function with algorithm = GenzBretz achieved 2.58 • 10 −16 and 2.20 • 10 −11 .