New Probability Distributions in Astrophysics: IX. Truncation for Exponential, Half Gaussian and Sech-Square Distributions with Application to the Galactic Height ()
1. Introduction
In the field of astrophysics, it is a common practice to estimate the gradients of a given quantity, for example, the vertical galactic height, with an exponential distribution. Another two distributions that have astrophysical interest are the sech-square distribution, corresponding to an isothermal self-gravitating disk, see equation (41) in [1], equation at page 441 in [2], equations (9.8) and (14.6) in [3], equation (2.31) in [4], and the normal or Gaussian distribution, see equation (5) in [5]. In the field of probability, the truncation of a distribution is a common topic of research and we report some of the approaches on the double truncation of usual interval,
, distributions. The doubly truncated exponential distribution has been analyzed in [6]. In the field of astrophysics, the following truncated distributions have been analyzed: the Pareto distribution, with application to the masses of stars and asteroids [7], the left truncated beta distribution, with application to the masses of the stars [8], the double truncated gamma distribution, with application to the masses of the stars [9], the double truncated lognormal distribution, with application to the mass of the stars [10], the double truncated Lindley distribution, with applications to the masses of the stars, to the luminosity function for galaxies, and to the photometric maximum in the distribution of galaxies [11], the double truncated generalized gamma distribution, with application to the luminosity functions for galaxies and quasars and to the average magnitude of galaxies as a function of the redshift [12], the double truncated Lindley family, with applications to the luminosity function for galaxies and quasars [13], the truncated Maxwell-Boltzmann distribution, with applications to a numerical relation between the root-mean-square speed and temperature and to a modification of the formula for the Jeans escape flux of molecules from an atmosphere [14], the relativistic Maxwell-Boltzmann distribution, with applications to the synchrotron emission in the presence of a magnetic field and to relativistic electrons [15], the truncated Weibull distribution, with applications to the masses of the stars and to the luminosity functions for galaxies and quasars [16], the truncated two-parameter Sujatha distribution [17], the gamma-Pareto distribution, with application to cosmic rays [18], and the truncated Weibull-Pareto distribution, with applications to the initial mass function for stars, the luminosity function for galaxies of the Sloan Digital Sky Survey, the luminosity function for QSO, and the photometric maximum of galaxies of the 2MASS Redshift Survey. This paper analyses in Section 2 the exponential, the half-normal and the sech-square distributions defined in the interval
. Section 3 is dedicated to the truncation of these distributions. Section 4 applies the results to the distribution of the vertical galactic height for both open clusters and Gaia’s stars.
2. Usual Case
In this section, we review the following distributions: the exponential, the Half-Normal, and the Sech-square distributions. The aim is to evaluate which distribution produces the best fit in modeling the galactic heights of open clusters and Gaia’s stars. The word “approximate” indicates that the result in question is not an analytical result.
2.1. The Exponential
A random variable X which takes values in
is said to be exponentially distributed if its distribution function (DF) is
(1)
and the probability density function (PDF) is
(2)
where b is the scale parameter, see [19] [20]. The average value or mean,
, is
(3)
the variance,
, is
(4)
and the median is at
(5)
Random generation of the exponential variate X is given by
(6)
where R is the unit rectangular variate. The parameter b is the average value of the sample,
,
(7)
2.2. Half-Normal
Let X be a random variable defined on
; its one-parameter Half-Normal PDF is
(8)
where
is the shape parameter [21].
Its DF is
(9)
where
is the error function, defined by
(10)
The DF has the following power series representation
(11)
The rth moment about the origin is
(12)
where r is an integer and
(13)
is the gamma function, see formula (5.2.1) in [22]. The average value or mean,
, is
(14)
the variance,
, is
(15)
The skewness is
(16)
and the kurtosis
(17)
The median does not have an analytical expression but can be expressed approximately. The Winitzki approximation for the median, see Equation (A.2), gives
(18)
The Menzel approximation for the median, see Equation (A.1), gives
(19)
The median in the case of the Padé Approximant of order (4.2), see Equation (A.3), is the solution of the following approximate equation
(20)
A different method reverts the series (11), see page 16 in [23],
(21)
where y is now the approximate DF; the median is obtained by inserting
. Table 1 reports the percent error of the four methods here implemented.
Random generation of the variate X for the Half-Normal is obtained with the Box-Muller method, in practice the FORTRAN subroutine gasdev [24], limiting ourselves to the positive values. The parameter b is obtained from the average
![]()
Table 1. Percent error,
, for approximating the median of the error function when
.
value of the sample,
,
(22)
2.3. Sech-Square Distribution
According to [25], the vertical profile of density in our galaxy can be parameterized by the following density dependence
(23)
where x represents the vertical height, b is the scale height and n is an integer. Let X be a random variable defined on
; the PDF corresponding to
for the above formula is
(24)
and its DF is
(25)
where b is the scale. The average value or mean,
, is
(26)
the variance,
, is
(27)
The skewness is
(28)
and the kurtosis
(29)
where the zeta function is defined by
(30)
see formula (25.2.1) in [22]. The median is at
(31)
and the random generation of the sech-square distribution is obtained by solving the following non-linear equation
(32)
The parameterb is obtained by the average value of the sample,
,
(33)
3. Truncated Case
In this section we introduce the truncations of the following distributions: the exponential, the Half-Normal and the Sech-square distribution.
3.1. The Truncated Exponential
Let X be a random variable defined in
; the truncated exponential PDF,
, is
(34)
and its DF is
(35)
The first moment about the origin
, is
(36)
and the second moment about the origin
, is
(37)
The variance can be evaluated with the usual formula
(38)
and the median is at
(39)
The parameter b can be derived by a numerical solution of the following equation, which arises from the maximum likelihood estimator (MLE)
(40)
where the
are the elements of the experimental sample with i varying between 1 and n. In the above formula,
represents the minimum of the sample and
the maximum.
3.2. The Truncated Half-Normal
Let X be a random variable defined on
; the one-parameter truncated Half-Normal PDF is
(41)
and its DF is
(42)
which has the following series representation
(43)
The first moment about the origin
, is
(44)
and the second moment about the origin
, is
(45)
The variance can be evaluated by Equation (38). There is no analytical expression for the median; we now present three approximations. The first approximate expression for the median can be obtained by the Menzel approximation for the error function, see Equation (A.1). We report the equation to be solved for x in order to find the median in the first approximation
(46)
which has the following approximation
(47)
The second approximation for the median can be obtained by the Winitzki approximation for the error function, see Equation (A.2). We report the equation to be solved for x in order to find the second median
(48)
which has an omitted complicated solution. The third approximation for the median can be obtained using the Padé Approximant for the error function, see Equation (A.3). We report the equation to be solved for x in order to find the third median
(49)
which has an omitted complicated solution. Table 2 reports the percent error of the approximate median for the truncated Half-Normal.
The parameter b can be derived by the numerical solution of the following equation, which arises from the MLE:
(50)
Random generation of the truncated Half-Normal variate X is given by the numerical solution of the following non-linear equation
(51)
where R is the rectangular variate.
3.3. Truncated Sech-Square Distribution
Let X be a random variable defined on
; the PDF for the truncated sech-square distribution is
(52)
and its DF is
![]()
Table 2. Percent error,
, of the median obtained by the approximation of the error function when
.
(53)
The first moment about the origin
, is
(54)
and the second moment about the origin
, is
(55)
The variance can be evaluated by Equation (38) and the median is at
(56)
The parameter b can be derived by the numerical solution of the following equation, which arises from the MLE:
(57)
Random generation of the exponential variate X is given by
(58)
where R is the unit rectangular variate and
(59)
4. Astrophysical Applications
This section reviews the adopted statistics as well as some data on open clusters and Gaia’s stars.
4.1. Statistics
The merit function
is computed according to the formula
(60)
where n is the number of bins,
is the theoretical value, and
is the experimental value represented by the frequencies. The theoretical frequency distribution is given by
(61)
where N is the number of elements of the sample,
is the magnitude of the size interval, and
is the PDF under examination. A reduced merit function
is given by
(62)
where
is the number of degrees of freedom, n is the number of bins, and k is the number of parameters. The goodness of the fit can be expressed by the probability Q, see equation 15.2.12 in [24], which involves the number of degrees of freedom and
. According to [24] p. 658, the fit “may be acceptable” if
. The Akaike information criterion (AIC), see [26], is defined by
(63)
where L is the likelihood function and k the number of free parameters in the model. We assume a Gaussian distribution for the errors. Then the likelihood function can be derived from the
statistic
where
has been computed by equation (60), see [27], [28]. Now the AIC becomes
(64)
The difference between the evaluation of
and the Kolmogorov-Smirnov test (K-S), see [29] [30] [31] is that the latter does not require binning the data. The K-S test, as implemented by the FORTRAN subroutine KSONE in [24], finds the maximum distance, D, between the theoretical and the astronomical DF as well as the significance level
, see formulas 14.3.5 and 14.3.9 in [24]; if
, the goodness of the fit is believable.
4.2. Open Clusters
The open clusters in a radius of 1.8 Kpc were analysed in [32] with a catalog available at CDS. The data can be processed by introducing the rectangular coordinates
and assuming a distance of the Sun from the Galactic centre,
, of 8 kpc, see Figure 1.
We are interested in the distribution of Z, the distance perpendicular to the galactic plane, and the results for the three distributions here analysed in the interval
are reported in Table 3. and those for the three truncated distributions, analysed in the interval
, are reported in Table 4.
A careful examination of Table 3 and Table 4 allows concluding that the best results are obtained for the usual exponential distribution, see Figure 2, followed
![]()
Figure 1. Distribution of 1241 open clusters in the X-Z plane as projected onto the galactic pole.
![]()
Table 3. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed DF, and
, significance level, in the K-S test for data from open clusters when
. The number of linear bins, n, is 30.
![]()
Table 4. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed DF, and
, significance level, in the K-S test for open clusters data when
. The imposed parameters are
,
and
.
![]()
Figure 2. Empirical PDF of distribution of Z for open cluster data (blue histogram) with a superposition of the exponential PDF (red line). Theoretical parameters as in Table 3.
by the truncated exponential distribution.
4.3. Stars
A great number of stars with mean apparent magnitude in the G-band are available in the Gaia Data Release 1 (Gaia DR1) astrometric catalogs, see [33] [34], with data at http://vizier.u-strasbg.fr/viz-bin/VizieR and, specifically, Table I/337/tgasptyc. The absolute magnitude,
, is obtained by the usual formula
(65)
where
is the apparent magnitude in the G-band, and d is the distance in pc. We now select the stars with
in the first 1000 pc, amounting to a total of 58027, and we evaluate their rectangular coordinates
. The position of the above slice in absolute magnitude in the H-R diagram is visible in Figure 3. The obtained data in Z of the first 1000 pc are shifted by
which defines the Sun’s position relative to the plane of symmetry; for a review of the values of
, see Table 1 in [35]. The re-scaled distribution is visible in Figure 4. The distribution of Z has both negative and positive values and in order to increase the statistics we take the absolute values of Z because the distributions here analysed are defined only for positive values of the random variable. The results for the three distributions here analysed in the interval
are reported in Table 5 and those for the three truncated distributions analysed in the interval
are reported in Table 6.
![]()
Figure 3. MG against (B-V), evaluated as BT-VT, (H-R diagram) in the first 100 pc and selected region in red.
![]()
Figure 4. Histogram of the re-scaled distribution of Z for stars with
.
![]()
Table 5. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed DF, and
, significance level, in the K-S test for the re-scaled distribution of Z of the Gaia’s star data when
. The number of linear bins, n, is 30.
![]()
Table 6. Numerical values of
, AIC, probability Q, D, the maximum distance between theoretical and observed DF, and
, significance level, in the K-S test for the re-scaled distribution of Z of the Gaia’s star data when
. The imposed parameters are
,
and
.
![]()
Figure 5. Empirical PDF of distribution of Z for Gaia stars (blue histogram) with a superposition of the truncated half-normal PDF (red line). Theoretical parameters as in Table 6.
A careful examination of Table 5 and Table 6 allows concluding that the best results are obtained for the truncated half-normal distribution, see Figure 5.
5. Conclusions
The Truncated Distributions
We derived the PDF, the DF, the average value, the rth moment, the median, expressions to generate the random variate and the MLE for the truncated exponential, truncated half-normal and the truncated sech-square distributions.
Astrophysical Applications
We applied both the usual and the truncated three distributions to a sample of galactic-height for open clusters and for Gaia’s stars. In the case of open clusters, the best results are obtained by the exponential distribution, followed by the sech-square distribution. In the case of Gaia’s stars, the best results are obtained by the truncated half-normal distribution, followed by the sech-square distribution. This means that in the case of Gaia’s stars, the truncated half-normal distribution is the best model and should be used by astronomers in the fitting procedure.
Prospects for the Future
In view of the great importance of the doubly truncated distributions in astrophysics, other distributions can be analysed, for example, the Half-Gumbel distribution [36].
Appendix. Approximations for the Error Function
We report two existing approximations for the error function which are invertible, see Table I in [37] for more details, and a new approximation. The Menzel approximation [38] [39] for the error function is
(A.1)
The Winitzki approximation [40] for the error function is
(A.2)
A new approximation for the error function is derived in the framework of the Padé Approximant of order (4.2)
(A.3)
Table 7 reports the percent error of the three cases here analysed.
![]()
Table 7. Percent error,
, for approximating the error function in [0,5].