New probability distributions in astrophysics: II. The generalized and double truncated Lindley

The statistical parameters of five generalizations of the Lindley distribution, such as the average, variance and moments, are reviewed. A new double truncated Lindley distribution with three parameters is derived. The new distributions are applied to model the initial mass function for stars.


Introduction
The Lindley distribution, after [1,2], has one parameter. In recent years the Lindley distribution has been the subject of many generalizations, we report some of them among others: one with two parameters [3], a two-parameter weighted one [4], the generalized Poisson-Lindley [5], the extended Lindley [6] and a transmuted Lindley-geometric distribution [7]. Several generalizations of the Lindley distribution can be found in a recent review [8]. The Lindley distribution is useful in modeling biological data from grouped mortality studies [4,9] and the first application to astrophysics of the Lindley distribution has been done for the initial mass function (IMF) for stars and the luminosity function for galaxies [10]. The IMF is routinely modeled by the lognormal distribution and therefore the following question naturally arises. Can a Lindley distribution or a generalization be an alternative to the lognormal fit for the IMF? In order to answer the above question Section 2 reviews the notion of statistical sample and Lindley distribution, Section 3 reviews five generalizations of the Lindley distribution, Section 4 introduces the double Lindley distribution and Section 5 fits the six new Lindley distributions to four samples for the mass of the stars.

Preliminaries
We report some basic information on the adopted sample and on the original Lindley distribution with one parameter.

The sample
The experimental sample consists of the data x i with i varying between 1 and n; the sample mean,x, is the unbiased sample variance, s 2 , is and the sample rth moment about the origin,x r , is

The Lindley distribution with one parameter
The Lindley probability density function (PDF) with one parameter, f (x), is where x > 0 and c > 0. The cumulative distribution function (CDF), F (x), is At x = 0, f (0) = c 2 1+c and is not zero. The average value or mean, µ, is the variance, σ 2 , is The rth moment about the origin for the Lindley distribution, µ r , is where is the gamma function, see [11]. The central moments, µ r , are µ 3 = 2 c 3 + 12 c 2 + 12 c + 4 µ 4 = 9 c 4 + 72 c 3 + 132 c 2 + 96 c + 24 More details can be found in [2].

Generalizations of the Lindley distribution
We review the statistics of the Lindley distribution with two parameters, power, generalized, new generalized and new weighted.

The Lindley distribution with two parameters
The Lindley PDF with two parameters TPLD [3] is where x > 0, c > 0 and b c > −1. The CDF of the TPLD is The average value or mean of the TPLD is µ(b, c) = bc + 2 c (bc + 1) , (13) and the variance of the TPLD is The mode of the TPLD is at see eqn. (2.3) in [3]. The rth moment about the origin for the TPLD, µ r , is The two parameters b and c can be obtained by the following match and

The power Lindley distribution
The power Lindley PDF with two parameters (PLD) according to [3] is where b, c and x > 0. The CDF of the PLD is The average value or mean of the PLD is and the variance of the PLD is where The mode of the PLD is at The rth moment about the origin for the PLD is The two parameters b and c of the PLD can be found by numerically solving the nonlinear system given by eqs (17a) and (17b).

The generalized Lindley distribution
The generalized Lindley PDF with three parameters (GLD) according to [12] is where a, b, c and x > 0. The CDF of the GLD is where M µ, ν (z) is the Whittaker M function, see [11]. The average value or mean of the GLD is and the variance of the GLD is The hazard rate function, h(x; a, b, c), of the GLD is h(x; a, b, c) = −b a+1 x a−1 (cx + a) e −bx (a + 1) e −1/2 bx x a/2 cb a/2 + b a/2+1 M a/2, a/2+1/2 (bx) + x a b a+1 (a + 1) e −bx − (c + b) Γ (a + 2) , and Figure 1 reports an example. Here the CDF, equation (29), and the hazard rate function, equation (32), are reported in closed form in contrast to what was asserted by [12]. The mode of the GLD is at The rth moment about the origin for the GLD is and in particular the third moment is The three parameters a, b and c of the GLD can be obtained by numerically solving the following three non-linear equations µ =x (36a)

The new generalized Lindley distribution
The new generalized Lindley PDF with three parameters (NGLD) according to [13] is where a, b, c and x > 0. The CDF of the NGLD is where where Γ (a, z) is the incomplete Gamma function, defined by see [11]. The average value of the NGLD is and the variance of the NGLD is The rth moment about the origin for the NGLD is and the third moment is The three parameters a, b and c of the NGLD are obtained by numerically solving the three non-linear equations (36a), (36b) and (36a).

The new weighted Lindley distribution
The new weighted Lindley PDF with two parameters (NWL) according to [14] is where b, c and x > 0. The CDF of the NWL is where The average value of the NWL is and the variance of the NWL is where The rth moment about the origin for the NWL is where The two parameters b and c of the NWL can be found by numerically solving the nonlinear system given by eqs (17a) and (17b).

The double truncated Lindley distribution
where the effect of the double truncation increases the parameters from one to three, see [15]. The double truncated Lindley distribution with scale, which has four parameters, was introduced in [10]. where The average value, µ t (c, x l , x u ), is The rth moment about the origin for the DTL, µ r (c, x l , x u ), is where N G = −x l r/2 e −1/2cx l c 1−r/2 + c −r/2 (r + 1) M r/2,r/2+1/2 (cx l ) The three parameters which characterize the DTL can be found in the following way. Consider the sample of stellar masses X = x 1 , x 2 , . . . , x n and let x (1) ≥ x (2) ≥ · · · ≥ x (n) denote their order statistics, so that x (1) = max(x 1 , x 2 , . . . , x n ), x (n) = min(x 1 , x 2 , . . . , x n ). The first two parameters x l and x u are The third parameter c can be found by solving the following non-linear equation

Application to the IMF
We report the adopted statistics for four samples of stars which will be subject of fit, with the lognormal, the Lindley generalizations and the double truncated Lindley.

The involved statistics
The merit function χ 2 is computed according to the formula where n is the number of bins, T i is the theoretical value, and O i is the experimental value represented by the frequencies. The theoretical frequency distribution is given by where N is the number of elements of the sample, ∆x i is the magnitude of the size interval, and p(x) is the PDF under examination.
A reduced merit function χ 2 red is evaluated by where N F = n − k is the number of degrees of freedom, n is the number of bins, and k is the number of parameters. The goodness of the fit can be expressed by the probability Q, see equation 15.2.12 in [16], which involves the degrees of freedom and χ 2 . According to [16] p. 658, the fit 'may be acceptable' if Q > 0.001. The Akaike information criterion (AIC), see [17], is defined by where L is the likelihood function and k the number of free parameters in the model. We assume a Gaussian distribution for the errors and the likelihood function can be derived from the χ 2 statistic L ∝ exp(− χ 2 2 ) where χ 2 has been computed by eqn. (61), see [18], [19]. Now the AIC becomes The Kolmogorov-Smirnov test (K-S), see [20,21,22], does not require binning the data. The K-S test, as implemented by the FORTRAN subroutine KSONE in [16], finds the maximum distance, D, between the theoretical and the astronomical CDF as well the significance level P KS , see formulas 14.3.5 and 14.3.9 in [16]; if P KS ≥ 0.1, the goodness of the fit is believable.

The selected sample of stars
The first test is performed on NGC 2362 where the 271 stars have a range 1.47M ≥ M ≥ 0.11M , see [23] and CDS catalog J/MNRAS/384/675/table1.
The second test is performed on the low-mass IMF in the young cluster NGC 6611, see [24] and CDS catalog J/MNRAS/392/1034. This massive cluster has an age of 2-3 Myr and contains masses from 1.5M ≥ M ≥ 0.02M . Therefore the brown dwarfs (BD) region, ≈ 0.2 M is covered.
The third test is performed on γ Velorum cluster where the 237 stars have a range 1.31M ≥ M ≥ 0.15M , see [25] and CDS catalog J/A+A/589/A70/table5.
The fourth test is performed on young cluster Berkeley 59 where the 420 stars have a range 2.24M ≥ M ≥ 0.15M , see [26] and CDS catalog J/AJ/155/44/table3.

The lognormal distribution
Let X be a random variable defined in [0, ∞]; the lognormal PDF, following [27] or formula (14.2) in [28], is where m is the median and σ the shape parameter. The CDF is where erf(x) is the error function, defined as see [11]. The average value or mean, E(X), is the variance, V ar(X), is the second moment about the origin, E 2 (X), is The statistics for the lognormal distribution for these four astronomical samples of stars are reported in Table 1.

The generalizations of the Lindley distribution
The statistics for the Lindley distribution and its generalizations are reported in the following tables: Table 2 for the Lindley distribution with one parameter, Table 3 for the TPLD, Table 4 for the PLD, Table 2. Numerical values of χ 2 red , AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and PKS, significance level, in the K-S test of the Lindley distribution with one parameter for different mass distributions. The number of linear bins, n, is 20.  Table 3. Numerical values of χ 2 red , AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and PKS, significance level, in the K-S test of the TPLD distribution with two parameters for different mass distributions. The number of linear bins, n, is 20.  Table 4. Numerical values of χ 2 red , AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and PKS, significance level, in the K-S test of the PLD distribution with two parameters for different mass distributions. The number of linear bins, n, is 20.    Table 7. Numerical values of χ 2 red , AIC, probability Q, D, the maximum distance between theoretical and observed CDF, and PKS, significance level, in the K-S test of the NWL distribution with two parameters for different mass distributions. The number of linear bins, n, is 20.  Table 5 for the GLD, Table 6 for the NGLD and Table 7 for the NWL. The best fit for NGC 2362 is obtained with the PLD, see Figure 2. The best fit for NGC 6611 is obtained with the Lindley PDF with one parameter, see Figure 3. The best fit for γ Velorum is obtained with the lognormal PDF, see Figure 4. The best fit for the young cluster Berkeley 59 is obtained with the NGLD, see Figure 5.

The double truncated Lindley
The statistics for the DTL with three parameters are reported in Table 8. Figure 6 reports the CDF of the DTL for NGC 6611 which is the the best fit of the various distributions here analysed for this cluster.  Table 4.  Table 2.  Table 1.  Table 6.  Figure 6. Empirical CDF of mass distribution for NGC 6611 cluster data (blue dotted line) with a superposition of the DTL CDF with one parameter (red line). Theoretical parameters as in Table 8.

Conclusions
In this paper we explored five generalizations of the Lindley distribution as well the double truncated Lindley distribution against the lognormal distribution. For each IMF of the four clusters here analysed, the distribution which realizes the best fit is reported in Table 9. The above table allows to conclude Table 9. Best fits: Name of the cluster, name of the distribution, D, the maximum distance between theoretical and observed CDF, and PKS, significance level, in the K-S test. that the Lindley family here suggested produces better fits than does the lognormal distribution. Figure  7 reports the CDF for NGC 6611 as well as four fitting curves.