^{1}

^{*}

^{2}

^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

^{3}

^{1}

^{3}

We describe two new derivations of the chi-square distribution. The first derivation uses the induction method, which requires only a single integral to calculate. The second derivation uses the Laplace transform and requires minimum assumptions. The new derivations are compared with the established derivations, such as by convolution, moment generating function, and Bayesian inference. The chi-square testing has seen many applications to physics and other fields. We describe a unique version of the chi-square test where both the variance and location are tested, which is then applied to environmental data. The chi-square test is used to make a judgment whether a laboratory method is capable of detection of gross alpha and beta radioactivity in drinking water for regulatory monitoring to protect health of population. A case of a failure of the chi-square test and its amelioration are described. The chi-square test is compared to and supplemented by the
*t*-test.

The chi-square distribution (CSD) has been one of the most frequently used distributions in science. It is a special case of the gamma distribution (see Section 2). The latter has been an important distribution in fundamental physics, for example as kinetic energy distribution of particles in an ideal gas (Maxwell-Boltzmann) [

The established fundamental derivations of the CSD described above lend themselves to complicated handling of multiple integrals. On the contrary, the simplified derivations use the fact that CSD is a special case of the gamma distribution. Owing to the integrable and recursive properties of the gamma distribution, as well as its moment generating function (Mgf), simplified derivations of CSD are described in the textbooks [

In this work, we present two new methods of derivation of the CSD. They are both within the simplified category. One of them is mathematical induction. The original derivation was done by Helmert [

Chi-square testing (CST) is closely related to and based upon the CSD. It has its origins in the discovery of the goodness-of-fit test by Pearson [

χ ν 2 = ∑ i = 1 m ( O i − E i ) 2 E i , (1)

where O i is frequency of observation, E i is expected frequency based on an assumed model distribution, for category of type i, and m is the number of categories. Both O i and E i are unitless. ν = m − 1 − p is the number of degrees of freedom, where p is number of parameters of the model distribution calculated from the data. For any model distribution, Equation (1) leads asymptotically to the CSD when the number of observations is large, which has been proved for the multinomial distribution by Pearson [

Another form of the chi-square variable from Equation (1) is written in the general form as

χ ν 2 = ∑ i = 1 n ( x i − μ i σ i ) 2 , (2)

where n is the number of observations, x i is the observed variable, μ i is the expected value, σ i is the standard deviation, and ν ≤ n . The variables in Equation (2) can be expressed in physical units. In the limit of large number of observations, the variable and parameters of Equation (2) are approximated by those of the normal variates, and the χ ν 2 distributes as CSD. In this work, we generalize this CST test to a combined test for variance and location as well as verify it with the t-test [

Within the context of this work, we present a unique application of the CST to the detection of radioactive contaminants in drinking water required by the Safe Drinking Water Act (SDWA) in the US. The bulk of natural alpha and beta/gamma (photon) radioactivity in drinking water originates from the possible presence of ^{238}U and ^{232}Th natural radioactive-series progeny, ^{226,228}Ra and their progeny, as well as ^{40}K radionuclides [

The probability density function (Pdf) of the CSD is given by

Pdf ( χ ν 2 | ν ) = ( χ ν 2 ) ν / 2 − 1 e − χ ν 2 / 2 2 ν / 2 Γ ( ν / 2 ) , (3)

where Γ is the gamma function. The expectation value of CSD is E [ χ 2 ] = ν , and the variance Var [ χ 2 ] = 2 ν [

To derive Equation (3), we start with the general definition of χ ν 2 statistics given by Equation (2) assuming normal variates. For a single normal variable x 1 with Pdf ( x 1 ) , the probability of x 1 ∈ [ x 1 , x 1 + d x 1 ] is given by

Pdf ( x 1 ) d x 1 = 1 2 π σ 1 e − ( x 1 − μ 1 σ 1 ) 2 / 2 d x 1 . (4)

By substituting χ 1 2 = ( ( x 1 − μ 1 ) / σ 1 ) 2 , we obtain from Equation (4)

Pdf ( χ 1 2 | 1 ) d χ 1 2 = 2 2 π σ 1 e − χ 1 2 / 2 | d x 1 d χ 1 2 | d χ 1 2 = ( χ 1 2 ) 1 / 2 − 1 e − χ 1 2 / 2 2 1 / 2 Γ ( 1 / 2 ) d χ 1 2 = gamma ( χ 1 2 | 1 / 2 , 2 ) d χ 1 2 , (5)

which has the Pdf given by Equation (3) for ν = 1 . In deriving Equation (5), we also used Γ ( 1 / 2 ) = π , whereas factor of 2 originated from the fact that the x 1 variable ranging from minus infinity to plus infinity has been substituted with the χ 1 2 variable ranging from zero to plus infinity.

Let us assume that the n + 1 term with the normal x n + 1 variable was added to Equation (2), and that this addition raised the number of degrees of freedom to ν + 1 . Then,

χ ν + 1 2 = χ ν 2 + ( x n + 1 − μ n + 1 σ n + 1 ) 2 . (6)

Using the calculus for probability density functions [

Pdf ( χ ν + 1 2 | ν + 1 ) d χ ν + 1 2 = ∫ − ∞ + ∞ Pdf ( χ ν 2 | ν ) d χ ν 2 Pdf ( x n + 1 ) d x n + 1 . (7)

Let us define a new variable z, such as

( x n + 1 − μ n + 1 σ n + 1 ) 2 = χ ν + 1 2 ( 1 − z ) . (8)

By realizing that d χ ν + 1 2 = d χ ν 2 , and performing all substitutions, the right side of Equation (7) can be rewritten as

2 ∫ 0 1 Pdf ( χ ν 2 | ν ) Pdf ( x n + 1 ) | d x n + 1 d z | d z d χ ν + 1 2 = ( χ ν + 1 2 ) ( ν + 1 ) / 2 − 1 e − χ ν + 1 2 / 2 2 ( ν + 1 ) / 2 Γ ( ν / 2 ) Γ ( 1 / 2 ) d χ ν + 1 2 ∫ 0 1 z ν / 2 − 1 ( 1 − z ) 1 / 2 − 1 d z . (9)

However, the integral on the right side of Equation (9) is the beta function, B ( ν / 2 , 1 / 2 ) , which is related to the gamma functions by [

B ( ν / 2 , 1 / 2 ) = Γ ( ν / 2 ) Γ ( 1 / 2 ) Γ ( ( ν + 1 ) / 2 ) . (10)

By inserting Equation (10) into Equation (9), simplifying, and comparing with the left side of Equation (7), one obtains

Pdf ( χ ν + 1 2 | ν + 1 ) = ( χ ν + 1 2 ) ( ν + 1 ) / 2 − 1 e − χ ν + 1 2 / 2 2 ( ν + 1 ) / 2 Γ ( ( ν + 1 ) / 2 ) , (11)

which is the Pdf given by Equation (3) for ν + 1 degrees of freedom and it proves Equation (3) by induction.

By substituting φ i 2 = ( ( x i − μ i ) / σ i ) 2 , Equation (2) becomes

χ ν 2 = ∑ i = 1 n φ i 2 (12)

The sum of independent random variables φ i 2 is called a convolution and the joint distribution function for χ ν 2 can be obtained by calculating an n-dimensional convolution integral. Exploring the properties of this convolution leads to simplifications, which have been used in the literature. By convoluting two gamma distributions χ 1 2 ≡ φ i 2 from Equation (5) and using the theorem that the convolution of two gammas is also a gamma, one obtains gamma ( χ 2 2 | 2 / 2 , 2 ) [

Another simplified derivation of CSD uses the theorem that the Mgf of convolution is a product of individual Mgfs [

In this work we provide yet another simplified derivation of the CSD using Laplace transform [

∫ 0 ∞ ( χ 1 2 ) 1 / 2 − 1 e − χ 1 2 / 2 2 1 / 2 Γ ( 1 / 2 ) e − s χ 1 2 d χ 1 2 = ( 1 / 2 s + 1 / 2 ) 1 / 2 . (13)

Subsequently, we use a theorem that the Laplace transform of a nth convolution is a product of the individual transforms, i.e. ( 1 / 2 s + 1 / 2 ) n / 2 . By abbreviating u = χ n 2 , the inverse Laplace transform results in the Pdf of u,

Pdf ( u | n ) = 1 2 π i ∮ ( 1 / 2 s + 1 / 2 ) n / 2 e s u d s = 1 2 n / 2 1 2 π i ∮ e s u ( s + 1 / 2 ) n / 2 d s . (14)

To calculate the contour integral in Equation (14), we start with the Cauchy integration formula for an analytic function f ( s ) of a complex variable s having a simple pole at s 0 [

f ( s 0 ) = 1 2 π i ∮ f ( s ) s − s 0 d s . (15)

The k − 1 times differentiation of Equation (15), where the differentiation can be of an integer or a fractional order [

f ( k − 1 ) ( s 0 ) = Γ ( k ) 2 π i ∮ f ( s ) ( s − s 0 ) k d s . (16)

By comparing Equation (14) to Equation (16), we infer that f ( s ) = e s u , s 0 = − 1 / 2 , and k = n / 2 . By inserting these variables to Equation (16) and plugging it into Equation (14), we obtain:

Pdf ( u | n ) = 1 2 n / 2 Γ ( n / 2 ) ( d n / 2 − 1 d s n / 2 − 1 e s u ) s = − 1 / 2 = u n / 2 − 1 e − u / 2 2 n / 2 Γ ( n / 2 ) , (17)

which is the CSD given by Equation (3) for ν = n and χ n 2 = u .

Another simplified derivation of the CSD uses the Bayesian inference and it is not related to the convolutions described above [

In Section 5, we summarize the advantages and disadvantages of the simplified derivation methods of CSD described in this section.

Several models for the CST statistics can be derived from the general Equation (2). For the expected value, we can use either the sample mean x ¯ or the population mean μ , whereas for the standard deviation we can use either individual standard deviations σ i or the sample standard deviation σ x . We do not know the population standard deviation for the data described in Section 4. Model test statistics ∑ ( ( x i − x ¯ ) / σ x ) 2 is always equal to n − 1 and thus not useful. However, the model test statistics ∑ ( ( x i − x ¯ ) / σ i ) 2 can be used to test the variance. Other possibilities are to test for both the variance and location by employing model test statistics ∑ ( ( x i − μ ) / σ i ) 2 or ∑ ( ( x i − μ ) / σ x ) 2 , if the population mean is known which is the case for the data in Section 4.

For the t-test we perform a standard one-sample test, where we calculate t variable as ( x ¯ − μ ) / ( σ x / n ) . The t-test is the location test. The results of all these test models using radioactivity data are presented in Section 4.

The most convenient method of measuring GA and GB radioactivity in drinking water is by gas proportional counting [

As stated in Section 1, this method must be able to determine GA and GB at the DL, to be verified by the CST [^{230}Th and ^{90}Sr/^{90}Y radionuclides providing alpha and beta radioactivity, respectively. The spiking activities (i.e. the expected μ ) were: 2.9888 ± 0.0402 pCi/L for alpha and 4.1860 ± 0.0549 pCi/L for beta, close to the required DL values. The values of spiking activities and their uncertainties were obtained from the standards traceable to the National Institute of Standards and Technology (NIST). Then the experimental procedure was followed, and the measured GA and GB activities x i are depicted as points in

Also shown in

The GA results are described first. The sample average for GA is given by x ¯ = 3.0951 pCi/L (red horizontal thick line) which is close to the expected μ (green horizontal thick line) as seen in

The t-test statistics is calculated as described in Section 3 resulting in 0.45 for GA, as given in column 6 in

The gross beta activities plotted in

To elucidate the reasons for failure of the GB CST and t-test, fifteen non-spiked Method Blank (MB) community water samples were prepared and measured. The average GA activity was below detection; however, the average GB was 0.8121 ± 0.2801 pCi/L. This MB was then subtracted from the spiked GB results and the corrected GB activities are plotted in

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
---|---|---|---|---|---|---|---|

Experiment, reference | χ^{2}-test | t-test | |||||

Parameter | Variance, σ i | Variance and location, σ i | Variance and location, σ x | Location | |||

Deg free | 8 | 9 | 9 | 8 | |||

Calc RT | 20.1 | 21.7 | 21.7 | ||||

Calc LT | 1.6 | 2.1 | 2.1 | t | RT prob | 2T prob | |

Gross Alpha, | Observed | 14.0 | 13.4 | 8.2 | 0.45 | 0.33 | 0.66 |

Test result | Passed | Passed | Passed | Passed | Passed | ||

Gross Beta, | Observed | 3.8 | 43.1 | 93.7 | 9.26 | 7.5E−06 | 1.5E−05 |

Test result | Passed | Failed | Failed | Failed | Failed | ||

Gross Beta-MB subtracted, | Observed | 2.7 | 3.2 | 9.6 | 1.27 | 0.12 | 0.24 |

Test result | Passed | Passed | Passed | Passed | Passed |

The reasons for the elevated GB in MB of community drinking water were investigated. Ten L of water were evaporated to 50 mL and measured using precise gamma-ray spectrometry [^{40}K was 0.6926 ± 0.0790 pCi/L. It was also possible to identify several beta/gamma progenies of the ^{238}U series: ^{234}Th, ^{214}Pb, ^{214}Bi, and ^{210}Pb, as well as those from the ^{232}Th series: ^{228}Ac, ^{212}Pb, and ^{208}Tl. The combined activity of the beta/gamma progeny was 0.1513 ± 0.0672 pCi/L. Therefore, the sum of ^{40}K and beta/gamma progeny was 0.8440 ± 0.1037 pCi/L. The latter is consistent with the GB activity of 0.8121 ± 0.2801 pCi/L from the MB measurement to within the measured uncertainties. Also associated with the decay of ^{238}U and ^{232}Th is their alpha activity plus alpha progeny of similar activity to that of the beta/gamma progeny. This alpha activity could not have been detected by gamma spectrometry and was below the detection by GA in the MB measurement. However, the fact that GA of 3.0951 pCi/L is slightly higher than the expected 2.9888 pCi/L is an indication of that. Unlike in the case of beta activity, the small alpha progeny activity did not affect the CST or t-test. It should be noted that this level of naturally present radioactivity in the community water is much below the MCL, and thus poses small risk to the population.

We have described five simplified methods of deriving the chi-square distribution. Three of them: by convolution, moment generating function, and Bayesian inference are described in the literature and have been outlined here for comparison. The simplest of them seems to be the convolution method. It only uses the substitution from the normal distribution to a chi-square variable and requires a calculation of a single convolution integral on the above. It infers the form of multiple convolution on gamma distribution leading to the chi-square distribution. The moment generating function method of derivation is more advanced as it requires the knowledge of the moment generating function and the gamma distribution. The Bayesian inference method requires the knowledge about likelihood function and prior probabilities but does not require the knowledge about the gamma distribution.

In this work, we have proposed two new methods for derivation of the chi-square distribution: by induction and by Laplace transform. The method of induction uses operational calculus with only a single integral leading to beta function. The proposed derivation applies modern formalism and seems to be simpler than the original derivation by Helmert as early as in 1876. A disadvantage of the induction method is that it requires a prior knowledge of the chi-square distribution to perform induction on it. There is a significant advantage, however. All other methods require either no constraints in the data; i.e. the number of degrees of freedom must be equal to the number of observations, or one constraint in case of Bayesian inference. The induction method leaves any constraints intact by adding one induction step to the existing number of degrees of freedom. The proposed derivation method by Laplace transform is more advanced because it uses integration in the complex plane. The significant advantage of the Laplace transform, and the Bayes inference methods is that they do not require prior knowledge about the gamma distribution.

We have also described a unique application of the chi-square test to environmental science. In chi-square testing, it is important to delineate systematic effects from the random uncertainties. In this work, a systematic natural contamination of laboratory method blank caused the chi-square test for combined variance/location to fail; however, it did not affect the chi-square test for variance alone. After subtracting the systematic method blank, the chi-square variance/location test was shown to have passed. This was confirmed by the location t-test. It is also imperative to perform analysis of uncertainty. In this work, using either individual or sample standard deviations did not affect the variance/location chi-square test. While the chi-square test provides verification if a laboratory test method is adequate to monitor gross alpha and gross beta radioactivity in drinking water, the test statistics combining variance and location is more useful than the one based on the variance alone because it can identify systematic bias.

N. F. acknowledges partial support by the Questar III STEM Research Institute for Teachers of Science, Engineering, Mathematics, and Technology. K. N. acknowledges partial support by the US Food and Drug Administration under Grant 5U18FD005514-04. Thanks are due to J. Witmer for his valuable comments.

The authors declare no conflicts of interest regarding the publication of this paper.

Semkow, T.M., Freeman, N., Syed, U.-F., Haines, D.K., Bari, A., Khan, A.J., Nishikawa, K., Khan, A., Burn, A.G., Li, X. and Chu, L.T. (2019) Chi-Square Distribution: New Derivations and Environmental Application. Journal of Applied Mathematics and Physics, 7, 1786-1799. https://doi.org/10.4236/jamp.2019.78122

CL: Confidence Level

CSD: Chi-Square Distribution

CST: Chi-Square Test

DL: Detection Limit for radionuclides

EPA: U.S. Environmental Protection Agency

GA: Gross Alpha Radioactivity

GB: Gross Beta Radioactivity

L: Liter

LT: Left Tail

MB: Method Blank

mBq: milli-Becquerel

MCL: Maximum Contaminant Level

MCLG: Maximum Contaminant Level Goal

Mgf: Moment generating function

mL: milli-Liter

mrem: milli-rem

NIST: National Institute of Standards and Technology

pCi: pico-Curie

Pdf: Probability density function

RT: Right Tail

SDWA: Safe Drinking Water Act

STEM: Science, Technology, Engineering and Mathematics

y: year

μSv: micro-Sievert

2T: Two Tail

A.2. Variablesa, b: parameters of the gamma distribution

B: beta function

E: expectation value

E i : expected frequency

f ( s ) : analytic function

gamma: gamma distribution

i, k: indices

m: number of categories

n: number of observations

O i : observed frequency

p: number of parameters for model distribution

s: complex variable

s 0 : pole

t: t-test variable

Var: variance

x i : normal random variable

x ¯ : sample mean

u, z: substituted variables

Γ : gamma function

μ , μ i : expected variable: population, individual

ν : number of degrees of freedom

σ , σ i , σ x : standard deviation, individual, sample

φ i 2 : individual chi-square

χ 2 , χ i 2 , χ n 2 , χ ν 2 : chi-square, for i, n observations, ν degrees of freedom