A New Definition of P-Value Involving the Maximum Likelihood Estimator

Abstract

We present here an alternative definition of the P-value for statistical hypothesis test of a real-valued parameter for a continuous random variable X. Our approach uses neither the notion of Type I error nor the assumption that null hypothesis is true. Instead, the new P-value involves the maximum likelihood estimator, which is usually available for a parameter such as the mean μ or standard deviation σ of a random variable X with a common distribution.

Share and Cite:

Corley, H. (2024) A New Definition of P-Value Involving the Maximum Likelihood Estimator. Open Journal of Statistics, 14, 546-552. doi: 10.4236/ojs.2024.145024.

1. Introduction

A principal goal of statistics is to obtain evidence from data for comparing alternative decisions. For example, statistical evidence may allow one to decide that a population mean μ satisfies μ μ 0 as opposed to μ> μ 0 for some specified μ 0 . Unfortunately, evidence is an ambiguous concept in statistics, though [1]-[7] among others have attempted to define it. Currently, the P-value [8] [9] is perhaps the most frequently applied measure of evidence used to reject or fail to reject a hypothesis. In Section 2, we review the standard P-value and the notion of a maximum likelihood estimator (MLE). We then propose the new MLE P-value involving the MLE of some parameter of interest. The MLE P-value is not related to significance levels and not defined under the assumption that the null hypothesis is true. Both the standard and new MLE P-values may be considered measures of evidence, but the latter is more intuitive. In Section 3, we present four examples and compare the two approaches.

2. The Standard and MLE P-Values

The notion of P-value is a fundamental tool in statistical inference and has been widely used for reporting outcomes of hypothesis tests. Yet in practice, the P-value is often misinterpreted, misused, or miscommunicated. Moreover, it does not unequivocally reflect the available evidence for the null hypothesis since H0 is assumed to hold. In this section, we propose the new MLE P-value that may give different values in some cases than do existing definitions. The MLE P-value provides a simple and intuitive interpretation of P-value. Our definition appears applicable to a wide range of hypothesis testing problems and yields an interpretation of P-value as both a cardinal and ordinal measure of the evidence. We restrict our development to standard one-sided hypothesis testing, but Example 3 illustrates that the approach here can also be used for two-sided hypothesis testing.

We first summarize the two standard ways of defining P-value for the general hypothesis test H 0 :θ Θ 0 vs H 1 :θ Θ 0 with a parameter space Θ 0 . Let T( X ) be a test statistic for a random sample X=( X 1 ,, X n ) from a random variable X with a single scalar parameter θ . Denote the observed data for X as x=( x 1 ,, x n ) .

Definition 1 (Definitions of P-value from [10] and [11]). Under the assumption that H 0 :θ Θ 0 is true, the P-value associated with the observed data x is defined as either

PV1( x| Θ 0 )= sup θ Θ 0 P{ T( X )T( x )|θ } (1)

or

PV2( x| Θ 0 )=inf{ α:T( x ) R α }, (2)

where R α is the rejection region for a level of significance α .

Equation (1) is usually interpreted as follows. Under the assumption that H0 is true, P-value is the probability that T( X ) is as least as extreme as its observed value T( x ) . This interpretation can lead to the common misunderstanding that this definition of P-value is the probability that H0 is true. On the other hand, under the assumption that H0 is true, Equation (2) is based on significance levels (Type I Error probabilities) and can lead to the misunderstanding that P-value is only a measure of Type I Error and not related to the likelihood that H0 is true.

Both definitions lead to the question: how can the assumption that H0 is true produce evidence that H0 is true? The answer is that H0 is rejected when there is a probability less than or equal to 0.05 (for example) that H0 is true when it is assumed true. In other words, a contradiction involving probability is reached. For other issues with these definitions, see [10]-[12], for example. To address such issues, we utilize the well-known maximum likelihood estimator (MLE) [13] [14] and define a P-value utilizing the maximum likelihood estimator for the parameter under consideration.

Definition 2 (Likelihood Function and MLE). Let x=( x 1 ,, x n ) be sample data from a random sample X=( X 1 ,, X n ) from a random variable X with sample space S and real-valued parameter θ . For the joint pdf f( x|θ ) of the random sample X . For any sample data x , the likelihood function of θ is defined as

L( θ|x )=f( x|θ ), (3)

where L( θ|x ) in (3) is a function of the variable θ for given data x . The MLE θ ^ ( x ) maximizes L( θ|x ) for the data sample x ; i.e., θ ^ ( x )= argmax  θ   L( θ|x ) in terms of x .

Definition 3 (MLE P-value). Let x=( x 1 ,, x n ) be observed random sample data for a random sample X=( X 1 ,, X n ) from a continuous random variable X with a single real-valued parameter θ and pdf f X ( x|θ ) . Let Y= θ ^ ( X ) denote the MLE for θ with pdf f Y ( y|θ ) . For the hypothesis test H 0 :θ Θ 0 vs H 1 :θ Θ 0 , the MLE P-value (MPV) at the sample data x for the null hypothesis H 0 :θ Θ 0 is defined as

MPV( x| Θ 0 )= Θ 0 f Y ( y|θ= θ ^ ( x ) )dy, (4)

where the integration is over the scalar values y Θ 0 of Y= θ ^ ( X ) .

In the integration of (4), θ is estimated by the number MLE θ ^ ( x ) from the sample data x . Then the pdf f Y ( y|θ= θ ^ ( x ) ) of the MLE Y= θ ^ ( X ) is integrated over the values θ in the null hypothesis H 0 :θ Θ 0 . For example, in the one-tailed hypothesis test H 0 :θ θ 0 vs H 1 :θ> θ 0 , the set Θ 0 would be ( , θ 0 ] . In this case (4), H 0 :θ Θ 0 approximates the frequentist probability that Y= θ ^ ( X ) Θ 0 , and so MPV( x| Θ 0 ) approximates the likelihood that is true. Any inaccuracies are due (i) to integrating over Y= θ ^ ( X ) since θ has no prior or posterior distribution and (ii) to setting θ= θ ^ ( x ) . Both result from using the MLE θ ^ ( x ) as a surrogate for the unknown parameter θ . Doing so, however, utilizes the fact that distributions of MLEs for a parameter θ are often known for a continuous random variable X with a common distribution such as the normal or exponential. In such cases, (4) may be analytically integrable. If not, a numerical integration could possibly be employed. However, only analytical integration is considered here.

3. Examples

Four examples are now presented to illustrate the MLE P-value approach. The first two involve the normal random variable, and the third involves an exponential random variable. The fourth demonstrates a possible limitation of the method.

Example 1. For a random sample of size n , consider the hypothesis test H 0 :μ μ 0 vs H 1 :μ> μ 0 for the parameter μ of a normal random variable X~N( μ, σ 2 ) . In this case, we rewrite MPV( x| Θ 0 ) as MPV( x| μ 0 ) . We also show that (4) gives the standard P-value for θ=μ .

Result 1. Let X 1 ,, X n be a random sample from a random variable X~N( μ, σ 2 ) with unknown μ , and consider the one-sided hypothesis test H 0 :μ μ 0 vs H 1 :μ> μ 0 . Let x ¯ be the sample mean for sample values x 1 ,, x n . When σ 2 is known,

MPV( x| μ 0 )=ϕ( μ 0 x ¯ σ/ n ); (5)

and when σ 2 is unknown,

MPV( x| μ 0 )= F n1 ( μ 0 x ¯ s/ n ), (6)

where ϕ( z ) is the cdf for the standard normal distribution and F n1 ( t ) is the cdf for the student t distribution with n − 1 degrees of freedom.

Proof. When σ 2 is known, to prove (5) we use the fact [9] that the MLE μ ^ ( X ) for μ is the sample mean X ¯ ~N( μ, σ 2 /n ) . The integral of (4) therefore becomes

P[ X ¯ μ 0 ]=ϕ( μ 0 μ σ/ n ). (7)

Substituting x ¯ for μ in (7) gives (5). When σ 2 is unknown, Equation (6) follows from the definition of the student t distribution [9]. ■

Numerically, when n=9 , μ 0 =12 , σ=2 , and x ¯ =13 , the MLE P-value of (5) for H 0 :μ μ 0 vs H 1 :μ> μ 0 is ϕ( 1213 2/3 ) 0.067 from the standard normal table [9]. If n=25 instead, the MLE P-value of (5) is ϕ( 1213 2/5   )0.006 , and it is much less likely that μ μ 0 . It should be noted that the right sides of (5) and (6) are the standard P-values for the hypothesis test H 0 :μ μ 0 vs H 1 :μ> μ 0 using the definition of (2). In particular, the usual test statistic for this case is z= x ¯ μ 0   σ/ n with a critical region of z> z α [9], where α is the level of significance, i.e., the probability of committing a Type I error. The P-value is then the lowest level of α for which the observed z is significant. But this lowest level of α in standard hypothesis testing is simply 1ϕ( x ¯ μ 0   σ/ n )=ϕ( μ 0 x ¯ σ/ n ) , which is the right side of (5).

Example 2. For a random sample of size n , consider the hypothesis test H 0 : σ 2 σ 0 2 vs H 1 : σ 2 > σ 0 2 for the variance σ 2 of a normal random variable X~N( μ, σ 2 ) . From [9], the MLE for the variance of a normal distribution is σ ^ 2 ( X )= i=1 n ( X i X ¯ ) 2 n , so n σ ^ 2 ( X ) σ 2 ~ χ 2 ( n1 ) , from which we have the following.

Result 2. Let X 1 ,, X n be a random sample from a random variable X~N( μ, σ 2 ) with unknown σ 2 , and consider the one-sided hypothesis test H 0 : σ 2 σ 0 2 vs H 1 : σ 2 > σ 0 2 . Then the integral of (4) becomes

NPV( x| σ 0 2 )= F χ 2 ( n1 ) ( n σ 0 2 σ ^ 2 ( x ) ). (8)

Proof. From [9], the MLE σ ^ 2 ( X ) for σ 2 of a normal distribution is i=1 n ( X i X ¯ ) 2 n = ( n1 ) S 2 n and so n σ ^ 2 ( X ) σ 2 ~ χ 2 ( n1 ) . Hence

P[ σ ^ 2 ( X ) σ 0 2 ]=P[ n σ ^ 2 ( X ) σ 2 n σ 0 2 σ 2 ]= F χ 2 ( n1 ) ( n σ 0 2 σ 2 ). (9)

Substituting σ ^ 2 ( x )  for σ 2 in F χ 2 ( n1 ) ( n σ 0 2 σ 2 ) in (9) yields (8). ■

Numerically, when n=10 , σ 0 2 =4 , and s 2 =9 , the MLE P-value of (8) for H 0 : σ 2 σ 0 2 vs H 1 : σ 2 > σ 0 2 becomes F χ 2 ( 9 ) ( 4.94 )0.16 .

Example 3. In this example, we illustrate that the MLE approach is applicable in two-sided hypothesis testing. For a random sample of size n , consider the hypothesis test H 0 :μ= μ 0 vs H 1 :μ μ 0 for the parameter μ of X~N( μ, σ 2 ) with σ known. To be more realistic, we modify this test to H 0 :μ[ μ 0 δ, μ 0 +δ ] vs H 1 :μ[ μ 0 δ, μ 0 +δ ] , where δ is an acceptable tolerance level, i.e., an acceptable deviation from μ 0 . In this case, (4) becomes

MPV( x| μ 0 )=ϕ( μ 0 +δ x ¯ σ/ n )ϕ( μ 0 δ x ¯ σ/ n ). (10)

Numerically, when n=25 , μ 0 =12 , σ 2 =16 , x ¯ =12.3 , and δ=0.5 , the MLE P-value of (10) becomes ϕ( 0.25 )ϕ( 1.00 )0.4400 .

Example 4. Let the random variable X have an exponential distribution with pdf f X ( x|λ )=λ e λx , x>0 , with parameter λ>0 . For a random sample of size n , consider the hypothesis test H 0 :λ λ 0 vs H 1 :λ> λ 0 . From [15], the MLE for λ is Y=1/ X ¯ . It can be shown [16] that X ¯ for the exponential random variable X follows the gamma distribution Γ( n,nλ ) . Hence Y follows an inverse gamma distribution [16], so the right side of (4) becomes the regularized gamma function Q( n, n λ 0 X ¯ ) , which is available in various software packages such as MATLAB.

Example 5. To illustrate the difficulty of finding the pdf f Y ( y|θ= θ ^ ( x ) ) for MLE Y= θ ^ ( X ) in the integral of (4), consider the random variable X with pdf f X ( x|θ )=θ x ( θ+1 ) for x>1 and the parameter θ>0 . Then it can be shown [9] that θ ^ ( x )= n i=1 n ln( X i ) , whose pdf is difficult to determine.

4. Conclusions

The MLE P-value defined here gives a value that estimates the probability that a one-sided null hypothesis on a single parameter is true. Two-sided hypotheses can be similarly treated using tolerance levels as in Example 3. In the approach of this paper, the MLE Y= θ ^ ( X ) may be thought of as a new type of test statistic that is integrated over Θ 0 , while its numerical value θ ^ ( x ) from the data x is used to estimate θ in this integration. Obtaining such an approximation of the probability that the null hypothesis is true is, in fact, the ultimate goal of hypothesis testing. Certainty is not possible. But given the MLE P-value, a decision maker would need to decide if this value is sufficiently large to accept the null hypothesis. Different decision makers might judge a given MLE P-value probability differently, but a metric for the decision has now been provided.

The principal limitation of the analytical approach here is that the pdf of the maximum likelihood estimator needs to be both known and reasonable to integrate. Future work should be directed at the numerical integration of these integrals. Tables could be developed for certain parameters of particular random variables X. Although MLEs have been used here as the surrogate for single parameters, other estimators could also be used. The advantage of MLEs is that they are well-studied and often immediately available. Finally, an approach similar to that of this paper could be applied to hypothesis testing for parameters of discrete random variables.

Acknowledgements

The author thanks his former student Maryham Moghimi for her discussion about statistics and the P-value.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1] Shafer, G. (1976) A Mathematical Theory of Evidence. Princeton University Press.
[2] Evans, M. (2015) Measuring Statistical Evidence Using Relative Belief. Chapman and Hall/CRC.
https://doi.org/10.1201/b18587
[3] Hacking, I. and Romeijn, J. (2016) Logic of Statistical Inference. Cambridge University Press.
https://doi.org/10.1017/cbo9781316534960
[4] Dollinger, M.B., Kulinskaya, E. and Staudte, R.G. (1996) When Is a P-Value a Good Measure of Evidence? In: Lecture Notes in Statistics, Springer, 119-134.
https://doi.org/10.1007/978-1-4612-2380-1_8
[5] Hubbard, R. and Lindsay, R.M. (2008) Why P-Values Are Not a Useful Measure of Evidence in Statistical Significance Testing. Theory & Psychology, 18, 69-88.
https://doi.org/10.1177/0959354307086923
[6] Singh Chawla, D. (2017) Big Names in Statistics Want to Shake up Much-Maligned P Value. Nature, 548, 16-17.
https://doi.org/10.1038/nature.2017.22375
[7] Wang, B., Zhou, Z., Wang, H., Tu, X.M. and Feng, C. (2019) The P-Value and Model Specification in Statistics. General Psychiatry, 32, e100081.
https://doi.org/10.1136/gpsych-2019-100081
[8] Chavalarias, D., Wallach, J.D., Li, A.H.T. and Ioannidis, J.P.A. (2016) Evolution of Reporting P-Values in the Biomedical Literature, 1990-2015. Journal of the American Medical Association, 315, 1141-1148.
https://doi.org/10.1001/jama.2016.1952
[9] Walpole, R.E., Myers, R.H., Myers, S.L. and Keying, Y. (2017) Probability and Statistics for Engineers and Scientists. 9th Edition, Pearson Education.
[10] Abell, M.L., Braselton, J.P. and Rafter, J.A. (1999) Statistics with Mathematica. Academic Press.
[11] Lehmann, E.L. and Romano, J.P. (2005) Testing Statistical Hypotheses. Springer.
[12] Goodman, S. (2008) A Dirty Dozen: Twelve P-Value Misconceptions. Seminars in Hematology, 45, 135-140.
https://doi.org/10.1053/j.seminhematol.2008.04.003
[13] Pawitan, Y. (2013) In All Likelihood. Statistical Modeling and Inference Using Likelihood. 1st Edition, The Clarendon Press.
[14] Casella, G. and Berger, R.L. (2002) Statistical Inference. 2nd Edition, Cengage Learning.
[15] Hines, W.W., Montgomery, D.C., Goldman, D.M. and Borror, C.M. (2003) Probability and Statistics in Engineering. 4th Edition, Wiley.
[16] https://math.stackexchange.com/questions/155296/distribution-of-the-sample-mean-of-a-exponential
[17] https://en.wikipedia.org/wiki/Inverse-gamma_distribution

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.