Non-Regular Example of Confidence-Interval Construction

Jan Vrbik

doi:10.4236/ojs.2023.134024

Open Journal of Statistics > Vol.13 No.4, August 2023

Non-Regular Example of Confidence-Interval Construction

Jan Vrbik
Department of Mathematics and Statistics, Brock University, Catharines, Canada.
DOI: 10.4236/ojs.2023.134024 PDF HTML XML 56 Downloads 223 Views

Abstract

When dealing with a regular (fixed-support) one-parameter distribution, the corresponding maximum-likelihood estimator (MLE) is, to a good approximation, normally distributed. But, when the support boundaries are functions of the parameter, finding good approximation for the sampling distribution of MLE (needed to construct an accurate confidence interval for the parameter’s true value) may get very challenging. We demonstrate the nature of this problem, and show how to deal with it, by a detailed study of a specific situation. We also indicate several possible ways to bypass MLE by proposing alternate estimators; these, having relatively simple sampling distributions, then make constructing a confidence interval rather routine.

Keywords

Non-Regular Parameter Estimation, Maximum-Likelihood Estimator, Asymptotic Distribution, Largest-Order Statistic, Monte-Carlo Simulation

Share and Cite:

Vrbik, J. (2023) Non-Regular Example of Confidence-Interval Construction. Open Journal of Statistics, 13, 475-491. doi: 10.4236/ojs.2023.134024.

1. Introduction

Extensive literature exist describing how to find an approximate sampling distribution of one or more maximum-likelihood estimators of parameters of a regular distribution. The corresponding procedure is based on Rao-Cramer variance (in one-parameter case) or Fisher-information matrix (several parameters), resulting in (either univariate or multivariate, respectively) Normal distribution; in the former case, this can be further improved by including a few extra terms of a more accurate Edgeworth expansion (see, for example, [1] and [2] ). Constructing such approximation is quite routine, even though use of a computer may be necessary to find the required first four moments.

The purpose of this article is to show how much more difficult the same task becomes when the sampled distribution is not regular; we do this using the following one-parameter example: sampling a distribution whose probability density function is given by

$f (x) = \frac{3}{4 θ_{0}} (1 - \frac{x^{2}}{θ_{0}^{2}}) when - θ_{0} < x < θ_{0}$ (1)

(zero otherwise), and finding an approximation for the probability density function (PDF) of the maximum-likelihood estimator (MLE, denoted $\hat{θ}$ ) of the $θ_{0}$ (necessarily positive) parameter. We will also discuss several possibilities for constructing a confidence interval (CI) for the unknown value of $θ_{0}$ , aiming for good accuracy in terms of both the CI’s length (which needs to be small), and its level of confidence (which can be often only approximated).

Note that our example has two distinctive properties (beyond being a case of non-regular estimation), namely

· it involves what is called a scaling parameter; this simplifies the search for a good estimator,

· the corresponding MLE is at the local maximum of the likelihood function, not at its boundary (the latter case is easy to deal with—we leave it for another publication).

In what follows, we use the following terminology: analytic describes results expressed in formula form, while numeric implies using a computational algorithm; similarly, regarding accuracy, we use: exact (self-explanatory; only analytic results qualify), while practically exact answers can be computed numerically to arbitrary precision; accurate implies relatively small errors, and adequate restricts errors to less than one percent.

2. Maximum Likelihood Estimator

To find the MLE of $θ_{0}$ , we differentiate the natural logarithm of the corresponding likelihood function, namely

$L (θ) = \sum_{i = 1}^{n} \ln (1 - \frac{X_{i}^{2}}{θ^{2}}) - n \ln θ + n \ln (\frac{3}{4})$ (2)

with respect to $θ$ , thus getting

$\frac{2}{θ^{3}} \sum_{i = 1}^{n} \frac{X_{i}^{2}}{1 - \frac{X_{i}^{2}}{θ^{2}}} - \frac{n}{θ}$ (3)

Making the last expression equal to 0 results in

$\bar{\frac{\frac{X^{2}}{θ^{2}}}{1 - \frac{X^{2}}{θ^{2}}}} = \frac{1}{2}$ (4)

where the bar indicates the sample mean of the full expression. Solving the last equation for $θ$ , subject to the $θ > \max_{1 \leq i \leq n} | X_{i} |$ constraint (this can be done only numerically) yields the resulting ML estimate, denoted $\hat{θ}$ . Note that the distribution of $Θ : = \hat{θ} / θ_{0}$ is parameter free, which simplifies our task; we then need to find the sampling distribution of Θ only (or an adequate approximation to it); the corresponding transformation back to $\hat{θ}$ is then quite routine.

3. Sampling Distribution of Θ

Let us then investigate the distribution of Θ, a random variable defined as the unique (when $θ > \max_{1 \leq i \leq n} | Y_{i} |$ ) solution to

$\bar{\frac{\frac{Y^{2}}{θ^{2}}}{1 - \frac{Y^{2}}{θ^{2}}}} = \frac{1}{2}$ (5)

where Y stands for a sample of size n from a distribution whose PDF is

$f (y) = \frac{3}{4} (1 - y^{2}) where - 1 < y < 1$ (6)

The usual (regular-case) approach would be to expand the LHS of (5) at $θ = 1$ (the asymptotic mean of Θ), thus getting

$\bar{\frac{Y^{2}}{1 - Y^{2}}} - 2 \cdot \bar{\frac{Y^{2}}{{(1 - Y^{2})}^{2}}} \cdot (θ - 1) + \dots = \frac{1}{2}$ (7)

and then solving for $θ - 1$ ; this yields the following approximate value of $Θ - 1$ :

$Θ - 1 ≃ \frac{\bar{\frac{Y^{2}}{1 - Y^{2}}} - \frac{1}{2}}{2 \cdot \bar{\frac{Y^{2}}{{(1 - Y^{2})}^{2}}}}$ (8)

When both the numerator and denominator have finite moments, this then leads, without much difficulty, to the usual (Normal or Edgeworth) asymptotic result.

Unfortunately, $U : = \frac{Y^{2}}{1 - Y^{2}}$ has $\frac{1}{2}$ as its expected value but its variance is infinite, while $V : = \frac{Y^{2}}{{(1 - Y^{2})}^{2}}$ has both of these moments infinite. The best we can do at this point is to investigate the marginal distribution of the numerator and then, separately, of the denominator (deriving the bivariate distribution of U and V and proceeding in the manner of [3] or [4] proves too difficult).

3.1. The Numerator

Using (6), it is easy to derive the corresponding PDF of U, namely

$f_{n} (u) = \frac{3}{4} \cdot \frac{1}{{(u + 1)}^{5 / 2} \sqrt{u}} where u > 0$ (9)

This can be routinely converted into the corresponding characteristic function (CF) of $U - \frac{1}{2}$ , namely

$χ_{n} (t) : = \frac{t}{2} \exp (- i t) \cdot (K_{0} (- \frac{i t}{2}) + (t - i) K_{1} (- \frac{i t}{2}))$ (10)

where $K_{0}$ and $K_{1}$ are second-kind Bessel functions. When expanded in t, the ln of the last expression yields

$\ln χ_{n} (t) ≃ 0.2591 t^{2} - 0.5890 i t | t | + 0.375 t^{2} \ln | t | + \dots$ (11)

implying that the ln of the characteristic function of

$\frac{\sum_{i = 1}^{n} (\frac{Y_{i}^{2}}{1 - Y_{i}^{2}} - \frac{1}{2})}{\sqrt{n \ln n}}$ (12)

is (now expanded in terms which decrease with n)

$- \frac{3 t^{2}}{16} (1 + \frac{\ln \ln n}{\ln n} - \frac{1.382}{\ln n}) + \frac{3 t^{2} \ln | t |}{8 \ln n} - 0.589 i \frac{t | t |}{\ln n} + \dots$ (13)

The $n \to \infty$ limit of the last expression is $- \frac{3}{16} t^{2}$ , which means that (12) has, in the same limit, the Normal distribution with the mean of 0 and variance equal to $\frac{3}{8}$ , in agreement with [5] (yet, the actual variance of (12) remains infinite at any n). Unfortunately, the convergence to this limit is so slow that using it as an approximation is virtually useless; even when employing the full expression (9), adequate approximation is achieved only when $n \geq 1000$ . As an example, see Figure 1 which displays,

· an empirical histogram of one million values of (12), generated by Monte Carlo (MC) simulation (done using Mathematica) of the same number of $Y_{i}$ samples of size $n = 1000$ , and converting each of them to a value of (12),

Figure 1. Approximating $\bar{U}$ distribution when n = 1000.

· and the corresponding PDF, obtained by raising e to the power of (13) and applying the inverse Fourier transform (also delegated to Mathematica) to the resulting CF; note that $\exp (\frac{3 t^{2} \ln | t |}{8 \ln n})$ had to be further expanded to $1 + \frac{3 t^{2} \ln | t |}{8 \ln n}$ to facilitate proper convergence.

The two graphs support our claim of extremely slow convergence to a Normal distribution, which needed to be extended by extra $\frac{1}{\ln n}$ and $\frac{\ln \ln n}{\ln n}$ -proportional terms of (13) to reach (still less than adequate) accuracy at n as large as 1000. Nevertheless, even this information (the normalizing constant of (12) in particular) is essential in our subsequent attempt to find a useful approximation for the PDF of MLE’s distribution.

3.2. The Denominator

By a similarly routine exercise, the PDF of V is

$f_{d} (v) = \frac{3}{8 \sqrt{2}} \frac{\sqrt{1 + 2 v - \sqrt{1 + 4 v}}}{v^{5 / 2}} (1 - \frac{1}{\sqrt{1 + 4 v}})$ (14)

when $v > 0$ (zero otherwise). This time, there is no simple analytic expression for the corresponding CF; nevertheless, we can still find its t expansion by dividing the required integral into two parts, thus:

$\int_{0}^{\infty} \exp (i v t) f (v) d v = \int_{0}^{1} \exp (i v t) f (v) d v + \int_{1}^{\infty} \exp (i v t) f (v) d v$ (15)

To evaluate the first of these, we expand $\exp (i v t)$ at $t = 0$ , which lets us proceed by numerical integration; in the second part, we expand the $f (v)$ at $v = \infty$ , which enables us to evaluate the resulting integral (term by term) analytically. This yields, for the final sum

$χ_{d} (t) = 1 - 0.4334 | t | - i t (0.4574 + 0.357 \ln | t |) + \dots$ (16)

implying that the ln of the CF of

$\frac{\sum_{i = 1}^{n} \frac{Y_{i}^{2}}{{(1 - Y_{i}^{2})}^{2}}}{n \ln n}$ (17)

is (when expanded as a function of n rather than t)

$(0.375 - \frac{0.3018}{\ln n} + \frac{0.375 \ln \ln n}{\ln n}) i t - \frac{0.589}{\ln n} | t | - 0.375 i \frac{t \ln | t |}{\ln n} + \dots$ (18)

This readily implies that, in the $n \to \infty$ limit, (17) converges to the constant $\frac{3}{8}$ (yet, its expected value and variance remain infinite at any n); but again: this asymptotic (degenerate) distribution is modified (at any finite n) by additional terms, which make the actual convergence extremely slow. Similarly to results of the previous section, the empirical distribution of (17) agrees with the PDF constructed (via inverse Fourier transform) from (18) only at fairly large values of n; Figure 2 illustrates that, even when $n = 1000$ , the error of this approximation is less than adequate.

Figure 2. Approximating the $\bar{V}$ distribution when n = 1000.

3.3. And Their Ratio

Having such (rather imperfect) approximations for the distributions of both the numerator and denominator of (8) does not readily translate into a similar approximation for the ratio itself; attempting to find one would be extremely difficult and (considering the very limited accuracy achieved by such approach so far) ultimately of a very limited practical value. There is empirical evidence that the RHS of (8) cannot adequately approximate $Θ - 1$ unless additional terms of the (7) expansion are included, thus further complicating the matter.

Nevertheless, the previous two sections have given us a good indication that $Θ - 1$ needs to be multiplied by $\sqrt{n \ln n}$ to converge to an asymptotic distribution whose main part is Normal (with zero mean and variance equal to $\frac{2}{3}$ ), but which needs additional, $\frac{1}{\ln n}$ and $\frac{\ln \ln n}{\ln n}$ -proportional extra terms to reach adequate accuracy. But, to construct such an approximation, we can now rely only on MC simulation of a large number of random independent samples (RIS) from (6), each yielding one random value of $Θ$ , as described earlier. Doing this, we discover that, unlike $\bar{U}$ and $\bar{V}$ , the moments of the $Θ - 1$ distribution are finite; this enables us to use Edgeworth approximation (described shortly) for the corresponding PDF.

But first we go over some details of the MC simulation.

4. Monte-Carlo Simulation

Existing software (such as Mathematica) can easily generate random values from

any of the commonly used distributions. Since $Y = :2 T - 1$ , where T is a random variable having the Beta (2, 2) distribution, it is quite easy to generate a RIS of a specific size n from this distribution. Solving (5) for $θ$ is more difficult, as the procedure needs to be fast, accurate and reliable. The way we do it starts with

$θ = \max_{1 \leq i \leq n} | Y_{i} | + \frac{0.2}{\sqrt{n}}$ (19)

and continues by performing four iteration of the basic Newton technique, where a single iteration replaces the current value of $θ$ by

$θ \cdot (1 + \frac{\bar{\frac{{(\frac{Y}{θ})}^{2}}{1 - {(\frac{Y}{θ})}^{2}} - \frac{1}{2}}}{2 \cdot \bar{\frac{{(\frac{Y}{θ})}^{2}}{{(1 - {(\frac{Y}{θ})}^{2})}^{2}}}})$ (20)

This results in a single, random value of $Θ$ ; to get the corresponding empirical distribution of $Θ$ , this is repeated as many times as feasible. The resulting mega-sample of the values of

$W : = (Θ - 1) \sqrt{n \ln n}$ (21)

can then be used to

· display the corresponding histogram;

· convert it into an empirical PDF function (we use Mathematica’s “SmoothKernel Distribution” for this purpose; for the details of the corresponding algorithm, see [6] );

· find accurate estimates of the first few moments of the W distribution, of selected percentiles, etc.

Note that all our simulations use such large mega-samples (of one million RISs, to be specific) that their results can be considered practically exact.

4.1. Example

As an example, we present the results of this procedure when $n = 30$ , displaying the histogram of the W values together with the corresponding empirical PDF in Figure 3; note that the latter can be evaluated only numerically (it does not have an analytic form). Also note both graphs have been constructed based on the same set of data; this explains their perfect agreement.

The simulation has also yielded the following (practically exact) values of the mean, variance, skewness and excess kurtosis of the W distribution: −0.331, $\frac{2}{3} - 0.120$ , −0.571 and 0.391 respectively, while −2.598, −2.342, −1.978, −1.670. 0.733, 0.893, 1.078 and 1.203 are its empirical percentiles (also known as critical

Figure 3. Empirical PDF of W when n = 30.

values), corresponding to 0.5%, 1%, 2.5%, 5%, 95%, 97.5%, 99% and 99.5% respectively (these enable us to construct CIs for $θ_{0}$ when $n = 30$ , as explained below).

4.2. Edgeworth Approximation

When a reasonably accurate analytic formula is desired (for the PDF of W, at a specific value of n), it is natural to use the usual Edgeworth-series expansion, namely

$\begin{matrix} f (w) = \frac{\exp (- \frac{z^{2}}{2})}{\sqrt{2 π}} \cdot (1 + \frac{z^{3} - 3 z}{6} \cdot α_{3} + \frac{z^{6} - 15 z^{4} + 45 z^{2} - 15}{72} \cdot α_{3}^{2} \\ + \frac{z^{4} - 6 z^{2} + 3}{24} \cdot α_{4}) \end{matrix}$ (22)

where

$z = \frac{w - μ}{\sqrt{V}}$ (23)

and $μ$ , V, $α_{3}$ and $α_{4}$ , are the mean, variance, skewness and excess kurtosis taken from a MC simulation. This approximation will not be as accurate as the empirical PDF of Figure 3—the actual difference (when $n = 30$ ) is displayed in Figure 4—but it can be utilized by people lacking the sophistication and computer resources to do their own simulation. Note that the maximum error of this approximation is never bigger than 1% (the shaded area of Figure 4), and is expected (as has been confirmed) to decrease when the sample size gets larger.

By extending the simulation (this will take several hours of CPU) to get the same set of results for $n = 30, 30 \cdot 2, 30 \cdot 2^{2}, \dots, 30 \cdot 2^{9}$ , we have computed and are displaying (in Figure 5) the values of $μ$ (red), $V - \frac{2}{3}$ (blue), $α_{3}$ (green) and

Figure 4. Error of Edgeworth approximation when n = 30.

Figure 5. Four characteristics of W distribution.

$- α_{4}$ (brown) of the W distribution, for any practical sample size n (on the horizontal, log₁₀ scale); the small dots represent the simulated values, each of the four curves has the (sufficiently flexible, by numerical exploration) form of

$\frac{a}{\ln n} + \frac{b \cdot \ln \ln n}{\ln n} + \frac{c}{n}$ (24)

where $a, b, c$ have been found by a least-square fit. The accuracy of the resulting (22)-based approximation is similar to what we have observed for $n = 30$ (improving, rather slowly, with increasing n).

Note that the mean (also known as the expected value) of W has such a substantial bias (the red line) that (unlike in a regular case) it cannot be ignored even when n gets extremely large (this goes for the other three quantities as well).

5. Confidence Intervals

We will now turn our attention to constructing CIs for $θ_{0}$ , with the aim of making their expected length short and their claimed level of confidence as accurate as possible.

The CI construction critically depends on the sample statistics used for the point estimator of $θ_{0}$ ; we will consider several possibilities, staring with the obvious choice of MLE (the only estimator discussed so far).

5.1. MLE-Based CI

Most of what is needed to construct a CI for the true value of $θ_{0}$ has already been done in the previous section. Now, we just need to

· choose the level of confidence (denoted $1 - α$ ), with $α$ typically between 1% and 10% (5% being most common);

· find the corresponding critical values (denoted $C_{α / 2}$ and $C_{1 - α / 2}$ ) of the W distribution (these are obtained either as a by-product of MC simulation or, somehow less accurately, computed from the Edgeworth’s approximation to the PDF of W);

· find $\hat{θ}$ , based on a real (not simulated) RIS of size n, and solve

$(\frac{\hat{θ}}{θ_{0}} - 1) \sqrt{n \ln n} = C_{1 - α / 2}$ (25)

for $θ_{0}$ to get the CI’s lower limit, and repeat with $C_{α / 2}$ in place of $C_{1 - α / 2}$ to get the upper limit.

Note that the expected length of the resulting CI is, to a sufficient approximation

$\frac{C_{1 - α / 2} - C_{α / 2}}{\sqrt{n \ln n}} \cdot E (\hat{θ})$ (26)

When $α = 0.05$ and $n = 30$ , this yields $0.293 \cdot θ_{0}$ (implying that we can be easily off the true value of $θ_{0}$ by 30%); increasing n to 30720, the expected CI length is reduced to an impressive $0.00598 \cdot θ_{0}$ .

We still need to explain how to get (practically exact) critical value $C_{β}$ for a

Figure 6. Critical values C_0.975 (top) and C_0.05 (bottom).

given sample size n, based on a mega-sample of a million $\hat{θ}$ estimates: all we need to do is to take the k^th smallest of these estimates, where k is the nearest integer to $β \cdot 10^{6}$ (something we have already done for several values of $β$ and $n = 30$ ). By an extensive simulation (discussed previously) we have extended these results up to $n = 30720$ ; the values of $C_{0.025}$ and $C_{0.975}$ are displayed in Figure 6. This time, the curves were fitted using (24) with an additional constant term.

An alternate (but more elaborate) way of constructing an MLE-based CI is to minimize the distance between two percentiles (denoted $C_{β}$ and $C_{1 - α + β}$ , where $0 < β < α$ ) while keeping the probability of the corresponding interval equal to $1 - α$ ; this is achieved (based on the same mega-sample) by usual minimization of this distance, after displaying it as a function of $β$ (this can be done graphically). In the case of $n = 30$ and $α = 5 %$ . this yields −1.855 (instead of −1.978) and 0.977 (instead of 1.078) when $β = 0.0333$ ; the distance of the corresponding critical values has thus been shortened to 2.832 (from the old 3.056, i.e. by about 8%), thus improving the CI’s predictive power.

The main disadvantage of using the technique of this section rests in our inability to find, without extensive simulation, an analytic and sufficiently accurate expression for the PDF of W (even computing the estimate itself may prove difficult for some). Yet, it needs to be acknowledged that MLE is more efficient than any other potential estimator of $θ_{0}$ , especially at large values of n (due to the extra $\frac{1}{\ln n}$ factor in its variance; most estimators have a variance decreasing with $\frac{1}{n}$ only). Therefore, in our subsequent search for a simpler, formula-based point estimator of $θ_{0}$ , we must expect some reduction in the estimator’s relative efficiency, and thus in the predictive power of the resulting CI.

The question now is: when abandoning MLE, what other potential candidates are there for a good (i.e. relatively efficient and also unbiased, at least asymptotically) estimator? Furthermore, we would like to be able find, analytically, the asymptotic form of its distribution, and have the exact distribution rapidly converge to this $n \to \infty$ limit.

5.2. Method of Moments

The next traditional choice of an estimator (called MME) uses the method of moments. It works by solving the $E (X) = \bar{X}$ equation for $θ_{0}$ , where $E (X)$ stands for the expected value of a single X, and $\bar{X}$ is the usual sample mean of n such values. Since, in our case, $E (X) = 0$ (thus, not a function of $θ_{0}$ ), the method needs to be modified by using a function of X instead of X itself; in our case, a convenient choice is $| X |$ , whose PDF, namely

$\frac{3}{2 θ_{0}} (1 - \frac{x^{2}}{θ_{0}^{2}}) when 0 < x < θ_{0}$ (27)

leads to $E (| X |) = \frac{3}{8} θ_{0}$ . The new (fully unbiased) estimator of $θ_{0}$ is thus

$\hat{θ} : = \frac{8}{3} \frac{\sum_{i = 1}^{n} | X_{i} |}{n}$ (28)

(we keep the original notation for the new estimator).

We can now (rather routinely) compute the mean, variance, skewness and excess kurtosis of this estimator to be $μ = θ_{0}$ , $V = \frac{19}{45} \cdot \frac{θ_{0}^{2}}{n}$ , $α_{3} = 14 \sqrt{\frac{5}{19^{3} n}}$ and $α_{4} = - \frac{2106}{2527 n}$ respectively. The Central Limit Theorem (CLT) tells us that the sampling distribution of $\hat{θ}$ is approximately Normal, having the above mean and variance. Unlike the previous case of MLE, this approximation is adequate even at relatively small values of n. Furthermore, we can substantially improve its accuracy by incorporating corrections indicated by (22); Figure 7 displays (using the z scale) the corresponding error of both approximations

Figure 7. Error of CLT and Edgeworth (dashed) PDFs when n = 30.

relative to (MC generated but practically exact) PDF of

$W : = (\frac{\hat{θ}}{θ_{0}} - 1) \sqrt{n}$ (29)

Note that the simulation was required only for this comparison; it is not a part of the technique itself.

The largest possible error of the CLT approximation at $n = 30$ (the worst-case scenario) is about 1% (adequate), while the Edgeworth approximation is distinctively more accurate (indicated by the area under the curve, not by its height). Note that the CTL’s error will decrease with $\frac{1}{\sqrt{n}}$ , while that of Edgeworth’s approximation does it with $\frac{1}{n^{3 / 2}}$ (substantially faster).

The advantage of these approximations is that both have a simple analytic form; this enables us to directly compute the critical values required for construction of CIs. Furthermore, these critical values (using the W scale) are, when using the CLT approximation, independent of n; thus, for example

$C_{0.025} = - 1.960 \cdot \sqrt{\frac{19}{25}} = - 1.274$ and similarly $C_{0.975} = 1.274$ . Using MC simulation and $n = 30$ , we have established the true confidence level of the corresponding CI to be 0.9508; the approximation is thus quite accurate even for such a small samples (better accuracy yet can be achieved via the Edgeworth approximation; something we leave for the reader to try).

Using the appropriate analog of (26), namely

$\frac{C_{1 - α / 2} - C_{α / 2}}{\sqrt{n}} \cdot E (\hat{θ})$ (30)

the expected length of a CI when $α = 0.05$ and $n = 30$ is $0.465 \cdot θ_{0}$ (59% longer than MLE-based CI); when n increases to 30,720, the length becomes 0.0145 (2.43 times longer, which makes the technique rather non-competitive at large n).

In summary: using (28) as a basis of CI construction is computationally quite simple, as its distribution is approximately Normal yet reasonably accurate; unfortunately, the predictive power of the resulting CI (measured by its length) is less than impressive.

5.3. Utilizing Order Statistic

In this section, we explore yet another estimator of $θ_{0}$ , namely the n^th-order statistic of a sample of n values of $| X |$ , i.e. $\hat{θ} : = \max_{1 \leq i \leq n} | X_{i} |$ ; note that its exact sampling distribution and the corresponding $n \to \infty$ limit are now both readily available in an analytic form. The idea of using this estimator rests on a hint we get from our MLE (which was always only slightly bigger than this maximum).

Transforming $\hat{θ}$ to

$W : = (1 - \frac{\max_{1 \leq i \leq n} | X_{i} |}{θ_{0}}) \cdot \sqrt{n}$ (31)

We can easily compute

$\Pr (W > w) = \Pr {(1 - \frac{| X |}{θ_{0}} > \frac{w}{\sqrt{n}})}^{n} = {(1 - \frac{3}{2 n} w^{2} + \frac{w^{3}}{2 n^{3 / 2}})}^{n} when 0 < w < \sqrt{n}$ (32)

based on

$\Pr (\frac{| X |}{θ_{0}} < 1 - \frac{w}{\sqrt{n}}) = \frac{3}{2} (1 - \frac{w}{\sqrt{n}} - \frac{{(1 - \frac{w}{\sqrt{n}})}^{3}}{3}) = 1 - \frac{3}{2 n} w^{2} + \frac{w^{3}}{2 n^{3 / 2}}$ (33)

which follows from (27), and the corresponding

$\Pr (\frac{| X |}{θ_{0}} < x) = \frac{3}{2} (t - \frac{x^{3}}{3}) when 0 < x < 1$ (34)

Note that, when n is sufficiently large, we can approximate (32) by its $n \to \infty$ limit, namely

$\Pr (W > w) ≃ \exp (- \frac{3}{2} w^{2}) when w > 0$ (35)

Both formulas (exact and approximate) indicate that $\hat{θ}$ is unbiased only asymptotically; the exact bias needs to be computed from (32); for sufficiently large n we get the asymptotic result of

$E (\hat{θ}) = (1 - \sqrt{\frac{π}{6 n}}) \cdot θ_{0}$ (36)

obtained from (35). Similarly, the asymptotic variance of $\hat{θ}$ is $\frac{4 - π}{6 n} \cdot θ_{0}^{2}$ .

To construct a CI for $θ_{0}$ , one needs to find $C_{α / 2}$ ( $C_{1 - α / 2}$ ) by solving $\Pr (W > w) = 1 - \frac{α}{2}$ ( $= \frac{α}{2}$ ) for w, and then solve $W = C_{α / 2}$ for $θ_{0}$ (getting the lower boundary) and $W = C_{1 α / 2}$ (upper boundary). In Figure 8, we plot the two critical values (using $α = 5 %$ ) as functions of n; note that there are only minute differences between these and their $n \to \infty$ limits, namely to $C_{0.025} = 0.1299$ and $C_{0.975} = 1.5682$ .

This implies that the asymptotic values can be used at practically any n; even when $n = 30$ , the true confidence level thus achieved is 94.6% instead of 95% (a tolerable error).

To assess the predictive power of the new technique: the expected length of a CI (when $α = 0.05$ and $n = 30$ ) is, based on (30), $0.305 \cdot θ_{0}$ (only 4% higher than when using MLE); increasing n to 30720 reduces it to $0.00829 \cdot θ_{0}$ (about 39% higher—still quite tolerable).

Furthermore, we can make any such CI shorter by optimizing its length while keeping the confidence level at 95%, as done in the previous section; to demonstrate

Figure 8. C_0.975 (top) and C_0.025 (bottom) of (31) distribution.

this, we use $n = 30720$ and the asymptotic formulas (practically exact at such large n). Skipping routine details, this yields $C_{0.0439} = 0.5375$ and $C_{0.9939} = 1.4435$ , resulting in a new length of $0.00787 \cdot θ_{0}$ (only 32% higher than MLE-based CI).

To summarize: with the last estimator, we have achieved a combination of simplicity, ease of computation and a fast convergence to asymptotic formulas, without much loss in the predictive power of the corresponding CI.

6. Conclusions and Future Research

We have shown that finding an accurate and efficient CI for a single parameter is, in a case of non-regular estimation, a surprisingly difficult task; as a convincing simple example, we have presented a case study of estimating a scaling parameter of a distribution closely related to Beta (2, 2). We have delineated several possibilities for dealing with this problem by using the MLE, MME, and the largest-order statistics as point estimators, and investigating their sampling distributions. We have discovered that this distribution converges to Normal for both MLE and MME, but that in the former case the convergence is too slow to be of any practical use (the CI construction needs to rely on CPU-intensive numerical approach). Using MME and $\max | X |$ resulted in simple, analytic formulas for each of their sampling distributions and their asymptotic form (reached relatively quickly as n increases). Unfortunately, the variance of MME is too large to make the resulting CI competitive, but using $\max | X |$ proved to be both simple and relatively efficient.

Empirically (and rather surprisingly) we have also discovered that, similarly to regular cases, the distribution of

$2 L (\hat{θ}) - 2 L (θ_{0})$ (37)

(where $L (θ)$ was defined in (2)) is approximately $χ_{1}^{2}$ . This is demonstrated in Figure 9 using $n = 10$ (at such small n, most approximations become visibly inaccurate; the practically perfect fit we see here is truly amazing). To construct a

Figure 9. Histogram of (37)-values and PDF of $χ_{1}^{2}$ .

95% CI for the value of $θ_{0}$ , all we have to do is to compute $\hat{θ}$ (based on a specific RIS), make (37) equal to $C_{0.95}$ of the $χ_{1}^{2}$ distribution (i.e. 3.8415), and solve for $θ_{0}$ (there will always be two solutions, providing the CI’s boundaries). This produces CI’s similar to those constructed in the last paragraph of the previous section, including the $1 / \sqrt{n}$ -proportional decrease of the CI’s length. Theoretical justification of these statements, and finding out whether they are true in general (after excluding MLEs involving max or min which have different, but easy-to-find, asymptotic distributions) will require further investigation.

We acknowledge that every case of non-regular estimation is unique and may require exploring other possibilities than those presented here; nevertheless, the results of our article provide guidelines for such an exploration.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	Edgeworth, F.Y. (1905) The Law of Error. Transactions of the Cambridge Philosophical Society, 20, 36-66, 113-141.
[2]	Vrbik, J. (2022) Fisher Transformation via Edgeworth Expansion. arXiv: 2208.05070v1.
[3]	Logan, B.F., Mallows, C.L., Rice, S.O. and Shepp, A.L. (1973) Limit Distributions of Self-Normalized Sums. The Annals of Probability, 1, 788-809. https://doi.org/10.1214/aop/1176996846
[4]	Spataru, A. (2014) Convergence and Precise Asymptotics for Series Involving Self-Normalized Sums. Journal of Theoretical Probability, 29, 267-276. https://doi.org/10.1007/s10959-014-0560-1sss
[5]	Gnedenko, B.V. and Kolmogorov, A.N. (1954) Limit Distributions for Sums of Independent Random Variables. Addison-Wesley, Cambridge.
[6]	https://reference.wolfram.com/language/ref/SmoothKernelDistribution.html

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies