On the Estimation of a Univariate Gaussian Distribution: A Comparative Approach

Estimation of the unknown mean, μ and variance, σ2 of a univariate Gaussian distribution ( ) N 2 , μ σ given a single study variable x is considered. We propose an approach that does not require initialization of the sufficient unknown distribution parameters. The approach is motivated by linearizing the Gaussian distribution through differential techniques, and estimating, μ and σ2 as regression coefficients using the ordinary least squares method. Two simulated datasets on hereditary traits and morphometric analysis of housefly strains are used to evaluate the proposed method (PM), the maximum likelihood estimation (MLE), and the method of moments (MM). The methods are evaluated by re-estimating the required Gaussian parameters on both large and small samples. The root mean squared error (RMSE), mean error (ME), and the standard deviation (SD) are used to assess the accuracy of the PM and MLE; confidence intervals (CIs) are also constructed for the ME estimate. The PM compares well with both the MLE and MM approaches as they all produce estimates whose errors have good asymptotic properties, also small CIs are observed for the ME using the PM and MLE. The PM can be used symbiotically with the MLE to provide initial approximations at the expectation maximization step.


Introduction
The Gaussian distribution is a continuous function characterized by the mean µ and variance σ 2 .It is regarded as the mostly applied distribution in all of the science disciplines since it can be used to approximate several other distributions.We consider a single observation x obtained from a univariate Gaussian distribution with both the mean µ and variance, σ 2 , unknown, that is x N µ σ , µ −∞ < < ∞ .In this paper the problems of estimat- ing the sufficient parameters of a normal distribution using the iterative methods are discussed.We then propose an algorithm that mitigates the problems associated with the iterative techniques.A thorough discussion of the iterative techniques and their related algorithms can be obtained from [1]- [6].The mean µ and the variance σ 2 are referred to as sufficient parameters in most of the statistics literature and this is due to the fact that they contain all information about the probability distribution function, see Equation (1).
, 2 π exp 2 An important problem in statistics is to obtain information about the mean, µ, and the variance, σ 2 of a given population.The estimation of these parameters is central in areas such as machine learning, pattern recognition, neural networks, signal processing, computer vision and in feature extraction, see [6]- [11].
The rationale and motivation for the proposed approach are presented in Section 2. The methodological steps and the datasets simulated to validate the proposed approach are discussed in Section 3. Explicit estimation steps using the ordinary least squares method are presented in Section 4. Statistical analysis results on simulations are presented in Section 5.The error distribution analyses are presented in Section 6. Accuracy results for the proposed method (PM) and maximum likelihood estimation (MLE) methods are presented in Section 7. In Sections 8 and 9 we provide a thorough discussion of the results and some concluding remarks on the study findings.

Rational and Motivation
Numerical methods for estimating parameters of a Gaussian distribution function are well known like the bisection method, Newton-Raphson, secant, false position, Gauss-Seidel, see [12]- [15].Other methods for obtaining analytical solutions are, the maximum likelihood estimation (MLE), maximum distance estimation, maximum spacing estimation and moment-generating function method, see [16]- [18].However, these approaches are largely dependent on guess initial values.The guess initial values may not guarantee convergence, could take a longer time or even fail to converge in case they are far from the optimal solution, hence requiring high expertise for their application, see [19].The MLE is regarded as the standard approach to most of the nonlinear estimation problems as it always converges to the required minimum given "good" initial guess approximations, however, it requires the maximization of the log-likelihood method [20].Application of the MLE procedure may present a challenge if necessary software is not available; it requires the applicant to have a mathematical background as it is necessary for the user to transform the likelihood function into its natural logarithm, referred to as the log-likelihood in most of the statistical literature.Since the maximum of the function is usually required, it is constrained that the derivative of the parent function is obtained a priori, and solving for the parameters being maximized.However, this can only be achieved by maximizing the log-likelihood function and not the parent function.Another difficulty is encountered at the initialization step, according to [21]: "One question that plagues all hill-climbing procedures is the choice of the starting point.Unfortunately, there is no simple, universally good solution to this problem."as cited by [22].We present a method for computing acceptable parameter values for the mean and variance that could be applied as initial guess values when the proposed approach is used symbiotically with the MLE.

Methodology
We transform the Gaussian density function (1) into a new function that is linear with respect to some of the unknown parameters or their combinations in an appropriate form.For linearization, we consider the derivatives for the parent function (1).The unknown regression parameters are then estimated using the ordinary least squares (OLS) methods.The employed frame-work was first proposed by [19] and has been used in the estimation of exponential functions; see [23].We propose a version of this frame-work and use it to estimate the Gaussian distribution parameters.The PM is compared with both the MLE and MM the traditional estimation procedures on three simulations of normal datasets of known mean and standard deviation.The first two datasets are concerned with the study of hereditary physical characteristics see [24] in which both the father and daughter's heights were studied.The third dataset was concerned with the morphometric analysis of DDT-resistant and non-resistant housefly strains, in which the housefly wing lengths are analyzed, see [25].We estimate the known mean, µ, and standard deviation, σ, of the respective datasets using the three methods, that is the PM, MLE and MM.
In the course of estimation of the parameters using the PM, we anticipate that, there is a shift of the estimated parameters from their "true" values.The amount of this shift is what is commonly referred to as accuracy, and is computed as the difference between the known values and the estimates from the underlying process [26].The distribution of the errors from the evaluated approaches is an important aspect that gives a clue on which assessment methods are to be employed, that is standard, visual or otherwise non parametric measures.

Transformation and Re-Characterization
It is always a requirement to estimate the parameters of a Gaussian distribution in most of the data modelling aspects involving normally distributed observations.In this section the method we present has not been considered before in the statistical literature that has been reviewed.The approach is to transform the original Gaussian function (1), and this is done by taking its first derivative and subsequently introducing new parameters either as linear or their combination.
Re-arranging Equation ( 5) We observe from Equation ( 6) that the original function ( 1) is contained in both the first and second terms.Hence, we write Equation ( 6) as , , , where ( ) Introducing new parameters in Equation ( 8) to formulate a model linear in the new parameters, we obtain a simple linear model of the form There are well-recognised approaches for obtaining the parameter, ϕ and τ such as, least-squares, Baye- sian techniques and maximumlikelihood methods [27]- [29].In this estimation problem we consider the least squares method since Equation (9) represents a simple linear least square model and it satisfies atleast one or two of the following assumptions: 1) Each of the independent variables (in this case ( ) f x ) in the model is multiplied by an unidentified para- meter.
2) The model contians at most one unidentified parameter that does not have an independent variable.
3) All the discrete terms are summed to yield the ultimate model value [30].

Estimation Criteria
Parameter estimation is an important aspect in most of the statistical modelling frame-works.The major goal of estimation is to obtain the numerical values of the regression coefficients associated with individual or a combination of the regressors [30].For the proposed approach the estimation is as follows, If a dataset say, be an estimation of ( ) j f x ′ at the point j x with the error of this estimation as ( ) ( ) We estimate the error, since it is known that an important part of estimation is the assessment of how much the computed value will vary due to noise in the dataset.When information concerning the deviations is not available, then there is no basis on which comparison of the estimated value to the "true" or target value can be done [30].
The sum of squares of the errors over all the data points is ( ) ( ) ( ) .
In Equation ( 14), variables j x and ( ) j f x are known; ( ) j f x ′ can be computed uisng numerical methods, Davis (2001), in this case we apply the Newton's difference quotient method [31] ( ) ( ) ( ) So that, as the goal function for the ordinary least sqaures estimation of the parameters ϕ and τ .Available statistical software packages can be used to obtain estimates φ and τ .It is now possible to relate the model (1) para- meters with the estimated parameters of Equation ( 16) as The estimates of Gaussian distribution parameters are then estimated as

Method Evaluation
In oder to evaluate the performance of the proposed method (PM), we perform simulations of the father and daughters heights using Mathematica software [24] and compute their respective means and standard deviations, 67.7; 2.8 .We now require to estimate the known means and standard deviations of the considered datasets using the PM, MLE and MM.The analysis is done on two samples, 100 n = and 1000 n = . This is to ascertain the performance of the PM on both small and large samples, see Tables 1-3

Error Distribution Analysis
We are frequently faced with a situation of processing volumes of data whose generative process we are uncertain about, yet it is always necessary to understand the sampling theory and statistcial inference before carrying out any parameter estimation in statistical modelling problems [30].In this paper we consider performing exploratory analysis on the error distribution as generated by each of the three evaluated approaches on estimating the "true" or required parameters µ and σ .

Visualization of Normality
We aim at establishing the distribution of the errors from the PM in comparison to those from the standard method, that's MLE.We would wish to use the easier standard statistical techniques like, the Pearson Chi-Square, the Jacque-berra, and the Kolmogorov-Smirnov methods to test for normality in the errors, but such tests are usually more receptive in case of large datasets.In that case visual methods have been preferred, see Figures 1-6, and these have several advantages [32].

Histogram Plots
Error distribution can with little effort be observed by a histogram of the sampled errors, where the error counts are plotted.Such a histogram presents an overview of the normality of the error distribution, see Figures 1-6.
For comparison with normality, normal distribution curves are superimposed on the histograms.The figures illustrate the distribution of errors, h ∆ , in inches for father's height and housefly wing length.All plots from the PM and MLE almost a "perfect" match as there is no heavy tailing.This could be attributed to absence of outliers in the datasets and also errors originating from normally distributed datasets.
Better diagnostic methods for checking deviations from a normal distribution are the so called quantile-quantile (Q-Q) plots, see [26].Quantiles of empirical distribution functions are plotted against the hypothetical quantiles of the Gaussian distribution.For one to conclude that indeed the actual underlying distribution is Gaussian, the Q-Q plot should be able to yield a straight line.Observing Figure 3 and Figure 4 which are based on large samples, that is, 1000 n = , there is no noticeable deviation from the straight line, which indicates that the error distribution is Gaussian as expected.However, in Figure 5 and Figure 6, we notice a significant deviation from the assumption of normality, this could be that these errors are generated from a small sample, n = 100.This could call for further investigation on the performance of the PM on small samples, but the question would be why is it that the standard MLE approach as well produces a poor plot?

Accuracy Assessment
When normal distribution for the parent dataset, and no outliers are exhibited as shown in Section 6, then the accuracy measures in Table 4 can be adopted.The accuracy measures in the normal distribution fram-work are defines as follows In Table 4, ∆h i denotes the difference between the observed and estimated value.Where i is the sampled data point, and n is the sample size.Assuming that the generated errors follow a normal distribution as established in Section 6, see Figures 1-6.Then from the theory of errors, it is well known that 68.3% of data will fall within the interval µ σ ± , where μ is the systematic mean error and σ is the standard deviation, [26].When we require to measure accuracy based on the 95% confidence level, then the interval will be 1.96 µ σ ± ⋅ .In this work we have employed and compared the methods described in Table 4 since the underlying errors from all the estimation methods assumed a normal distribution.Both the histogram and Q-Q plots have justified the assumption of normally.

Accuracy Results
Results generated by the standard measures of Table 4, are presented.We note that application of the standard measures impies that the generated errors follow a normal distribution as established in Section 6. Tables 5-8 show results for PM and MLE.
Tables 5-8 show the accuracy measures considered to evaluate the performance of the PM and the MLE, on two datasets of different sizes, that is 1000 n = and 100 n = , for the father's height and housefly wing lengths The PM produces smaller standard deviations as compared to the MLE on the small sample, for the large samples, the methods produce the same standard deviation, which could be interpreted as equal performance of the methods on large samples, though this cannot be generalised subject to further research.

Results and Discussion
The PM has been compared with some of the current methods in use that is, MM and MLE.These were preferred Table 4. Measurement of accuracy for statistical methods presenting normally distributed errors.

Measure Formulae
Root mean square error due to their computation lure and availability inmost of the statistical Software packages.Secondly the MLE method is more preferred and widely applied due to its good asymptotic properties.Three standard datasets from [24] and [25] have been used.However, on further tests only two datasets were considered, that is the height of the father and housefly wing lengths; this was to decrease on the intensity of the work to be presented.Section 5, contains the computation results for PM, MM and MLE.Tables 1-3 illustrate and show the parameter estimates obtained from the methods.It is observed that all the approaches give comparable results with the "true" or required values of the parameters given in the captions of the respective tables.
In order to use standard techniques that are employed for accuracy measurements, the errors have been tested for normality, see Section 6. Statistical visualization techniques were preferred to other statistical tests which are said to be sensitive in the presence of outliers and large datasets [26].Figure 1 and Figure 2 illustrate the histograms of the errors and clearly show a normal distribution since more of the information contained in the errors lies under the normal curve that is superimposed.The Q-Q plots in Figures 3-6 have also been used as a measure of testing for normality of the generated errors.It is observed that there are almost straight lines produced in all the cases.This implies that the actual distribution of the generated errors is indeed normally distributed.

Conclusion
This research laid out an easy approach to computing the parameters of a univariate normal distribution which is an important distribution in applied statistics and in most of the science disciplines.It serves as a platform or bench mark for studying more complex distributions, like the mixture of two or more Gaussians, mixture of exponentials and other continuous distributions which are very useful in pattern recognition, machine learning and unsupervised learning.The simplicity of the approach is time saving in computation and guarantees convergence to the required values, this is not usually the case in the conventional analytical and numerical methods as these may fail or take a long time to converge depending on the quality of initial approximations.
or true parameters.Another dataset on housefly wing lengths[25] is also simulated and it's mean and standard deviation computed, that is,

Figure 3 .
Figure 3. Normal Q-Q plot for the error (∆h) distribution from MLE on the father's height, n = 1000.

Figure 4 .
Figure 4. Normal Q-Q plot for the error (∆h) distribution from PM on the father's height, n = 1000.

Figure 5 .
Figure 5. Normal Q-Q plot for the error (∆h) distribution from MLE on the housefly wing lengths, n = 100.

Figure 6 .
Figure 6.Normal Q-Q plot for the error (∆h) distribution from PM on the housefly wing lengths, n = 100.

Table 1 .
. Height of the father, required parameters,

Table 5 .
Measure of accuracy for the MLM approach; father's height, n = 1000.

Table 6 .
Measure of accuracy for the PM approach; father's height n = 1000.

Table 7 .
Measure of accuracy for the MLM approach; housefly wing lengths n = 100.

Table 8 .
Measure of accuracy for the PM approach on the housefly wing lengths (n = 100).