^{1}

^{*}

^{2}

^{*}

Sample size is very important in statistical research because it is not too small or too large. Given significant level α, the sample size is calculated based on the z-value and pre-defined error. Such error is defined based on the previous experiment or other study or it can be determined subjectively by specialist, which may cause incorrect estimation. Therefore, this research proposes an objective method to estimate the sample size without pre-defining the error. Given an available sample X = {
*X*
_{1},
* X*
_{2}, ...,
*X*
_{n}}, the error is calculated via the iterative process in which sample X is re-sampled many times. Moreover, after the sample size is estimated completely, it can be used to collect a new sample in order to estimate new sample size and so on.

Given a sample of size n, ^{2}, it implies that the sample mean

is also normally distributed with mean μ and known variance σ^{2}/n. Given a confident level 100(1 ? α) percentage, the confident interval [

where Z_{α}_{/2} which is the z-value at significant level α is the upper 100α/2 percentage point of standard normal distribution. Let E is the absolute deviation between the sample mean

The value E is also called estimated error, which is always less than or equal to

There is a requirement that how to estimate the sample size n so as to the deviation

(Readers can refer to [

The formula of choice of sample size is re-written:

The z-value Z_{α}_{/2} is totally determined and so what we need to do is to calculate the variance σ^{2} and the error

is reduced as below:

where

Fixing variance σ(i)^{2} and mean μ(i), we have:

Suppose there is an available ^{th} iteration. This is a form of bootstrap sampling with replacement.

Note that Y_{ij} (s) are taken randomly from with replacement.

Let M(i) is the sample mean of

where,

We assume that the sample mean M(i) is approximated to the sample mean

Note that

Summing accumulatively

Dividing both sides of formula above by

Let

It is easy to infer that Δ^{2} is sample variance of the set of sample means M(i) (s).

Therefore, the formula for calculating variable h with fixed variance σ^{2} is:

Because the theoretical variance σ^{2} is not defined, it is approximated by sample variance s^{2} of sample

where

Substituting s^{2} into the formula for calculating variable h, we have:

Finally, the sample size n is calculated by following formula:

It is necessary to have an example for illustrating the proposed formula to calculate sample size without pre-defined error. Given 10-element sample _{1} = 8.05, X_{2} = 9.60, X_{3} = 2.98, X_{4} = ?20.26, X_{5} = ?6.52, X_{6} = ?10.85, X_{7} = 8.14, X_{8} = 26.48, X_{9} = 10.57, X_{10} = 2.26}, we will estimate the optimal size of the next sample based on

The sample variance Δ^{2} of sample means M(i) (s) is:

The mean

The sample variance s^{2} of sample

Given the confident level 95% (α = 0.05), it is easy to calculate the optimal sample size as follows:

According to results from many experiments, if the origin sample (previous sample

X_{1} | X_{2} | X_{3} | X_{4} | X_{5} | X_{6} | X_{7} | X_{8} | X_{9} | X_{10} | M(i) | |
---|---|---|---|---|---|---|---|---|---|---|---|

2.26 | 10.57 | 2.26 | 2.98 | 26.48 | 10.57 | 2.98 | ?20.26 | 26.48 | ?10.85 | 5.35 | |

10.57 | ?20.26 | 26.48 | 8.05 | 2.98 | 26.48 | ?20.26 | 9.6 | 2.26 | 8.05 | 5.4 | |

?10.85 | ?20.26 | 10.57 | 26.48 | ?10.85 | 2.26 | ?10.85 | ?20.26 | ?10.85 | 9.60 | ?3.5 | |

?6.52 | 2.98 | 9.60 | ?6.52 | 2.98 | 26.48 | 9.60 | ?20.26 | 10.57 | 2.98 | 3.19 | |

10.57 | ?20.26 | 2.26 | 2.98 | ?6.52 | 2.98 | ?6.52 | ?6.52 | 8.14 | ?20.26 | ?3.31 | |

10.57 | ?6.52 | 26.48 | 2.98 | 2.26 | 8.05 | 9.6 | 8.14 | ?6.52 | 8.05 | 6.31 | |

9.60 | ?6.52 | ?10.85 | 9.60 | ?20.26 | ?20.26 | 2.98 | 8.14 | 2.26 | 8.05 | ?1.73 | |

9.60 | 26.48 | 2.98 | ?20.26 | 26.48 | 8.14 | 8.14 | 8.05 | 10.57 | ?6.52 | 7.37 | |

8.05 | ?10.85 | 8.14 | 8.05 | ?20.26 | 10.57 | 9.60 | 10.57 | 2.98 | 26.48 | 5.33 | |

8.14 | 10.57 | 2.98 | 26.48 | ?20.26 | 8.14 | 8.05 | ?20.26 | 2.26 | ?6.52 | 1.96 |

I invent this method when discussing with the co-author Dr. Hang Ho about choice of sample size. At that time, I make the simile that the ideology of this method is similar to the problem “hen and egg”. Regardless that hen exists before or egg exists before, you feed hen to lay new egg and incubate such egg to hatch new hen. Therefore, given an available random sample is used to estimate the sample size and such sample size is applied to collect new random sample; after that new sample size is estimated based on the new random sample and so on. Now, we analyze the formula for estimating sample size:

The variance s^{2} in numerator expresses the coherent variation of data and the value Δ^{2} in denominator specifies the variation of disturbed data (data is disturbed for many times). It means that Δ^{2} specifies the variation of change (or variation of variation). The smaller the value Δ^{2} is, the more precise the variance s^{2} is and so the sample size is much proportional to s^{2}. In other words, the small Δ^{2} makes an increase in sample size. Ratio

approaches 1 when m approaches +∞ and so, the larger the number of iterations is, the more precise the sample size is. If m is small, the sample size tendentiously increases, but the balance is established because Δ^{2} will increase if m is small, and as known the large Δ^{2} makes decrease in sample size. But why the small m makes an increase in Δ^{2} and otherwise? As known the number of iterations m specifies the variation of disturbed data. The larger the number m is, the much more the data is disturbed and so it is easier for the tendency that data is reverted in equilibrium, which causes the decrease in Δ^{2}. In other words, the small m makes increase in Δ^{2}.

LocNguyen,HangHo, (2015) A Proposed Method for Choice of Sample Size without Pre-Defining Error. Journal of Data Analysis and Information Processing,03,163-167. doi: 10.4236/jdaip.2015.34016