^{1}

^{*}

^{1}

Draxler and Zessin [1] derived the power function for a class of conditional tests of assumptions of a psychometric model known as the Rasch model and suggested an MCMC approach developed by Verhelst [2] for the numerical approximation of the power of the tests. In this contribution, the precision of the Verhelst approach is investigated and compared with an exact sampling procedure proposed by Miller and Harrison [3] for which the discrete probability distribution to be sampled from is exactly known. Results show no substantial differences between the two numerical procedures and quite accurate power computations. Regarding the question of computing time the Verhelst approach will have to be considered much more efficient.

Draxler and Zessin [

Verhelst’s MCMC technique may be considered as the most promising in terms of handling practically realistic cases in psychometric research (regarding sample sizes and item numbers) and computing times. On the basis of the stationary distribution of the Markov chain the conditional probability distributions of interest can be computed to obtain size, p value, and power of the tests. The stationary distribution of the chain can be arbitrarily well approximated. Unlike the MCMC technique, the exact sampling approach is based on an analytical solution of a combinatorial problem which arises as a consequence of the conditioning involved in the procedure. This solution enables the exact determination of the conditional probability distributions of interest but, nonetheless, computing them in practically relevant cases in psychometrics is still too time-consuming so that one still relies on random sampling from the known exact distributions.

This paper essentially deals with two questions. The first one generally concerns the precision of power computations of conditional tests as introduced by Draxler and Zessin [

Consider a typical psychometric problem that a sample of n persons responds to k items. Let Y i j ∈ { 0,1 } denote the binary response of person i = 1 , ... , n to item j = 1 , ... , k and x i ∈ { 0,1 } be a fixed covariate, i.e. a known characteristic of the persons like gender. The covariate may be also treated as a random variable. Examples are quoted by Draxler and Zessin [

P ( Y i j = y i j | x i ) ∝ exp [ y i j ( τ i + β j + x i δ j ) ] , (1)

with τ i ∈ ℝ as a person parameter, β j ∈ ℝ as an item parameter, and δ j ∈ ℝ as the conditional effect of the item given the covariate. Assuming local independence of the Ys the joint distribution of all binary responses is obtained by

P ( Y = y | x ) = ∏ i = 1 n ∏ j = 1 k P ( Y i j = y i j | x i ) , (2)

with Y as an n × k matrix-valued random variable containing the Ys arranged in n rows and k columns and with x ′ = ( x 1 ,..., x n ) . Factorizing this product immediately shows that the statistics R i = ∑ j Y i j , S j = ∑ i Y i j , and T j = ∑ i x i Y i j are sufficient for τ i , β j , and δ j . Note that the former two sufficient statistics are the row and column sums of the matrix of responses Y . Suppose the interest lies in making inferences about the δs where the τs and βs are treated as nuisance parameters. One way of eliminating the influence of nuisance parameters is conditioning on the observed values of their sufficient statistics. Proceeding in this way one obtains the conditional distribution

P ( T 1 = t 1 ,..., T k − 1 = t k − 1 | R = r , S = s , x ) = ∑ T exp ( ∑ j = 1 k t j δ j ) ∑ Ω exp ( ∑ j = 1 k t j δ j ) , (3)

with R ′ = ( R 1 ,..., R n ) , S ′ = ( S 1 ,..., S k ) . For identifiability let δ k = 0 . Note that all information needed for making inferences about the δs is provided by the T statistics because of their sufficiency property. Hence, the original observations, the Ys, can be represented in condensed form. It suffices to consider the joint distribution of the Ts as a function of the Ys. Note also that at least one of the Ts is not free conditional on R = r , S = s . The denominator on the right side of (3) is a normalizing constant requiring a summation over the set Ω. The latter is the set of all possible n × k matrices given the condition R = r , S = s . In other words, this is the set of all matrices with given, fixed row and columns sums. The subset T ⊆ Ω contains those n × k matrices additionally satisfying T 1 = t 1 ,..., T k = t k .

Suppose the interest lies in testing the composite hypothesis δ 1 = ... = δ k − 1 = 0 against the alternative ( δ 1 ,..., δ k − 1 ) ′ = c k − 1 , where c k − 1 is any ( k − 1 )-dimensional column vector of constants except a ( k − 1 )-dimensional column vector of zeros, i.e. at least one δ is different from 0. Note that both hypotheses would be termed simple if the δs were the only parameters involved in the problem. The restriction on the parameter space of the free δs given by the hypothesis to be tested yields the Rasch model as a special case which assumes the Ys independent of the covariate. In other words, the hypothesis to be tested is equivalent with the well-known scenario of testing the equality of the item parameters of the Rasch model between two groups of persons. In psychometric literature, such an analysis is known as testing the invariance of the item parameters or, more general, as investigating differential item functioning (DIF). Moreover, if the covariate vector x divides the sample of persons according to their scores, i.e. the row sums of Y , yielding one group of persons with low score and another with high score the hypothesis δ 1 = ... = δ k − 1 = 0 will be equivalent to the assumption of equal item discriminations. This is a basic assumption of the Rasch model which has to be tested in almost every application. Thus, the conditional procedure discussed can be considered to have practical potential.

A most powerful test and its power function are obtained as follows. Let α denote the probability of the error of the first kind and C the critical region of the test. Consider the ( k − 1 )-dimensional sufficient statistic T 1 ,..., T k − 1 for δ 1 ,..., δ k − 1 to serve as the test statistic. Denote by P_{0} the conditional distribution given by (3) evaluated at δ 1 = ... = δ k − 1 = 0 (the hypothesis to be tested) and by P_{1} the respective distribution evaluated at ( δ 1 ,..., δ k − 1 ) ′ = c k − 1 (the alternative). According to the fundamental lemma of Neyman and Pearson [_{1}/P_{0}. Eventually, the power function of a critical region C chosen this way is obtained by ∑ C P 1 . Note that Fisher’s well-known exact test is obtained as a special case by setting k = 2, R = 1 n = ( R 1 = 1 , ... , R n = 1 ) ′ . In this case, (3) becomes the one dimensional non-central hypergeometric distribution and, under the hypothesis to be tested, the (central) hypergeometric distribution.

For further conditional tests and their power functions as well as an application to real-world data from educational research one is referred to Draxler and Zessin [

To compute the conditional distribution given by (3) one obviously has to determine the cardinalities of the two sets T and Ω. Counting the total number of matrices in Ω is not an easy task. Miller and Harrison [

Miller and Harrsion [

A first natural question continuing, supplementing, and enhancing the work of Draxler and Zessin [

A second question focuses on potential differences with respect to power computations between the exact sampling [

The chosen column sums for each person number condition are illustrated in

Item | Person number | |||||
---|---|---|---|---|---|---|

10 | 30 | 60 | 90 | 120 | 150 | |

1 | 8 | 23 | 60 | 73 | 95 | 117 |

2 | 6 | 17 | 28 | 51 | 69 | 89 |

3 | 3 | 12 | 16 | 34 | 41 | 53 |

4 | 2 | 7 | 12 | 15 | 19 | 24 |

Person Score | Person number | |||||
---|---|---|---|---|---|---|

10 | 30 | 60 | 90 | 120 | 150 | |

1 | 3 | 8 | 17 | 26 | 38 | 46 |

2 | 5 | 15 | 30 | 45 | 60 | 75 |

3 | 2 | 7 | 13 | 19 | 22 | 29 |

approaches is concerned. One can observe only the trivial fact of increasing power with increasing absolute value of the DIF parameter regardless of the sampling technique used. Moreover, an absolute value of roughly at least 0.5 or 0.6 of the DIF parameter may be considered meaningful in most practical contexts in psychometric research. For a deeper discussion on the practical meaning of a deviation from the hypothesis to be tested in a broader context of power and sample size issues one is referred to Draxler [

The observed results with respect to question 1 are summarized as follows.

The diagram on the right side concerns the case assuming a difficult item (low item score) affected by DIF. As can be seen, the effect of the DIF parameter on the observed power depends on which item is assumed to be affected by DIF. These results are expected from theory. To explain, consider the diagram on the right side of

Matrix | Min | 2.5% Quan. | 25% Quan. | Median | Mean | 75% Quan. | 97.5% Quan. | Max | SD |
---|---|---|---|---|---|---|---|---|---|

10 × 25 | 0.12 | 0.13 | 0.14 | 0.14 | 0.14 | 0.15 | 0.16 | 0.17 | 0.007 |

30 × 25 | 0.37 | 0.39 | 0.40 | 0.41 | 0.41 | 0.42 | 0.43 | 0.47 | 0.011 |

90 × 25 | 0.28 | 0.31 | 0.47 | 0.49 | 0.47 | 0.50 | 0.51 | 0.53 | 0.051 |

150 × 25 | 0.38 | 0.42 | 0.44 | 0.46 | 0.46 | 0.48 | 0.51 | 0.59 | 0.024 |

250 × 25 | 0.53 | 0.56 | 0.60 | 0.62 | 0.62 | 0.64 | 0.68 | 0.76 | 0.031 |

350 × 25 | 0.61 | 0.68 | 0.73 | 0.76 | 0.76 | 0.78 | 0.82 | 0.97 | 0.035 |

500 × 25 | 0.75 | 0.80 | 0.84 | 0.86 | 0.86 | 0.88 | 0.92 | 0.97 | 0.031 |

with high score. In these cases, the effect on the power computations is exactly the other way round i.e. positive values of the DIF parameter yielding smaller power on average even though not so obvious as on the right side of

Generally, as seen in all diagrams the observed standard deviations of the power values are quite small. They are higher for those scenarios yielding a mean power around 0.5 which is also obvious from theory. Thus, the computations may be considered quite stable. Further analyses have been carried out decreasing the number of samples drawn from Ω to 3000 and even to 1000 without considerably increasing the standard deviations of the observed power values.

Moreover, the burn in phase of the MCMC algorithm has been varied from 300 up to 8000 as well as three different values of the so-called step parameter have been considered, i.e. 16, 32, and 50. The latter is simply to avoid dependence of the matrices to be drawn (states in the Markov chain), e.g. 16 means that only every 16th matrix is selected.

Finally,

a smaller sample size but, nonetheless, this result is quite understandable. In the presented examples the DIF parameter refers to item 2. In

In cases of smaller sample sizes, i.e. up to a few hundred, conditional testing of assumptions of the Rasch model [^{2} tests usually applied in this context (e.g. [^{2} tests the conditional procedure treated in this paper is a one-sided hypothesis test which generally has higher power than its two sided counterpart.

The power function of the conditional test can be well approximated by numerical procedures and random sampling techniques, respectively. The results of this work hint at quite accurate and stable computations. Nonetheless, many more scenarios could be investigated. Particularly, scenarios assuming more extreme values of the person (row sums) as well as the item scores (column sums) than have been analyzed in this contribution. In such cases higher variances and less precision of the power computations have to be expected.

Probably the most important result of this contribution for the practice of psychometric data analysis is that the exact sampling approach based on an exact counting algorithm [

The authors declare no conflicts of interest regarding the publication of this paper.

Draxler, C. and Nolte, J.P. (2018) Computational Precision of the Power Function for Conditional Tests of Assumptions of the Rasch Model. Open Journal of Statistics, 8, 873-884. https://doi.org/10.4236/ojs.2018.86058