Statistical Comparison of Eight Alternative Methods for the Analysis of Paired Sample Data with Applications

This paper presents and statistically compares eight alternative methods that could possibly be used in the analysis of matched or paired sample data, including situations in which the data being analyzed satisfy the usual assumptions of normality and continuity necessary for the use of parametric tests as well as when the data are numeric and non-numeric measurements on as low as the ordinal scale. It is shown that only the modified sign tests based on only the raw observations or their assigned ranks may be used with non numeric measurement on the ordinal scale. If the ordinary sign test, the Wilcoxon signed rank sum test and the modified sign tests can be equally used in data analysis, then it is shown that the modified sign tests are more efficient and hence more powerful than the ordinary sign tests because the two test statistics are intrinsically and structurally modified for the possible presence of tied observations between the sampled populations for both using raw and simulated data. Of all the non-parametric methods presented, the modified Wilcoxon’s signed rank sum test when applicable is the most efficient and powerful, followed in this order by the modified sign test by ranks and the modified sign test based on only raw scores for raw data while using simulation, modified sign test by ranks is the most efficient and powerful, followed in this order by modified Wilcoxon’s signed rank sum test and modified sign test. Each of the non-parametric methods presented can be easily modified and re-specified for use with one sample data by simply re-designating the observations from one of the sampled populations to correspond with a hypothesized value of some measure of central tendency. The methods are illustrated with some raw data as well as simulated data and their relative performances compared.


Introduction
A clinician, medical researcher or research scientist may expose a random sample of subjects to some treatment or drug at two points in time or space, or expose two random samples of subjects matched on several characteristics, one to an active or new drug or treatment, and the other to a diluent, inactive placebo or control treatment and research interest is in comparing the responses after the exposure.A dietician may be interested in studying a random sample of subjects, treated with a regimen of diet or exercises and in measuring their responses in terms of the differences between body weights before and after the experiences.A panel of judges or examiners may be interested in comparing the performances of candidates in two tests or examinations taken at two points in time or space.A psychologist or psychiatrist may wish to compare the performance of two matched samples of subjects exposed to two experimental conditions.A beautician, marketing consultant or advertising agent, product promoter or investor may wish to compare the performance of a line of products in terms of their acceptability or sales, at two different points in time or space, etc.
In each of these and similar situations, the researcher may wish to select a statistical method often used in the analysis of matched or paired samples that is relatively efficient and powerful in terms of being able to more readily reject a false null hypothesis and accept a true one and hence be able to reach more reliable conclusions.
This paper presents, discusses and compares eight alternative statistical methods that may be used for this purpose.may or may not be measurements that are continuous; normally distributed; numeric data; independent; but they should be measurements on at least the ordinal scale.Interest is in statistically comparing the following eight methods for analyzing paired samples.They include paired sample t test, ordinary sign test for two samples, exact binomial test, normal approximation to the ordinary two sample sign test, unmodified Wilcoxon signed rank sum test for paired samples, modified sign test, modified sign test by ranks, and modified Wilcoxon signed rank sum test for paired samples.
All modifications or adjustments of test statistics are aimed at adjusting and making provisions for the possibility of any ties, that is tied observations between sampled populations and hence obviate the need to require the sampled populations to be continuous or even numeric.

Required Assumptions
Populations continuous and normally distributed; or sample size sufficiently large [1].This method can only be used for data that satisfy the required assumptions and are measurements on at least the interval scale.Let for i = 1, 2, •••, n.
Be the variance of the differences.Let: be the standard deviation of the mean difference d where d 0 is any real number including zero.The test statistic is [1].
which has a t distribution with n -1 degrees of freedom .We reject H 0 at the  level of significance if 1 /2; 1 Otherwise H 0 is accepted.

Required Assumptions
Populations continuous and numeric measurements.
The test statistic is based on the signs (+ sign, or -sign) of the differences between members of the paired sample observations.
Thus let Note that Equation (2.1) assumes that there are no ties that is i cannot be 0 and hence does not make any provisions for this possibility.Let   It is easily shown [2] that The test statistics for the null hypothesis of equal population medians (H 0 : which has the chi-square distribution with 1 degree of freedom for sufficiently large n.
Otherwise H 0 is accepted.Note that in particular under the null hypothesis usually tested in the sign test (H 0 : which has the chi-square distribution with 1 degree of freedom, if n is sufficiently large.

Exact Binomial Test
Assumption: Data is discrete As in 2 above, let Under the null hypothesis of equal population medians, we would expect that 1's or +'s are as likely to occur as -1's or -'s.In other words we would expect that   Therefore too many of 1's (or + signs) or -1's (orsigns) will lead to a rejection of the null hypothesis.
If we let X be the number of plus signs (or minus signs, depending for simplicity on which one is smaller).Then the probability of obtaining at most X = x plus signs is - [3] calculated from the binomial equation where n is the effective sample size (number of + signs plus number of minus signs, excluding all zero).In particular under the null hypothesis usually tested in paired sample tests (H 0 :  =  0 = 0.50), the null hypothesis of equal population medians is rejected at the  level of significance if where  is the specified level of significance.
If the alternative hypothesis suggests a one-sided test, then H 0 is rejected at the  level of significance if otherwise H 0 is accepted.Note that the exact binomial test leads to essentially the same conclusion as the ordinary sign test presented in Section 2.2 above.

Normal Approximation to the Ordinary Two Sample Sign Test
Assumption: Data is discreet.
The binomial test is usually used in the ordinary sign test to calculate the exact probability that is sufficiently satisfactory for most sample sizes encountered in practice.
In general where as in the usual sign test, the null hypothesis is Then using the notations of Section 2.2 the test statistic becomes which has approximately the chi-square distribution with 1 degree of freedom.However, for sufficiently large n the normal approximation can be used which then becomes Otherwise H 0 is accepted.

Unmodified Wilcoxon Signed Rank Sum Test for Paired Samples
This test is similar to the ordinary sign test except that it is based on the ranks of the absolute differences, /d i /, of the differences, d i between paired observations instead of only on the signs of the difference between the ith pair of sample observations, for i = 1, 2, …, n Let   where "r"   d i is the rank assigned to i , the absolute value of the differences d i = x i1 − x i2 without loss of generality we may assume that r d The unmodified Wilcoxon's sign rank sum test statistic for the general null hypothesis of constant difference between population medians (H 0 : which under H 0 has approximately the chi-square distribution with 1 degree of freedom for sufficiently large n. H 0 is rejected at the  level of significance if Equation (2.8) is satisfied, otherwise H 0 is accepted.
In particular, under the null hypothesis usually tested in the Wilcoxon's signed rank sum test (H 0 :  =  0 = 0.50), Equation (5.4) becomes which has approximately a chi-square distribution with 1 degree of freedom for sufficiently large n.

Modified Sign Test
The ordinary sign test is modified for the possibility of tied observations between the two matched or paired observations and to also provide for the possibility that the ordinal scale data being analyzed may be non-numeric; we re-specify u i as follows; Let where (6.3)Let (6.4) It can also be easily shown that the sample estimates of and  are respectively.
where f + , f 0 and f − are respectively the number of 1's, 0's and −1's in the frequency distribution of the n values of these numbers in The test statistic for the null hypothesis that the population medians differ by some constant which under H 0 has approximately the chi-square distribution with 1 degree of freedom for sufficiently large n.
H 0 is rejected at the  level of significance if Equation (2.8) is satisfied otherwise H 0 is accepted.In particular the test statistic for the null hypothesis usually tested for paired samples under which H 0 has approximately the chi-square distribution with 1 degree of freedom for sufficiently large n.
As noted above this method may also be used with ordinal scale data that are non-numeric measurements.

Modified Paired Sample Test by Ranks
A rather noval and relatively more efficient and hence more powerful alternative method also exists.This method is however similar to the one discussed in six above and yields similar but often more powerful results because the paired raw scores or observations are first changed into ranks before use.Thus, let x i1 be assigned the rank r i1 = k + 1, k or k − 1 if x i1 is a higher or larger score or observation, the same or equal score, lower or smaller score than x i2 .Similarly, let x i2 be assigned the rank where .1 and .2 are respectively the sums of the ranks assigned to sample observations from populations X 1 and X 2 .
where t is the number of tied observations between populations X 1 and X 2 and .1 and .2 are respectively the sums of squares of the ranks assigned to sample observations from populations X 1 and X 2 .
To test the general null hypothesis that the medians of the two sampled populations differ by some constant, that is the null hypothesis that the difference between the proportions of, or the probability that observations drawn from population X 1 are on the average higher (greater) than observations drawn from population X 2 and the probability that they are on the average lower (smaller) is some constant or notationally 0 0 which has approximately the chi-square distribution with 1 degree of freedom for sufficiently large "n".
The null hypothesis H 0 is rejected at the  level of sig- In particular, under the null hypothesis usually tested in paired sample problems These results are unaffected by any chosen real valued "k".However although the results obtained remain unchanged, it is often computationally easier and quicker if "k" is an integer.
The methods of Sections for 2.6 and 2.7 could be used alternatively to analyze the same types of data, although method 7 because it is based on ranks, is often more powerful than method 6 based on raw scores.
The two methods are nevertheless each more powerful than the unmodified Wilcoxon's signed rank sum test, because unlike the later, the former test statistics intrinsically adjust or make provisions for the possible presence of ties in the data.To show this, we note that the relative efficiency of W to for all n ≥ 3 and 0  0  < 1 showing that W is more efficient and hence more powerful than T + except for the very rare cases in which we have only one or two paired samples.

Modified Wilcoxon Signed Rank Sum Test for Paired Samples
This method is designed to correct for the shortfalls of the regular Wilcoxon Signed Rank Sum test T + that does not intrinsically provide for the possibility of ties between the sampled populations.To do this, assuming d i is as defined in Section 2.5, we let where where   r d i  i as defined in Section 2.5, the rank assigned to the absolute difference, i d . Note that (8.5) where T + and T − are respectively the sums of the ranks of absolute differences with positive and negative signs.It is easily shown [5] that The sample estimates of , and where f + , f 0 , and f -are respectively the number of 1's 0's and −1's in the frequency distribution of the n values of these numbers in The corresponding test statistic for the general null hypothesis.
which under H 0 has approximately the chi-square distribution with 1 degree of freedom for sufficiently large n.
In particular, the test statistic for the null hypothesis usually tested in paired sample problems reduces Equation (8.8) to simply Note that the test statistic of Equation (5.4) could equivalently be expressed as while Equation (8.7) could equivalently be expressed as The relative efficiency of the test statistic given in Equation (8.11) for the modified Wilcoxon test statistic to Equation (8.10) for its unmodified counterpart may therefore be determined by comparing the variances of 4T + and 2T. As ods are generally more efficient and hence more powerful than the unmodified methods.Their specifications also enable the researcher, policy maker or implementer determine or estimate the proportions of, or the probabilities that a randomly selected subject performs better, as well as, or worse at a given point in time or space than at another given point in time or space or under one condition compared with another condition, which are additional advantage that provides further useful information that may guide the introduction of any desired interventionist remedial measures.
Hence RE (8.13)For all .Showing that the modified Wilcoxon test is more powerful than the unmodified test for all that is whenever there are tied observations between the sampled populations.
Finally, as an anecdote and for completeness it is necessary and instructive to add that correlation models may also be used to study the degree of association between paired or matched samples.
These include the Pearson's moment correlation coefficient used when the data being analysed are continuous and normally distributed and the Spearman's ranked correlation coefficient used when the data being analysed are measurements on at least the ordinal scale.

Application
We here illustrate the application of these eight alternative methods for the analysis of paired (matched) sample data with two data sets as well as using simulation.The first are ordinal non-numeric score and the second are numeric scores as follows: The corresponding test statistics which are fairly familiar are also available for use.
Again each of the proposed methods may be appropriately modified and used to analyse one sample data simply by setting values or scores from one of the sampled populations equal to a hypothesized value of some measure of central tendency.
1) A health insurance company every year assesses the vital signs of its clients for the purpose of determining the annual insurance premium payable.In this process the company scores its clients from A + (excellent health) through C (fair health) down to F (poorest health-fail), persons with excellent health pay the lowest annual health premium while clients with very poor score pay the highest annual premium.A sample of the scores earned by a random sample of 15 of the clients of this health insurance company during the past two consecutive years are as follows: An important advantage of the modified methods over the unmodified ones is that each of them intrinsically and structurally makes provisions for the possibility of tied observations in the sampled populations and hence makes it unnecessary to require the populations to be continuous.By making use of the information on all the observations instead of only on the non-zero ones, the modified meth- 2) A random sample of members of each of 15 newly married couples (husband and wife) are asked to state their preferred family size (desired number of children) with the following results.As noted above the data of example (1) being ordinal non-numeric data may only be analysed using modified sign test, using either raw scores (method 6) or ranks (method 7) as shown in Table 1.
Interest is to determine whether the median scores by clients are the same for the two years, that is if clients are likely to pay equal insurance premium for each of the two years.To do this using method 6 we have from column 4 of Table 1 (u i (6)) that f + = 10, f 0 = 2; f − = 3 and w = 10 -3 = 7.Also  which with 14 degrees of freedom is not statistically significant; showing that husbands and wives tend to prefer the same family sizes, that is, desire the same family sizes or number of children.Analysis using the ordinary sign test, exact binomial test and its normal approximation are presented in Table 3.
Now note that the number of ties (0) is 5. Hence the effective sample size is 15 -5 = 10.Also the number of (P-value = 0.0578) which with 1 degree of freedom is not statistically significant.

Exact Binomial Test
An equivalent approach to th dinary sign test for these e or Since P = 0.0547 > 0.05, we do not reject the null hypothesis of equal population medians.That is with the exact method we may still conclude that newly married husbands and wives do not differ in their preferred or desired family sizes.

Normal Approximation
The normal approximation to the exact binomial test for the present data again with x = 2; n = 10 and  =  0 = 0.5 is, with correction for continuity which is also not statistically significant.Analysis of example 2 using Wilcoxons signed rank sum test is presented in Table 4.

Unmodified Wilcoxons Signed Rank Sum Test
The sum of the ranks of absolute differences with positive signs ignoring zero differences is  (P-value = 0.0294) or which with 1 degree of freedom is highly statistically significant now indicating that newly married husbands and wives do differ in their preferred or desired family sizes.Now to apply the modified sign test by ranks to same data we have from column 10 of Table 5 that W = 4 -16 = -12; Also from column 11 of the table we have that which is highly significant.Note that from the P-values and the associated chisquare values that the ordinary sign tests and the unmodified Wilcoxons sign rank sum test are likely to accept a false null hypothesis (Type II error) more frequently than the two type of modified signed tests (methods 6 and 7).The relative efficiency of the modified signed test w to the unmodified Wilcoxons signed rank sum test T + for the present data is 96.25 : 12.681 7.59 showing that at least for the present data the modified sign tests are much more powerful than the unmodified Wilcoxon signed rank sum test.
The problem with the ordinary sign test and the unmodified Wilcoxon signed rank sum test is that non of the two adjusts or modifies the test statistics for the possible presence of tied observations between sampled populations, and simply ignores these ties if they occur, a procedure that because it uses less information tends to compromise the associated power of the test.Now reanalyzing the data of example 2 using the modified Wilcoxon signed rank sum test of Section 2.8, we have from column 7 of Table 4 that T = 19 -86 = -67.Also f + = 2, f 0 = 5; f -= 8 which with 1 degree of freedom is highly statistically significant now indicating that newly married husbands and wives differ significantly in their desired family size preferences.
Thus the modified Wilcoxon signed rank sum test is here shown at least for the present data to be the most powerful of the six non parametric statistical methods presented here for the analysis of paired or matched sample data.This is because this method uses all available information on the data being analyzed including direction and magnitude and also adjusts, that is makes provision, for the presence of any possible tied observations between the sampled populations.
Using simulation, the result is as shown in Table 6.
From Table 6 we have that The test statistic for the null hypothesis of equal population medians (H 0 : which with 14 degrees of freedom is statistically significant; showing that wife and husband differ in their choices.Analysis using the ordinary sign test, exact binomial test and its normal approximation are presented in Table 7.

Exact Binomial Test
An equivalent approach to the ordinary sign test for these data is the exact binomial test with x = 2, n = 14, and  =  0 = 0.5.Hence Since P = 0.0065 < 0.05, we therefore reject the null hypothesis of equal population medians.That is with the exact method we may still conclude that husbands and wives differ in their preferences.

Normal Approximation
The normal approximation to the exact binomial test for the present data again with x = 2; n = 14 and  =  0 = 0.5 is, with correction for continuity   which is also statistically significant.Analysis of simulated data using Wilcoxons signed rank sum test is presented in Table 8.

Unmodified Wilcoxons Signed Rank Sum Test
The sum of the ranks of absolute differences with positive signs ignoring zero differences is 10.5 + 1.5 = 12 = T +   showing that also for the simulated data the modified sign tests are much more powerful than the unmodified Wilcoxon signed rank sum test.
The problem with the ordinary sign test and the unmodified Wilcoxon signed rank sum test is that non of the two adjusts or modifies the test statistics for the possible presence of tied observations between sampled populations, and simply ignores these ties if they occur, a procedure that because it uses less information tends to compromise the associated power of the test.Now reanalyzing the simulated data using the modified Wilcoxon signed rank sum test of Section 2.8, we have from column 7 of Table 8 that T = 14 -105 = -91.Also which with 1 degree of freedom is highly statistically significant now indicating that couples differ significantly in their preferences.Thus the modified Wilcoxon signed rank sum test is here shown again to be the second best using the simulated data being the second most powerful of the six non parametric statistical methods presented here for the analysis of paired or matched sample data.This is because this method uses all available information on the data being analyzed including direction and magnitude and also adjusts, that is makes provision, for the presence of any possible tied observations between the sampled populations.

Summary and Conclusion
We have in this paper presented and discussed eight alternative methods for the analysis of paired or matched sample data.If the sampled populations satisfy the necessary assumptions of continuity and normality, then the paired sample parametric "t" test becomes the method of choice and should be preferred since it is generally more powerful than most alternative non parametric methods.If however the data being analyzed are not continuous or are ordinal non-numeric measurements, then the modified sign tests using either the raw scores themselves (method 6) or their ranks (method 7) are the only available methods of analysis under the circumstance.If the data are numeric measurements on at least the ordinal scale but not appropriate for analysis using the parametric "t" test, then the modified Wilcoxon signed rank sum test, the modified sign tests by ranks, the modified sign test, the exact binomial or ordinary sign test and its normal approximation should be preferred and used in this order because of their relatively decreasing power, as shown by at least the illustrative examples used here and when reanalyzed using simulation, the modified sign test by ranks, modified Wilcoxon signed rank sum test, the modified sign test, the exact binomial or its ordinary sign test should be preferred and used in this order because of their relative decreasing power which is almost the same with the raw example except that modified sign tests by rank came first using simulation while second using raw data.
Finally each of the proposed methods may be appropriately modified and used to analyse one sample data simply by setting values or scores from one of the sampled populations equal to some hypothesized value of a measure of central tendency.

  
 

Table 3 .
Application of the ordinary sign test and other two to the data on family size preferences by husbands and wives. .EBUH, I. C. A. OYEKA 337 data is the exact binomial test with x = 2, n = 10, and  =  0 0.133 0.533 0.133 0.533 6 1240 0.666 0.160 1240 0.506 627.44 Now under the null hypothesis of equal population me-dians , the test statistic for the mo-

Table 7 . Application of the ordinary sign test and other two to the simulated data on family size preferences by husbands and wives. Couple Husband (x i1 ) Wife (x i2 ) Diff. d i = x i1 -x i2
Note that from the P-values and the associated chisquare values that the ordinary sign test and unmodified wilcoxon sign rank sum test have lower P-values than the two type of modified signed tests (methods 6 and 7).The relative efficiency of the modified signed test w to the unmodified Wilcoxons signed rank sum test T + for the simulated data is