Measure of Departure from Point-Symmetry for the Analysis of Collapsed Square Contingency Tables ()
1. Introduction
Consider an
square contingency table with the same row and column classifications. Let
denote the probability that an observation will fall in the ith row and jth column of the table (
). The point-symmetry (PS) model is defined by
where the symbol
denotes
; see Wall and Lienert [1]. This indicates that the probability of an observation falling in
th cell is equal to the probability of the observation falling in point symmetric
th cell with respect to the center cell (when r is odd) or center point (when r is even). Now, we consider the
ways of collapsing the
original table with ordered categories into a 3 × 3 table by choosing cut points after hth and
th rows and after hth and h’th columns for
, where
We refer to each collapsed 3 × 3 table as the
table. In the collapsed
table, let
denote the corresponding cumulative probability for row value
and column value
; i.e.,
Then, Yamamoto et al. [2] considered the collapsed point-symmetry (CoPS) model as
for all
, where the symbol
denotes
. Note that the PS model implies the CoPS model, but the PS model is not equivalent to the CoPS model.
When the CoPS model does not hold, we are interested in measuring the degree of departure from CoPS. For square contingency tables with ordered categories, Tomizawa et al. [3] proposed a measure to represent the degree of departure from PS.
By the way, consider the data in Table 1 taken from Hashimoto [4]. These data describe the cross-classification of father’s and son’s occupational status categories in Japan which were examined in 1975 and 1995. For the data in Table 1(a) & (Table 1(b)) having five categories, there may be a case that we want to combine the occupational status into the simpler three categories, namely, “high”, “middle” and “low”. For example, the collapsed 3 × 3 table T14 has “high” category which is “(1) Capitalist” category in the original 5 × 5 table, “middle” category which is obtained by combing “(2) New middle”, “(3) Working” and “(4) Self-employed” categories in the original table, and “low” category which is “(5) Farming” category in it. Similarly, we can consider the collapsed 3 × 3 table T23, which has “high” category which is obtained by combing “(1) Capitalist” and “(2) New middle” categories in the original 5 × 5 table, “middle” category which is “(3) Working” category in the original table, and “low” category which is obtained by combing “(4) Self-employd” and “(5) Farming” categories in it. Table 2 and Table 3 give the collapsed 3 × 3 tables T14, T23 (for observations) for the data in Table 1(a) and (Table 1(b), respectively. Now, we are interested in seeing what degree the departure from PS is for each of tables T14 and T23. So, the present paper proposes a measure which represents the degree of departure from
(a) (b)
Table 1. Occupational status for Japanese father-son pairs; from Hashimoto [4]. (a) examined in 1975; (b) examined in 1995.
Note: Status (1) is Capitalist, (2) New middle, (3) Working, (4) Self-employed and (5) Farming.
(a) (b)
Table 2. Collapsed tables T14 and T23 for the data in Table 1(a). (a) T14 table; (b) T23 table.
(a) (b)
Table 3. Collapsed Tables T14 and T23 for the data in Table 1(b). (a) T14 table; (b) T23 table
CoPS by using collapsed 3 × 3 tables. For related research, see Iki et al. [5] and Balcha [6].
The new measures are introduced in Section 2. Section 3 presents an approximate variance and a confidence interval for the proposed measure. Section 4 gives examples. Finally, Section 5 concludes the paper.
2. Measure of Departure from Point-Symmetry for Collapsed Tables
Assume that
. Let
and
Consider a measure to represent the degree of departure from CoPS, defined by
where
and the value at
is taken to be continuous limit as
. Namely
where
The submeasure
represents the degree of departure from PS for the collapsed
table. We note that
is the power-divergence between two probabilities
and
, and especially
is the Kullback-Leibler information between them. (For more details of the power-divergence
, see Cressie and Read [7]; Read and Cressie [8] ).
Let
Also let
. Then the submeasure
is expressed as
where
and the value at
is taken to be continuous limit as
. Namely
Moreover, the submeasure
is also expressed as
where
and the value at
is taken to be continuous limit as
. Namely
Note that
is Patil and Taillie’s [9] diversity index of degree
for
and
, which includes the Shannon entropy (when
) in a special case.
We note that for all
and
, (i)
, (ii)
if and only if
(then
) or
(then
), and (iii)
if and if only if
, that is,
.
We see that the measure
lies between 0 and 1. Also the submeasures
lie between 0 and 1 for
. For each
, there is the structure of CoPS if and only if
; and the degree of departure from CoPS is the largest, in the sense that
(then
) or
(then
) for
and
if and only if
.
3. Approximate Confidence Interval for Measure
Let
denote the observed frequency in ith row and jth column of the table
. The sample version of
, that is,
, is given by
with
replaced by
, where
and
. We assume that
result from full multinomial sampling. We consider an approximate standard error for
and a large-sample confidence interval for
. The term
has asymptotically (as
) a
normal distribution with mean zero and variance
by using the delta method. See Appendix for the details of
.
Let
denote
with
replaced by
. Then
is an estimated approximate standard error for
, and
is an approximate
percent confidence interval for
, where
is the percentage point from the standard normal distribution corresponding to a two-tail probability equal to p.
4. Examples
Consider the data in Table 1(a) and Table 1(b) again. From Table 4(a) and Table 4(b), since the confidence intervals for
applied to the data in each of Table 1(a) and Table 1(b) do not include zero for all
, these would indicate that there is not a structure of CoPS in each table. When the degrees of departure from CoPS in Table 1(a) and Table 1(b) are compared using the confidence interval for
, it is greater for Table 1(a) than for Table 1(b).
We further analyze the data in Table 1(a) and Table 1(b) using submeasures
(a) (b)
Table 4. Estimate of measure
, approximate standard error for
and approximate 95% confidence interval for
, applied to Table 1(a) and Table 1(b).
(a) (b)
Table 5. Estimate of submeasures
applied to Table 1(a) and Table 1(b).
. We see from Table 5(a) that for Table 1(a), the degree of departure from point-symmetry in the collapsed table T23 is smaller than that in T14. Thus it is seen that (i) when we combine the categories (2), (3) and (4) in Table 1(a), the degree of departure from point-symmetry for collapsed table T14 is large, and (ii) when we combine the categories (1) and (2), and combine (4) and (5) in Table 1(a), that for the collapsed table T23 is less than the case of (i). Similarly, we see from Table 5(b) that for Table 1(b), the degree of departure from point-symmetry in the collapsed table T23 is smaller than that in T14. Thus it is seen that (i) when we combine the categories (2), (3) and (4) in Table 1(b), the degree of departure from point-symmetry for collapsed table T14 is large, and (ii) when we combine the categories (1) and (2), and combine (4) and (5) in Table 1(b), that for the collapsed table T23 is less than the case of (i).
5. Conclusions
When the CoPS model does not hold for the original 5 × 5 table, we are interested in (i) seeing what degree the departure from point-symmetry is for each of tables T14 and T23, (ii) seeing for which table of T14 and T23 the degree of departure from point-symmetry is larger, and (iii) seeing what degree the departure from CoPS is for the original 5 × 5 table. For (i) and (ii), the proposed
are useful, and for (iii) the proposed measure
is useful.
Since the collapsed tables are obtained by combing adjacent categories, it is meaning to consider collapsed 3 × 3 tables only when an original square contingency table has ordered categories. Therefore, a measure for CoPS in square ordinal tables should depend on the order of listing the categories. We note that it does not matter whichever submeasures for the collapsed tables are invariant or not invariant, because each collapsed 3 × 3 table obtained from an original square table is unique.
In addition, the measure
is expressed by using same weights
for submeasures
. It seems useful to analyze an original square contingency table using the measure
when we cannot decide which collapsed 3 × 3 table is important.
Acknowledgements
The authors would like to thank the referee for their helpful comments.
Appendix
Using the delta method,
has asymptotically variance
as follows:
where
and
is the indicator function,
if true, 0 if not.