Statistical Inference for a Novel Health Inequality Index

In this paper, we develop statistical inference for an important health inequality index proposed by Lv, Wang and Xu [1] for ordinal data. Asymptotic distributions of the indices are established. This allows us to make inference for the indices. Generalizations of the indices to multiple population setting are also studied. We demonstrate the effectiveness of our procedure using the health inequality data of several areas in Switzerland, and our results classify these areas into three classes based on their health inequalities.


Introduction
The qualitative nature of SRHS data prevents the straightforward use of conventionally developed indices for measuring income inequality. A reasonable index for SRHS data should be invariant to rescalings of variables which preserve the order of categories.
Assessment on health inequality for ordered data has received attention in the last ten years, [2] and [3] developed median-based concept of inequality. [4] proposed polarization measures, which are also median based. These methods are invariant to cardinal scaling on the categories. [5] proposed a method using income-health matrix to measure socioeconomic inequality in health. [6] introduced a family of sub-group decomposable indices and investigated the decomposability of the indices. [7] conducted an empirical study of the health inequality index for ordinal data from China. Reference [8] considered the tools and How to cite this paper: Niu, C.Z., Hong, S.X. and Jiang, X.J. (2017) Statistical Inference for a Novel Health Inequality Index. choices to be made when measuring socioeconomic inequalities with rank-dependent inequality indices. [9] made an empirical comparison with several ordinal and cardinal measures of health inequality. [10] proposed a new measure for ordinal health data to monitor income-related health differences between regions in Great Britain. [11] defined a new ratio-scale health status variable and developed positional stochastic dominance conditions that could be implemented in a context of multi-dimensionality categorical variables. [12] examined the measurement of social polarization with categorical and ordinal data. [13] introduced two approaches to measure social polarization in the case where the distance between groups is based on an ordinal variable, such as self-assessed health status. More examples on ordinal inequality measurements can be seen in [14], [15] and so on. For statistical inference of these recent developed health inequality indices, some authors (e.g. [4], [16]) have derived standard errors for the inequality indices they have introduced. [17] presented a unified methodology for the estimation of inequality indices of the cumulative distribution function.
Recently, [1] proposed a class of measures of health inequality, which are easy to compute and have some desirable properties, such as additivity, invariance of parallel shifts, normalization and simple aversion to median-preserving spreads.
However, it is designed only for one population and has not developed statistical inference for the index. This motivates us to work along this topic. In this paper, we establish asymptotic distributions of the indices introduced by [1] and extend the indices to multiple population settings. Our procedures allow dependence between the considered populations and different sample sizes. In particular, we answer several important questions, for example, whether the health inequality of one population is the same as others and is there a linear relationship among the health inequalities of different populations?
The reminder of the paper is organized as follows. In Section 2, we review the indices developed by [1] and derive asymptotic distribution of the indices. In Section 3, we develop the indices for multiple populations. Empirical results are reported in Section 4. Section 5 concludes the paper.

Review of the Indices
According to [1], denote In practice, 2 σ is unknown and must be estimated. Given that d is a consistent estimator of f , the asymptotic variance can be estimated by Based on the asymptotic result, the two-sided symmetric ( )

Testing for Equivalence
We first consider two populations with Our analysis considers the cases of mutually dependent samples and independent samples, with the former being relevant in examining the evolution of health inequalities in a single group (e.g., changes in health inequality over time), while the latter being relevant in comparing health inequality between two groups (e.g., cross-national). The sampling is performed independently within each group. . The asymptotic distribution of ( ) ( ) . . : .
We introduce the following Wald statistic: Then under the null hypothesis The corresponding p -value can be computed by the following formula: where ( ) We propose statistical inference procedures to test the equality between samples in terms of their health inequality indices. This equality issue often emerges when checking for the similarity of the health inequalities in the whole country or in a specified region. For example, China, a country consists of many administrative regions, such as Eastern China, North China, and Central Region, with each region having several provinces. Those provinces in the same region have similar economic and/or social behaviors. Therefore, those provinces in the same region are assumed to have the same health inequalities. We also examine whether the health inequality index of a province is the same as the average index of the entire region. The above two testing problems lead to another application. If the preceding analysis reveals that the provinces within each region have equal indices, then we can check whether the common means in two regions are also the same. Accordingly, we cluster the regions based on the test results. In other words, if several regions have the same health inequality, then we can view these regions as one cluster.

Global Test
Suppose there are ( ) For the dependent samples, we can obtain the similar results as those presented in Section 2. However, the covariance structure becomes too complex to be practical when more samples are used. We only consider independent samples for simplification. A global test can be constructed as: . . : Define the matrix r − dimensional vector with all the elements being 1. Then, Hypothesis in (7) can be rewritten as follows: Given the independence of the r groups of samples, we can obtain Note that under the null hypothesis, in (9). Consequently, a Wald type of test statistic can be defined as where ( )  (7) can be regarded as a generalization of the two-sample comparison case. The availability of this hypothesis can be seen clearly in our empirical application.

Hypothesis Testing within a Cluster
Another interesting problem in the multiple sample case is whether the health inequality of a specified population is the same as the average health inequality of entire population. For instance, one may interest to investigate the health inequality level in Hebei province is higher or lower than the average level of all provinces in the North China region. Accordingly, we propose the following testing hypothesis: Under the null hypothesis in (12), Then the p-value can be determined similarly as that for r T .

Hypothesis Testing between Clusters
Further, we discuss the hypothesis testing between clusters. Assume now that our preliminary analysis reveals that the provinces the corresponding region (cluster), such as Eastern China region, have the same health inequality indices.
We may then examine whether the health inequalities between two regions are similar. To this end, we choose two representative provinces in each region and then compare their health inequality indices following the proposed approaches in Section 2. However, this method does not employ all information in these groups. To use all underlying information, we compare the common means of these two regions. We consider the following hypothesis: Under the null hypothesis in (14), rb T ⇒  , thus p-value can be determined similarly as that for r T .

Empirical Application
To illustrate our proposed procedures, we present a real application by using the The survey respondents were asked to rate their health statuses on a five-point scale ranging from very bad to very good. This dataset was also analyzed by [3] and [6]. We do not include the distributions of SHS in the seven regions in this paper, this information can be found in [6]. We use the health inequality indices proposed by [1] to analyze the survey data and yield new observations. Denote the index with ( ) ( ) show ambiguous ranking. Specifically, for F1 and F2-2, Zurich is the region with least difference in health status, and Central has the second-to-the-lowest inequality. However, for F2-3, Central is identified as the least imbalanced region in health status, while Zurich has the second-to-the-lowest inequality. East and Ticino show the similar behavior. Due to the reason of random sampling of the data set, it is natural to ask questions, like, do East and Ticino have different health inequalities in fact? Do Central and Zurich have the same health inequality actually? We use statistical inferences to address these problems. To fully answer these questions, various interesting two-sample comparison tests are carried out, the results are reported in Table 2. We set the significance level to 5% . From Table 2, we can conclude that Leman is significantly more imbalanced than Middle-Land in health status. In contrast to the findings in Table 1, Middle-Land and North-West do not show statistically significant differences in their health inequalities. In other words, these two regions have the same health inequality level base on the data set we have. North-West is significantly more unbalanced in health status than East. Except for F2-3, all p-values for North-West and Ticino are all smaller than 5% . Therefore, the difference of health inequality between North-West and Ticino can be confirmed almost. Central and Zurich have the same inequality level, and the same finding has been observed for East and Ticino.
Based on the above analysis, we classify North-West and Middle-Land, East and Ticino, and Central and Zurich into three groups. However, can we combine two groups, such as the East and Ticino group with the Central and Zurich group? The question is equivalent to ask whether the average health inequality of the East and Ticino group is the same as that of the other group. The p-values of tests by using the above four measures are 0.5505, 0.1778, 0.7105 and 0.0140, respectively, which are all larger than 5% except for F2-3. Therefore, East, Ticino, Central, and Zurich may be clustered into one group. We also check whether these four regions have the same health inequality levels. The p-values for this global equality hypothesis testing are 0.8805, 0.1824, 0.8946 and 0.0942, respectively, which suggest that these regions have the same inequality levels. We then examine whether this four-member group can be enlarged by including the North-West and Middle-Land group? We propose two hypotheses to investigate this question. First, are the average inequalities of North-West and Middle-Land similar to those of the other groups? Second, do these six regions have the same health inequality levels? For these two hypotheses, all the p-values resulting from tests with the four measures are significantly smaller than 5% , which indicate that the average health inequality of the North-West and Middle-Land group is different from that of the four-member group. We then examine whether the

Conclusion
In this paper, we propose several statistical inference procedures for the novel health inequality indices introduced in [1]. We consider one-, two-, and multiple-sample cases. Given that health surveys generally cover multiple regions, the health inequalities of multiple sample cases must be tested. The health inequality in various regions of Switzerland validates the availability of our proposed tools. Seven regions covered by SHS can be categorized into three groups after the numerical study; Leman has the highest health inequality followed by the North-West and Middle-Land group. The other four regions (i.e., Central, East, Ticino, and Zurich) have the same health inequality. Our proposed procedures can also be applied to other recently proposed health inequality indices.
The subjective well-being is influenced by many factors such as health inequality, education, environment and so on. The statistical inference on multi-dimensionality well-being inequality can be investigated ongoing.
Then the variance of ( ) ( )