Regression Analyses of Income Inequality Indices

Scientists have analysed different methods for numerical estimation of Gini coefficients. Using Lorenz curves, various numerical integration attempts have been made to identify accurate estimates. Central alternative methods have been the trapezium, Simpson and Lagrange rules. They are all special cases of the Newton-Cotes methods. In this study, we approximate the Lorenz curve by polynomial regression models and integrate optimal regression models for numerical estimation of the Gini coefficient. The attempts are checked on theoretical Lorenz curves and on empirical Lorenz curves with known Gini indices. In all cases the proposed methods seem to be a good alternative to earlier methods presented in the literature.


Introduction
Income distributions are commonly unimodal and skew with a heavy right tail. Therefore, different skew models, such as the lognormal and the Pareto, have been proposed as suitable descriptions of income distributions, and the corresponding Lorenz curves have been obtained. These are usually applied in specific empirical situations. For general studies, more wide-ranging tools have been considered. In a long series of studies, different models and methods have been proposed. The target for them is to introduce inequality measures, such as Gini and Pietra indices, that are usable for comparisons of different distributions. Primary income data yield the most exact estimates of income inequality coefficients, but when the income distribution is unknown the use of Lorenz curves is common. In this article, we present a new regression model that approximates curves are compared, especially when they intersect, the comparisons are based on numerical indices. The most frequently used index is the Gini coefficient, G [1]. Using the Lorenz curves, this coefficient is the ratio of the area between the diagonal and the Lorenz curve and the whole area under the diagonal. The formula is This definition yields Gini coefficients satisfying the inequalities 0 1 G < < .
The higher the G value, the lower the Lorenz curve and the stronger the inequality. The reason for the popularity of the Gini coefficient is that it is easy to compute, being a ratio of two areas in Lorenz curve diagrams. The Gini coefficient allows direct comparison of the income of two income distributions, regardless of their sizes or patterns. The Gini does not capture where in the distribution the inequality occurs. As an additional result, two very different distributions of income, even if they have intersecting Lorenz curves, can have the same Gini index.
In many empirical situations, the income distribution F(x) is given in grouped tables. If the mean or the total incomes in the groups are known, the cumulative distribution can be modified to a Lorenz curve, but the subintervals do not have constant length. Consequently, Simpson's rule is not applicable. One has to replace it with the trapezium rule or with Lagrange polynomials. The trapezium rule is a weak alternative because it yields positive bias for the area under the Lorenz curve and negative bias for the Gini coefficient.
As an application of these methods, Fellman [2] considered different Lorenz models: the Kakwani and Podder model, the generalized Pareto model analysed by Rasche et al. [3] and the Gupta model [4]. In addition, Rao and Tam [5] constructed a generalized Gupta model. Furthermore, Rao and Tam introduced a simplified version of the Rao-Tam model. Chotikapanich [6] defined an alternative Lorenz curve. The Pareto, Chotikapanich and Gupta models contain only one parameter. They are so simple that it is impossible to distinguish between the length of the range of the income distribution function and the Gini coefficient. With only one parameter to estimate, these distribution properties cannot be independently estimated. We pay special attention to these models and analyse them in more detail. Using Lorenz curves, various numerical integration attempts were made to determine the accuracy of the estimates. For example, Mettle et al. [7] considered Lorenz curves and estimated the Gini coefficient of in-Theoretical Economics Letters come by Newton-Cotes methods, and then compared the accuracy of these estimates for some (Ghanaian) data.
In this study, we review income analysis methods based on Lorenz curves. To test the proposed methods, the analyses are initially applied to theoretical models with known inequality indices. The empirical value of the method is based on analyses of real data in the literature with Gini indices of strong accuracy, and our obtained results are compared with earlier findings.

Methods
There are several different situations, and consequently, alternative analyses of Gini coefficients have to be performed. Common estimation alternatives are the use of the trapezium, Simpson and Lagrange rules. They are all special cases of the Newton-Cote method. A common property of these is that they split the ( ) 0,1 interval into subintervals and approximate the Lorenz curve in such a way that the polynomials obtain the same values as the Lorenz curve at the end points of the subintervals.
When Lorenz curves are considered, the simplest situations are that they are defined for five quintiles or for ten deciles. In the first case, the most commonly When one integrates the regression model over the interval ( ) 0, 1 , one obtains the area under the Lorenz curve having the formulae ( ) and ( )

Theoretical Lorenz Curves
We apply our method on theoretical models in order to compare the obtained Gini indices with theoretical ones. We follow the assumption that the polynomial is at most of six degree ( 6 n ≤ ). This restriction is inposed by the maximum degree of the polynomial trend lines in the Excel system.

Pareto Model
The first one is the Pareto model, In this study, we assume that the parameter value is 4 α = . Hence, The Lorenz curve is presented in Figure 1.
After integration, our method gives the observed value ˆ0 .141508 G = , which compared with the theoretical value given above indicates good agreement.

Chotikapanich Model
The Chotikapanich model [6] has the Lorenz curve In this study, we assume that 5 k = . Hence, the Gini index is 0.613567 G = .
The Lorenz curve is presented in Figure 2.
The estimated regression model is The estimated Gini index based on a this polynomial is 0.6 62 134 G = , indicating good agreement with the theoretical value.

Gupta Model [4]
The Lorenz curve of the Gupta model is ( ) Despite the Gupta model being relatively simple, the corresponding income distribution is not attainable. The explanation of this is that the variable p is included in the model both as a factor and exponent. If 5 β = , then the theoretical numerical value is 0.375021 G = . The Lorenz curve is presented in Figure 3.

Empirical Data
The obtained results concerning theoretical models are acceptable, but in order to check the model the proposed method must also be applied on numerical

Ogwang Data
The Lorenz curve is based on the data given by Ogwang [9]. The data is household income in Israel, originally derived from the Family Expenditure Survey 1986/87 reported by Fishelson [10]. The data are presented as a Lorenz curve with several intervals. In this case, the subintervals are of different lengths. Consequently, one has no possibility to use Simpson's rule. The Lorenz curve is presented in Figure 5

Tepping's Data
Tepping estimated an accurate Gini coefficient from the Current Population Survey (CPS) data from 1968 [11]. The estimated Gini index was 0.4014. Gastwirth [12] tested Tepping's data, applied different methods and obtained interval estimates that were close to Tepping's estimate. In this study, we construct the Lorenz curve for Tepping's data and approximate the curve using our polynomial regression model. We obtain the optimal regression model ( ) 2 6 0.0000079 0.264809 0.735051 L p p p = − + + .
After integration, the Gini estimate is 0.4005. This finding is very close to Tepping's result ( Figure 6).

Lorenzen Data
Lorenzen [13] presents information about the total distribution of income for households in Germany in 1973 in his " Tabelle    In all cases the proposed methods seem to be a good alternative to earlier methods presented in the literature.

Discussion
The comparison between different estimation methods is in general difficult to perform. These difficulties are mainly caused by the fact that the true Gini coef-

Lorenzen (1980) Theoretical Economics Letters
The step from the Lorenz curve to distribution function is more difficult than that from distribution function to the Lorenz curve. There is a difference between advanced and simple Lorenz models. Advanced models yield a better fit to data, but are difficult to connect to exact income distributions. Simple one-parameter models can more easily be associated with the corresponding income distribution, but when statistical analyses are performed the goodness of fit is often poor.
In order to perform comparisons between the estimated and theoretical Gini coefficients, Fellman [1] analysed classes of theoretical Lorenz curves with varying Gini coefficients. In this study, we compare Gini estimates for the Pareto, the simplified Rao-Tam and the Chotikapanich distributions.
Fellman [18] studied the Lorenz curves for the Pareto, Chotikapanich and Gupta models presented the Gini and Pietra indices for variable parameter values. He compared these indices and showed the relation between them.