_{1}

Scientists have analysed different methods for numerical estimation of Gini coefficients. Using Lorenz curves, various numerical integration attempts have been made to identify accurate estimates. Central alternative methods have been the trapezium, Simpson and Lagrange rules. They are all special cases of the Newton-Cotes methods. In this study, we approximate the Lorenz curve by polynomial regression models and integrate optimal regression models for numerical estimation of the Gini coefficient. The attempts are checked on theoretical Lorenz curves and on empirical Lorenz curves with known Gini indices. In all cases the proposed methods seem to be a good alternative to earlier methods presented in the literature.

Income distributions are commonly unimodal and skew with a heavy right tail. Therefore, different skew models, such as the lognormal and the Pareto, have been proposed as suitable descriptions of income distributions, and the corresponding Lorenz curves have been obtained. These are usually applied in specific empirical situations. For general studies, more wide-ranging tools have been considered. In a long series of studies, different models and methods have been proposed. The target for them is to introduce inequality measures, such as Gini and Pietra indices, that are usable for comparisons of different distributions. Primary income data yield the most exact estimates of income inequality coefficients, but when the income distribution is unknown the use of Lorenz curves is common. In this article, we present a new regression model that approximates the Lorenz curve by polynomial regression models and after integration of the optimal regression models one obtains numerical estimation of the Gini coefficient.

Income inequality indices. Consider the set of ordered points ( p , L ( p ) ) , where p is the cumulative proportion of the income-receiving units and the Lorenz curve, L ( p ) , is the corresponding cumulative proportion of income received when the units are arranged in ascending order of income. When Lorenz curves are compared, especially when they intersect, the comparisons are based on numerical indices. The most frequently used index is the Gini coefficient, G [

G = 1 − 2 ∫ 0 1 L ( p ) d p . (1)

This definition yields Gini coefficients satisfying the inequalities 0 < G < 1 . The higher the G value, the lower the Lorenz curve and the stronger the inequality. The reason for the popularity of the Gini coefficient is that it is easy to compute, being a ratio of two areas in Lorenz curve diagrams. The Gini coefficient allows direct comparison of the income of two income distributions, regardless of their sizes or patterns. The Gini does not capture where in the distribution the inequality occurs. As an additional result, two very different distributions of income, even if they have intersecting Lorenz curves, can have the same Gini index.

In many empirical situations, the income distribution F(x) is given in grouped tables. If the mean or the total incomes in the groups are known, the cumulative distribution can be modified to a Lorenz curve, but the subintervals do not have constant length. Consequently, Simpson’s rule is not applicable. One has to replace it with the trapezium rule or with Lagrange polynomials. The trapezium rule is a weak alternative because it yields positive bias for the area under the Lorenz curve and negative bias for the Gini coefficient.

As an application of these methods, Fellman [

In this study, we review income analysis methods based on Lorenz curves. To test the proposed methods, the analyses are initially applied to theoretical models with known inequality indices. The empirical value of the method is based on analyses of real data in the literature with Gini indices of strong accuracy, and our obtained results are compared with earlier findings.

There are several different situations, and consequently, alternative analyses of Gini coefficients have to be performed. Common estimation alternatives are the use of the trapezium, Simpson and Lagrange rules. They are all special cases of the Newton-Cote method. A common property of these is that they split the ( 0 , 1 ) interval into subintervals and approximate the Lorenz curve in such a way that the polynomials obtain the same values as the Lorenz curve at the end points of the subintervals.

When Lorenz curves are considered, the simplest situations are that they are defined for five quintiles or for ten deciles. In the first case, the most commonly used method is the trapezium rule. For Simpson’s rule, the number of subintervals should be even and the intervals should have the same length. Consequently, the comparison of the results of different rules can be performed for Lorenz curves with deciles.

Our new attempt proposed here is to assume that the approximating function of L ( p ) is a regression polynomial consisting of non-negative integer powers of the argument p, fitted to the values of the Lorenz curve. The optimal polynomial comes close to the Lorenz curve, but at no point obtains exactly the same value. Furthermore, the points of the Lorenz curves do not need to be equidistantly distributed.

Let the regression model be

L ^ ( p ) = α ^ + β ^ 1 p + β ^ 2 p 2 + ... + β ^ n p n . (2)

When one integrates the regression model over the interval ( 0 , 1 ) , one obtains the area under the Lorenz curve having the formulae

∫ 0 1 L ^ ( p ) d p = α ^ + 1 2 β ^ 1 + 1 3 β ^ 2 + ... + 1 n + 1 β ^ n (3)

and

G ^ = 1 − 2 ∫ 0 1 L ^ ( p ) d p = 1 − 2 ( α ^ + 1 2 β ^ 1 + 1 3 β ^ 2 + ... + 1 n + 1 β ^ n ) . (4)

We apply our method on theoretical models in order to compare the obtained Gini indices with theoretical ones. We follow the assumption that the polynomial is at most of six degree ( n ≤ 6 ). This restriction is inposed by the maximum degree of the polynomial trend lines in the Excel system.

The first one is the Pareto model, F ( x ) = 1 − x − α , with a finite mean, that is α > 1 . The Lorenz curve is L ( p ) = 1 − ( 1 − p ) α − 1 α , the mean is μ = 1 α − 1 and G = 1 2 α − 1 . In this study, we assume that the parameter value is α = 4 . Hence, μ = 1 3 and G = 1 2 α − 1 = 0.142857 . The Lorenz curve is presented in

L ^ ( p ) = 0.001462 α + 0.7056 p + 0.28469 p 2 − 0.33355 p 4 + 0.34561 p 6 . (5)

After integration, our method gives the observed value G ^ = 0.141508 , which compared with the theoretical value given above indicates good agreement.

The Chotikapanich model [

L ( p ) = e k p − 1 e k − 1 for k > 0 .

The Gini index is

G = ( k − 2 ) e k + 2 + k k ( e k − 1 ) . For given μ , F ( x ) = 1 k ln ( x ( e k − 1 ) μ k ) [

In this study, we assume that k = 5 . Hence, the Gini index is G = 0.613567 . The Lorenz curve is presented in

The estimated regression model is

L ^ ( p ) = − 0.0000079 + 0.264809 p 2 + 0.735051 p 6 . (6)

The estimated Gini index based on a this polynomial is G ^ = 0.613462 , indicating good agreement with the theoretical value.

The Lorenz curve of the Gupta model is L ( p ) = p β p − 1 with the Gini coefficient

G = 1 − 2 ln ( β ) [ 1 − β − 1 β ln ( β ) ] . (7)

Despite the Gupta model being relatively simple, the corresponding income distribution is not attainable. The explanation of this is that the variable p is included in the model both as a factor and exponent. If β = 5 , then the theoretical numerical value is G = 0.375021 . The Lorenz curve is presented in

L ^ ( p ) = − 0.000189 + 0.211948 p + 0.235348 p 2 + 0.459725 p 3 + 0.09329 p 6 . (8)

The estimated Gini index based on this polynomial is G ^ = 0.375015 , and the agreement with the theoretical value is acceptable.

The Lorenz curves for the Pareto, Chotikapanich and Gupta models are presented in deciles, and therefore, we can compare their regression results. In

The obtained results concerning theoretical models are acceptable, but in order to check the model the proposed method must also be applied on numerical

empirical data. We choose from the literature empirical data for which G values have previously been estimated with good accuracy.

The Lorenz curve is based on the data given by Ogwang [

The estimated regression model is

L ^ ( p ) = 0.000822 + 0.144475 p + 1.036686 p 2 − 0.523957 p 3 + 0.340210 p 6 . (9)

The estimated Gini index based on the polynomial (9) is G ^ = 0.3275 . This value is located well within the interval proposed by Ogwang.

Tepping estimated an accurate Gini coefficient from the Current Population Survey (CPS) data from 1968 [

L ^ ( p ) = − 0.0000079 + 0.264809 p 2 + 0.735051 p 6 . (10)

After integration, the Gini estimate is 0.4005. This finding is very close to Tepping’s result (

Lorenzen [

L ^ ( p ) = 0.000913 + 0.160800 p + 0.961946 p 2 − 0.647743 p 4 + 0.514366 p 6 . (11)

However, we obtained the estimate G ^ = 0.308212 .

In all cases the proposed methods seem to be a good alternative to earlier methods presented in the literature.

The comparison between different estimation methods is in general difficult to perform. These difficulties are mainly caused by the fact that the true Gini coefficient is unknown, but sometimes, where more detailed studies have already resulted in accurate estimates, the comparisons are possible. Such comparison problems are eliminated if the numerical estimations are applied to theoretical distributions. Therefore, when one introduces a new method one must base it on theoretical Lorenz curves with known exact theoretical Gini indices [

The first model in this study is the Pareto model analysed by Rasche et al. [

The step from the Lorenz curve to distribution function is more difficult than that from distribution function to the Lorenz curve. There is a difference between advanced and simple Lorenz models. Advanced models yield a better fit to data, but are difficult to connect to exact income distributions. Simple one-parameter models can more easily be associated with the corresponding income distribution, but when statistical analyses are performed the goodness of fit is often poor.

In order to perform comparisons between the estimated and theoretical Gini coefficients, Fellman [

Fellman [

This study was in part supported by a grant from the “Magnus Ehrnrooths Stiftelse” Foundation.

Fellman, J. (2018) Regression Analyses of Income Inequality Indices. Theoretical Economics Letters, 8, 1793-1802. https://doi.org/10.4236/tel.2018.810117