1. Introduction
The Pareto distribution is a non-normal continuous distribution that has significant applications in the social and behavioral sciences (Barnoy & Reich, 2022; Feng, Deng, Chen, Perc, & Kurths, 2020). However, its explanation is often theoretical and confusing, primarily due to its presentation under varying parameterizations that are frequently interchanged (Sarabia, Jorda, & Prieto, 2019). These include the classic two-parameter form, which is the most widely used (type I: scale parameter xm and shape parameter α); the three-parameter forms (type II: location parameter μ, xm, and α; and type III: μ, xm, and shape parameter γ); and the four-parameter form (type IV: μ, xm, α, and γ). Additionally, the distribution is predominantly developed with an economic focus (Barczy, Nedényi, & Sütő, 2023). The purpose of this article is, therefore, to present the type I Pareto distribution in a clear, comprehensible, and illustrative manner for social researchers, including psychologists, sociologists, and social workers.
The type I Pareto (1896; 1897) distribution was developed between 1896 and 1897 by the Italian civil engineer, economist, and sociologist Vilfredo Federico Damaso Pareto (1848-1923) during his political economy course at the University of Lausanne (UNIL) in Switzerland. It was initially conceived to describe the distribution of land, wealth, and income within a society. From these studies, the 80/20 principle was derived, positing that 80% of wealth is concentrated among 20% of the population. Notably, both the distribution and the inequality principle are now applied as probabilistic models for similar variables (Charpentier & Flachaire, 2022; Feng et al., 2020), although contemporary inequality levels tend to be lower, with a maximum ratio of 70/30 (McCarthy & Winer, 2019).
Furthermore, the Pareto distribution and inequality rule have found applications beyond economics, serving as models in fields such as psychology (Campbell & Brauer, 2021; Rajeev, 2022), virtual education (Valkanas & Diamandis, 2022), engineering (Chen, Zhang, Wang, Jiang, & Liu, 2019; Sudharson et al., 2022; Sudharson & Prabha, 2019), climatology (Le Gall, Favre, Naveau, & Prieur, 2022), and physics (Rácz et al., 2023; Cheng et al., 2023). For instance, claims to insurers for accidents and illnesses often follow a Pareto distribution, prompting these companies to impose disproportionately high premiums on certain sectors of the population, such as older adults (Diawara, Kane, Dembele, & Lo, 2021; Zhang, Wu, & Yao, 2022).
2. Characterization of the Distribution
2.1. Parameters and Support
This continuous distribution, belonging to the family of power distributions, is defined by two parameters in its simplest or type I form (Ahmad & Almetwally, 2020). The first parameter, a scale parameter often denoted as xm (though its notation varies widely), represents both the peak or mode and the minimum value of the distribution (Chattamvelli & Shanmugam, 2021). The second parameter, a shape parameter denoted as α (Fedotenkov, 2020), is commonly referred to as the Pareto tail index (Andria, 2022). The parameter space for both spans the interval (0, ∞), while the support of the distribution is defined over the interval [xm, ∞).
The profile of its density function depicts a curve that sharply decreases from its maximum value and becomes asymptotic to the horizontal axis, exhibiting positive skewness and leptokurtosis with a long right tail (Figure 1). The value of α that precisely models the 80/20 law is log4(5) = ln(5)/ln(4) ≈ 1.161, which is considered too extreme for wealth distributions, where α is typically greater than 1.5 (Yang & Zhou, 2022).
Figure 1. Density function fX(x) and cumulative distribution function FX(x) of hourly income modeled by a type I Pareto distribution with scale parameter xm = 40 and shape α = log4 (5).
2.2. Functions
2.2.1. Density Function
The definite integral of the density function allows the calculation of the probability that a value x of the continuous random variable X falls within a given integration interval, which must lie within the domain of the distribution. It is denoted by fX(x). The analytical expression of this function for the type I Pareto distribution is shown in Equation 1.
(1)
2.2.2. Cumulative Distribution Function
This function provides the probability that a continuous random variable X takes a value between the lower limit of the distribution and a specified value x, where x belongs to the domain of the distribution. it is denoted by FX(x). The analytical expression of this function for the type I Pareto distribution is presented in Equation 2.
(2)
2.2.3. Complementary Cumulative Distribution Function
Also known as the tail function, it is denoted by
. This function provides the probability that a continuous random variable X takes a value between a specified value x and the upper limit of the distribution, where x belongs to the domain of the distribution. The analytical expression for this function in the case of the type I Pareto distribution is presented in Equation 3.
(3)
2.2.4. Quantile Function
It is the inverse of the cumulative distribution function. It is denoted by QX(x) or
. It provides the value of X that accumulates the probability corresponding to the given argument, where the argument takes values in the interval (0, 1). The analytical expression for this function in the case of the type I Pareto distribution is shown in Equation 4.
(4)
2.2.5. Moment Generating Function
The k-th derivative of this function, evaluated at the point 0, gives the k-th order moments of the distribution, where the k-th order moment is the mathematical expectation of the variable raised to the k-th power. It is denoted by MX(t). The moment generating function of the type I Pareto distribution is not defined for a non-trivial interval of t around 0, but it is defined for values of t ≤ 0. Its analytical expression is shown in Equation 5.
(5)
Upper incomplete gamma function
2.2.6. Characteristic Function
The characteristic function corresponds to the Fourier transform (with a sign inversion) applied to the analytical expression of the density function. It allows for the analysis of the behavior of the moments of a distribution. It is denoted by CX(t). The analytical expression of this function for the type I Pareto distribution is shown in Equation 6, where
and t ∊ R.
(6)
2.3. Descriptive Measures
2.3.1. Measures of Central Tendency
Mathematical expectation or arithmetic mean: This measure corresponds to the first (non-central) moment. For the type I Pareto distribution, it can be calculated using its two parameters, as shown in Equation 7. Its computation requires the shape parameter α to be greater than 1.
(7)
Geometric mean: It is the antilogarithm of the mathematical expectation of the logarithm of the values, typically using the natural base. For the type I Pareto distribution, it can be calculated from its two parameters, as shown in Equation 8.
(8)
Harmonic Mean: It is the reciprocal of the mathematical expectation of the reciprocals of the values. For the type I Pareto distribution, it can be calculated using its two parameters, as shown in Equation 9.
(9)
Median: It is the 0.5 quantile. For the type I Pareto distribution, it can be calculated using its two parameters, as shown in the Equation 10.
(10)
Mode: It corresponds to the peak of the distribution. The type I Pareto distribution is unimodal, with its single peak occurring at the scale parameter xm (Equation 11).
(11)
Non-central k-th moment: It is the mathematical expectation of the values raised to the k-th power. For the type I Pareto distribution, it can be calculated using its two parameters, as shown in Equation 12. For its calculation, the shape parameter α must be greater than the power k to which the values are raised.
(12)
2.3.2. Measures of Variation
Variance: It corresponds to the second central moment or the expected value of the squared differences from the arithmetic mean. For the type I Pareto distribution, it can be calculated using its two parameters, as shown in in Equation 13. Its computation requires the shape parameter α to be greater than 2.
(13)
Standard deviation: It is the square root of the variance (Equation 14).
(14)
Entropy: It corresponds to the mathematical expectation of the additive inverses of the logarithms of the densities, or the mathematical expectation of the information of the values. When the natural logarithm is used, entropy is expressed in natural units of information (nats), which is the most common. If base 10 is used, it is expressed in decimal units of information (dits); whereas, if base 2 is used, it is expressed in binary units of information (bits). For the type I Pareto distribution, it can be calculated using its two parameters, as shown in Equation 15.
(15)
2.3.2. Measures of Shape
Skewness: It corresponds to the third standardized central moment. Karl Pearson’s original notation is used:
(Pearson, 1895). For the type I Pareto distribution, it can be calculated using the shape parameter α, as shown in Equation 16. Its computation requires this parameter to be greater than 3.
(16)
Excess Kurtosis: It corresponds to the standardized fourth central moment minus the kurtosis value of the normal distribution, which is 3. Karl Pearson’s original notation is used: β2 − 3 (Pearson, 1905). For the type I Pareto distribution, it can be calculated using the shape parameter α, as shown in Equation 17. Its computation requires this parameter to be greater than 4.
(17)
3. Parameter Estimation
Let X be a random sample of size n from a continuous quantitative variable that follows a type I Pareto distribution with parameters xm and α. For example, the sample data could consist of records of the monthly salaries of randomly chosen workers in a large firm. Refer to Equation 18 for its notation.
(18)
3.1. Estimator of α by the Method of Moments
The estimator of the shape parameter α by the method of moments is obtained from the mathematical expectation or arithmetic mean of X, if the value of the location parameter xm, which is the peak and minimum value of the distribution, is known. This mathematical expectation corresponds to the quotient of the sample mean (numerator) and the difference between the sample mean and the known parameter xm (denominator). It is valid when α > 2. For more details, refer to Equation 19.
(19)
The point estimator converges to a normal distribution, since X has a distribution with finite moments (Rao, 1973), which allows for the computation of an asymptotic interval estimator (Equation 20).
(20)
3.2. Estimators of xm and α by the Maximum Likelihood Method
The estimator by the maximum likelihood method of the scale parameter xm is the minimum sample value, and the estimator of the shape parameter α is the inverse of the arithmetic mean of the logarithms of the ratios between each value xi and the minimum sample value (Siudem, Nowak, & Gagolewski, 2022). See Equation 21 and Equation 22 for these estimators, respectively.
(21)
(22)
The Fisher information for n data points for these two parameters is given by the 2 × 2 matrix in Equation 23.
(23)
The variance of a parameter θ, Var(θ), is always greater than or equal to its Cramér-Rao lower bound CRLB(θ) or the inverse of its Fisher information for n data, 1/I(θ) (Xu, Sedory, & Singh, 2022). Refer to Equation 24.
(24)
The estimators obtained by the maximum likelihood method have asymptotic properties of unbiasedness (Equation 25), consistency (Equation 26), efficiency (Equation 27), and normality (Equation 28), which make them very useful (Song, Roung-Park, & Yoon, 2022).
(25)
(26)
(27)
(28)
Based on these asymptotic properties, asymptotic standard errors (aes) can be defined for the maximum likelihood estimators of xm (Equation 29) and α (Equation 31), along with asymptotic confidence intervals (Equation 30 for the estimator of xm and Equation 32 for the estimator of α).
(29)
(30)
(31)
(32)
The estimator of α by the maximum likelihood method is more efficient than that of the method of moments, but it is biased. Given this limitation, a bias-corrected estimator can be defined that has a lower variance than the biased estimator (Rytgaard, 1990) and can be used to achieve more efficient asymptotic estimation (Equation 33). As with the previous definitions, the use of these asymptotic formulas requires a large sample, specifically one larger than 30 and preferably at least 100 (Mateus & Caeiro, 2022). See the variance, standard error, and asymptotic confidence interval of the bias-corrected estimator in Equations 34, 35, and 36, respectively.
(33)
(34)
(35)
(36)
3.3. Exact Distributions of the Parameters
Let X be a random variable following a type I Pareto distribution with scale parameter xm and shape parameter α. The exact distribution of the sum of the logarithms of each of the n sample data points of X, divided by the minimum sample value, follows an exponential distribution with a rate parameter equal to α (Rytgaard, 1990). See Equation 37.
(37)
The exact distribution of the maximum likelihood estimator of the scale parameter xm is a type I Pareto distribution with a scale parameter xm and a shape parameter n × α, while the exact distribution of the maximum likelihood estimator of the shape parameter α is an inverse gamma distribution with a shape parameter n - 1 and a scale parameter n × α (Qian, Chen, & He, 2021). See Equation 38.
(38)
4. Generalized Form of the Pareto Distribution
There is a generalized form of the four-parameter Pareto distribution with location parameter μ, scale parameter σ, and two shape parameters: the Pareto tail index α and the inequality index γ. Its cumulative distribution function is provided in Equation 39 (Arnold, 2015).
Support:
Parameter space:
and
and
.
(39)
The location parameter μ is not the mathematical expectation or arithmetic mean of the distribution, and the scale parameter σ is not the standard deviation. The mean depends on the parameters σ, α, and γ, as well as certain gamma functions, and requires that γ < α (Equation 40).
(40)
When μ = σ = xm (minimum and mode) and γ = 1, this generalized four-parameter form reduces to the type I Pareto distribution (Equation 41).
(41)
In the type II Pareto distribution, γ = 1 (Equation 42). When the parameter μ of the type II Pareto distribution is equal to 0, it is referred to as the Lomax (1954) distribution.
(42)
In the type III Pareto distribution, α = 1 (Equation 43).
(43)
5. Relationship between Type I Pareto Distribution and
Other Distributions
Let X be a random variable following a type I Pareto distribution with parameters xm and α. The product of the scale parameter xm and an exponential distribution based on the number e and exponent X (random variable Y) follows an exponential distribution with a rate parameter or inverse scale λ = α. Conversely, let Y be a random variable with an exponential distribution with rate parameter or inverse scale λ. The natural logarithm of the quotient of the random variable Y and the value xm (random variable X) follows a type I Pareto distribution with a scale parameter xm and a shape parameter α = λ. This implies that the cumulative distribution functions of X and Y are equal and interchangeable when calculating cumulative probabilities. Refer to Equation 44.
(44)
It should be noted that there is a relationship between the exponential and Pareto distributions that is analogous to the one between the normal and lognormal distributions. The Pareto and lognormal distributions can be applied to the same data, although one may provide a better fit than the other. Both are exponential transformations of a more widely known and used variable: the former from an exponential distribution and the latter from a normal distribution (Feng et al., 2020).
The zeta distribution is the discrete equivalent of the Pareto distribution. If the support of the zeta distribution is bounded within an interval of natural numbers, it becomes the so-called Zipf distribution (Arnold, 2015).
6. Relationship between Type I Pareto Distribution and Gini
Concentration Index
The Pareto (1897) tail index is related to the Gini concentration index (Safari, Masseran, Ibrahim, & Hussain, 2019). Corrado Gini (1936), created this index to summarize in a single number the information contained in the Lorenz curve (Mojiri & Ahmadi, 2022). The Lorenz (1905) curve is obtained from a two-dimensional diagram that represents the number of people on the horizontal axis (x) and the accumulated income or wealth on the vertical axis, FX(x). A straight line with a 45-degree slope, known as the line of equality, is included in this graph. As the curve, defined by the coordinate pairs (x, FX(x)), diverges from this line, inequality increases. The Gini index is the ratio of the area between the line of equality and the Lorenz curve to the area of the lower right triangle. A value of 0 indicates total equality, which occurs when all individuals have the same income. A value of 1 reflects maximum inequality, occurring when one person receives all the income and the others have no income or are unpaid labor (Sitthiyot & Holasut, 2021).
In countries with more income equality, such as the Scandinavian countries, the Gini index is less than 0.3. In countries with greater inequality, such as some African nations (South Africa, Namibia, Suriname, Zambia, Eswatini, Botswana, Angola, and Zimbabwe) and countries in the Americas (Belize, Brazil, Colombia, and Panama), the Gini index is higher than 0.5 (World Bank, 2022). It should be noted that countries with communist economic systems do not have low Gini indices, and social democratic policies involving increased public spending and debt tend to increase rather than decrease inequality (Tokhirov, 2021). Thus, left-wing populist policies have been described as policies of poverty equalization, leading to the emergence of a wealthy political oligarchy (Benczes, 2022; Landoni & Villegas, 2022).
The value on the Lorenz curve at point x for a continuous random variable X with density function fX(x) is obtained by the following proportion given in Equation 45.
(45)
When applied to the type I Pareto distribution, the above formula is transformed into the expression shown in Equation 46.
(46)
The Gini index can be expressed as one minus twice the integral from 0 to 1 of the Lorenz curve. When applied to the type I Pareto distribution, it is the inverse of twice the tail parameter minus one, 1/(2α − 1), whenever α > 1. When α < 1, the Gini index takes its minimum value, which is 1 (Equation 47). Table 1 shows the correspondence between values of the Pareto tail and Gini concentration indices and their interpretation.
(47)
Table 1. Correspondence between Gini and Pareto indexes and their interpretation.
G |
α |
Interpretation |
0.05 |
10.5 |
Very little inequality |
0.1 |
5.5 |
|
0.15 |
3.833 |
Little inequality |
0.2 |
3 |
|
0.25 |
2.5 |
|
0.3 |
2.167 |
|
0.35 |
1.929 |
Moderate inequality |
0.4 |
1.75 |
|
0.45 |
1.61 |
Fairly significant inequality |
0.5 |
1.5 |
|
0.55 |
1.409 |
A lot of inequality |
0.6 |
1.333 |
|
0.65 |
1.269 |
|
0.7 |
1.214 |
|
0.75 |
1.167 |
Very much inequality |
0.8 |
1.125 |
|
0.85 |
1.088 |
|
0.9 |
1.056 |
|
0.95 |
1.026 |
|
1 |
1 |
|
Note. G = Gini concentration index and α = Pareto tail index.
7. Generation of Type I Pareto Random Samples and
Goodness-of-Fit Testing
Let U be a random variable with a standard uniform distribution: U ⊆ U [0, 1]. The transformation xm(1 − U)−1/α follows a type I Pareto distribution with scale parameter xm and shape parameter α. This procedure allows for obtaining a random sample with a type I Pareto distribution from a random sample of a standard uniform variable and is called inverse transform sampling (Ross, 2022).
To test whether the sample data fit the type I Pareto distribution, Chu, Dickin and Nadarajah (2019) recommend the Kolmogorov-Smirnov test based on a simulation study. The simplest way to apply this inferential test is to transform the data, assuming a type I Pareto distribution, into values with an exponential distribution by taking the natural logarithm of the ratio of each data point to the minimum sample value. Then, the Kolmogorov-Smirnov test, adapted for exponentially distributed samples, is applied (Stephens, 1974). The inferential test can be supplemented with a graphical assessment using a quantile-quantile plot and a histogram with an overlaid density curve. In the former, an alignment of points along a straight line with a 45-degree slope is sought, and in the latter, an inverted J-shaped profile is expected (Bhoj & Chandra, 2021).
Statistical hypotheses: H0: X ~ Pareto(xm, α) ≡ Y = ln[X/min(X)] ~ Exponential(λ = α) y H1: X ≁ Pareto(xm, α).
Assumptions: Random sample of size n of a continuous quantitative variable X.
Test statistic: The data are sorted in ascending order, assigned ranks or orders, and transformed to follow an exponential distribution under the assumption that the null hypothesis is true.
x(1) ≤ |
x(2) ≤ |
… ≤ |
x(i) ≤ |
… ≤ |
x(n-1) ≤ |
x(n) |
1 |
2 |
… |
i |
… |
n-1 |
n |
ln[x(1)/x(1)] |
ln[x(2)/x(1)] |
… |
ln[x(i)/x(1)] |
… |
ln[x(n-1)/x(1)] |
ln[x(n)/x(1)] |
The cumulative probability (theorical cumulative relative frequency) under the type I Pareto distribution model (Equation 48) and the (empirical) cumulative relative frequency (Equation 49) are calculated for each transformed value, along with the D+, D−, and D statistics (Equation 50).
(48)
(49)
(50)
The sample estimation correction to the D statistic, as given by Stephens (1974), is applied. See Equation 51.
(51)
The decision is based on the transformed D statistic (Equation 48). If Dc ≤ Dα, H0 holds, and if Dc > Dα, H0 is rejected at a significance level of α. The critical Dα values depend on the significance level α (Stephens, 1974), which is typically set at .05 (D0.05 = 1.094). For small samples (n < 20), it may be increased to .10 (D.10 = .990), and for large samples (n > 500), it may be reduced to .01 (D.01 = 1.308).
Another option for inferential testing is the Anderson & Darling (1952) test. The formula for its test statistic is shown in Equation 52. Similar to the Kolmogorov-Smirnov test, the data are transformed to follow an exponential distribution (Equation 48), and Stephens’ (1986) correction is applied to the test statistic when the parameters are estimated from the sample data (Equation 53).
(52)
(53)
As in the Kolmogorov-Smirnov test, the decision is based on the corrected A2 statistic, denoted as
. If
(critical value), H0 is retained; if
, H0 is rejected at a significance level of α. The critical values Aα2 depend on the type of distribution (in this case, exponential), the estimation method (in this case, maximum likelihood estimator), and the significance level α, which is typically set at .05 (A.05 = 1.321). For small samples (n < 20), the significance level α can be increased to .10 (A.1 = 1.062), while for large samples (n > 500), it can be reduced to .01 (A.01 = 1.959). Refer to Stephens (1986).
A modification of the Anderson-Darling test was developed by Sinclair, Spurr and Ahmad (1990) for distributions with positive skewness and highly atypical cases in the right tail. The type I Pareto distribution falls into this category; hence, this modification can be applied. The Sinclair-Spurr-Ahmad statistic is calculated directly from the original sample data, without transformation (Equation 54), and is denoted as
.
(54)
The critical value for decision making is calculated with the formula shown in Equation 55.
(55)
If
, H₀ holds: X ~ Pareto (xm, α) at a significance level of p. If
, H₀ is rejected. As the sample size approaches infinity, the value of u becomes unitary, and the critical value
reaches its asymptotic value:
= 0.356 for p = .1,
= 0.432 for p = .05, and
= 0.610 for p = .01.
8. Example Calculations Using the Type I Pareto Distribution
Next, simulated data are used in two examples illustrating the application of the distribution. In the first example, probabilities, descriptive statistics, and the representation of the distribution are calculated for a variable 𝑋 that follows a type I Pareto distribution, characterized by the parameters xm = 500 and α = 4.1, representing the biweekly salary (in dollars) within a company. In the second example, the data are generated using the inverse sampling transform in Excel to follow a type I Pareto distribution (xm =500, α = 4.1). Based on these generated data, the distribution parameters are estimated both pointwise and using confidence intervals. Additionally, the goodness of fit is assessed using statistical tests and graphical methods.
8.1 Calculation of Probabilities and Descriptive Measures
Let X be a variable with a type I Pareto distribution characterized by parameters xm = 500 and α = 4.1, representing the biweekly salary in dollars within a company. The task is to calculate the probability of having an income less than or equal to 592, less than 1000, between 800 and 1400, greater than or equal to 600, and greater than 1500. Additionally, the following measures of central tendency are required: the mathematical expectation or arithmetic mean μ(X), geometric mean μg(X), harmonic mean μh(X), median Mdn(X), and mode Mo(X); measures of variation such as variance σ2(X), standard deviation σ(X), and entropy H(X); as well as shape measures based on standardized central moments, including skewness
and excess kurtosis β2(X). Furthermore, the Gini concentration index G(X) is to be calculated. Finally, it is considered illustrative to plot its density function fX(x) and cumulative distribution FX(x).
Equation 2 is applied for the calculation of the probability of having an income less than or equal to 592.
Equation 2 is also used to calculate the probability of having an income less than 1000. Since it is a continuous distribution, including or excluding the specific point does not affect the value of the cumulative probability.
Equation 2 is used to calculate the probability of having an income between 800 and 1400 by taking the difference between two cumulative probabilities.
Equation 3 is used to calculate the probability of having an income greater than or equal to 600.
Equation 3 is also used to calculate the probability of having an income greater than 1500. Since it is a continuous distribution, including or excluding the specific point does not affect the complementary or right-tailed probability.
Measures of central tendency. The arithmetic mean is calculated using Equation 7, the geometric mean using Equation 8, the harmonic mean using Equation 9, the median using Equation 10, and the mode using Equation 11.
Measures of variation. Variance is calculated using Equation 13, standard deviation using Equation 14, and entropy using Equation 15.
Skewness and Excess Kurtosis Based on Central Moments. These two shape statistics are calculated using Equations 16 and 17, respectively.
The Gini Index is obtained using Equation 47.
The Gini index is close to 0, indicating fairly equal wages within the firm. Figure 2 displays the density function fX(x) and the cumulative distribution FX(x) of the random variable X, representing biweekly wages (in dollars). The wages follow a type I Pareto distribution with parameters xm = 500 and α = 4.1.
Figure 2. Density function fX(x) and cumulative distribution function FX(x) for X ~ Pareto(xm = 500, α = 4.1).
The R script to calculate these probabilities, descriptive statistics, and the plot of the density and cumulative distribution functions of the type I Pareto distribution (xm = 500 and α = 4.1) is shown below:
# Load required libraries (packages)
library(cascsim)
library(modeest)
cat ("Probability calculations (rounding to four decimal places)", "\n")
cat ("Cumulative probability less than or equal to 592 =", round(ppareto(592, xm = 500, alpha = 4.1), 4), "\n")
cat ("Cumulative probability less than 1000 =", round(ppareto(1000, xm = 500, alpha = 4.1), 4), "\n")
cat ("Probability in the interval [800, 1400] =", round(ppareto(1400, xm = 500, alpha = 4.1) - ppareto(800, xm = 500, alpha = 4.1), 4), "\n")
cat ("Probability greater than or equal to 600 =", round(1 - ppareto(600, xm = 500, alpha = 4.1), 4), "\n")
cat ("Probability greater than 1500 =", round(1 - ppareto(1500, xm = 500, alpha = 4.1), 4), "\n")
# Descriptive measures for type I Pareto distribution (rounding to three decimal places).
x_m <- 500 # Scale parameter
alpha <- 4.1 # Shape parameter
cat ("Measures of central tendency", "\n")
cat ("Mathematical expectation of X: μ(X) =", round(alpha / (alpha - 1) * x_m, 3), "\n")
cat ("Geometric mean of X: μg(X) =", round(x_m * exp(1 / alpha), 3), "\n")
cat ("Harmonic mean of X: μh(X) =", round((1 + 1 / alpha) * x_m, 3), "\n")
cat ("Median de X: Mdn(X) =", round(qpareto(0.5, xm = 500, alpha = 4.1), 3), "\n")
cat ("Mode of X: Mdn(X) =", round(qpareto(0.5, xm = 500, alpha = 4.1), 3), "\n")
cat ("Measures of variation ", "\n")
cat ("Variance of X: σ^2(X) =", round(x_m^2 * alpha /(( alpha -1)^2 * (alpha - 2)), 3), "\n")
cat ("Measures of variation ", "\n")
cat ("Standard deviation of X: σ(X) =", round(sqrt(x_m^2 * alpha /(( alpha -1)^2 * (alpha - 2))), 3), "\n")
cat ("Entropy of X: Η(X) =", round(log(x_m / alpha * exp(1+ 1/ alpha)), 3), "nats", "\n")
cat ("Shape measures", "\n")
cat("Skewness of X: √b1(X) =", round(2 * (alpha + 1) / (alpha - 3) * sqrt((alpha - 2) / alpha), 3), "\n")
cat ("Kurtosis of X: b2(X) =", round(6* (alpha^3 + alpha^2 - 6 * alpha - 2) / (alpha * (alpha - 3) * (alpha - 4)), 3), "\n")
cat ("Gini Index of X: G(X) =", round(1 / (2 * alpha - 1), 3), "\n")
# Plot of density and cumulative distribution functions.
x <- seq(0, 2000, length.out = 2000)
dpareto <- function (x, x_m, alpha) {
ifelse (x < x_m, 0, alpha * x_m^alpha / (x^(alpha + 1)))}
ppareto2 <- function (x, x_m, alpha) {ifelse(x < x_m, 0, 1 - (x_m / x)^alpha)}
density <- dpareto(x, x_m, alpha)
cumulative <- ppareto(x, x_m, alpha)
par (mar = c(4, 4, 1, 2) + 0.1)
plot (x, density, type = "l", col = "blue", lwd = 2, xlab = "Biweekly wages (in dollars)",
ylab = "Density", ylim = c(0, max(density)))
par (new = TRUE)
plot (x, cumulative, type = "l", col = "red", lwd = 2, axes = FALSE, xlab = "", ylab = "")
axis (4)
mtext ("Cumulative Probability", side = 4, line = 3, col = "red")
legend ("right", legend = c("Density", "Cumulative"),
col = c ("darkblue", "red"), lwd = 2, bty = "n", y.intersp = 1.5)
8.2. Random Sample Generation, Parameter Estimation, and
Goodness of Fit
In this second example, the aim is to generate a random sample of 40 observations from the variable X ~ Pareto (500, 4.1). Based on this sample, the goal is to estimate the parameters xm and α using maximum likelihood estimators, and to construct 95% confidence intervals for these parameters based on their asymptotic distributions and errors. Finally, the goodness-of-fit will be assessed through a plot of theoretical versus empirical quantiles, a histogram with an overlaid density curve, and the Kolmogorov-Smirnov and Anderson-Darling tests.
In the first column of Table 2, a random sample of 40 ui values is drawn from a standard uniform distribution U ~ U [0, 1], generated by means of the 2021 version of the Excel random number generator. Using the quantile function, the ui values are transformed into a random sample of size 40 for a variable X that follows a type I Pareto distribution with population parameters: xm = 500 (scale) and α = 4.1 (shape). This data generation process is known as inverse transform sampling, as illustrated in Equation 56. The first value of X obtained by applying Equation 56, starting at u1 = 0.226, is 532.239.
(56)
Table 2. Random sample generation and fit testing to the generating distribution.
Generation of X |
Kolmogorov-Smirnov test |
Q-Q plot |
ui |
xi |
(i) |
x(i) |
y(i) |
FY(y(i)) |
D+ |
D− |
p(i) |
xt(i) |
0.226 |
532.239 |
1 |
502.595 |
0 |
0 |
0.025 |
0 |
0.017 |
504.716 |
0.678 |
659.182 |
2 |
506.685 |
0.008 |
0.032 |
0.018 |
0.007 |
0.041 |
507.983 |
0.754 |
703.919 |
3 |
509.868 |
0.014 |
0.055 |
0.020 |
0.005 |
0.066 |
511.358 |
0.791 |
732.467 |
4 |
513.713 |
0.022 |
0.083 |
0.017 |
0.008 |
0.091 |
514.847 |
0.427 |
572.739 |
5 |
516.552 |
0.027 |
0.103 |
0.022 |
0.003 |
0.116 |
518.457 |
0.989 |
1502.032 |
6 |
519.474 |
0.033 |
0.123 |
0.027 |
−0.002 |
0.140 |
522.196 |
0.956 |
1071.113 |
7 |
524.641 |
0.043 |
0.156 |
0.019 |
0.006 |
0.165 |
526.073 |
0.356 |
556.651 |
8 |
529.753 |
0.053 |
0.188 |
0.012 |
0.013 |
0.190 |
530.097 |
0.651 |
646.362 |
9 |
532.239 |
0.057 |
0.203 |
0.022 |
0.003 |
0.215 |
534.278 |
0.592 |
622.201 |
10 |
540.797 |
0.073 |
0.252 |
−0.002 |
0.027 |
0.240 |
538.628 |
0.275 |
540.797 |
11 |
540.979 |
0.074 |
0.253 |
0.022 |
0.003 |
0.264 |
543.159 |
0.885 |
847.360 |
12 |
550.305 |
0.091 |
0.302 |
−0.002 |
0.027 |
0.289 |
547.886 |
0.921 |
928.627 |
13 |
551.504 |
0.093 |
0.308 |
0.017 |
0.008 |
0.314 |
552.824 |
0.524 |
599.242 |
14 |
556.651 |
0.102 |
0.333 |
0.017 |
0.008 |
0.339 |
557.991 |
0.145 |
519.474 |
15 |
560.951 |
0.110 |
0.353 |
0.022 |
0.003 |
0.364 |
563.406 |
0.726 |
685.652 |
16 |
569.140 |
0.124 |
0.389 |
0.011 |
0.014 |
0.388 |
569.093 |
0.053 |
506.685 |
17 |
571.286 |
0.128 |
0.398 |
0.027 |
−0.002 |
0.413 |
575.075 |
0.325 |
550.305 |
18 |
572.739 |
0.131 |
0.404 |
0.046 |
−0.021 |
0.438 |
581.384 |
0.421 |
571.286 |
19 |
588.961 |
0.159 |
0.466 |
0.009 |
0.016 |
0.463 |
588.050 |
0.578 |
617.102 |
20 |
598.325 |
0.174 |
0.498 |
0.002 |
0.023 |
0.488 |
595.114 |
0.489 |
588.961 |
21 |
599.242 |
0.176 |
0.501 |
0.024 |
0.001 |
0.512 |
602.620 |
0.021 |
502.595 |
22 |
609.166 |
0.192 |
0.533 |
0.017 |
0.008 |
0.537 |
610.619 |
0.924 |
937.437 |
23 |
617.102 |
0.205 |
0.556 |
0.019 |
0.006 |
0.562 |
619.175 |
0.331 |
551.504 |
24 |
622.201 |
0.213 |
0.570 |
0.030 |
−0.005 |
0.587 |
628.359 |
0.622 |
633.900 |
25 |
633.900 |
0.232 |
0.601 |
0.024 |
0.001 |
0.612 |
638.261 |
0.723 |
683.834 |
26 |
646.362 |
0.252 |
0.630 |
0.020 |
0.005 |
0.636 |
648.989 |
0.211 |
529.753 |
27 |
659.182 |
0.271 |
0.658 |
0.017 |
0.008 |
0.661 |
660.674 |
0.412 |
569.140 |
28 |
683.834 |
0.308 |
0.704 |
−0.004 |
0.029 |
0.686 |
673.482 |
0.077 |
509.868 |
29 |
685.652 |
0.311 |
0.707 |
0.018 |
0.007 |
0.711 |
687.625 |
0.276 |
540.979 |
30 |
703.919 |
0.337 |
0.736 |
0.014 |
0.011 |
0.736 |
703.373 |
0.521 |
598.325 |
31 |
721.765 |
0.362 |
0.761 |
0.014 |
0.011 |
0.760 |
721.090 |
0.822 |
761.718 |
32 |
732.467 |
0.377 |
0.775 |
0.025 |
0.000 |
0.785 |
741.265 |
0.125 |
516.552 |
33 |
761.718 |
0.416 |
0.807 |
0.018 |
0.007 |
0.810 |
764.590 |
0.971 |
1185.755 |
34 |
804.876 |
0.471 |
0.845 |
0.005 |
0.020 |
0.835 |
792.076 |
0.858 |
804.876 |
35 |
847.360 |
0.522 |
0.873 |
0.002 |
0.023 |
0.860 |
825.283 |
0.105 |
513.713 |
36 |
928.627 |
0.614 |
0.912 |
−0.012 |
0.037 |
0.884 |
866.783 |
0.376 |
560.951 |
37 |
937.437 |
0.623 |
0.915 |
0.010 |
0.015 |
0.909 |
921.249 |
0.555 |
609.166 |
38 |
1071.113 |
0.757 |
0.950 |
0.000 |
0.025 |
0.934 |
998.449 |
0.179 |
524.641 |
39 |
1185.755 |
0.858 |
0.967 |
0.008 |
0.017 |
0.959 |
1124.364 |
0.778 |
721.765 |
40 |
1502.032 |
1.095 |
0.987 |
0.013 |
0.012 |
0.983 |
1417.313 |
Σ |
|
|
|
10.108 |
|
|
|
|
|
max |
|
|
|
|
|
0.046 |
0.037 |
|
|
Note. Generation of X: ui = sample data in its random order (i = 1, 2, …, 40) drawn from a standard uniform distribution U [0, 1] and xi = 500(1 - ui)-1/4.1 = data transformed to follow a type I Pareto distribution (with scale parameter xm = 500 and shape parameter α = 4.1) via inverse transform sampling. Logarithmic transformation of X and Kolmogorov- Smirnov test: (i) = the order of data xi within the 40-item sample, x(i) = data xi sorted in ascending order (empirical quantiles), y(i) = ln(x(i)/502.595) = the logarithmic transformation of xi normalized by the sample minimum, FY(y(i)) = the cumulative distribution function of the Y variable following an exponential distribution with rate parameter λ = 3.957, D+ = (i)/40 - FY(y(i)) = the difference between the empirical and theoretical cumulative distribution functions and D- = FY(y(i)) - ((i)-1)/40 = the difference between the theoretical distribution function and empirical cumulative distribution function with lap of 1. Quantile-Quantile plot: p(i) = ((i)-1/3)/(n+1/3) = the order of the theoretical quantile and xt(i) = QX[p(i)] = theoretical quantiles calculated using the quantile function of a type I Pareto distribution with estimated parameters. Σ = sum per column, max = maximum value per column.
Next, the two parameters are estimated pointwise using maximum likelihood estimation (Equation 21 for thse estimate of xm and Equation 22 for the estimate of α), and their corresponding standard errors (Equation 29 for
and Equation 31 for
) and 95% asymptotic confidence intervals (Equation 30 for
and Equation 32 for
) are computed.
Estimation of the scale parameter (xm):
Estimation of the shape parameter (α):
The bias-corrected formula is also used to estimate the shape parameter α. (Equation 33). Initially, the point estimate of α appears less accurate than the unbiased one; however, the confidence interval is more efficient and appropriate. The standard error is calculated using Equation 35, and the asymptotic confidence interval is determined using Equation 36.
The following is the R script for point estimates and 95% asymptotic confidence intervals for the two parameters of the type I Pareto distribution.
# Vector of scores.
x <- c(532.2385845, 659.1817235, 703.9185616, 732.4668655, 572.7389011, 1502.031724, 1071.11279, 556.6512741, 646.3622704, 622.2013372, 540.796541, 847.3602594, 928.6272967, 599.2422898, 519.4737837, 685.6524203, 506.685297, 550.3047107, 571.2856056, 617.1023659, 588.9614346, 502.594959, 937.4374491, 551.5044233, 633.9000483, 683.8337772, 529.7526944, 569.1404291, 509.8675755, 540.9786305, 598.3247283, 761.718308, 516.5523976, 1185.754777, 804.8759389, 513.7129142, 560.9510722, 609.1662801, 524.6406256, 721.7654816)
# Calculation of statistics.
n <- length(x)
x_m_hat <- min(x)
y <- log(x / x_m_hat)
alpha_hat <- n / sum(y)
ase_x_m_hat <- x_m_hat /sqrt(n * alpha_hat)
LL_x_m <- x_m_hat - qnorm(0.975) * ase_x_m_hat
UL_x_m <- x_m_hat + qnorm(0.975) * ase_x_m_hat
ase_alpha_hat <- alpha_hat / sqrt(n)
LL_alpha_hat <- alpha_hat - qnorm(0.975) * ase_alpha_hat
UL_alpha_hat <- alpha_hat + qnorm(0.975) * ase_alpha_hat
alpha_c <- (n - 2) / sum(y)
ase_alpha_c <- alpha_c / sqrt(n - 2)
LL_alpha_c <- alpha_c - qnorm(0.975) * ase_alpha_c
UL_alpha_c <- alpha_c + qnorm(0.975) * ase_alpha_c
# Display results (rounding to three decimal places).
cat("sample size: n =", n, "\n")
cat("Estimate of the scale parameter: x_m_hat =", round(x_m_hat, 3), "\n")
cat("Asymptotic standard error for x_m_hat: ase(x_m_hat) =", round(ase_x_m_hat, 3), "\n")
cat("95% asymptotic confidence interval for x_m: 95% CI", "[", round(LL_x_m, 3), ",", round(UL_x_m, 3), "]", "\n")
cat("Estimate of shape parameter: alpha_hat =", round(alpha_hat, 3), "\n")
cat("Asymptotic standard error for alpha_hat: ase(alpha_hat) =", round(ase_alpha_hat, 3), "\n")
cat("95% asymptotic confidence interval for alpha: 95% CI", "[", round(LL_alpha_hat, 3), ",", round(UL_alpha_hat, 3), "]", "\n")
cat("Biased-corrected estimate of shape parameter: alpha_bc =", round(alpha_c, 3), "\n")
cat("Asymptotic standard error for alpha_bc: ase(alpha_bc) =", round(ase_alpha_c, 3), "\n")
cat("95% asymptotic confidence interval for alpha_bc: 95% CI", "[", round(LL_alpha_c, 3), ",", round(UL_alpha_c, 3), "]", "\n")
The goodness-of-fit test begins with inferential tests. The 40 data points, xi, following a type I Pareto distribution, are transformed using Equation 48 to follow an exponential distribution, yi. The Kolmogorov-Smirnov test statistic is then calculated by applying Equation 50 to the transformed sample. Subsequently, Stephens’ correction is applied (Equation 51). Table 2 presents the data transformation and the calculations to derive the statistics D+ (the maximum difference between the empirical and theoretical cumulative distribution functions), D− (the maximum difference between the theoretical cumulative distribution function and the empirical cumulative distribution function with a lag of 1), and D (the maximum of D+ and D−). After correcting D, it is found to be less than the critical value at a 5% significance level, supporting the null hypothesis of goodness-of-fit. Thus, the transformed sample data, yi, follow an exponential distribution, and the original data, xi, follow a type I Pareto distribution.
Statistical hypothesis
Data transformation
Test statistic
Test statistic with Stephens’ correction
Decision on the null statistical hypothesis of the type I Pareto distribution
The yi data are also used in the application of the Anderson-Darling test. These data are arranged in ascending order. For the calculation of the statistic A2 (Equation 52), the first value, which is 0, must be excluded because it corresponds to a cumulative probability of 0, and the logarithm of this probability is undefined. To compute the cumulative probabilities, the rate parameter λ is required, and it is estimated using its maximum likelihood estimator, which is the reciprocal of the sample mean of the remaining 39 data points (Table 3). The corrected A2 statistic, calculated using Equation 53, is less than the critical value at a 5% significance level (Stephens, 1986). Consequently, the null hypothesis holds, indicating that the transformed data yi = ln(xi/502.595) follow an exponential distribution and, therefore, the original data
follow a type I Pareto distribution.
Rate parameter estimate
Expected cumulative probability following an exponential distribution with a rate parameter of 3.858.
Anderson-Darling test statistic
Anderson-Darling test statistic with Stephens’ correction
Table 3. Calculations for deriving the Anderson-Darling A2 statistic from the yi values.
i |
y(i) |
(2i − 1)/39 |
FY (y(i)) |
Ln [FY (y(i))] |
y(n + 1 − i) |
FY[y(n + 1 − i)] |
Ln [1 − FY(y(n + 1 − i))] |
Ai |
1 |
0 |
0 |
0 |
|
|
|
|
|
2 |
0.008 |
0.026 |
0.031 |
−3.481 |
1.095 |
0.985 |
−4.224 |
−0.198 |
3 |
0.014 |
0.077 |
0.054 |
−2.920 |
0.858 |
0.964 |
−3.312 |
−0.479 |
4 |
0.022 |
0.128 |
0.081 |
−2.514 |
0.757 |
0.946 |
−2.919 |
−0.697 |
5 |
0.027 |
0.179 |
0.100 |
−2.300 |
0.623 |
0.910 |
−2.405 |
−0.844 |
6 |
0.033 |
0.231 |
0.120 |
−2.123 |
0.614 |
0.906 |
−2.369 |
−1.037 |
7 |
0.043 |
0.282 |
0.153 |
−1.880 |
0.522 |
0.867 |
−2.015 |
−1.099 |
8 |
0.053 |
0.333 |
0.184 |
−1.694 |
0.471 |
0.837 |
−1.817 |
−1.170 |
9 |
0.057 |
0.385 |
0.198 |
−1.618 |
0.416 |
0.799 |
−1.604 |
−1.239 |
10 |
0.073 |
0.436 |
0.246 |
−1.402 |
0.377 |
0.766 |
−1.453 |
−1.244 |
11 |
0.074 |
0.487 |
0.247 |
−1.398 |
0.362 |
0.753 |
−1.396 |
−1.361 |
12 |
0.091 |
0.538 |
0.295 |
−1.220 |
0.337 |
0.727 |
−1.300 |
−1.357 |
13 |
0.093 |
0.590 |
0.301 |
−1.200 |
0.311 |
0.698 |
−1.198 |
−1.415 |
14 |
0.102 |
0.641 |
0.326 |
−1.122 |
0.308 |
0.695 |
−1.188 |
−1.481 |
15 |
0.110 |
0.692 |
0.345 |
−1.063 |
0.271 |
0.649 |
−1.046 |
−1.460 |
16 |
0.124 |
0.744 |
0.381 |
−0.965 |
0.252 |
0.621 |
−0.971 |
−1.439 |
17 |
0.128 |
0.795 |
0.390 |
−0.942 |
0.232 |
0.592 |
−0.896 |
−1.460 |
18 |
0.131 |
0.846 |
0.396 |
−0.927 |
0.213 |
0.561 |
−0.824 |
−1.481 |
19 |
0.159 |
0.897 |
0.458 |
−0.782 |
0.205 |
0.547 |
−0.792 |
−1.412 |
20 |
0.174 |
0.949 |
0.490 |
−0.714 |
0.192 |
0.524 |
−0.742 |
−1.381 |
21 |
0.176 |
1.000 |
0.493 |
−0.708 |
0.176 |
0.493 |
−0.679 |
−1.387 |
22 |
0.192 |
1.051 |
0.524 |
−0.647 |
0.174 |
0.490 |
−0.673 |
−1.387 |
23 |
0.205 |
1.103 |
0.547 |
−0.603 |
0.159 |
0.458 |
−0.612 |
−1.340 |
24 |
0.213 |
1.154 |
0.561 |
−0.578 |
0.131 |
0.396 |
−0.504 |
−1.248 |
25 |
0.232 |
1.205 |
0.592 |
−0.525 |
0.128 |
0.390 |
−0.494 |
−1.228 |
26 |
0.252 |
1.256 |
0.621 |
−0.476 |
0.124 |
0.381 |
−0.480 |
−1.201 |
27 |
0.271 |
1.308 |
0.649 |
−0.433 |
0.110 |
0.345 |
−0.424 |
−1.120 |
28 |
0.308 |
1.359 |
0.695 |
−0.364 |
0.102 |
0.326 |
−0.394 |
−1.030 |
29 |
0.311 |
1.410 |
0.698 |
−0.359 |
0.093 |
0.301 |
−0.358 |
−1.012 |
30 |
0.337 |
1.462 |
0.727 |
−0.318 |
0.091 |
0.295 |
−0.350 |
−0.977 |
31 |
0.362 |
1.513 |
0.753 |
−0.284 |
0.074 |
0.247 |
−0.284 |
−0.860 |
32 |
0.377 |
1.564 |
0.766 |
−0.266 |
0.073 |
0.246 |
−0.283 |
−0.859 |
33 |
0.416 |
1.615 |
0.799 |
−0.224 |
0.057 |
0.198 |
−0.221 |
−0.720 |
34 |
0.471 |
1.667 |
0.837 |
−0.177 |
0.053 |
0.184 |
−0.203 |
−0.634 |
35 |
0.522 |
1.718 |
0.867 |
−0.143 |
0.043 |
0.153 |
−0.166 |
−0.530 |
36 |
0.614 |
1.769 |
0.906 |
−0.098 |
0.033 |
0.120 |
−0.127 |
−0.399 |
37 |
0.623 |
1.821 |
0.910 |
−0.095 |
0.027 |
0.100 |
−0.106 |
−0.365 |
38 |
0.757 |
1.872 |
0.946 |
−0.055 |
0.022 |
0.081 |
−0.084 |
−0.262 |
39 |
0.858 |
1.923 |
0.964 |
−0.037 |
0.014 |
0.054 |
−0.055 |
−0.178 |
40 |
1.095 |
1.974 |
0.985 |
−0.015 |
0.008 |
0.031 |
−0.031 |
−0.091 |
Σ |
10.108 |
|
|
|
|
|
|
|
Note. i = order or rank of the data y = ln[xi/min(x)] where the data are ordered in ascending order, y(i) = y-value at order i, (2i − 1)/39 = first factor of Ai, FY(y(i)) = cumulative probability for the y(i) value, assuming an exponential distribution with a rate parameter λ = 3.858, ln[FY(y(i))] = natural logarithm of the cumulative probability for the value y(i), y(n + 1 − i) = y-value at order n + 1 − i, FY[y(n + 1 − i)] = cumulative probability for the value y(n+1-i), following the same probability distribution; ln[1 − FY(y(n + 1 − i))] = natural logarithm of the complement of the cumulative probability for the value y(n+1-i), Ai = ((2i − 1)/39) × (ln[FY(y(i))]+ln[1-FY(y(n + 1 − i))]), Σ = sum by column.
The Anderson-Darling test, modified by Sinclair et al. (1990) for distributions with positive skewness, is applied to the original data sorted in ascending order x(i). Refer to Equation 54. The cumulative distribution function of the type I Pareto distribution, FX(x(i)) = 1 − (x(i)/502.595)3.957, is used for these calculations; however, the probabilities are identical to those given by the cumulative distribution function of an exponential distribution with rate parameter λ = 3.858. The test statistic
is less than the critical value for a sample size of 40 and a significance level of 5%, so the null hypothesis of fit to a type I Pareto distribution is not rejected. The detailed calculations are provided below, with part of the results summarized in Table 4.
Test statistic
Critical value:
Table 4. Calculations for deriving the test statistic of the Anderson-Darling test modified by Sinclair et al. (1990).
i |
x(i) |
FX(x(i)) |
2 − [(2i − 1)/40] |
ln[1 − FX(x(i))] |
AUi |
1 |
502.595 |
0 |
1.975 |
0 |
0 |
2 |
506.685 |
0.032 |
1.925 |
−0.032 |
−0.062 |
3 |
509.868 |
0.055 |
1.875 |
−0.057 |
−0.107 |
4 |
513.713 |
0.083 |
1.825 |
−0.087 |
−0.158 |
5 |
516.552 |
0.103 |
1.775 |
−0.108 |
−0.192 |
6 |
519.474 |
0.123 |
1.725 |
−0.131 |
−0.225 |
7 |
524.641 |
0.156 |
1.675 |
−0.170 |
−0.285 |
8 |
529.753 |
0.188 |
1.625 |
−0.208 |
−0.338 |
9 |
532.239 |
0.203 |
1.575 |
−0.227 |
−0.357 |
10 |
540.797 |
0.252 |
1.525 |
−0.290 |
−0.442 |
11 |
540.979 |
0.253 |
1.475 |
−0.291 |
−0.430 |
12 |
550.305 |
0.302 |
1.425 |
−0.359 |
−0.511 |
13 |
551.504 |
0.308 |
1.375 |
−0.367 |
−0.505 |
14 |
556.651 |
0.333 |
1.325 |
−0.404 |
−0.536 |
15 |
560.951 |
0.353 |
1.275 |
−0.435 |
−0.554 |
16 |
569.140 |
0.389 |
1.225 |
−0.492 |
−0.603 |
17 |
571.286 |
0.398 |
1.175 |
−0.507 |
−0.596 |
18 |
572.739 |
0.404 |
1.125 |
−0.517 |
−0.582 |
19 |
588.961 |
0.466 |
1.075 |
−0.628 |
−0.675 |
20 |
598.325 |
0.498 |
1.025 |
−0.690 |
−0.707 |
21 |
599.242 |
0.501 |
0.975 |
−0.696 |
−0.679 |
22 |
609.166 |
0.533 |
0.925 |
−0.761 |
−0.704 |
23 |
617.102 |
0.556 |
0.875 |
−0.812 |
−0.711 |
24 |
622.201 |
0.570 |
0.825 |
−0.845 |
−0.697 |
25 |
633.900 |
0.601 |
0.775 |
−0.919 |
−0.712 |
26 |
646.362 |
0.630 |
0.725 |
−0.996 |
−0.722 |
27 |
659.182 |
0.658 |
0.675 |
−1.073 |
−0.724 |
28 |
683.834 |
0.704 |
0.625 |
−1.219 |
−0.762 |
29 |
685.652 |
0.707 |
0.575 |
−1.229 |
−0.707 |
30 |
703.919 |
0.736 |
0.525 |
−1.333 |
−0.700 |
31 |
721.765 |
0.761 |
0.475 |
−1.432 |
−0.680 |
32 |
732.467 |
0.775 |
0.425 |
−1.490 |
−0.633 |
33 |
761.718 |
0.807 |
0.375 |
−1.645 |
−0.617 |
34 |
804.876 |
0.845 |
0.325 |
−1.863 |
−0.606 |
35 |
847.360 |
0.873 |
0.275 |
−2.067 |
−0.568 |
36 |
928.627 |
0.912 |
0.225 |
−2.429 |
−0.547 |
37 |
937.437 |
0.915 |
0.175 |
−2.467 |
−0.432 |
38 |
1071.113 |
0.950 |
0.125 |
−2.994 |
−0.374 |
39 |
1185.755 |
0.967 |
0.075 |
−3.397 |
−0.255 |
40 |
1502.032 |
0.987 |
0.025 |
−4.332 |
−0.108 |
Σ |
|
19.885 |
|
|
−19.802 |
Note. i = order or rank of the sample data of X, where the 40 data points are sorted in ascending order, x(i) = sample data of X at order or rank i, FX(x(i)) = cumulative probability of x(i) under a Pareto distribution(xm = 502.595, α = 3.957), 2 − [(2i − 1)/40] = first factor of AUi, ln[1 − FX(x(i))] = natural logarithm of the probability to the right tail of x(i) or second factor of AUi, AUi = (2 − [(2i − 1)/40]) × ln[1 − FX(x(i))], Σ = sum per column.
To obtain the quantile-quantile plot, the data xi (i = 1, 2, …, 40) are sorted in ascending order and each value is assigned its corresponding order (i), representing the empirical quantiles: x(i). Using the orders (i), the theoretical quantiles are computed based on the median of the sampling distribution of the i-th order statistic from a standard uniform continuous distribution: Mdn = (α − 1/3)/(α + β − 2/3). This distribution corresponds to a beta distribution with parameters α = (i) and β = n + 1 – (i). Consequently, the theoretical quantile’s cumulative probability (order of quantile) is given by p(i) = ((i) + 1/3)/(n + 1/3). The theoretical quantiles are then obtained by evaluating the quantile function of a type I Pareto distribution at p(i). The values for its two parameters are taken from the estimates previously calculated using the maximum likelihood estimators (Table 2).
First coordinate pair of the theoretical quantile (on the abscissa axis) and the empirical quantile (on the ordinate axis) plot.
Figure 3 illustrates the coordinate pairs of theoretical quantiles xt(i) and empirical quantiles x(i). The theoretical quantiles are plotted on the horizontal (abscissa) axis as they predict the empirical quantiles, which are positioned on the vertical (ordinate) axis. In the plot, the coordinate points align closely along a 45-degree line, indicating that the empirical data closely adhere to the theoretical model. This alignment can be quantified using the linear correlation coefficient; the square of this coefficient represents the proportion of shared variance. In this example, the linear correlation is 0.999, corresponding to a shared variance of 99.7%. Furthermore, the histogram with an overlaid density curve (Figure 4) exhibits the characteristic inverted J-shaped profile of a sample drawn from a type I Pareto distribution, supporting the conclusion that the sample data follow a type I Pareto distribution pattern.
Figure 3. Quabtile-Quantile plot.
Figure 4. Histogram with an overlaid density curve.
The R script for testing the fit of sample data to a type I Pareto distribution using the Kolmogorov-Smirnov and Anderson-Darling tests, and for displaying the q-q plot and histogram with an overlaid density curve, is shown.
# Sample data.
x <- c(532.2385845, 659.1817235, 703.9185616, 732.4668655, 572.7389011, 1502.031724, 1071.11279, 556.6512741, 646.3622704, 622.2013372, 540.796541, 847.3602594, 928.6272967, 599.2422898, 519.4737837, 685.6524203, 506.685297, 550.3047107, 571.2856056, 617.1023659, 588.9614346, 502.594959, 937.4374491, 551.5044233, 633.9000483, 683.8337772, 529.7526944, 569.1404291, 509.8675755, 540.9786305, 598.3247283, 761.718308, 516.5523976, 1185.754777, 804.8759389, 513.7129142, 560.9510722, 609.1662801, 524.6406256, 721.7654816)
# Load library
Library (EnvStats)
# Kolmogorov-Smirnov test.
x_sorted <- sort(x)
n <- length(x_sorted) # Sample size
i <- 1:n
F_empirical <- i / n
x_m <- min(x_sorted)
y <- log(x_sorted / x_m)
lambda_hat <- n / sum(y)
F_theoretical <- 1 - exp(-lambda_hat * y)
D_plus <- max(F_empirical - F_theoretical)
F_empirical_prev <- c(0, F_empirical[-n]) # Add 0 at the beginning for F_n(x_(i-1))
D_minus <- max(F_theoretical - F_empirical_prev)
D <- max(D_plus, D_minus)
# Stephens’ correction for D (D_c).
Dc <- (D - 0.2 / n) * (sqrt(n) + 0.26 + 0.5 / sqrt(n))
alpha <- 0.05 # Significance level
D_alpha <- 1.094 # Critical value for α = 0.05 (Stephens, 1974)
if (Dc <= D_alpha) {decision <- "H_0 holds: The data follow the theoretical distribution."
} else {decision <- "H_0 is rejected: The data do not follow the theoretical distribution."}
# Display results (rounding to three decimal places).
cat("D⁺:", round(D_plus, 3), "\n")
cat("D⁻:", round(D_minus, 3), "\n")
cat("D:", round(D, 3), "\n")
cat("D_c:", round(Dc, 3), "\n")
cat("Significance level: α =", alpha, "\n")
cat("Decision:", decision, "\n")
# Anderson-Darling test.
y_prime <- y[-1]
n_prime <- length(y_prime)
i_prime <- 1: n_prime
lambda_hat_prime <- n_prime / sum(y_prime)
FY_i <- 1 - exp(-lambda_hat_prime * y_prime)
log_FY_i <- log(FY_i)
FY_complement_log <- log(1 - FY_i[n_prime:i_prime])
A <- -n_prime - sum((2 * i_prime - 1) / n * (log_FY_i + FY_complement_log))
# Apply Stephens’ correction.
Ac <- A * (1 + 0.6 / n_prime)
alpha <- 0.05 # Significance level
A_alpha <- 1.321 # Critical value for α = 0.05 (Stephens, 1974)
if (Ac <= A_alpha) {
decision <- "H_0 holds: The data follow the theoretical distribution."
} else {decision <- "H_0 is rejected: The data do not follow the theoretical distribution."}
# Display results.
cat("A:", A, "\n")
cat("Ac:", Ac, "\n")
cat("Significance level: α =", alpha, "\n")
cat("Decision:", decision, "\n")
# Anderson-Darling test modified by Sinclair et al. (1990).
alpha_hat <- lambda_hat
FX <- 1 - (x_m / x_sorted)^alpha_hat
sum(FX)
sum((2 - (2 * i - 1) / n) * log(1 - FX))
AU <- n / 2 - 2 * sum(FX) - sum((2 - (2 * i - 1) / n) * log(1 - FX))
p <- 0.05 # Significance level
t <- log(p / (1 - p))
u <- 1 / (1 + 0.3 / sqrt(n))
G_p <- 0.1170 - 0.03791 * t + 0.06318 * u + 0.09878 * t * u + 0.009184 * t^2 * u - 0.00009742 * t^4 * u
AU_np <- 1 - 1 / (1 + exp(G_p))
if (AU <= AU_np) {
decision <- "H_0 holds: The data follow the theoretical distribution."
} else {decision <- "H_0 is rejected: The data do not follow the theoretical distribution."}
# Display results (rounding to three decimal places).
cat("Estimate of scale parameter: x_m =", round(x_m, 3), "\n")
cat("Estimate of shape parameter: α = :", round(alpha_hat, 3), "\n")
cat("Test statistic: AU =:", round(AU, 3), "\n")
cat("Significance level: p =:", p, "\n")
cat("Critical value (AU_n,p):", round(AU_np, 3), "\n")
cat("Decision:", decision, "\n")
# Q-Q plot.
u <- (i - 1/3) / (n + 1/3)
q <- eqpareto(x, p = u, method = "mle", plot.pos.con = 1/3)
plot(q$quantiles, x_sorted, pch = 1, xlab = " Theoretical quantiles ", ylab = "Empirical quantiles")
abline(0, 1, col = "darkblue", lwd = 2)
# Histogram with an overlaid density curve.
hist(x, breaks = "fd", col = "lightyellow", border = "black", freq = FALSE, xlab = "Values of X", ylab = "Density", main = "")
lines(density(x), col="darkblue", lwd=4)
R provides a command to generate random samples that follow a Type I Pareto distribution. The EnvStats package must be loaded, and a seed can be set to ensure the results are stable (e.g., 123). It is only necessary to specify the sample size (n), the value of the scale parameter (xm) in “location” (since in the Type I Pareto distribution, the scale and location parameters coincide), and the value of the shape parameter (α) in “shape”.
Library (EnvStats)
set.seed (123)
rpareto (n = 40, location = 500, shape = log (5)/log(4))
9. Strengths and Weaknesses of Type I Pareto Distribution
The Pareto distribution is a valuable tool in social research for analyzing extreme inequality, but its simplicity and specific focus make it less versatile than other models for more balanced or nuanced distributions (Charpentier & Flachaire, 2022). The following are some of its most distinctive features:
Flexibility with inequality measures: The Pareto model is directly linked to measures of inequality, such as the Gini coefficient. It can succinctly model disparities, making it a preferred choice over less interpretable models like the exponential distribution.
Simplicity in application: With only two parameters, the type I Pareto distribution is relatively simple compared to multi-parameter models such as generalized Pareto or Weibull distributions. This simplicity can be advantageous for interpretability but might limit its ability to model more complex phenomena.
Fit for empirical data: While the type I Pareto distribution fits well for data with pronounced inequality, it might not perform as effectively for datasets with moderate or low inequality. In such cases, models like the log-normal or gamma distributions might provide better fits.
Possible combination with other distributions: The type I Pareto distribution typically focuses on the upper tail of a distribution (e.g., the wealthiest individuals). For comprehensive population modeling, hybrid models (e.g., combining Pareto for the tail and log-normal or beta for the body distribution) may be more accurate (Akinsete, Famoye, & Lee, 2008; Martins, Liska, Beijo, Menezes, & Cirillo, 2020). However, this feature can be seen as a limitation of the Pareto distribution that needs to be combined.
Specifically, the Type I Pareto distribution has several limitations that should be considered when applying it to real world data. First, it is primarily suited to modelling the upper tails of distributions and may not accurately represent the entire data set, especially for moderate or low values. Second, its reliance on a power-law relationship imposes strict assumptions about the proportion of large values, which may oversimplify complex phenomena. In addition, the distribution assumes that the scale parameter (xm) represents a minimum threshold, which may not always match the data structure. The Type I Pareto distribution also struggles to capture multimodal distributions or data sets with varying degrees of inequality across different ranges. Finally, while its simplicity makes it attractive in terms of interpretability, this can limit its flexibility in capturing more nuanced patterns, necessitating the use of hybrid or alternative models in certain contexts.
The type I Pareto distribution fails in contexts where the data are light-tailed, multimodal, highly dynamic or dominated by intermediate values, or where structural factors introduce complexity that cannot be accounted for by a simple power-law relationship. For example, in a society with a large and stable middle class, income distributions often peak around the median, which the Pareto model cannot effectively capture due to its emphasis on extremes.
10. Conclusion
The illustrated presentation of the type I Pareto distribution with scale parameter xm and shape parameter α demonstrates that calculating probabilities is straightforward, as is computing descriptive statistics. As with other distributions, the most effective estimators for its parameters are those derived using the maximum likelihood method (Martín, Parra, Pizarro, & Sanjuán, 2022), which are both simple and computationally efficient. The estimator for xm is the sample minimum, while the estimator for α is the inverse of the mean of the log-transformed sample data (Siudem et al., 2022). Naturally, it is necessary to assess the fit of the empirical data to the probability model by employing both a graphical approach—using a histogram with an overlaid density curve (characterized by an inverted J-shape) and a quantile-quantile plot (showing a 45-degree alignment)—and an inferential approach, involving the Kolmogorov-Smirnov (Chu et al., 2019), Anderson-Darling (Stephens, 1986), or modified Anderson-Darling (Sinclair et al., 1990) tests. In the first two tests, the data are transformed to follow an exponential distribution, whereas the third test does not require such a transformation.
All these calculations can be performed using the scripts developed and presented in this article, which can be adapted to other score vectors. Additionally, Excel is also a valuable tool for this purpose. It should be noted that R is not only a free-access program, but it is currently the most comprehensive among statistical software, and it is gaining popularity among researchers in psychology and social sciences (Navarro, 2024).
Within the social sciences, this distribution serves as an effective probability model for various phenomena, including the distribution of income, rent, accumulated resources within a company, region, or country, and insurance claims (Feng et al., 2020). It is also applicable to the average frequency of behaviors that are rare among most individuals but highly prevalent in a few, such as compulsive behaviors, behavioral or substance addictions, and paraphilias (Rajeev, 2022). Notably, it can be applied to the same continuous variables as the lognormal distribution, such as epidemiological data (Beare & Toda, 2020), necessitating an evaluation to determine which probability model provides the best fit (Charpentier & Flachaire, 2022; Feng et al., 2020). For discrete data defined on a bounded set of natural numbers, the Zipf distribution—a variant of the zeta distribution—is the appropriate choice. From a social perspective, a good fit to a Paretian probability model highlights the presence of well-defined funnel rules and inequality in resource allocation. This, in turn, may inspire policies of segregation targeting specific social sectors deemed responsible for disproportionate expenditure or debt, such as health insurance for the elderly (Rodríguez-Abreu, 2021).
Acknowledgements
The author expresses his gratitude to the reviewers and editors for the suggestions and corrections received for the improvement of the manuscript.