^{1}

^{2}

^{2}

Symmetry of the underlying probability density plays an important role in statistical inference, since the sampling distribution of the sample mean for a given sample size is more likely to be approximately normal for a symmetric distribution than for an asymmetric one. In this article, two new measures of skewness are proposed and the confidence intervals for true skewness are obtained via Monte Carlo simulation experiments. One advantage of the two proposed skewness measures over the standard measures of skewness is that the proposed measures of skewness take values inside the range (-1, +1).

Many of the common statistical inference methods rely on the approximate normality of the sample mean via the Central Limit Theorem (CLT) for sufficiently large number of samples (n). A rule of thumb says that the CLT can be used for n > 30 [

Skewness plays an important role in statistical analyses in almost all disciplines, and especially in finance. Johnson, Sen and Balyeat [

Several measures of skewness are available in statistical literature [

Skewness of a probability distribution refers to the departure of the distribution from symmetry. A symmetric distribution has no skewness, a distribution with longer tail on the left is negatively skewed, and a distribution with longer tail on the right is positively skewed [

There are mainly three types of skewness measures available in the literature: Fisher-Pearson skewness, adjusted Fisher-Pearson skewness, and Pearson Type 2 skewness. Fisher-Pearson skewness measures are functions of the second and third central sample moments:

m k = ∑ i = 1 n ( x i − x ¯ ) k n − 1 , k = 2 , 3 x ¯ = sample mean,and m 2 = sample standard deviation . (1)

The formulas for calculating Fisher-Pearson sample skewness used by popular statistical software packages [

Fisher-Pearson Skewness (Type 1):

g 1 = m 3 m 2 3 / 2 (2)

Adjusted Fisher-Pearson Skewness (Type 2):

G 1 = n ( n − 1 ) ( n − 2 ) g 1 (3)

Pearson Type 2 skewness is a simple measure that is calculated from the sample mean, standard deviation, and the sample median m:

S k 2 = 3 ( x ¯ − m ) s (4)

Hotelling and Solomon [

Many introductory statistics text books include a rule of thumb regarding the relative positions of the mean, the median: for a positively skewed distribution, mean > median > mode, and for a negatively skewed distribution, mean < median < mode [

Letting f (x) and F (x) denote the population probability density and cumulative distributions functions of the random variable, with mean μ and median Q_{2}, the proposed skewness measure is defined as the area under f (x) between μ and median Q_{2} (

Areaskewness = F ( μ ) − F ( Q 2 ) .

Area skewness, the probability that the random variable falls inside the true mean μ and the median Q_{2}, can be computed in two steps:

Step 1. The probability density is estimated from the sample; in this article, a nonparametric density estimate [

Step 2: A numerical integration method can then be used to compute the area between the sample mean and sample median; the trapezoid rule is used in this article for computing area skewness.

Mean = exp ( μ + 0.5 σ 2 ) Median = exp ( μ ) C V = exp ( σ 2 ) − 1 , C V = Coefficient of Variation Skewness = ( C V ) 3 + 3 (CV)

True population mean, median and area skewness for the LN (μ = 5, σ = 1) distribution are:

mean = exp ( 5.5 ) = 244.6919

median = exp ( 5 ) = 148.4132

standardskewness = 6.1849

areaskewness = F ( 244.6919 ) − F ( 148.4132 ) = 0.1915

The sample area skewness value for the generated sample is 0.2047, and the standard skewness estimate is 4.3192.

Three probability distributions with varying degrees of skewness are used in simulation in this study:

N (μ, σ)—normal distribution with mean μ and standard deviation σ.

GAM (α, β)—gamma distribution with shape = α and scale = β, skewness = 2 / α .

Tr (a, b, c)—Triangular distribution with parameters a, b, c [

f ( x ) = { 2 ( x − a ) ( b − a ) ( c − a ) , a ≤ x ≤ c 2 ( b − x ) ( b − a ) ( b − c ) , c < x ≤ b F ( x ) = { ( x − a ) 2 ( b − a ) ( c − a ) , a ≤ x ≤ c 1 − ( b − x ) 2 ( b − a ) ( b − c ) , c < x ≤ b .

The skewness of the triangular distribution Tr (a, b, c) is given by

g 1 = 2 ( a + b − 2 c ) ( 2 a − b − c ) ( a − 2 b + c ) 5 ( a 2 + b 2 + c 2 − a b − a c − b c ) 3 / 2 .

Triangular distribution is selected for this study as it can be used to model both positively skewed and negatively skewed distribution.

The simulation experiment used in this study is carried out in the following steps:

1) A random sample of size n is generated from the selected probability distribution.

2) Each of the five skewness coefficients (proposed area skewness, Pearson

Normal | Gamma | Triangular Distribution Tr (a = 0, b = 1, c) | |||
---|---|---|---|---|---|

N (μ = 100, σ = 20) | GAM (α = 2, β = 1) | c = 0.5 | c = 0.95 | c = 0.05 | |

Standard Skewness | 0 | 1.4142 | 0 | −0.5606 | 0.5606 |

Pearson Skewness | 0 | 0.6823 | 0 | −0.5217 | 0.5217 |

Area Skewness | 0 | 0.0940 | 0 | −0.0564 | 0.0564 |

skewness, and the sample-moments based Types 1-3 skewness coefficients are computed.

Steps (1) and (2) are repeated 10,000 times and the 90%, 95%, and 99% confidence intervals for true skewness are calculated from the 10,000 skewness values.

The simulation experiment was run for n = 25, 50, 75, 100, for each of the three probability models, for each of the two sets of parameter values. The samples sizes chosen represent moderate to a large number of samples, and the true skewness values selected cover a wide range of skewness. Figures 4-23 show the histograms of the 10,000 skewness estimates and the confidence intervals.

The probability density function estimated from the data can be modeled by a simple polygon P as shown in _{m} be the vertical line segment at the sample mean (thick vertical line). Let Ch_{1} and Ch_{2} denote polygonal chains to the right and left of l_{m}. By taking l_{m} as a mirror we can consider the reflected images of Ch_{1} and Ch_{2} denoted by I_{1} and I_{2}, respectively. I_{1} and I_{2} are drawn as dashed lines in _{1} and I_{2} form a simple polygon P*, which we call image polygon. The overlay of P and P* results in two types of areas: (i) Overlap Area O_{A}, and (ii) Spilled Area S_{A}. In the figure spilled area components are labeled as A, B, C, and D. For a symmetric distribution, spilled area will be small. If the distribution is asymmetric then the portion of spilled area will be large. This motivates us to use the proportion of spilled area as a measure of skewness.

An algorithm for computing spilled area can be developed by using the data structures for representing simple polygon from computational geometry. A sketch of the algorithm for computing spilled areas is shown below. Efficient implementation of Step 5 and Step 6 needs techniques from computational geometry. For this, the input polygon is represented in a doubly connected edge list data structure as reported in [

Algorithm 1: Computing Spilled Area.

Input: A simple polygon P constructed from samples points.

Output: Spilled Area S_{A}.

Step 1: Find the mean vertical line segment l_{m}.

Step 2: Find polygonal chains Ch_{1} and Ch_{2} implied by l_{m} from input polygon P.

Step 3: Determine corresponding image chains I_{1} and I_{2}.

Step 4: Construct image polygon P* by combining I_{1} and I_{2}.

Step 5: Compute Overlap Area O A = ∩ ( P , P * ) .

Step 6: Compute Union Area U A = ∪ ( P , P * ) .

Step 7: Spilled Area S_{A} = U_{A} − O_{A}.

We implemented the algorithm in python programming environment. For illustration purposes, two different samples were generated from different normal distributions. The true geometric skewness measure for any normal distribution is 0, since the normal distribution is symmetric. The results for the two samples are presented below.

The input polygon computed from the first sample is shown in

For sample 1, node count = 188, overlap area: 0.46, polygon area: 2.91, and the geometric measure of skweness = overlap area/polygon area = 0.1581.

For the second simulated example,

We have proposed two different skewness measures: area skewness and geometric skewness. The standard skewness measures suffer from one drawback: they do not have known lower and upper bounds. The absolute values of both of the proposed skewness estimates fall in the range (0, 1). We have used Monte Carlo simulations to compute confidence intervals from the area skewness estimate, and we intend to do the same for the geometric skewness estimate in the near future.

The authors declare no conflicts of interest regarding the publication of this paper.

Singh, A.K., Gewali, L.P. and Khatiwada, J. (2019) New Measures of Skewness of a Probability Distribution. Open Journal of Statistics, 9, 601-621. https://doi.org/10.4236/ojs.2019.95039