New Measures of Skewness of a Probability Distribution

Symmetry of the underlying probability density plays an important role in statistical inference, since the sampling distribution of the sample mean for a given sample size is more likely to be approximately normal for a symmetric distribution than for an asymmetric one. In this article, two new measures of skewness are proposed and the confidence intervals for true skewness are ob-tained via Monte Carlo simulation experiments. One advantage of the two proposed skewness measures over the standard measures of skewness is that the proposed measures of skewness take values inside the range (−1, +1).


Introduction
Many of the common statistical inference methods rely on the approximate normality of the sample mean via the Central Limit Theorem (CLT) for sufficiently large number of samples (n). A rule of thumb says that the CLT can be used for n > 30 [1] [2] [3] [4]. Singh, Lucas, Dalpatadu, & Murphy [5] showed that this rule of thumb may be inaccurate for highly skewed distributions. Veluchamy [6] developed a graphical approach based on bootstrap for verification of normality of the sample mean.
Skewness plays an important role in statistical analyses in almost all disciplines, and especially in finance. Johnson, Sen and Balyeat [7] applied a skewness adjusted binomial model to futures options pricing and derived the asymptotic skewness model properties. Their results showed that the futures options price, in the presence of skewness, depends not only on mean and standard deviation (sd), but other parameters as well. Kun [8] investigated daily time series of four Shanghai Stock market indices and found inclusion of skewness in models to yield higher investor utility. Chateau [9] investigated the effects of skewness and kurtosis by starting with the Black's normal model for the European put values, replacing the Gaussian distribution by the Gram-Charlier and the Johnson distribution, and showed that both skewness and kurtosis have significant impact on the model results. The effects of skewness on stochastic frontier models are discussed in [10].
Several measures of skewness are available in statistical literature [11], but most of these are based on the sample moments or quantiles, and as such are adversely affected by the presence of a few outliers. Robust skewness measures such as medcouple have been proposed and investigated in the literature [12]; the medcouple measures of skewness are a function of sample quantiles and order statistics. A comparison of skewness and kurtosis measures is provided by [13]; a comparison of the standard t-test and a modified t-test for skewed distributions is available in [14].
Skewness of a probability distribution refers to the departure of the distribution from symmetry. A symmetric distribution has no skewness, a distribution with longer tail on the left is negatively skewed, and a distribution with longer tail on the right is positively skewed [15].
There are mainly three types of skewness measures available in the literature:  The formulas for calculating Fisher-Pearson sample skewness used by popular statistical software packages [16] are shown below; the statistical software environment R [17] can be used to compute all of the three types.
Fisher-Pearson Skewness (Type 1): Adjusted Fisher-Pearson Skewness (Type 2): Pearson Type 2 skewness is a simple measure that is calculated from the sample mean, standard deviation, and the sample median m: Hotelling and Solomon [18]  A computational geometric measure of skewness is also introduced.

Proposed Measure of Skewness
Many introductory statistics text books include a rule of thumb regarding the relative positions of the mean, the median: for a positively skewed distribution, mean > median > mode, and for a negatively skewed distribution, mean < median < mode [19] [20] [21]. It was pointed out by von Hippel [22] that many violations of this rule exist, especially in the case of discrete probability distributions (see Figure 1(b), Figure 1(c)).
Letting f (x) and F (x) denote the population probability density and cumulative distributions functions of the random variable, with mean μ and median Q 2 , the proposed skewness measure is defined as the area under f (x) between μ and median Q 2 ( Figure 2).  Area skewness, the probability that the random variable falls inside the true mean μ and the median Q 2 , can be computed in two steps: Step 1. The probability density is estimated from the sample; in this article, a nonparametric density estimate [23] [24] is used, but a parametric density estimate can also be used.
Step 2: A numerical integration method can then be used to compute the area between the sample mean and sample median; the trapezoid rule is used in this article for computing area skewness.  Figure 2 (bottom graph) is generated from the log-normal (LN) distribution which is defined as: Y is LN with parameters μ and σ if log(Y) is normally distributed with mean μ and standard deviation σ; here the log function is the natural log, i.e., the base is e. The LN (μ, σ) distribution has population mean, standard deviation, and skewness given by [25]:

Monte Carlo Simulation for Comparison of Skewness Measures
Three probability distributions with varying degrees of skewness are used in simulation in this study: N (μ, σ)-normal distribution with mean μ and standard deviation σ. GAM (α, β)-gamma distribution with shape = α and scale = β, skewness = 2 α .
Tr (a, b, c)-Triangular distribution with parameters a, b, c [26] [27] with probability density and cumulative distribution given by The skewness of the triangular distribution Tr (a, b, c) is given by  Table 1 shows the specific distributions and their skewness values used in this simulation, and Figure 3 shows plots of the two triangular distributions used in the simulations.
The simulation experiment used in this study is carried out in the following steps: 1) A random sample of size n is generated from the selected probability distribution.
2) Each of the five skewness coefficients (proposed area skewness, Pearson  Steps (1) and (2) are repeated 10,000 times and the 90%, 95%, and 99% confidence intervals for true skewness are calculated from the 10,000 skewness values.

A Computational Geometric Measure of Skewness
The probability density function estimated from the data can be modeled by a simple polygon P as shown in Figure 24 (thin solid line). Let l m be the vertical line segment at the sample mean (thick vertical line). Let Ch 1 and Ch 2 denote polygonal chains to the right and left of l m . By taking l m as a mirror we can consider the reflected images of Ch 1 and Ch 2 denoted by I 1 and I 2 , respectively. I 1 and I 2 are drawn as dashed lines in Figure 24. Chains I 1 and I 2 form a simple polygon P*, which we call image polygon. The overlay of P and P* results in two types of areas: (i) Overlap Area O A , and (ii) Spilled Area S A . In the figure spilled area components are labeled as A, B, C, and D. For a symmetric distribution, spilled area will be small. If the distribution is asymmetric then the portion of spilled area will be large. This motivates us to use the proportion of spilled area as a measure of skewness.
An algorithm for computing spilled area can be developed by using the data structures for representing simple polygon from computational geometry. A sketch of the algorithm for computing spilled areas is shown below. Efficient implementation of Step 5 and Step 6 needs techniques from computational geometry. For this, the input polygon is represented in a doubly connected edge list data structure as reported in [28]. By navigating through this data structure, the intersection points corresponding to the overlay of P and P' can be computed in linear time.
Input: A simple polygon P constructed from samples points. Output: Spilled Area S A .
Step 1: Find the mean vertical line segment l m .
Step 2: Find polygonal chains Ch 1 and Ch 2 implied by l m from input polygon P.
Step 3: Determine corresponding image chains I 1 and I 2 .
Step 4: Construct image polygon P* by combining I 1 and I 2 .
Step 5: Compute Overlap Area ( ) Step 6: Compute Union Area ( ) Step 7: Spilled Area S A = U A − O A . Figure 24. Construction of an Image Polygon. The input polygon computed from the first sample is shown in Figure 25, and the overlap area is shown in Figure 26.  For the second simulated example, Figure 27 and Figure 28 show the input polygon and the overlap area, respectively. For sample 2, node count = 40, overlap area: 0.41, polygon area: 2.93, and the geometric measure of skweness = overlap area/polygon area = 0.1387.

Discussion and Results
We have proposed two different skewness measures: area skewness and geometric skewness. The standard skewness measures suffer from one drawback: they do not have known lower and upper bounds. The absolute values of both of the proposed skewness estimates fall in the range (0, 1). We have used Monte Carlo simulations to compute confidence intervals from the area skewness estimate, and we intend to do the same for the geometric skewness estimate in the near future.