1. Introduction
The early development of portfolio risk management focused on the Value-at-Risk (VaR) measure of risk, whose popularity owes a lot to JP Morgan Risk Metrics document (1996) [1] . Subsequently, a set of risk measure coherence axioms was introduced by Artzner et al. (1999) [2] , who showed that VaR did not satisfy these axioms, and that the expected shortfall (ES) measure of the average losses beyond the VaR is a coherent risk measure. See also, Sections 2.3 and 8.1 of McNeil et al. (2015) [3] . Over the years, ES has become an increasingly popular risk measure of choice, in part due to its properties for portfolio optimization established by Rockafellar and Uryasev (2000) [4] , who used the term CVaR in place of ES, and due to its support by Basel regulatory guidelines for use by banks.
There exist two basic variants of ES estimators, nonparametric ES estimators which do not depend on returns distribution assumptions, and parametric ES estimators which depend on an assumed returns distribution. Nonparametric ES estimators are the average of a small fraction of the smallest ordered portfolio returns, and the ES risk measure that gives rise to such an estimator is known to satisfy the risk coherence axioms. The parametric normal distribution ES risk measure is a linear combination of the distribution mean and the distribution standard deviation, and this linear combination fails to be a coherent risk measure because it fails to satisfy the coherence monotonicity axiom, which states that if random return R1 is greater or equal to random return R2, then the risk for R1 is less or equal to the risk for R2. In terms of the ES risk estimator obtained by replacing the distribution mean and standard deviation with sample estimates, this failure manifests itself in the fact that one large return can give rise to an increase in ES risk, which is quite non-intuitive. For a reasonable risk measure, increasing returns should intuitively result in decreasing risk. Fischer (2003) [5] showed that a simple fix to the lack of coherence of parametric normal distribution ES is to replace the standard deviation by the lower standard semi-deviation, “semi-deviation” for short.
The fact that returns are often non-normal led to a literature focus on parametric ES for fat-tailed non-normal distributions, the simplest of which are t-distributions. Furthermore, parametric ES estimators based on maximum-likelihood parameter estimators (MLE’s) have the attractive feature that they achieve the minimum large-sample variance when returns in fact have the assumed non-normal distribution. With this in mind, Martin and Zhang (2019) [6] , henceforth MZ2019 [6] , studied the comparative behavior of both parametric normal and parametric t-distribution ES risk measures and estimators, in terms of the behavior of their influence functions which are extensively treated in the robust statistics literature. It turns out that the influence functions of both parametric normal and parametric t-distribution based on parameter MLE’s are approximately symmetric functions of a return, which reflects the failure of these risk measures to satisfy the coherence monotonicity axiom.
In MZ2019 [6] it is shown that the monotonicity of a risk measure influence function is a sufficient condition to ensure that the risk measure satisfies the coherence monotonicity axiom. Thus, our goal was to modify the parametric t-distribution ES in such a way that its influence function is monotonic. Motivated by the Fischer (2003) [5] result that replacement of the standard deviation by semi-deviation results in a coherent mean semi-deviation risk measure, we will show that replacement of the t-distribution MLE scale estimator by a semi-scale estimator results in ES influence functions that are quite close to being monotonic, the more so for t-distribution tail probabilities that are larger than commonly recommended values.
The remainder of the paper is organized as follows. Section 2 briefly reviews nonparametric and normal distribution expected shortfall, and their influence functions. Section 3 introduces general parametric risk measures and influence functions, and discusses the t-distribution special case. Section 4 introduces a new t-distribution ES Semi-Scale M-estimator, derives its influence function formula, and displays its typical influence function shape. Section 5 describes a practical implementation of the ES Semi-Scale M-estimator. Section 6 derives the asymptotic variance formula for the ES Semi-Scale M-estimator and shows that, as a function of the ES tail probability, its asymptotic standard errors are not very much greater than those of a t-distribution ES maximum likelihood estimator. Section 7 contains concluding comments. The detailed derivation of some mathematical results may be found in the Appendix of an early draft version of this paper, which is available at SSRN (https://ssrn.com/abstract=4605604). An early version of MZ2019 [6] , which includes its Appendices, may be found at SSRN (https://ssrn.com/abstract=2747179).
2. Nonparametric and Normal Distribution Parametric Expected Shortfall and Influence Functions
This section reviews some basic material about nonparametric and normal distribution parametric shortfall, and their influence functions, which is discussed in more detail in MZ2019 [6] .
2.1. Nonparametric and Normal Distribution Parametric Expected Shortfall Formulas
The integral form of expected shortfall with risk as a positive quantity is
(2.1)
where F is an unrestricted returns distribution function and
is the tail probability
quantile functional. A nonparametric estimator of ES is obtained by replacing the unknown distribution function F in the
formula by the empirical distribution function
which has a jump of size
at each of the return values
. For purposes of providing a formula for the ES estimator, we let
be the ordered values of the observed returns, and let
be the smallest integer greater or equal to x. Then the nonparametric estimator
formula is:
(2.2)
In typical risk management applications, the choice of
will be 0.01, 0.025 or 0.05, i.e., 1%, 2.5% or 5%. 0. For such applications, the ordered returns in the above summation are that small fraction of all returns which have the most negative single period returns, thereby resulting in a positive ES estimator. The greater such losses are, the greater will be the associated positive ES estimator.
A parametric ES risk measure is obtained by replacing F with a parametric distribution function
where
is a vector of the distribution parameters. In the case of a normal distribution, the parameters are µ and σ, and straight-forward evaluation of the integral (2.1) results in the normal distribution parametric ES formula
(2.3)
where
is the standard normal density function, and
is the standard normal distribution
quantile1. In this case the parametric ES estimator is:
(2.4)
where
and
are the normal distribution sample mean and sample standard deviation maximum likelihood estimators.
Likewise, for a t-distribution with parameters
, evaluation of the integral ( 2.1 ) results in the parametric t-distribution ES formula:
(2.5)
where
(2.6)
1See for example MZ2019 [6] or Jorion (2007) [7] .
2Zhang (2016) [9] shows that the expression (2.5) is equivalent to the one in McNeil et al. (2015) [3] .
with
the standard t-density with
degrees of freedom, and
the tail probability
quantile for the standard t-distribution with
degrees of freedom2. The parametric t-distribution ES estimator is obtained by replacing the parameters
with their t-distribution maximum-likelihood estimates.
The above nonparametric ES estimator is fundamentally different in character than the above parametric ES estimators, in that the former depends only on a small fraction of the most negative ordered returns, whereas the latter depends on all the returns for their parameter estimates. Furthermore, nonparametric ES is known to be a coherent risk measure, whereas parametric ES is not a coherent risk measure. In particular, parametric ES fails to satisfy the coherence monotonicity axiom. In data-oriented estimator terms, monotonicity means that if any return data value is decreased the risk should always increase or remain constant, and conversely. For the nonparametric risk estimator ( 2.2 ), this condition clearly holds since the decrease of any return among the lowest
percent of the returns results in an increase in risk, and conversely, the decrease or increase of any return in the largest
percent of the returns results in no change in the risk. On the other hand, monotonicity does not hold for the parametric normal distribution ES because while the decrease of a return value below the mean will decrease both terms in Equation (2.3), a decrease of a large return above the mean can decrease the second term while increasing the first term, in such a way that the risk decreases.
2.2. Nonparametric and Parametric Normal Distribution ES Influence Functions
The influence function was introduced by Hampel (1974) [8] as a basic tool for the study of robust parameter estimators3. As we shall see, the shape of a risk measure influence function provides an immediate intuitive understanding of the impact that gains and losses, positive and negative returns respectively, have on a parametric risk estimator. The influence function is defined using a mixture distribution of the form
where
is a point mass probability distribution located at r. The influence function of nonparametric
is defined as the following directional (Gateaux) derivative of the ES functional at
in the direction of the point mass distribution
:
.
As such, it has the intuitive interpretation as the asymptotic form of the difference quotient for an ES estimator evaluated at a fixed set of n returns
and at an augmented set of
returns
with r variable, divided by
.
Using the integral representation of
in equation (2.1) it is shown in MZ2019 [6] that the nonparametric influence function of ES is
(2.7)
where
is the indicator function of the interval
. The above influence function expression can be evaluated for any distribution F, and the left-hand plot in Figure 1 displays the nonparametric ES influence function for the case where
, with
a normal distribution with mean 0.12 and standard deviation 0.24. The resulting influence function has a striking piece-wise linear monotonic decreasing shape.
3See Hampel et al. (1986) [10] for detailed discussions of influence functions, and their applications.
As for the influence function of the parametric normal distribution ES given by Equation (2.3), we first note that this ES functional representation
is the following linear combination of the mean functional
and the standard deviation functional
:
.
Figure 1. Nonparametric and parametric tail probability
ES influence function for a normal distribution with mean 0.12 and standard deviation 0.24.
It follows that the influence function of parametric normal distribution ES is the corresponding linear combination of the influence functions of
and
. It is well-known in the robust statistics literature that influence function of the mean is
and the influence function of the standard deviation is4:
.
Thus, the influence function of parametric normal distribution ES is
(2.8)
and it is displayed in the right-hand plot of Figure 1, with the same mean and standard deviation values as for the nonparametric ES influence function in the left-hand plot.
4See for example Section 3 of Zhang, Martin and Christidis (2021) [11] .
The shape of the parametric ES influence function is symmetric about the value
, and in this regard is strikingly different from the desirable monotonic character of the nonparametric ES influence function, which is either constant, or strictly increasing, for decreasing return , beyond the tail probability
quantile. On the other hand, the parametric normal distribution influence function is quite non-monotonic, with increasing return values much larger than 0.24 indicating increased risk, which is quite unnatural.
3. Parametric Risk Measures and Influence Functions
This section first discusses general parametric risk measures and their general influence function formula, as well as a specific influence function formula for the case where maximum-likelihood parameter estimators are used. Then it presents the MZ2019 [6] influence function formula for parametric t-distribution expected shortfall.
3.1. General Parametric Risk Measures and Influence Functions
Let
be a parametric risk measure defined by a fixed parametric family of univariate distribution functions
where
. It will suffice for our applications in this paper to assume that
is continuous and strictly increasing with quantile function
. One obtains a parametric risk measure estimator by first choosing a parameter estimator
of
and then using it to obtain a risk estimator
, where we have suppressed the subscript n on
for notational convenience. We assume that
is based on independent and identically (i.i.d.) returns
, with a general distribution function that will ultimately be equal to a parametric family
of t-distributions for our parametric ES maximum-likelihood estimator.
An estimator
is assumed to be obtained from a functional
by the plug-in-rule
(3.1)
where
is the empirical distribution function of a set of returns
. Correspondingly, we represent the risk measure functional as
(3.2)
and the risk measure estimator is
. (3.3)
We tacitly assume that the returns are independent and identically distributed (i.i.d.) with a parametric distribution
. Then when the usual Fisher consistency condition
holds, the estimator
will under reasonable conditions converge in probability to the true parametric distribution parameter
. Correspondingly, the risk measure estimator
will converge in probability to
. Note however that, when the returns come from
then
will converge in probability to
, but typically
.
For deriving parametric risk measure influence functions, the distribution function F is replaced by the parametric distribution function
, which results in the parametric mixture distribution
. (3.4)
Then, referring to the risk measure loosely by its estimator
, the influence function of
is defined as
(3.5)
The parameter estimator vector influence function
has the components
(3.6)
With
the gradient of
, where
, the chain rule gives:
(3.7)
The above is a general expression valid for any consistent parameter estimator
having an influence function
. For a given parametric risk measure the above expression can be used for a variety of choices of parameter estimators, and for a given parameter estimator the expression can be used for a variety of parametric risk measures.
In the frequently occurring case where the parameter estimator is an MLE, it was shown in Hampel et al. (1986) [10] that
has the special form
(3.8)
where
is the information matrix, and
is the score function vector5. In this case the IF formula (3.7) has the general form:
(3.9)
3.2. Parametric t-Distribution Expected Shortfall Influence Function
For a parametric t-distribution ES based on maximum-likelihood parameter estimates, the formula (3.9) becomes
(3.10)
5With
the log-likelihood for a single observation, the vector score function
has components
, and the K × K information matrix
has elements
.
where
is the gradient vector of
given by Equations (2.5) and (2.6),
is the Fisher information matrix, and
is the t-distribution score function. The risk measure gradient vector is given by
with the partial derivative approximated by a finite difference quotient. The
score vector is
(3.11)
where the formula of the scale score function
is
And
is the digamma function. The formula for the information matrix
was derived by Lucas (1997) [12] and may be found in MZ2019 [6] .
Note that
depends on r solely through its appearance in the three components
,
,
of the score function vector, and for large r this dependence is symmetric about μ. In this regard, the behavior of
has a quite undesirable non-monotonic approximately symmetric behavior similar to that of the parametric normal distribution influence function in Equation (2.8) and the right-hand plot of Figure 1. This behavior is displayed in the left-hand plot of Figure 2, which is discussed further in the next section in comparison with an alternative parametric t-distribution ES M-estimator with semi-scale, and its influence function.
4. Parametric T-Distribution Semi-Scale Expected Shortfall M-Estimator and Influence Function
In this Section we derive a semi-scale modified version of the parametric t-distribution expected shortfall estimator, whose influence function does not suffer from the approximately symmetric behavior of the parametric t-distribution expected shortfall exhibited in the previous Section. Since the semi-scale parameter estimator is not an MLE, the MLE results of the previous Section no longer apply. However, the following closely related M-estimator, as discussed in Hampel et al. (1986) [10] can be used in place of an MLE.
6The above equation becomes an MLE estimating equation for the special choice
, but here we move beyond the MLE.
A general M-estimator functional
is defined as the solution for
of the vector equation
(4.1)
where
is a vector-valued general “score function”, which depends on a vector parameter
and a scalar return variable r6. An M-estimator
is obtained from a set of returns
by replacing F in (2.11) with the empirical distribution
:
Figure 2. t-Distribution parametric symmetric and semi-scale ES influence function with tail probability
for t-distribution with
degrees of freedom, mean 0.12 and standard deviation 0.24.
. (4.2)
For our application where the distribution of is in a parametric family
, we assume the Fisher consistency condition
(4.3)
In other words, the expected value of the score function at the true parameter value is zero. Correspondingly, an M-estimator defined by ( 4.2 ) converges in probability to a solution of the asymptotic estimating Equation (4.3).
For many estimation problems
may be defined as the derivative
. Of a loss function
. In particular the log-likelihood function is a choice.
It is shown in Hampel et al. (1986) [10] that the influence function of an M-estimator is given by
(4.4)
where the M-matrix
is the p×p matrix
. (4.5)
Combining (3.7) and (4.4) we have the following general expression for the influence function of a parametric risk measure based on a general M-estimator of the unknown parameters:
(4.6)
Note that the gradient vector in the above expression depends only on the risk measure chosen and the M-estimator functional that represents the asymptotic value of the M-estimator. For parametric distributions and consistent estimators, this gradient only depends on the parametric risk measure and the distribution parameters. However, for any given distribution the influence function part of the above expression depends only on the choice of M-score function
.
We seek to modify the t-distribution MLE score function of the previous Section so that the scale score function is a semi-scale score function, and the expected value of the resulting M-score function is still zero. One way to do this is as follows. Define the semi-scale score function by
(4.7)
where the location score function
is the same as for a t-distribution MLE as given by (3.11), and
is the semi-scale score function
(4.8)
where the constant c remains to be determined.
In order to ensure consistency of the M-estimator defined by the above M-score function, the latter must satisfy the zero-expectation condition (4.2) under a t-distribution for returns. This is already the case for
, and we now show this is also the case for
.
In order to see that the expected value of
is zero, note that from the t-distribution MLE score function (3.11) we have
(4.9)
Since the expected value of
is zero the sum of the first two terms on the right is one, and from the symmetry of the t-distribution about μ the expected value of each of these first two terms must be equal, and hence equal to one-half. Thus
Now we just need to choose c so that the M-score
for degrees of freedom has zero expected value. First note that we require that
which means that
. (4.10)
Noting that
(4.11)
along with the symmetry of the t-distribution about μ shows that the expected value of the last two terms must be equal, and thus
(4.12)
Plugging (4.12) into (4.10) gives
7We note that this choice of M-score function is not unique. Another valid choice would be
.
However, it is easily verified that this choice of scale score function is discontinuous at
and it is a basic principle in robust statistics is that discontinuous influence functions are to be avoided.
Thus, we have7
. (4.13)
Now to get the expression for the parameter estimator score function in (4.4), we just need to evaluate the M-matrix
(4.14)
where the expectations are taken with
. Straightforward but tedious calculations (see Appendix A in an early draft version of this paper available at SSRN, https://ssrn.com/abstract=4605604) show that
which has the form
(4.15)
with inverse
(4.16)
The left-hand plot in Figure 2 displays
tail probability parametric t-Distribution ES estimator influence function for with Degrees of Freedom = 10, mean = 0.12 and standard deviation = 0.24. Except for the curious negative bump for small positive values of the Return r, this influence function is essentially symmetric about the μ = 0.12, as we pointed out in the last paragraph of Section 3. Thus, the parametric t-distribution ES influence function suffers from the lack of monotonicity in a manner similar to that of the parametric normal distribution ES estimator influence function in the right-hand plot of Figure 1.
On the other hand, the right-hand plot in Figure 2 displays the parametric t-distribution semi-scale M-estimator influence function, with the same
parameters as for the influence function in the left-hand plot. But now, except for the curious negative bump for small negative values of the Return r, this influence function has the desirable monotonic decreasing character similar to that of the nonparametric ES influence function in the left-hand plot of Figure 1.
Figure 3 gives a good feeling for how the shape of the parametric t-distribution ES M-estimator influence function changes with changes in the t-distribution degrees of freedom and ES tail probability, for the four degrees of freedom
Figure 3. Influence functions of ES semi-scale m-estimators with monthly
, and annual SR = 0.5
parameter
values 20, 10, 6, 3, and the three tail probability parameter
values 1%, 2.5%, 5%. These influence functions all have the desirable shape that there is essentially zero influence for positive values of return r, and positive influence that increases rapidly for decreasing negative return values. As one expects, the positive values of the influence function increase for each fixed negative r as the tail probability
decreases. The behavior of the shape of the influence functions for negative returns close to 0 as the degrees of freedom decrease from 20 to 3 is more subtle, with the shape being slightly convex for
and slightly concave for
. The latter is related to the fact, demonstrated in MZ2019 [6] , that the t-distribution maximum-likelihood ES influence function is logarithmically unbounded.
5. Implementation of the Parametric Semi-Scale ES Estimator
Now we propose to construct an ES semi-scale M-estimator for risk monitoring, with the following straightforward steps.
(1) First compute the t-distribution MLE estimates
, for example using the Azzalini SN package [13] available on CRAN (https://cran.r-project.org/web/packages/sn/index.html).
(2) Compute the semi-scale parameter estimate as follows. Plug the
and
MLE’s from step one into the semi-scale score function
. Then the semi-scale parameter estimate
can be computed by solving the equation:
Note that above summation has the form
and to compute
, one just needs to solve the following equation whose left-hand side is strictly monotonic in s:
.
Any simple search algorithm will suffice, for example using the Newton-Raphson method package rootSolve available on CRAN (https://CRAN.R-project.org/package=rootSolve).
(3) Finally, plug
into the parametric t-distribution ES expression (2.5) to obtain the ES semi-scale M-estimator:
It remains to carry out some empirical studies of the performance of this risk estimator.
6. ES Semi-Scale M-Estimator Asymptotic Variance
The asymptotic variance of a consistent M-estimator
of
has the form
(6.1)
where
(6.2)
See for example Hample et al. (1986) [10] . In the special case of MLE estimators both
and
reduce to the information matrix
and the expression (6.1) reduces to
as expected.
Straightforward but tedious derivations (see Appendix B in an early draft version of this paper available at SSRN, https://ssrn.com/abstract=4605604) give the following expressions:
(6.3)
(6.4)
(6.5)
(6.6)
(6.7)
(6.8)
Since our t-distribution ES Semi-scale M-estimator is a small modification of the t-distribution ES MLE, one expects that the increase in the asymptotic variance of this estimator, relative to a t-distribution MLE, will not be very great. Figure 4 and Figure 5, which are based on standard errors (SE’s) obtained as the square root of the asymptotic variances of the ES semi-scale estimator using 6 degrees of freedom, indicate that indeed the increase in variance is not very great.
Figure 4. Asymptotic standard error of t-distribution ES semi-scale m-estimator and the asymptotic standard error of the ES MLE for a t-distribution with
.
Figure 5. Ratio of standard error of t-distribution ES estimators: semi-scale m-estimator versus maximum likelihood ES estimator for 5 degrees of freedom.
7. Concluding Comments
We have introduced a new ES Semi-Scale M-Estimator by replacing the scale estimator of an ES t-distribution joint MLE of the location, scale, and degrees of freedom, with a semi-scale estimator, and we derived the new estimator’s influence function and asymptotic variance formula. The mathematical form of the new estimator’s estimating equation and influence function show that the estimator avoids the unsatisfactory behavior of the t-distribution ES MLE that (large) positive returns indicate (large) risk.
Since an ES semi-scale M-estimator influence function is not exactly monotonic decreasing as returns increase, we cannot assert that this ES is a coherent risk measure, as is a mean semi-deviation estimator. However, the fact that such influence functions are nearly monotonic decreasing suggests that ES semi-scale M-estimators will have good properties for risk reporting. What is needed now is an in-depth empirical study of the relative performance of the ES semi-scale M-estimators and the mean semi-deviation coherent risk measures, using both simulated and real returns whose distributions range from approximately normal to moderately fat-tailed (e.g., t-distributions with 8 to 12 degrees of freedom), and to very fat-tailed distributions (e.g., t-distributions with 3 to 6 degrees of freedom).