Expected Shortfall Semi-Scale T-Distribution M-Estimator

R. Douglas Martin; Shengyu Zhang

doi:10.4236/jmf.2023.134029

Journal of Mathematical Finance > Vol.13 No.4, November 2023

Expected Shortfall Semi-Scale T-Distribution M-Estimator

R. Douglas Martin¹, Shengyu Zhang²
¹Department of Applied Mathematics and Statistics, University of Washington, Seattle, USA.
²BECU, Tukwila, Washington, USA.
DOI: 10.4236/jmf.2023.134029 PDF HTML XML 81 Downloads 255 Views

Abstract

The influence function of parametric t-distribution expected shortfall (ES) estimators has an approximately symmetric shape, for which large positive returns indicate large losses. We avoid this risk estimator’s unacceptable feature by introducing an ES semi-scale M-estimator for t-distributions, for which the usual t-distribution scale parameter is replaced by a semi-scale parameter. We derive the influence function of the ES semi-scale M-estimator, and show that its influence function has large values only for large negative returns as one expects, and only very small typically negative values for positive returns. The computation of an ES semi-scale M-estimator is shown to be a simple modification of a parametric t-distribution ES maximum-likelihood estimator (MLE), in which the scale MLE is replaced by a semi-scale estimator. We also derive the asymptotic variance expression for the ES semi-scale M-estimator, and show that its standard error is not very much larger than that of the t-distribution ES maximum-likelihood estimator.

Keywords

Risk, Expected Shortfall, Semi-Scale, MLE, M-Estimator, Influence Function

Share and Cite:

Martin, R. and Zhang, S. (2023) Expected Shortfall Semi-Scale T-Distribution M-Estimator. Journal of Mathematical Finance, 13, 483-500. doi: 10.4236/jmf.2023.134029.

1. Introduction

The early development of portfolio risk management focused on the Value-at-Risk (VaR) measure of risk, whose popularity owes a lot to JP Morgan Risk Metrics document (1996) [1] . Subsequently, a set of risk measure coherence axioms was introduced by Artzner et al. (1999) [2] , who showed that VaR did not satisfy these axioms, and that the expected shortfall (ES) measure of the average losses beyond the VaR is a coherent risk measure. See also, Sections 2.3 and 8.1 of McNeil et al. (2015) [3] . Over the years, ES has become an increasingly popular risk measure of choice, in part due to its properties for portfolio optimization established by Rockafellar and Uryasev (2000) [4] , who used the term CVaR in place of ES, and due to its support by Basel regulatory guidelines for use by banks.

There exist two basic variants of ES estimators, nonparametric ES estimators which do not depend on returns distribution assumptions, and parametric ES estimators which depend on an assumed returns distribution. Nonparametric ES estimators are the average of a small fraction of the smallest ordered portfolio returns, and the ES risk measure that gives rise to such an estimator is known to satisfy the risk coherence axioms. The parametric normal distribution ES risk measure is a linear combination of the distribution mean and the distribution standard deviation, and this linear combination fails to be a coherent risk measure because it fails to satisfy the coherence monotonicity axiom, which states that if random return R₁ is greater or equal to random return R₂, then the risk for R₁ is less or equal to the risk for R₂. In terms of the ES risk estimator obtained by replacing the distribution mean and standard deviation with sample estimates, this failure manifests itself in the fact that one large return can give rise to an increase in ES risk, which is quite non-intuitive. For a reasonable risk measure, increasing returns should intuitively result in decreasing risk. Fischer (2003) [5] showed that a simple fix to the lack of coherence of parametric normal distribution ES is to replace the standard deviation by the lower standard semi-deviation, “semi-deviation” for short.

The fact that returns are often non-normal led to a literature focus on parametric ES for fat-tailed non-normal distributions, the simplest of which are t-distributions. Furthermore, parametric ES estimators based on maximum-likelihood parameter estimators (MLE’s) have the attractive feature that they achieve the minimum large-sample variance when returns in fact have the assumed non-normal distribution. With this in mind, Martin and Zhang (2019) [6] , henceforth MZ2019 [6] , studied the comparative behavior of both parametric normal and parametric t-distribution ES risk measures and estimators, in terms of the behavior of their influence functions which are extensively treated in the robust statistics literature. It turns out that the influence functions of both parametric normal and parametric t-distribution based on parameter MLE’s are approximately symmetric functions of a return, which reflects the failure of these risk measures to satisfy the coherence monotonicity axiom.

In MZ2019 [6] it is shown that the monotonicity of a risk measure influence function is a sufficient condition to ensure that the risk measure satisfies the coherence monotonicity axiom. Thus, our goal was to modify the parametric t-distribution ES in such a way that its influence function is monotonic. Motivated by the Fischer (2003) [5] result that replacement of the standard deviation by semi-deviation results in a coherent mean semi-deviation risk measure, we will show that replacement of the t-distribution MLE scale estimator by a semi-scale estimator results in ES influence functions that are quite close to being monotonic, the more so for t-distribution tail probabilities that are larger than commonly recommended values.

The remainder of the paper is organized as follows. Section 2 briefly reviews nonparametric and normal distribution expected shortfall, and their influence functions. Section 3 introduces general parametric risk measures and influence functions, and discusses the t-distribution special case. Section 4 introduces a new t-distribution ES Semi-Scale M-estimator, derives its influence function formula, and displays its typical influence function shape. Section 5 describes a practical implementation of the ES Semi-Scale M-estimator. Section 6 derives the asymptotic variance formula for the ES Semi-Scale M-estimator and shows that, as a function of the ES tail probability, its asymptotic standard errors are not very much greater than those of a t-distribution ES maximum likelihood estimator. Section 7 contains concluding comments. The detailed derivation of some mathematical results may be found in the Appendix of an early draft version of this paper, which is available at SSRN (https://ssrn.com/abstract=4605604). An early version of MZ2019 [6] , which includes its Appendices, may be found at SSRN (https://ssrn.com/abstract=2747179).

2. Nonparametric and Normal Distribution Parametric Expected Shortfall and Influence Functions

This section reviews some basic material about nonparametric and normal distribution parametric shortfall, and their influence functions, which is discussed in more detail in MZ2019 [6] .

2.1. Nonparametric and Normal Distribution Parametric Expected Shortfall Formulas

The integral form of expected shortfall with risk as a positive quantity is

$E S_{γ} (F) = - \frac{1}{γ} \int_{r \leq q_{γ} (F)} r \cdot d F (r)$ (2.1)

where F is an unrestricted returns distribution function and $q_{γ} (F)$ is the tail probability $γ$ quantile functional. A nonparametric estimator of ES is obtained by replacing the unknown distribution function F in the $E S_{γ} (F)$ formula by the empirical distribution function $F_{n} (r)$ which has a jump of size $n^{- 1}$ at each of the return values $r_{1}, r_{2}, \dots, r_{n}$ . For purposes of providing a formula for the ES estimator, we let $r_{(1)} \leq r_{(2)} \leq \dots \leq r_{(n)}$ be the ordered values of the observed returns, and let $⌈ x ⌉$ be the smallest integer greater or equal to x. Then the nonparametric estimator $\hat{E S_{γ}}$ formula is:

$\hat{E S_{γ}} = - \frac{1}{⌈ n γ ⌉} \sum_{i = 1}^{⌈ n γ ⌉} r_{(i)}$ (2.2)

In typical risk management applications, the choice of $γ$ will be 0.01, 0.025 or 0.05, i.e., 1%, 2.5% or 5%. 0. For such applications, the ordered returns in the above summation are that small fraction of all returns which have the most negative single period returns, thereby resulting in a positive ES estimator. The greater such losses are, the greater will be the associated positive ES estimator.

A parametric ES risk measure is obtained by replacing F with a parametric distribution function $F_{θ}$ where $θ$ is a vector of the distribution parameters. In the case of a normal distribution, the parameters are µ and σ, and straight-forward evaluation of the integral (2.1) results in the normal distribution parametric ES formula

$E S_{γ} (μ, σ) = - μ + \frac{ϕ (z_{γ})}{γ} \cdot σ$ (2.3)

where $ϕ$ is the standard normal density function, and $z_{γ}$ is the standard normal distribution $γ$ quantile1. In this case the parametric ES estimator is:

$E S_{γ} (\hat{μ}, \hat{σ}) = - \hat{μ} + \frac{ϕ (z_{γ})}{γ} \cdot \hat{σ}$ (2.4)

where $\hat{μ}$ and $\hat{σ}$ are the normal distribution sample mean and sample standard deviation maximum likelihood estimators.

Likewise, for a t-distribution with parameters $θ = (μ, s, ν)$ , evaluation of the integral ( 2.1 ) results in the parametric t-distribution ES formula:

$E S_{γ} (μ, s, ν) = - μ + \frac{g_{γ, ν}}{γ} \cdot s$ (2.5)

where

$g_{γ, ν} ≜ \sqrt{\frac{ν}{ν - 2}} \cdot f_{ν - 2} (\sqrt{\frac{ν - 2}{ν}} \cdot q_{γ, ν})$ (2.6)

1See for example MZ2019 [6] or Jorion (2007) [7] .

2Zhang (2016) [9] shows that the expression (2.5) is equivalent to the one in McNeil et al. (2015) [3] .

with $f_{ν - 2}$ the standard t-density with $ν - 2$ degrees of freedom, and $q_{γ, ν}$ the tail probability $γ$ quantile for the standard t-distribution with $ν$ degrees of freedom2. The parametric t-distribution ES estimator is obtained by replacing the parameters $μ, s, ν$ with their t-distribution maximum-likelihood estimates.

The above nonparametric ES estimator is fundamentally different in character than the above parametric ES estimators, in that the former depends only on a small fraction of the most negative ordered returns, whereas the latter depends on all the returns for their parameter estimates. Furthermore, nonparametric ES is known to be a coherent risk measure, whereas parametric ES is not a coherent risk measure. In particular, parametric ES fails to satisfy the coherence monotonicity axiom. In data-oriented estimator terms, monotonicity means that if any return data value is decreased the risk should always increase or remain constant, and conversely. For the nonparametric risk estimator ( 2.2 ), this condition clearly holds since the decrease of any return among the lowest $γ$ percent of the returns results in an increase in risk, and conversely, the decrease or increase of any return in the largest $1 - γ$ percent of the returns results in no change in the risk. On the other hand, monotonicity does not hold for the parametric normal distribution ES because while the decrease of a return value below the mean will decrease both terms in Equation (2.3), a decrease of a large return above the mean can decrease the second term while increasing the first term, in such a way that the risk decreases.

2.2. Nonparametric and Parametric Normal Distribution ES Influence Functions

The influence function was introduced by Hampel (1974) [8] as a basic tool for the study of robust parameter estimators3. As we shall see, the shape of a risk measure influence function provides an immediate intuitive understanding of the impact that gains and losses, positive and negative returns respectively, have on a parametric risk estimator. The influence function is defined using a mixture distribution of the form

$F_{p} (x) = (1 - p) F (x) + p \cdot δ_{r} (x), 0 \leq p < 1$

where $δ_{r} (x)$ is a point mass probability distribution located at r. The influence function of nonparametric $E S_{γ} (F)$ is defined as the following directional (Gateaux) derivative of the ES functional at $F (x)$ in the direction of the point mass distribution $δ_{r} (x)$ :

$I F_{E S_{γ}} (r; F) = \lim_{p \to 0} \frac{E S_{γ} (F_{p}) - E S_{γ} (F)}{p} = {\frac{d}{d p} E S_{γ} (F_{p}) |}_{p = 0}$ .

As such, it has the intuitive interpretation as the asymptotic form of the difference quotient for an ES estimator evaluated at a fixed set of n returns $r_{1}, r_{2}, \dots, r_{n}$ and at an augmented set of $n + 1$ returns $r_{1}, r_{2}, \dots, r_{n}, r$ with r variable, divided by $n^{- 1}$ .

Using the integral representation of $E S_{γ} (F)$ in equation (2.1) it is shown in MZ2019 [6] that the nonparametric influence function of ES is

$I F_{E S_{γ}} (r; F) = - q_{γ} (F) - E S_{γ} (F) - \frac{I_{(- \infty, q_{γ} (F)]} (r)}{γ} (r - q_{γ} (F))$ (2.7)

where $I_{(- \infty, q_{γ} (F)]} (r)$ is the indicator function of the interval $(- \infty, q_{γ} (F)]$ . The above influence function expression can be evaluated for any distribution F, and the left-hand plot in Figure 1 displays the nonparametric ES influence function for the case where $γ = 5 %$ , with $F (x)$ a normal distribution with mean 0.12 and standard deviation 0.24. The resulting influence function has a striking piece-wise linear monotonic decreasing shape.

3See Hampel et al. (1986) [10] for detailed discussions of influence functions, and their applications.

As for the influence function of the parametric normal distribution ES given by Equation (2.3), we first note that this ES functional representation $E S_{γ} (μ (F), σ (F))$ is the following linear combination of the mean functional $μ (F)$ and the standard deviation functional $σ (F)$ :

$E S_{γ} (μ (F), σ (F)) = - μ (F) + \frac{ϕ (z_{γ})}{γ} \cdot σ (F)$ .

Figure 1. Nonparametric and parametric tail probability $γ = 5 %$ ES influence function for a normal distribution with mean 0.12 and standard deviation 0.24.

It follows that the influence function of parametric normal distribution ES is the corresponding linear combination of the influence functions of $μ (F)$ and $σ (F)$ . It is well-known in the robust statistics literature that influence function of the mean is

$I F_{μ (F)} (r) = r - μ$

and the influence function of the standard deviation is4:

$I F_{σ (F)} (r) = \frac{{(r - μ)}^{2} - σ^{2}}{2 σ}$ .

Thus, the influence function of parametric normal distribution ES is

$I F_{E S, γ} (r; μ, σ) = - (r - μ) + \frac{ϕ (z_{γ})}{γ} \cdot \frac{{(r - μ)}^{2} - σ^{2}}{2 σ}$ (2.8)

and it is displayed in the right-hand plot of Figure 1, with the same mean and standard deviation values as for the nonparametric ES influence function in the left-hand plot.

4See for example Section 3 of Zhang, Martin and Christidis (2021) [11] .

The shape of the parametric ES influence function is symmetric about the value $μ + γ σ / ϕ (z_{γ}) = 0.24$ , and in this regard is strikingly different from the desirable monotonic character of the nonparametric ES influence function, which is either constant, or strictly increasing, for decreasing return , beyond the tail probability $γ = 0.05$ quantile. On the other hand, the parametric normal distribution influence function is quite non-monotonic, with increasing return values much larger than 0.24 indicating increased risk, which is quite unnatural.

3. Parametric Risk Measures and Influence Functions

This section first discusses general parametric risk measures and their general influence function formula, as well as a specific influence function formula for the case where maximum-likelihood parameter estimators are used. Then it presents the MZ2019 [6] influence function formula for parametric t-distribution expected shortfall.

3.1. General Parametric Risk Measures and Influence Functions

Let $ρ (θ) = ρ (F_{θ})$ be a parametric risk measure defined by a fixed parametric family of univariate distribution functions $F_{θ}$ where $θ = (θ_{1}, θ_{2}, \dots, θ_{K})$ . It will suffice for our applications in this paper to assume that $F_{θ} (r)$ is continuous and strictly increasing with quantile function $q_{γ} (F_{θ}) = F_{θ}^{- 1} (γ)$ . One obtains a parametric risk measure estimator by first choosing a parameter estimator $θ = {\hat{θ}}_{n}$ of $θ$ and then using it to obtain a risk estimator ${\hat{ρ}}_{n} = ρ (\hat{θ})$ , where we have suppressed the subscript n on ${\hat{θ}}_{n}$ for notational convenience. We assume that $θ = {\hat{θ}}_{n}$ is based on independent and identically (i.i.d.) returns $r_{1}, r_{2}, \dots, r_{n}$ , with a general distribution function that will ultimately be equal to a parametric family $F_{θ}$ of t-distributions for our parametric ES maximum-likelihood estimator.

An estimator $θ = {\hat{θ}}_{n}$ is assumed to be obtained from a functional $θ (F)$ by the plug-in-rule

${\hat{θ}}_{n} = θ (F_{n})$ (3.1)

where $F_{n}$ is the empirical distribution function of a set of returns $r_{1}, r_{2}, \dots, r_{n}$ . Correspondingly, we represent the risk measure functional as

$ρ (θ) = ρ (θ (F))$ (3.2)

and the risk measure estimator is

${\hat{ρ}}_{n} = ρ (θ (F_{n}))$ . (3.3)

We tacitly assume that the returns are independent and identically distributed (i.i.d.) with a parametric distribution $F_{θ}$ . Then when the usual Fisher consistency condition $θ (F_{θ}) = θ$ holds, the estimator ${\hat{θ}}_{n}$ will under reasonable conditions converge in probability to the true parametric distribution parameter $θ$ . Correspondingly, the risk measure estimator ${\hat{ρ}}_{n} = ρ (θ (F_{n}))$ will converge in probability to $ρ (θ)$ . Note however that, when the returns come from $F \neq F_{θ}$ then $θ (F_{n})$ will converge in probability to $θ (F)$ , but typically $θ (F) \neq θ$ .

For deriving parametric risk measure influence functions, the distribution function F is replaced by the parametric distribution function $F_{θ}$ , which results in the parametric mixture distribution

$F_{θ, p} = (1 - p) F_{θ} + p \cdot δ_{r}, 0 \leq p < 1$ . (3.4)

Then, referring to the risk measure loosely by its estimator $\hat{ρ}$ , the influence function of $\hat{ρ}$ is defined as

$I F_{\hat{ρ}} (r; F_{θ}) = \lim_{p ↓ 0} \frac{ρ (θ (F_{θ, p})) - ρ (θ (F_{θ}))}{p} = {\frac{d ρ (θ (F_{θ, p}))}{d p} |}_{p = 0} .$ (3.5)

The parameter estimator vector influence function $I F_{\hat{θ}} (r) = I F_{\hat{θ}} (r; F)$ has the components

$I F_{{\hat{θ}}_{k}} (r; F_{θ}) = {[\partial θ_{k} (F_{θ, p}) / \partial p]}_{p = 0}$ (3.6)

With $\nabla ρ (θ) = {(ρ_{1} (θ), \dots, ρ_{K} (θ))}^{'}$ the gradient of $ρ (θ)$ , where $ρ_{k} (θ) = \partial ρ (θ) / \partial θ_{k}$ , the chain rule gives:

$\begin{matrix} I F_{\hat{ρ}} (r; F) = {[\nabla ρ {(θ (F_{p}))}^{'} \cdot \frac{\partial θ (F_{p})}{\partial p}]}_{p = 0} \\ = \nabla ρ {(θ (F_{p}))}^{'} \cdot I F_{\hat{θ}} (r) \\ = \sum_{k = 1}^{K} ρ_{k} (θ) \cdot I F_{{\hat{θ}}_{k}} (r) \end{matrix}$ (3.7)

The above is a general expression valid for any consistent parameter estimator $\hat{θ}$ having an influence function $I F_{\hat{θ}} (r)$ . For a given parametric risk measure the above expression can be used for a variety of choices of parameter estimators, and for a given parameter estimator the expression can be used for a variety of parametric risk measures.

In the frequently occurring case where the parameter estimator is an MLE, it was shown in Hampel et al. (1986) [10] that $I F_{\hat{θ}} (r)$ has the special form

$I F_{\hat{θ}} (r; θ) = I_{θ}^{- 1} \cdot ψ (r; θ)$ (3.8)

where $I_{θ}$ is the information matrix, and $ψ (r; θ)$ is the score function vector5. In this case the IF formula (3.7) has the general form:

$I F_{\hat{ρ}} (r; F) = \nabla ρ (θ (F_{p})) I_{θ}^{- 1} ψ (r; θ) .$ (3.9)

3.2. Parametric t-Distribution Expected Shortfall Influence Function

For a parametric t-distribution ES based on maximum-likelihood parameter estimates, the formula (3.9) becomes

$I F_{E S} (r; μ, s, ν) = \nabla ρ_{E S} (μ, s, ν) \cdot I_{μ, s, ν}^{- 1} \cdot ψ_{E S} (r; μ, s, ν)$ (3.10)

5With $l (θ) = \ln f (r; θ)$ the log-likelihood for a single observation, the vector score function $ψ (r; θ)$ has components $ψ_{θ_{k}} (r) = \partial l (θ) / \partial θ_{k}, k = 1, 2, \dots, K$ , and the K × K information matrix $I_{θ} = E_{F} [ψ (r; θ) ψ^{'} (r; θ)]$ has elements $I_{k, j} = E_{F} [ψ_{θ_{k}} (r) \cdot ψ_{θ_{j}} (r)]$ .

where $\nabla ρ_{E S} (μ, s, ν)$ is the gradient vector of $ρ_{E S} (μ, s, ν) = E S_{γ} (μ, s, ν)$ given by Equations (2.5) and (2.6), $I_{μ, s, ν}$ is the Fisher information matrix, and $ψ_{E S} (r; μ, s, ν)$ is the t-distribution score function. The risk measure gradient vector is given by

$\nabla ρ_{E S} (μ, s, ν) = {(- 1, \frac{g_{γ, ν}}{γ}, \frac{s}{γ} \cdot \frac{\partial g_{γ, ν}}{\partial ν})}^{'}$

with the partial derivative approximated by a finite difference quotient. The $ψ_{E S} (r; θ)$ score vector is

$[\begin{matrix} ψ_{μ} (r) \\ ψ_{s} (r) \\ ψ_{ν} (r) \end{matrix}] = [\begin{matrix} \frac{(v + 1) (r - μ)}{v s^{2} + {(r - μ)}^{2}} \\ ψ_{s} (r) \\ \frac{1}{2} (Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2})) - \frac{1}{2} \log (1 + \frac{{(r - μ)}^{2}}{v s^{2}}) + \frac{s}{2 ν} ψ_{s} (r) \end{matrix}]$ (3.11)

where the formula of the scale score function $ψ_{s} (r)$ is

$ψ_{s} (r) = \frac{ν}{s} \cdot \frac{{(r - μ)}^{2} - s^{2}}{v s^{2} + {(r - μ)}^{2}}$

And $Ω (v) = \frac{d \log Γ (v)}{d v}$ is the digamma function. The formula for the information matrix $I_{μ, s, ν}$ was derived by Lucas (1997) [12] and may be found in MZ2019 [6] .

Note that $I F_{E S} (r; μ, s, ν)$ depends on r solely through its appearance in the three components $ψ_{μ} (r)$ , $ψ_{s} (r)$ , $ψ_{ν} (r)$ of the score function vector, and for large r this dependence is symmetric about μ. In this regard, the behavior of $I F_{E S} (r; μ, s, ν)$ has a quite undesirable non-monotonic approximately symmetric behavior similar to that of the parametric normal distribution influence function in Equation (2.8) and the right-hand plot of Figure 1. This behavior is displayed in the left-hand plot of Figure 2, which is discussed further in the next section in comparison with an alternative parametric t-distribution ES M-estimator with semi-scale, and its influence function.

4. Parametric T-Distribution Semi-Scale Expected Shortfall M-Estimator and Influence Function

In this Section we derive a semi-scale modified version of the parametric t-distribution expected shortfall estimator, whose influence function does not suffer from the approximately symmetric behavior of the parametric t-distribution expected shortfall exhibited in the previous Section. Since the semi-scale parameter estimator is not an MLE, the MLE results of the previous Section no longer apply. However, the following closely related M-estimator, as discussed in Hampel et al. (1986) [10] can be used in place of an MLE.

6The above equation becomes an MLE estimating equation for the special choice $ψ_{M L E} (r; θ) = - d \ln f (r; θ) / d θ$ , but here we move beyond the MLE.

A general M-estimator functional $θ (F)$ is defined as the solution for $θ$ of the vector equation

$E_{F} [ψ (r; θ)] = \int ψ (r; θ) d F (r) = 0$ (4.1)

where $ψ (r; θ)$ is a vector-valued general “score function”, which depends on a vector parameter $θ$ and a scalar return variable r6. An M-estimator $\hat{θ} = {\hat{θ}}_{n}$ is obtained from a set of returns $r_{1}, r_{2}, \dots, r_{n}$ by replacing F in (2.11) with the empirical distribution $F_{n}$ :

Figure 2. t-Distribution parametric symmetric and semi-scale ES influence function with tail probability $γ = 5 %$ for t-distribution with $γ = 10$ degrees of freedom, mean 0.12 and standard deviation 0.24.

$\sum_{i = 1}^{n} ψ (r_{i}; \hat{θ}) = 0$ . (4.2)

For our application where the distribution of is in a parametric family $F_{θ}, θ \in Θ$ , we assume the Fisher consistency condition

$E_{F_{θ}} [ψ (r; θ)] = \int ψ (r; θ) d F_{θ} (r) = 0, θ \in Θ$ (4.3)

In other words, the expected value of the score function at the true parameter value is zero. Correspondingly, an M-estimator defined by ( 4.2 ) converges in probability to a solution of the asymptotic estimating Equation (4.3).

For many estimation problems $ψ (r; θ)$ may be defined as the derivative $ψ (r; θ) = d ρ (r; θ) / d θ$ . Of a loss function $ρ (r; θ)$ . In particular the log-likelihood function is a choice.

It is shown in Hampel et al. (1986) [10] that the influence function of an M-estimator is given by

$I F (r; θ (F)) = M^{- 1} (ψ, F) ψ (r; θ (F))$ (4.4)

where the M-matrix $M (ψ, F)$ is the p×p matrix

$M (ψ, F) = - \int^{} {[\frac{\partial}{\partial θ} ψ (r; θ)]}_{θ = θ (F)} d F (r)$ . (4.5)

Combining (3.7) and (4.4) we have the following general expression for the influence function of a parametric risk measure based on a general M-estimator of the unknown parameters:

$\begin{matrix} I F_{\hat{ρ}} (r; F) = \nabla ρ {(θ (F))}^{'} \cdot I F_{\hat{θ}} (r) \\ = \nabla ρ {(θ (F))}^{'} \cdot M^{- 1} (ψ, F) ψ (r; θ (F)) \end{matrix}$ (4.6)

Note that the gradient vector in the above expression depends only on the risk measure chosen and the M-estimator functional that represents the asymptotic value of the M-estimator. For parametric distributions and consistent estimators, this gradient only depends on the parametric risk measure and the distribution parameters. However, for any given distribution the influence function part of the above expression depends only on the choice of M-score function $ψ$ .

We seek to modify the t-distribution MLE score function of the previous Section so that the scale score function is a semi-scale score function, and the expected value of the resulting M-score function is still zero. One way to do this is as follows. Define the semi-scale score function by

$\begin{matrix} ψ (r) = [\begin{matrix} {\tilde{ψ}}_{μ} (r) \\ {\tilde{ψ}}_{s} (r) \\ {\tilde{ψ}}_{ν} (r) \end{matrix}] \\ = [\begin{matrix} ψ_{μ} (r) \\ {\tilde{ψ}}_{s} (r) \\ \frac{1}{2} (Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2})) - \frac{1}{2} \log (1 + \frac{{(r - μ)}^{2} \cdot I_{(- \infty, μ)} (r)}{v s^{2}}) - c + \frac{s}{2 ν} {\tilde{ψ}}_{s} (r) \end{matrix}] \end{matrix}$ (4.7)

where the location score function $ψ_{μ} (r)$ is the same as for a t-distribution MLE as given by (3.11), and ${\tilde{ψ}}_{s} (r) = {\tilde{ψ}}_{s} (r; μ, s, ν)$ is the semi-scale score function

${\tilde{ψ}}_{s} (r) = \frac{(ν + 1) {(r - μ)}^{2} \cdot I_{(- \infty, μ)} (r)}{v s^{3} + s \cdot {(r - μ)}^{2}} - \frac{1}{2 s}$ (4.8)

where the constant c remains to be determined.

In order to ensure consistency of the M-estimator defined by the above M-score function, the latter must satisfy the zero-expectation condition (4.2) under a t-distribution for returns. This is already the case for $ψ_{μ} (r)$ , and we now show this is also the case for ${\tilde{ψ}}_{s} (r)$ .

In order to see that the expected value of ${\tilde{ψ}}_{s} (r)$ is zero, note that from the t-distribution MLE score function (3.11) we have

$\begin{matrix} s \cdot ψ_{s} (r) = \frac{(v + 1) {(r - μ)}^{2}}{v s^{2} + {(r - μ)}^{2}} - 1 \\ = \frac{(v + 1) {(r - μ)}^{2}}{v s^{2} + {(r - μ)}^{2}} \cdot I_{(- \infty, μ]} (r) + \frac{(v + 1) {(r - μ)}^{2}}{v s^{2} + {(r - μ)}^{2}} \cdot I_{(μ, \infty)} (r) - 1 . \end{matrix}$ (4.9)

Since the expected value of $ψ_{s} (r)$ is zero the sum of the first two terms on the right is one, and from the symmetry of the t-distribution about μ the expected value of each of these first two terms must be equal, and hence equal to one-half. Thus

$\begin{matrix} E [{\tilde{ψ}}_{s} (r)] = \frac{1}{s} E [\frac{(ν + 1) {(r - μ)}^{2} \cdot I_{(- \infty, μ]} (r)}{v s^{2} + {(r - μ)}^{2}} - \frac{1}{2}] \\ = \frac{1}{s} E [\frac{1}{2} - \frac{1}{2}] \\ = 0 . \end{matrix}$

Now we just need to choose c so that the M-score $ψ_{ν} (r)$ for degrees of freedom has zero expected value. First note that we require that

$\begin{array}{l} E [ψ_{ν} (r)] \\ = \frac{1}{2} (Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2})) - E [\frac{1}{2} \log (1 + \frac{{(r - μ)}^{2} \cdot I_{(- \infty, μ]} (r)}{v s^{2}})] - c + \frac{s}{ν} E [ψ_{s} (r)] \\ = \frac{1}{2} (Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2})) - E [\frac{1}{2} \log (1 + \frac{{(r - μ)}^{2} \cdot I_{(- \infty, μ]} (r)}{v s^{2}})] - c \\ = 0 \end{array}$

which means that

$c = \frac{1}{2} (Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2})) - \frac{1}{2} E [\log (1 + \frac{{(r - μ)}^{2} \cdot I_{(- \infty, μ)} (r)}{v s^{2}})]$ . (4.10)

Noting that

$\log (1 + \frac{{(r - μ)}^{2}}{v s^{2}}) = \log (1 + \frac{{(r - μ)}^{2} \cdot I_{(- \infty, μ)} (r)}{v s^{2}}) + \log (1 + \frac{{(r - μ)}^{2} \cdot I_{(μ, + \infty)} (r)}{v s^{2}})$ (4.11)

along with the symmetry of the t-distribution about μ shows that the expected value of the last two terms must be equal, and thus

$E [\log (1 + \frac{{(r - μ)}^{2} \cdot I_{(- \infty, μ)} (r)}{v s^{2}})] = \frac{1}{2} \cdot E [\log (1 + \frac{{(r - μ)}^{2}}{v s^{2}})]$ (4.12)

Plugging (4.12) into (4.10) gives

$\begin{matrix} c = \frac{1}{2} (Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2})) - \frac{1}{4} (Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2})) \\ = \frac{1}{4} (Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2})) . \end{matrix}$

7We note that this choice of M-score function is not unique. Another valid choice would be ${\tilde{ψ}}_{2 s} (r) = \frac{ν}{s} (\frac{{(r - μ)}^{2} - s^{2}}{v s^{2} + {(r - μ)}^{2}}) \cdot I_{(- \infty, μ]} (r)$ .

However, it is easily verified that this choice of scale score function is discontinuous at $r = μ$ and it is a basic principle in robust statistics is that discontinuous influence functions are to be avoided.

Thus, we have7

$[\begin{matrix} {\tilde{ψ}}_{μ} (r) \\ {\tilde{ψ}}_{s} (r) \\ {\tilde{ψ}}_{ν} (r) \end{matrix}] = [\begin{matrix} ψ_{μ} (r) \\ {\tilde{ψ}}_{s} (r) \\ \frac{1}{4} (Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2})) - \frac{1}{2} \log (1 + \frac{{(r - μ)}^{2} \cdot I_{(- \infty, μ)} (r)}{v s^{2}}) + \frac{s}{2 ν} {\tilde{ψ}}_{s} (r) \end{matrix}]$ . (4.13)

Now to get the expression for the parameter estimator score function in (4.4), we just need to evaluate the M-matrix

$M (ψ, F) = M (μ, s, ν) = (\begin{matrix} E [\frac{\partial ψ_{μ}}{\partial μ}] & E [\frac{\partial ψ_{μ}}{\partial s}] & E [\frac{\partial ψ_{μ}}{\partial ν}] \\ E [\frac{\partial {\tilde{ψ}}_{s}}{\partial μ}] & E [\frac{\partial {\tilde{ψ}}_{s}}{\partial s}] & E [\frac{\partial {\tilde{ψ}}_{s}}{\partial ν}] \\ E [\frac{\partial {\tilde{ψ}}_{ν}}{\partial μ}] & E [\frac{\partial {\tilde{ψ}}_{ν}}{\partial s}] & E [\frac{\partial {\tilde{ψ}}_{ν}}{\partial ν}] \end{matrix})$ (4.14)

where the expectations are taken with $F = F_{θ}$ . Straightforward but tedious calculations (see Appendix A in an early draft version of this paper available at SSRN, https://ssrn.com/abstract=4605604) show that

$M = (\begin{matrix} \frac{1}{s^{2}} [1 - \frac{2}{v + 3}] & 0 & 0 \\ \frac{Γ ((ν + 1) / 2)}{Γ (ν / 2) \cdot \sqrt{ν π}} \cdot \frac{- 2 (v + 1)}{(ν + 3) \cdot s^{2}} & \frac{1}{s^{2}} \cdot \frac{v}{v + 3} & \frac{1}{2 s} [\frac{1}{v + 3} - \frac{1}{v + 1}] \\ \frac{Γ ((ν + 1) / 2)}{Γ (ν / 2) \cdot \sqrt{ν π}} \cdot \frac{ν - 1}{(ν + 3) \cdot (ν + 1) \cdot v \cdot s} & \frac{1}{2 s} (\frac{1}{v + 3} - \frac{1}{v + 1}) & \frac{1}{8} [Ω^{'} (\frac{v}{2}) - Ω^{'} (\frac{v + 1}{2})] - \frac{v + 5}{4 v (v + 1) (v + 3)} \end{matrix})$

which has the form

$M = (\begin{matrix} A & 0 & 0 \\ B & C & D \\ E & D & F \end{matrix})$ (4.15)

with inverse

$M^{- 1} = \frac{1}{A (C F - D^{2})} (\begin{matrix} C F - D^{2} & 0 & 0 \\ E D - B F & A F & - A D \\ B D - C E & - A D & A C \end{matrix}) .$ (4.16)

The left-hand plot in Figure 2 displays $γ = 5 %$ tail probability parametric t-Distribution ES estimator influence function for with Degrees of Freedom = 10, mean = 0.12 and standard deviation = 0.24. Except for the curious negative bump for small positive values of the Return r, this influence function is essentially symmetric about the μ = 0.12, as we pointed out in the last paragraph of Section 3. Thus, the parametric t-distribution ES influence function suffers from the lack of monotonicity in a manner similar to that of the parametric normal distribution ES estimator influence function in the right-hand plot of Figure 1.

On the other hand, the right-hand plot in Figure 2 displays the parametric t-distribution semi-scale M-estimator influence function, with the same $μ, s, ν$ parameters as for the influence function in the left-hand plot. But now, except for the curious negative bump for small negative values of the Return r, this influence function has the desirable monotonic decreasing character similar to that of the nonparametric ES influence function in the left-hand plot of Figure 1.

Figure 3 gives a good feeling for how the shape of the parametric t-distribution ES M-estimator influence function changes with changes in the t-distribution degrees of freedom and ES tail probability, for the four degrees of freedom

Figure 3. Influence functions of ES semi-scale m-estimators with monthly $μ = 1 %, s = 7 %$ , and annual SR = 0.5

parameter $ν$ values 20, 10, 6, 3, and the three tail probability parameter $γ$ values 1%, 2.5%, 5%. These influence functions all have the desirable shape that there is essentially zero influence for positive values of return r, and positive influence that increases rapidly for decreasing negative return values. As one expects, the positive values of the influence function increase for each fixed negative r as the tail probability $γ$ decreases. The behavior of the shape of the influence functions for negative returns close to 0 as the degrees of freedom decrease from 20 to 3 is more subtle, with the shape being slightly convex for $ν = 20$ and slightly concave for $ν = 3$ . The latter is related to the fact, demonstrated in MZ2019 [6] , that the t-distribution maximum-likelihood ES influence function is logarithmically unbounded.

5. Implementation of the Parametric Semi-Scale ES Estimator

Now we propose to construct an ES semi-scale M-estimator for risk monitoring, with the following straightforward steps.

(1) First compute the t-distribution MLE estimates $(\hat{v}, {\hat{s}}_{0}, \hat{μ})$ , for example using the Azzalini SN package [13] available on CRAN (https://cran.r-project.org/web/packages/sn/index.html).

(2) Compute the semi-scale parameter estimate as follows. Plug the $\hat{μ}$ and $\hat{v}$ MLE’s from step one into the semi-scale score function ${\tilde{ψ}}_{s} (r; μ, s, ν)$ . Then the semi-scale parameter estimate ${\hat{s}}_{s e m i}$ can be computed by solving the equation:

$\sum_{i = 1}^{n} {\tilde{ψ}}_{s} (r_{i}; \hat{μ}, s, \hat{v}) = 0.$

Note that above summation has the form

$\begin{matrix} \sum_{i = 1}^{n} {\tilde{ψ}}_{s} (r_{i}; \hat{μ}, s, \hat{v}) = \sum_{i = 1}^{n} [\frac{(\hat{v} + 1) {(r_{i} - \hat{μ})}^{2} \cdot I_{(- \infty, \hat{μ}]} (r_{i})}{\hat{v} s^{3} + s \cdot {(r_{i} - μ)}^{2}} - \frac{1}{2 s}] \\ = \frac{1}{s} \cdot \sum_{i = 1}^{n} [\frac{(\hat{v} + 1) {(r_{i} - \hat{μ})}^{2} \cdot I_{(- \infty, \hat{μ}]} (r_{i})}{\hat{v} s^{2} + {(r_{i} - μ)}^{2}} - \frac{1}{2}] \\ = \frac{\hat{v} + 1}{s} \cdot [\sum_{i = 1}^{n} \frac{{(r_{i} - \hat{μ})}^{2} \cdot I_{(- \infty, \hat{μ}]} (r_{i})}{\hat{v} s^{2} + {(r_{i} - μ)}^{2}} - \frac{n}{2 (\hat{v} + 1)}] \end{matrix}$

and to compute ${\hat{s}}_{s e m i}$ , one just needs to solve the following equation whose left-hand side is strictly monotonic in s:

$\sum_{i = 1}^{n} \frac{{(r_{i} - \hat{μ})}^{2} \cdot I_{(- \infty, \hat{μ}]} (r_{i})}{\hat{v} s^{2} + {(r_{i} - \hat{μ})}^{2}} - \frac{n}{2 (\hat{v} + 1)} = 0$ .

Any simple search algorithm will suffice, for example using the Newton-Raphson method package rootSolve available on CRAN (https://CRAN.R-project.org/package=rootSolve).

(3) Finally, plug $(\hat{v}, {\hat{s}}_{s e m i}, \hat{μ})$ into the parametric t-distribution ES expression (2.5) to obtain the ES semi-scale M-estimator:

$E S_{γ} (\hat{μ}, {\hat{s}}_{s e m i}, \hat{v}) = - \hat{μ} + \frac{g_{γ, \hat{v}}}{γ} \cdot {\hat{s}}_{s e m i} .$

It remains to carry out some empirical studies of the performance of this risk estimator.

6. ES Semi-Scale M-Estimator Asymptotic Variance

The asymptotic variance of a consistent M-estimator ${\hat{θ}}_{n} = θ (F_{n})$ of $θ$ has the form

$V (θ (F), F) = M^{- 1} (ψ, F) \cdot Q (ψ, F) \cdot M^{- 1} {(ψ, F)}^{'}$ (6.1)

where

$\begin{matrix} Q (ψ, F) = \int ψ (r; θ (F)) ψ^{'} (r; θ (F)) d F (r) \\ = (\begin{matrix} E [{\tilde{ψ}}_{μ} {\tilde{ψ}}_{μ}] & E [{\tilde{ψ}}_{μ} {\tilde{ψ}}_{s}] & E [{\tilde{ψ}}_{μ} {\tilde{ψ}}_{v}] \\ E [{\tilde{ψ}}_{s} {\tilde{ψ}}_{μ}] & E [{\tilde{ψ}}_{s} {\tilde{ψ}}_{s}] & E [{\tilde{ψ}}_{s} {\tilde{ψ}}_{v}] \\ E [{\tilde{ψ}}_{ν} {\tilde{ψ}}_{μ}] & E [{\tilde{ψ}}_{ν} {\tilde{ψ}}_{s}] & E [{\tilde{ψ}}_{ν} {\tilde{ψ}}_{ν}] \end{matrix}) \end{matrix}$ (6.2)

See for example Hample et al. (1986) [10] . In the special case of MLE estimators both $M$ and $Q$ reduce to the information matrix $I (θ)$ and the expression (6.1) reduces to $I {(θ)}^{- 1}$ as expected.

Straightforward but tedious derivations (see Appendix B in an early draft version of this paper available at SSRN, https://ssrn.com/abstract=4605604) give the following expressions:

$\begin{matrix} E [ψ_{ν} ψ_{ν}] = \frac{1}{16} {[Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2}) - \frac{1}{v}]}^{2} + \frac{1}{8} [Ω^{'} (\frac{v}{2}) - Ω^{'} (\frac{v + 1}{2})] \\ - \frac{1}{2 v} [\frac{1}{v + 1} - \frac{1}{2 (v + 3)}] \end{matrix}$ (6.3)

$E [ψ_{s} ψ_{s}] = \frac{1}{s^{2}} [\frac{ν}{ν + 3} + \frac{1}{4}]$ (6.4)

$E [ψ_{μ} ψ_{μ}] = \frac{1}{s^{2}} [1 - \frac{2}{v + 3}]$ (6.5)

$E [ψ_{ν} ψ_{s}] = \frac{1}{2 s} [\frac{1}{ν + 3} - \frac{1}{ν + 1}] - \frac{1}{8 s} [Ω (\frac{v + 1}{2}) - Ω (\frac{v}{2}) - \frac{1}{ν}]$ (6.6)

$E [ψ_{ν} ψ_{μ}] = \frac{Γ ((ν + 1) / 2)}{Γ (ν / 2) \cdot \sqrt{ν π} \cdot s} \cdot \frac{v - 1}{(v + 3) (v + 1) v}$ (6.7)

$E [ψ_{s} ψ_{μ}] = \frac{Γ ((ν + 1) / 2)}{Γ (ν / 2) \cdot \sqrt{ν π} \cdot s^{2}} \cdot \frac{- 2 (ν + 1)}{ν + 3}$ (6.8)

Since our t-distribution ES Semi-scale M-estimator is a small modification of the t-distribution ES MLE, one expects that the increase in the asymptotic variance of this estimator, relative to a t-distribution MLE, will not be very great. Figure 4 and Figure 5, which are based on standard errors (SE’s) obtained as the square root of the asymptotic variances of the ES semi-scale estimator using 6 degrees of freedom, indicate that indeed the increase in variance is not very great.

Figure 4. Asymptotic standard error of t-distribution ES semi-scale m-estimator and the asymptotic standard error of the ES MLE for a t-distribution with $μ = 0, s = 1, ν = 5$ .

Figure 5. Ratio of standard error of t-distribution ES estimators: semi-scale m-estimator versus maximum likelihood ES estimator for 5 degrees of freedom.

7. Concluding Comments

We have introduced a new ES Semi-Scale M-Estimator by replacing the scale estimator of an ES t-distribution joint MLE of the location, scale, and degrees of freedom, with a semi-scale estimator, and we derived the new estimator’s influence function and asymptotic variance formula. The mathematical form of the new estimator’s estimating equation and influence function show that the estimator avoids the unsatisfactory behavior of the t-distribution ES MLE that (large) positive returns indicate (large) risk.

Since an ES semi-scale M-estimator influence function is not exactly monotonic decreasing as returns increase, we cannot assert that this ES is a coherent risk measure, as is a mean semi-deviation estimator. However, the fact that such influence functions are nearly monotonic decreasing suggests that ES semi-scale M-estimators will have good properties for risk reporting. What is needed now is an in-depth empirical study of the relative performance of the ES semi-scale M-estimators and the mean semi-deviation coherent risk measures, using both simulated and real returns whose distributions range from approximately normal to moderately fat-tailed (e.g., t-distributions with 8 to 12 degrees of freedom), and to very fat-tailed distributions (e.g., t-distributions with 3 to 6 degrees of freedom).

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1]	JPMorgan/Reuters (1996) RiskMetrics—Technical Document. 4th Edition. https://www.msci.com/documents/10199/5915b101-4206-4ba0-aee2-3449d5c7e95a
[2]	Artzner, P., Delbaen, F., Eber, J.M. and Heather, D. (1999) Coherent Measures of Risk. Mathematical Finance, 9, 203-228. https://doi.org/10.1111/1467-9965.00068
[3]	McNeil, A.J., Frey, R. and Embrechts, P. (2015) Quantitative Risk Management. Princeton University Press, Princeton.
[4]	Rockafellar, R.T. and Uryasev, S. (2000) Optimization of Conditional Value-at-Risk. Journal of Risk, 2, 21-41. https://doi.org/10.21314/JOR.2000.038
[5]	Fischer, T. (2003) Risk Capital Allocation by Coherent Risk Measures Based on One-Sided Moments. Insurance: Mathematics and Economics, 32, 135-146. https://doi.org/10.1016/S0167-6687(02)00209-3
[6]	Martin, R.D. and Zhang, S. (2019) Non-Parametric versus Parametric Expected Shortfall. Journal of Risk, 21, 1-41. https://doi.org/10.21314/JOR.2019.416
[7]	Jorion, P. (2007) Value-at-Risk. 3rd Edition, McGraw-Hill, New York.
[8]	Hampel, F.R. (1974) The Influence Curve and Its Role in Robust Estimation. Journal of American Statistical Association, 69, 383-393. https://doi.org/10.1080/01621459.1974.10482962
[9]	Zhang, S. (2016) Two Equivalent Parametric Expected Shortfall Formulas for T-Distributions. https://ssrn.com/abstract = 2883935
[10]	Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986) Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons, Inc., New York.
[11]	Zhang, S., Martin, R.D. and Chritidis, A.A. (2021) Influence Functions for Risk and Performance Estimators. Journal of Mathematical Finance, 1, 1-33. https://doi.org/10.4236/jmf.2021.111002
[12]	Lucas, A. (1997) Robustness of the Student-t Based M-Estimator. Communications in Statists-Theory and Methods, 26, 1165-1182. https://doi.org/10.1080/03610929708831974
[13]	Azzalini, A. (2023). http://azzalini.stat.unipd.it/SN/

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies