Operational Risk Modelling in Insurance and Banking

The author of the presented paper is trying to develop and implement the model that can mimic the state of the art models of operational risk in insurance. It implements generalized Pareto distribution and Monte Carlo simulation and tries to mimic and construct operational risk models in insurance. At the same time, it compares lognormal, Weibull and loglogistic distribution and their application in insurance industry. It is known that operational risk models in insurance are characterized by extreme tails, therefore the following analysis should be conducted: the body of distribution should be analyzed separately from the tail of the distribution. Afterwards the convolution method can be used to put together the annual loss distribution by combining the body and tail of the distribution. Monte Carlo method of convolution is utilized. Loss frequency in operational risk in insurance and overall loss distribution based on copula function, in that manner using student-t copula and Monte Carlo method are analysed. The aforementioned approach represents another aspect of observing operational risk models in insurance. This paper introduces: 1) Tools needed for operational risk models; 2) Application of R code in operational risk modeling;3) Distributions used in operational risk models, specializing in insurance; 4) Construction of operational risk models.


Introduction
Operational risk is defined according to Basel II (Nicolas & Firzli, 2011) as well as according to European Solvency II which adopted for insurance industry is defined in the following way (Nicolas & Firzli, 2011).
Operational risk is the risk of change in value caused by the fact that actual losses, incurred for inadequate or failed internal processes, people and systems, or from external events (including legal risk), and differs from the expected losses.
In order to analyse operational risk in insurance, Solvency II Directive (Mittnik, 2011) must be discussed.Solvency II Directive is an EU Directive that codifies and harmonises the EU insurance regulation.This concerns the amount of capital that EU insurance companies must hold to reduce the risk of insolvency.Solvency II (Mittnik, 2011) is called "Basel for insurers" (Nicolas & Firzli, 2011).Solvency II is somewhat similar to the banking regulations of Basel II.The proposed Solvency II framework has three main areas (pillars) (Accords, 2006): • Pillar 1-quantitative requirements • Pillar 2-requirements for the governance and risk management of insurers and their supervision • Pillar 3-disclosure and transparency requirements In order to analyse the operational risk in insurance (CEA-Groupe Consultatif, 2005), the attention towards pillar 2 and pillar 1 should be directed.Ernst &Young reports that most of the insurance companies in 2016 will manage to implement the Solvency II requirement until January 2016 (PWC Financial Services Regulatory Practice, 2014).One of the major new features of Solvency II is that insurance companies must now devote a portion of their equity to covering their exposure to operational risks.There are approaches to calculate the capital requirement: a standard and more advance approach.The advanced approach uses an internal model of risk that corresponds to the company's real situation.Quantitative impact study (QIS 5) has encouraged insurance companies to adopt the internal model by structuring the standard approach such that it uses up much more equity (Solvency-European Commission, 2012).In order to analyse the operational risk in this frame, the following assumption will be made: risk will be divided between frequency and severity risk (Doerig, 2000).They will be modeled by Loss Distribution approach (Power, 2005).Severity risk represents the risk of large but rare losses.Bayesian networks are used to model severity risk.

Bayesian Networks
Bayesian networks are defined as a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG).For example, a Bayesian network could represent the probabilistic relationships between the cause and result.If we are aware of the causes, the probabilities of results can be calculated.
In order to define the Bayesian network, the following definition will be used.X is a Bayesian network with respect to G if its joint probability density function (with respect to a product measure) can be written as a product of the individual density functions, conditional on their parent variables: where The probability of any member of a joint distribution can be calculated from the conditional probabilities using the chain rule, taking into consideration at the same topological ordering of X.
( ) ( ) Equation ( 2) can be written as: X is a Bayesian network with respect to G if it satisfies the local Markov property, each variable is conditionally independent of its non-descendants given its parent variables.
where ( ) de v is the set of descendants and ( ) The aforementioned thing can also be expressed in terms similar to the first definition, as: For each j X which is a parent of v X .
Note that the set of parents is a subset of the set of non-descendants because the graph is acyclic.

Developing Bayesian Networks
To develop a Bayesian network, we often first develop a DAG (directed acyclic graph) G such that we believe X satisfies the local Markov property with respect to G .Sometimes this is done by creating a casual DAG .We then ascertain the conditional probability distribution of each variable given its parents in G .In many cases and in particular in the case where the variables are discrete, we define the joint distribution of X to be the product of these conditional distributions, then X is a Bayesian network with respect to G .
In order to model loss severity, since it can be difficult to model operational risk losses of a risk class using only one probability distribution, we analyse severity at two levels: body and tail of the distribution delimited by a high threshold value u .
The central body can be modeled by using a parametric distribution.One of the distributions that can be used is lognormal.Body is usually estimated on internal data, since the sample size is sufficient below the threshold u .
Figure 1 demonstrates the body of the distribution, it is assumed that the body of the distribution embraces lognormal value, therefore we assume the lognormal distribution.In order to analyse the tail, it can be modeled applying extreme value theory (Embrechts, Kluppelberg, & Mikosch, 1997) the distribution above the threshold, we are talking about Generalized Pareto distribution (GPD).Tail is usually estimated on internal data integrated with external data and scenario generated data above u .Data are real scarce above threshold u .
Generalized Pareto distribution is used to capture the tail of severity distribution.Figure 2 shows the Genera-  lized Pareto distribution, its density as well as its cumulative function.
The commonly used approach to quantify operational risk is the Loss Distribution Approach (Embrechts, Kluppelberg, & Mikosch, 1997) where frequency and severity of operational risk losses are modeled separately.The yearly potential loss S is based on the sum of the yearly losses j S related to the J risk classes.The yearly potential loss j S related to risk class j , is affected by two sources of uncertainty: • The number of losses j N in one year time horizon • The impact of each single loss ij X If the following procedure is to be applied to mathematical modeling of operational risk in insurance, the only thing that should be changed is the distribution that it is to be used.Therefore, the methods of copula and convolution will be explained as well as the possible distribution that could be applied in insurance.

Convolution
In mathematics, convolution is a mathematical operation on two functions f and g , producing a third function that is typically viewed as a modified version of one of the original functions, giving the area overlap between the two functions as a function of the amount that one of the original functions is translated (Bracewell, 1986;Damelin & Miller, 2011).Convolution is presented in the following manner: It is defined as the integral of the product of the two functions after one is reversed and shifted.It is a particular kind of integral transform: Although the symbol t is used above, it need not represent the time domain.However formula can be interpreted as a weighted average of the function ( ) f τ at the moment t where the weighting is given by ( ) g τ − simply shifted by amount t .As t changes, the weighting function emphasizes different part of the input function (Bracewell, 1986;Damelin & Miller, 2011).

Copula
A copula is a multivariate probability distribution for which the marginal probability distribution of each variable is uniform.Copulas are used to describe the dependence between random variables.
One important theorem for copulas is Sklar's theorem.We will now define the copula: Copula (Nelsen, 1999) is a multivariate distribution function, C , with marginal functions distributed uniformly in [0,1] (U(0,1)) such that 1) (Sklar, 1959) provides the theoretical foundation for the application of copula.Sklar's theorem states that every multivariate cumulative distribution function Of a random vector ( ) and it can be written as: where C is copula.
The copula that will be used is Gaussian copula and Gumbel copula.They will be presented shortly.The Gaussian copula: Gaussian copula (Nelsen, 1999) is a distribution over the unit cube [ ] 0,1 d .It is constructed from a multivariate normal distribution over d R using the probability integral form.The Gaussian copula for , it can be written with parameter matrix R can be written as: where 1 − Φ is the inverse cumulative distribution function of a standard normal and R Φ is the joint cumulative distribution function of a multivariate normal distribution with mean vector zero and covariance matrix equal to the correlation matrix R .
The density matrix can be written as where I is the identity matrix.
Other famous copulas are Archimedean copulas.

Archimedean Copulas
Archimedean copulas (Nelsen, 1999) are an associative class of copulas.Archimedean copulas are popular because they allow modeling dependence in arbitrarily high dimensions with only one parameter, governing the strength of dependence.
A copula C is called Archimedean if it admits the representation , , ; ; ; ; where

[ ] [ )
: 0,1 0, ψ × Θ → ∞ is a continuous, strictly decreasing and convex function such that ( ) is a parameter within some parameter space θ .ψ is the so-called generator function and [ ] 1 ψ − is the pseu- do-inverse defined by: Moreover, the above formula for C yields a copula for The copula that we will also introduce is Gumbel copula.It has the following bivariate form: After giving the definition of two most important techniques of aggregating operational risk, we will introduce the distributions that we will be using as well as the definition of value at risk and expected shortfall.

Statistical Distributions
To analyse severity in that sense pertaining to body of the distribution, following distributions can be used, given their densities: Lognormal distribution density (Hazewinkel, 2001): Weibull density distribution: The probability density function of a Weibull random variable (Hazewinkel, 2001) is: where k > 0 is the shape parameter and λ > 0 is the scale parameter of the distribution.Its complementary cu-mulative distribution function is a stretched exponential function.The Weibull distribution is related to a number of other probability distributions; in particular, it interpolates between the exponential distribution (k = 1) and the Rayleigh distribution (k = 2 and 2 k λ = ) Loglogistic probability density function is defined in the following way: ; , 1 The parameter 0 α > is a scale parameter and is also the median of the The parameter 0 β > is a shape parameter.The distribution is unimodal when 1 β > and its dispersion decreases as β increases.The following functions are shown in the graphs below: Figure 3 shows probability density functions of Loglogistic, Weibull and lognormal distributions.At the same time, we will introduce Champernowne distribution (Hazewinkel, 2001) that we will also be using, its probability density function is given in the graph below:

O. Vukovic
When obtaining copula, Monte Carlo method will be used, so we will introduce it shortly.One is interested in the expectation of a response function : applied to some random vector . If we denoted the cumulative distribution function of this random vector with H , the quantity of interest can thus be written as: If H is given by a copula model, example: , , , , Then expectation can be written as: ) In case the copula C is absolutely continuous, C has density c, the equation can be written as ) If copula and margins are known (or if they have been estimated), then the following Monte Carlo algorithm can be used: 2) By applying the inverse marginal cdf's, produce a sample of ( ) ) by its empirical value: ( ) ( )

VAR and Expected Shortfall
After having introduced all the necessary tools for operational risk model in insurance, we will just provide the VAR and expected shortfall definition (Longin, 1997) and then we are going to present the modeling results.
Confidence level ( ) , the VaR (Longin, 1997) of the portfolio at the confidence level α is given by the smallest number l such that the probability that the loss L exceeds l is at most ( ) In order to model tails, Generalized Pareto distribution will be used.The standard cumulative distribution function (cdf) of the GPD is defined by where the support is ( ) ( )

Expected shortfall
Expected shortfall is defined as Or equivalently can be written as: where is the lower -quantile α and ( ) is the indicator function.

Experimental Results
After having introduced all the necessary tools, the experimental results will be presented.As we are considering operational risk models in insurance, we will use the following tools.The procedure is the following.As we don't have enough data, we will use random number generator, let it simulate the events, but at the same defining that extreme events don't occur frequently.Afterwards we will implement the distribution for body and tail of the severity of distribution.Frequency of events was modeled by using Poisson distribution.A standard choice to estimate annual frequency of operational risk loss is performed by using Poisson distribution.On the basis of extreme value theory, the distribution function of loss data above a high threshold u is supposed to follow a Generalized Pareto distribution.Lognormal distribution will be used for modeling body of severity distribution.Afterwards, we use the convolution method using frequency and severity distribution and we obtain annual loss distribution.Convolution is performed by using Monte-Carlo method.
Following steps are performed: 1) Extract one random number ( ) Repeat n times and we obtain annual loss distribution.In order to obtain overall annual loss distribution, we use the following procedure: • Extract one random value If the real data is used, then to estimate the parameters maximum likelihood, method of moments, probability weighted method of moments should be used.
The simulation results are given below: The body of distribution is considered to be lognormal.The first fit value is the mean and the other one is standard deviation.The gradient of the function demonstrates where the function is 0.
In Figure 4, optimisation values are given for different theta 1 and theta 2 that represent optimal values.
Goodness of fit can also be performed, if the real data is used.Figure 7 shows negative log-likelihood method for fitting log-normal distribution.In that direction, Kolmogorov-Smirnov and Anderson-Darling test statistics are used, but this is left for further research.
To estimate the tail of the severity distribution, we have to set a high threshold u : the parameters are estimated on excess over threshold using probability weighted moments, maximum likelihood method etc. Mean excess function is defined in the following way: What should be noticed is that if the mean excess function for Generalized Pareto distribution is a linear function of u, if the empirical mean excess function is a straight line above a threshold, it is an indication that the excess over u follows a generalized Pareto distribution, which is shown in the Figure 8.
It is obvious that Figure 5 demonstrates that the tail of severity distribution is following Generalized Pareto distribution.QQ plots of the Generalized Pareto distribution using maximum likelihood method and probability weighted method of moments.
Q-Q plots in Figure 6 demonstrate that the tail of severity distribution is characterised by generalized Pareto distribution.Results are the following.Using the aforementioned methods, the following results are obtained:  To analyse the loss we assume Poisson process and obtain the following results: Figure 7 shows loss frequency distribution for Poisson case.After having introduced the severity and loss frequency distribution, we use the convolution method to obtain annual loss distribution.The code will be given in the appendix, so that the calculation can be performed.
The annual loss distributions are obtained in Figure 8. Figure 8 shows annual loss distribution as a result of convolution method, it is the combination of lognormal and generalized Pareto distribution.
It is the combination of lognormal and generalized Pareto distribution, therefore because they differ and vary from sample to sample, the copula method should be used to obtain overall annual loss distribution (Hazewinkel, 2001).
Using Student t-copula, overall loss distribution is calculated and code that can be replicated is given, together with the VaR (Longin, 1997) results which are shown in Figure 9.
The code given in appendix demonstrates how to calculate VaR and copula for overall loss distribution.At the same time, some results are given for Value at Risk, expected loss and capital requirements.The given calculations and results demonstrate how to model risk in insurance by using lognormal and generalized Pareto distribution.At the same time distribution can be changed depending of the situation.This approach introduces modeling of operational risk in insurance and financial institutions and represents a review of the state of art techniques.insurance.It introduces techniques that can be used in modeling operational risk in insurance and banking, and it can be immediately applied.Hopefully, this review will provide further research in risk models and provide new ideas that can be applied to operational risk in whole.
• Extract n random numbers from loss distributions related to J risk classes • Sum yearly losses to obtain overall annual loss distribution

Figure 4 .
Figure 4. Neg-log likelihood for fitting log-normal distribution with random number generator.

Figure 6 .
Figure 6.QQ plots with maximum likelihood method and probability weighted moments.

Figure 9 .
Figure 9. Copula method for overall annual loss distribution.