Comparison of Single and Composite Distributions in Modelling Auto Mobile Insurance Losses for Risk Measure Estimation ()
1. Introduction
One main function of an Actuary is to properly estimate risk, which can consequently be used in pricing and creating reserves. In estimating risk, it is, however, essential to determine the right probabilistic distribution of claims data. From the literature, single distributions such as Lognormal, Pareto, Weibull, and Gamma distributions have been widely used in modeling insurance claims. However, these distributions are not able to capture the extreme tail behavior of claims data, and this can lead to underestimation of the tail risk [1]. The failure of these traditional single distributions to accurately capture the dual nature (both small and large claims) of claims can lead to inadequate representation of tail behaviour essential for predicting insurance losses [2] [3].
Composite distributions, which combine multiple distributions, offer a flexible solution to this problem by accommodating the distinct characteristics of different segments of the data. The issue of appropriately modelling insurance claim data, particularly in the context of heavy-tailed data, has been a long-standing challenge in the field of actuarial science. In Ghana, this problem is especially pertinent due to the unique characteristics of the country’s insurance market and the heavy-tailed nature of its claim data.
Risk management cannot be underrated in risky industries such as insurance, hence requiring extensive data modeling. It is therefore of great importance to understand and efficiently model claims distribution for robust risk management. Most often, there is a display of heavy tailness in claims data where extreme values depict low frequencies but are accompanied by high severities. These rare extreme events can have a detrimental and catastrophic impact on the insurance industry if not properly accounted for. A critical model is consequently required to predict these unforeseen circumstances and enable proper reserving to handle potential claims.
2. Methods
2.1. Composite Distribution
Here, we discuss in detail how to formulate composite distributions. Composite distributions join together two weighted distributions at a given threshold value. In statistical terms, let
be a random variable and let
be the pdf of the first distribution and the pdf of the second distribution be denoted by
of the same random variable. Let
and
be the corresponding cdf’s of the random variables. The pdf of the composite model can then be expressed as:
(2.0)
The continuity condition and the differentiability conditions are imposed at the threshold
, such that in the limiting terms
(2.1)
(2.2)
where
and
are the parameters of the two pdfs on the two different intervals
and
, respectively. The differentiable and continuity conditions ensure
and
are defined as functions of the parameters
and
. In addition, the mixing weights for the two density functions are given by
and
. These mixing weights are defined as functions of
. This can be written in closed form in terms of the cumulative density function as:
(2.3)
Substituting the expression for
into the differentiability conditions simplifies to
(2.4)
The functions
and
are truncated pdfs. In terms of their associated pdfs and cdfs is given by:
(2.5)
(2.6)
and also
(2.7)
2.2. Model Selection Criteria
There are 16 loss distributions in the R-software package “actuar”, which is generally accepted for modelling losses. We fitted 240 composite distributions from these loss distributions in “actua” by taking two distributions at a time, making 16C2 × 2 = 240 in total. The results of the top 5 composite distributions are presented based on the three goodness of fit criteria: AIC, BIC, and log-likelihood.
2.2.1. Value at Risk (VaR)
According to [4], Value at Risk (VaR) is a statistical measure that defines the worst expected loss over a specific time horizon under normal market conditions at a certain confidence level. It is commonly used in risk management to quantify the potential loss on an investment or portfolio. In simpler terms, VaR represents the maximum amount of money that could be lost on a portfolio within a set period of time with a specified level of confidence. The VaR at a 95% security level is calculated as follows:
(2.8)
2.2.2. Tail Value at Risk
[5] and [6] defined the theoretical estimate for the TVaR of the random variable,
as:
(2.9)
3. Results
3.1. Data and Its Source
Secondary data obtained from an insurance company was used for this work. It consists of one year of claim data, consisting of 11,892 data points.
3.2. Preliminary Analysis
The descriptive statistics of the data are presented in Table 1 below:
Table 1. Descriptive statistics of the comprehensive insurance data.
Statistic |
Value (GH) |
Mean |
3884.2 |
1st Quartile |
901.5 |
Median |
2082.512 68 |
3rd Quartile |
3961.2 |
Standard deviation |
7542.157 |
Skewness |
6.300 086 8 |
Minimum |
20 |
Maximum |
123,758.9 |
3.3. Fitting Single Distributions to Comprehensive Insurance Claims Data
Several single continuous distributions were fitted, but results from only the top five (5) are presented in Table 2 below. The table gives a summary of the parameter estimates and the AIC values. The AIC value for Lognormal was 5178.87, which was the lowest, indicating the best fit among the five distributions considered. This was closely followed by Gamma with an AIC of 5325.02 and Exponential distribution with an AIC of 5396.66. This explains the suitability of the Lognormal distribution as the appropriate model for the comprehensive data set.
Table 2. Summary of the top 5 single distribution and their parameters.
Distribution |
Parameter Estimate |
Goodness of Fit Criteria |
Pareto |
|
|
Weibull |
ɣ
|
|
Exponential |
|
|
Gamma |
|
|
Lognormal |
|
|
3.4. Fitting Composite Distribution to Comprehensive Insurance Data
Table 3. The top five best composite distributions fitted to claims data.
Composite
Distribution |
Parameters (Head Distribution) |
Parameters (Tail Distribution) |
Goodness of fit |
Gamma-Weibull |
ɣ
|
ɣ
|
|
Para logistic-Weibull |
|
Ԑ
Ʋ
|
|
Inverse Paralogistic-Inverse Gaussian |
|
|
|
Lognormal-Burr |
|
|
|
Gamma-Invburr |
|
|
|
Table 3 above shows the results of the top five composite models out of the 240 composite models fitted to the data.
From the three criteria used in choosing the best candidate model, lognormal-burr was demonstrated to be the best candidate model for the data with an AIC of 1091.2, which is the least. This means that for the claims data, the body is best fitted with a lognormal distribution and the tail fitted with a burr distribution.
Table 4 below shows the threshold and mixing weights values for the top five composite distributions. The mixing weights which are given by
and
with
determines the appropriate segment of the data set which will be fitted to the body and tail distributions, respectively. For the best composite model, which is lognormal-Burr, 90.74% of the data points were fitted with the body distribution (Lognormal), whereas only 9.26% were fitted with the tail distribution. This clearly shows that the greater part of the losses was modelled with lognormal, whereas the remaining part was modelled by the burr distribution.
Table 4. Threshold and mixing weight values of the top five composite distributions.
Composite Distribution |
Threshold (
) |
Weight Parameter (
) |
Gamma-Weibull |
47.384 27 |
0.174 21 |
Paralogistic-Weibull |
39.3100 |
0.0008 |
Inverse Paralogistic-Inverse Gaussian |
25.9001 |
0.06086 |
Lognormal-Burr |
36.5539 |
0.401 797 1 |
Gamma-Invburr |
30.3393 |
354.3002 |
For the threshold and in a more general sense, given that
are the losses and
is the threshold value, then
are fitted by the body (head) distribution whereas
are fitted by the tail distribution. The best composite distribution, lognormal-burr, estimated that losses up to GH 36,553.9 can be modelled by the head distribution (lognormal) and losses greater than GH 36,553.9 can be modelled by the tail distribution (Burr).
3.5. Estimation of the Risk Measures
Using the top five composite models, we now estimate the associated VaR and TVaR estimates at the security levels 95% and 99%. VaR is the worst possible loss an insurance company is likely to pay on any given trading day. The measure, when estimated, can be reliably used as an estimate for reserves. The right estimation of reserves is critical as it is necessary to ensure that only the right amount is set aside to pay claims, in order to invest any leftover money to generate investment income.
From Table 5, the Lognormal-Burr distribution estimated that at a security level of 99%, a typical insurance company can make a loss of GH 30116.00 or GH 33210.00, respectively, for VaR and TVaR on any given day. This amount is substantially huge, and as such, knowing this amount can help in properly reserving it in order to meet all obligations of paying claims.
Table 5. Risk measures of the top five composite models (in GH1000).
Composite Distribution |
VaRα |
TVaRα |
95% |
99% |
95% |
99% |
Gamma-Weibull |
33.502 |
24.346 |
43.119 |
32.108 |
Para logistic-Weibull |
31.108 |
25.034 |
39.211 |
31.022 |
Inverse Paralogistic-Inverse Gaussian |
27.119 |
21.709 |
32.220 |
25.101 |
Lognormal-Burr |
30.116 |
29.098 |
33.210 |
28.205 |
Gamma-Inverse burr |
28.208 |
34.007 |
33.391 |
23.441 |
Now, given the differences in the VaR and TVaR values, it is essential to possibly find an average of the two risk measures to obtain an average risk estimate, which, to the best of our knowledge, may be more reliable. From the table, we realize that on average, from the two risk measures, a typical insurance company can pay claims of GH 31,663.00 on any given day. It is, however, important to note that this value is random.
3.6. Comparing Best Single Distribution to Best Composite Distribution
There are two folds in a comparison of the best single and best composite distribution. We compare their AIC values and probability plots (Q-Q plot) of the two distributions. From Table 2 and Table 3, the AIC values of lognormal and Lognormal-Burr distributions are respectively 5178.87 and 1091.2. A comparison of the values clearly shows that the composite Lognormal-Burr was a better fit than just using a lognormal distribution to fit the entire data.
Again, a comparison of the Q-Q plot of both the single distribution (Lognormal) in Figure 1 and the composite distribution (Lognormal-Burr) in Figure 2 shows that the composite distribution fitted the data better than the single distribution. This further supports the argument that composite distributions give a much better fit to claims data than a single distribution [5] [7] [8].
Figure 1. Q-Q plot of estimated Lognormal distribution.
Figure 2. Q-Q plot of estimated Lognormal-Burr distribution.
4. Conclusions
In conclusion, in fitting probabilistic distributions for risk estimation to insurance claims data, composite models have demonstrated superiority over single distributions. The results of the study showed that Lognormal distribution was identified as the most appropriate single distribution for the Ghanaian comprehensive insurance claim data. Also, the best composite distribution for the data was the Lognormal-Burr distribution. A comparison of the best single and best composite distribution for the same data showed that the best composite distribution far outperforms the single distributions when comparing their goodness of fit criteria and Q-Q plots.
Due to the model’s versatility in capturing extreme tail losses, Value at Risk (VaR) and Tail Value at Risk estimations were also estimated at the 95th and 99th percentile levels. The importance of using composite models for accurate risk estimation was expatiated in the study, which goes a long way to assisting insurance industries, particularly in markets such as the Ghanaian markets, where claims have heavy tails, since valuable insights for planning reserves and investment aspects of premiums collected by insurers were highlighted. The best fit composite distribution estimated a loss (VaR) of 30.116 (in thousands) at a 95% security level and 29.098 (in thousands) at a 99% security level. This loss amount is large and therefore needs careful planning of reserves to meet such obligations.