^{1}

^{*}

^{1}

^{1}

Claims experience in non-life insurance is contingent on random eventualities of claim frequency and claim severity. By design, a single policy may possibly incur more than one claim such that the total number of claims as well as the total size of claims due on any given portfolio is unpredictable. For insurers to be able to settle claims that may occur from existing portfolios of policies at some future time periods, it is imperative that they adequately model historical and current data on claims experience; this can be used to project the expected future claims experience and setting sufficient reserves. Non-life insurance companies are often faced with two challenges when modeling claims data; selecting appropriate statistical distributions for claims data and establishing how well the selected statistical distributions fit the claims data. Accurate evaluation of claim frequency and claim severity plays a critical role in determining: An adequate premium loading factor, required reserve levels, product profitability and the impact of policy modifications. Whilst the assessment of insurers’ actuarial risks in respect of their solvency status is a complex process, the first step toward the solution is the modeling of individual claims frequency and severity. This paper presents a methodical framework for choosing a suitable probability model that best describes automobile claim frequency and loss severity as well as their application in risk management. Selected statistical distributions are fitted to historical automobile claims data and parameters estimated using the maximum likelihood method. The Chi-square test is used to check the goodness-of-fit for claim frequency distributions whereas the Kolmogorov-Smirnov and Anderson-Darling tests are applied to claim severity distributions. The Akaike information criterion (AIC) is used to choose between competing distributions. Empirical results indicate that claim severity data is better modeled using heavy-tailed and skewed distributions. The lognormal distribution is selected as the best distribution to model the claim size while negative binomial and geometric distributions are selected as the best distributions for fitting the claim frequency data in comparison to other standard distributions.

The development of insurance business is driven by general demands of the society for protection against various types of risks of undesirable random events with a significant economic impact. Insurance is a process that entails the provision of an equitable method of offsetting the risk of a likely future loss with a payment of a premium. The underlying concept is to create a fund to which the insured members contribute predetermined amounts of the premium for given levels of loss. When the random events that policyholders are protected against occur giving rise to claims then claims are settled from the fund. The characteristic feature of such an arrangement is that the insured members are faced with a homogeneous set of risks. The positive aspect of forming such communities is the pooling together of risks which enables members to benefit from the weak law of large numbers.

In the non-life insurance industry, there is increased interest in the automobile insurance because it requires the management of a large number of risk events. These include cases of theft and damage to vehicles due to accidents or other causes as well as the extent of damage and parties involved [

One of the main challenges that non-life insurance companies face is to accurately forecast the expected future claims experience and consequently determining an appropriate reserve level and premium loading. The erratic results often reported by non-life insurance companies lead to the pre-conclusion that it is virtually impossible to conduct the business on a sound basis. Most non-life insurance companies base their estimations of claim frequency and severity on their own historical claims data. This is sometimes complemented with data from external sources and is used as a base for managerial decisions [

A general approach to claims data modeling is to consider separately claim count experience from the claim size experience. Both claim frequency and severity random variables; hence there is a risk that future claims experience will deviate from past eventualities. It is therefore imperative that appropriate statistical distributions are applied in modeling of claims data variables. Although the empirical distribution functions are useful tools in analyzing claims processes, it is always convenient to model claims data by fitting a probability distribution with mathematically tractable features.

Statistical modeling of claims data has gained popularity in the past recent years, particularly in the actuarial literature all in an attempt to address the problem of premium setting and reserves calculations. Hogg and Klugman [

When assessing portfolio claim frequency, it is often found that certain policies did not incur any claims since the insured loss event did not occur to the insured. Such cases result in many zero claim counts such that the claim frequency random variable takes the value zero with high probability. Antonio et al. [

Modeling claim frequency and claim severity of an insurance company is an essential part of insurance pricing and forecasting future claims. Being able to forecast future claim experience enables the insurance company to make appropriate prior arrangements to reduce the chances of making a loss. Such arrangements include setting a suitable premium for the policies and setting aside money required to settle future claims (reserves). A suitable premium level for an individual policy, with proper investment, should be enough to at least cover the cost of paying a claim on the said policy. Adequate reserves set aside should enable the insurance company to remain solvent, such that it can adequately settle claims when they are due, and in a timely manner. Bahnemann [

The objective of this paper is to present a hypothetical procedure for selecting probability models that approximately describe auto-insurance frequency and severity losses. The method employs fitting of standard statistical distributions that are relatively straightforward to implement. However, the commonly used distributions for modeling claims frequency and severity may not appropriately describe the actual claims data distributions and therefore may require modifications of standard distributions. Most insurance companies rely on existing models from developed markets that are normally customized to model their claims frequency and severity distributions. In practice, specific models from different markets cannot be generalized to appropriately model the claims data of all insurance companies as different markets have unique characteristics.

This paper gives a general methodology in modeling the claims frequency and severity using the standard statistical distributions that may be used as starting point in modeling claims frequency and severity. The actuarial modeling techniques are utilized in an attempt to fit an appropriate statistical probability distribution to the general insurance claims data and select the best fitting probability distribution. In this paper, a sample of the automobile portfolio datasets obtained from the insurance Data package in R with variables; Auto Collision, data Car, and data Ohlsson are used. These data are chosen since they are complete, freely and easily available in R statistical software package. However, any other appropriate dataset especially company-specific data may also be used. The parameter estimates for fitted models are obtained using the Maximum Likelihood Estimation procedure. The Chi-square test is used to check the goodness-of-fit for claim frequency models whereas the Kolmogorov-Smirnov and Anderson-Darling tests are used for the claim severity models. The AIC and BIC criteria are used to choose between competing distributions.

The remainder of the paper is organized as follows: Section 2 discusses the statistical modeling procedure for claims data. Section 3 presents the empirical results of the study. Finally, Section 4 concludes the study.

The statistical modeling of claims data involves the fitting of standard probability distributions to the observed claims data. Kaishev and Krachunov [

1) Selection of the claims distribution family

2) Estimation of parameters of the chosen fitted distributions

3) Specification of the criteria to select the appropriate distribution from the family of distributions.

4) Testing the goodness of fit of the approximate distributions.

The initial selection of the models is based on prior knowledge on the nature and form of claim’s data. Claim frequency is usually modeled using non-negative discrete probability distributions since the number of claims is discrete and non-zero. Claim severity is known to be best modeled using non-zero continuous distributions which are skewed to the right and have heavy tails. Kaas, et al. [

This study proposes a number of standard probability distributions that could be used to approximate the distributions for claim amount and claim count random variables. The binomial, geometric, negative-binomial and Poisson distributions are considered for modeling claim frequency as they are discrete. On the other hand, five standard continuous probability distributions are proposed for modeling claim severity. These are the exponential, gamma, Weibull, Pareto and the lognormal distributions. The probability distribution functions along with their parameters estimation and their respective properties are discussed subsequently.

In this paper, the parameters of the selected claim distributions are estimated using the maximum likelihood estimation (MLE) technique. The MLE is a commonly applied method of estimation in a variety of problems. It often yields better estimates compared to other methods like; least-squares estimation (LSE), the method of quantile and method of moments especially when the sample size is large. Boucher et al. [

Suppose X 1 , X 2 , ⋯ , X n is a random sample of independent and identically distributed (iid) observations drawn from an unknown population. Let X = x denote a realization of a random variable or vector X with probability mass or density function f ( x ; θ ) ; where θ is a scalar or a vector of unknown parameters to be estimated. The objective of statistical inference is to infer θ from the observed data. The MLE involves obtaining the likelihood function of a random variable. The likelihood function L ( θ ) is the probability mass or density function of the observed data x, expressed as a function of the unknown parameter θ . Given that X 1 , X 2 , ⋯ , X n have a joint density function f ( X 1 , X 2 , ⋯ , X n | θ ) for every observed sample of independent observations { x i } , i = 1 , 2 , ⋯ , n , the likelihood function of θ is defined by

L ( θ ) = L ( θ | x 1 , x 2 , ⋯ , x n ) = f ( x 1 , x 2 , ⋯ , x n | θ ) = ∏ i = 1 n f ( x i | θ ) (1)

The principle of maximum likelihood provides a means of choosing an asymptotically efficient estimator for a parameter or a set of parameters θ ^ as the value for the unknown parameter that makes the observed data “most probable”. The maximum likelihood estimate (MLE) θ ^ of a parameter θ is obtained through maximizing the likelihood function L ( θ ) .

θ ^ ( x ) = arg max θ L ( θ ) (2)

Since the logarithm of the likelihood function is a monotonically non-decreasing function of X, maximizing L ( θ ) is equivalent to maximizing log L ( θ ) . Therefore, the log of the likelihood function denoted as l ( θ ) is defined as

l ( θ ) = ln L ( θ ) = ln ∏ i = 1 n f ( x i | θ ) = ∑ i = 1 n ln f ( x i | θ ) (3)

The probability distribution functions of the selected claim distributions along with their maximum likelihood estimates for the parameters are given as follows:

The binomial distribution is a popular discrete distribution for modeling count data. Given a portfolio of n independent insurance policies, let X denote a binomially distributed random variable that represents the number of policies in the portfolio that result in a claim. The claim count variable X can be said to follow a binomial distribution with parameters n and p, where n is a known positive integer representing the number of policies on the portfolio and p is the probability that there is at least one claim on an individual policy. The probability distribution function of X is defined as:

f X ( x ) = ( n x ) p x ( 1 − p ) n − x , for x = 0 , 1 , ⋯ , n and 0 ≤ p ≤ 1 (4)

The expected value of the binomial distribution is np and variance n p ( 1 − p ) . Hence, the variance of the binomial distribution is smaller than the mean.

The corresponding binomial likelihood function is

L ( p ) = ( n x ) p x ( 1 − p ) n − x (5)

Therefore the log-likelihood function is

l ( p ) = k + x ln p + ( n − x ) ln ( 1 − p )

where k is a constant that does not involve the parameter p. Therefore, to determine the parameter estimate p ^ s we obtain the derivative of the log-likelihood function with respect to the parameter and equate to zero.

∂ l ( p ) ∂ p = x p − n − x 1 − p = 0 (6)

Solving the Equation (6) gives the MLE. Thus, the MLE is p ^ = x / n .

The Poisson distribution is a discrete distribution for modeling the count of randomly occurring events in a given time interval. Let X be the number of claim events in a given interval of time and l is the parameter of the Poisson distribution representing the mean number of claim events per interval. The probability of recording x claim events in a given interval is given by

f X ( x ; λ ) = e − λ λ x x ! , for x = 1 , 2 , 3 , ⋯ (7)

A Poisson random variable can take on any positive integer value. Unlike the Binomial distribution which always has a finite upper limit. In general, the expected value and variance functions of the Poisson distribution are both equal l.

The Poisson likelihood function is

L ( λ ) = ∏ i = 1 n λ x i e − λ x i ! = λ ∑ i = 1 n x i e − λ x 1 ! x 2 ! ⋯ x n ! (8)

Therefore, the log-likelihood function is

l ( λ ) = ∑ i = 1 n x i ln λ − n λ

Differentiating the log-likelihood function with respect to l, ignoring the constant term that does not depend on l and equating to zero

∂ l ( λ ) ∂ λ = 1 λ ∑ i = 1 n x i − n = 0 (9)

Solving the Equation (9) gives the MLE. Thus, the MLE is λ ^ = ∑ i = 1 n x i / n .

The geometric distribution is a discrete distribution that generates the probability that the first occurrence of an event of interest requires x independent trials, where each trial results in either success or failure, and the probability of success in any individual trial is constant. The probability distribution function of the geometric distribution is

f X ( x ; p ) = p ( 1 − p ) x ; x = 0 , 1 , 2 , ⋯ (10)

where p is the probability of success, and x is the number of failures before the first success. The geometric distribution is the only discrete memoryless random distribution analogous to the exponential distribution.

The likelihood function for n independent and identically distributed trials is

L ( p ) = ∏ i = 1 n p ( 1 − p ) x i = p n ∏ i = 1 n ( 1 − p ) x i (11)

The log-likelihood function is:

l ( p ) = n ln p + ln ( 1 − p ) ∑ i = 1 n x i

Thus, to determine the parameter estimate, equate the derivative of the log-likelihood function to zero.

∂ l ( p ) ∂ p = n p − 1 1 − p ∑ i = 1 n x i = 0 (12)

Therefore, the MLE is p ^ = 1 1 + x ¯ .

The negative binomial distribution is a generality of the geometric distribution. Consider a sequence of independent Bernoulli trials, with common probability p, until we get r successes. If X denotes the number of failures which occur before the r-th success, then X has a negative binomial distribution given by

f X ( x ; r , p ) = ( x + r − 1 r − 1 ) p r ( 1 − p ) x ; x = 0 , 1 , 2 , ⋯ (13)

The negative binomial distribution has an advantage over the Poisson distribution in modeling because it has two positive parameters p > 0 and r > 0 . The most important feature of this distribution is that the variance is bigger than the expected value. A further significant feature in comparing these three discrete distributions is that the binomial distribution has a fixed range but the negative binomial and Poisson distributions both have infinite ranges.

The likelihood function is

L ( p ; x , r ) = ∏ i = 1 n ( x i + r − 1 r − 1 ) p r ( 1 − p ) x i = ∏ i = 1 n ( x i + r − 1 r − 1 ) p n r ( 1 − p ) ∑ x i (14)

The log-likelihood function is obtained by taking logarithms

l ( p ; x , r ) = ∑ i = 1 n ln ( x i + r − 1 r − 1 ) + n r ln p + ∑ i = 1 n x i ln ( 1 − p )

Taking the derivative with respect to p and equating to zero.

∂ l ∂ p = n r p − ∑ x i 1 − p = 0 (15)

The resulting MLE estimate is p ^ = n r n r + x ¯ . Therefore, the negative binomial

random variable can be expressed as a sum of r independent, identically distributed (geometric) random variables.

The exponential distribution is a continuous distribution that is usually used to model the time until the occurrence of an event of interest in the process. A continuous random variable X is said to have an exponential distribution if its probability density function (pdf) is given by:

f X ( x ) = λ exp ( − λ x ) , x > 0 (16)

where λ > 0 is the rate parameter of the distribution. The exponential distribution is a special case for both the gamma and Weibull distribution.

The likelihood function is

L ( λ ) = ∏ i = 1 n λ exp ( − λ x i ) = λ n exp ( − λ ∑ i = 1 n x i ) (17)

The log-likelihood functions is therefore

l ( λ ) = − n ln λ − λ ∑ i = 1 n x i

The parameter estimator λ ^ is be obtained by setting the derivative of the log-likelihood function to zero and solving for λ

d ln l ( λ ) d λ = n λ − ∑ i = 1 n x i = 0 (18)

The resulting MLE is given by λ ^ = 1 / x ¯ where x ¯ = 1 n ∑ i = 1 n x i is the mean observed time.

The gamma distribution and Weibull distributions are extensions of the exponential distribution. Both of them are expressed in terms of the gamma function which is defined by

Γ ( x ) = ∫ 0 ∞ t x − 1 e − t d t , x > 0

The two-parameter gamma distribution function is given by

f X ( x ; α , β ) = β α Γ ( α ) x α − 1 exp ( − β x ) , x > 0 (19)

where α > 0 is the shape parameter and β > 0 is the scale parameter.

The likelihood function for the gamma distribution function is given by

L ( α , β ) = ∏ i = 1 n β α Γ ( α ) x i α − 1 e − β x i (20)

The log-likelihood functions is

l ( α , β ) = n ( α ln β − ln Γ ( α ) ) + ( α − 1 ) ∑ i = 1 n ln x i − β ∑ i = 1 n x i

Thus, to determine the parameter estimates, we equate the derivatives of the log-likelihood function to zero and solve the following equations

∂ l ( α , β ) ∂ α = n ( ln β ^ − d d α ln Γ ( α ^ ) ) + ∑ i = 1 n ln x i = 0 (21)

Hence

∂ l ( α , β ) ∂ β = n α ^ β ^ − ∑ i = 1 n x i = 0 or x ¯ = α ^ β ^ .

Substituting β ^ = α ^ / x ¯ in Equation (21) results in the following relationship for α ^ ,

n ( ln α ^ − ln x ¯ − d d x ln Γ ( α ^ ) + ln x _ _ _ _ _ ) = 0 . (22)

This result is a non-linear equation in α ^ that cannot be solved in a closed form. This can be solved numerically using the root-finding methods.

The Weibull distribution is a continuous distribution that is commonly used to model the lifetimes of components. The Weibull probability density function has two parameters, both positive constants that determine its location and shape. The probability density function of the Weibull distribution is

f X ( x ; γ , β ) = γ β γ x γ − 1 exp ( − ( x β ) γ ) ; x > 0 , γ > 0 , β > 0 (23)

where γ is the shape parameter and β the scale parameter. When γ = 1 , the Weibull distribution is reduced to the exponential distribution with parameter λ = β .

The likelihood function for the Weibull distribution is given by:

L ( γ , β ) = ∏ i = 1 n ( γ β ) ( x i β ) γ − 1 exp ( − ( x i β ) γ ) (24)

The log-likelihood function is therefore given by

l ( γ , β ) = n ln γ − n γ ln β + ( γ − 1 ) ∑ i = 1 n ln x i − ∑ i = 1 n ( x i β ) γ

Thus, to determine the parameter estimates, we equate the derivatives of the log-likelihood function to zero and solve the following equations

∂ l ( γ , β ) ∂ γ = n γ − n ln β + ∑ i = 1 n ln x i − 1 β γ ∑ i = 1 n x i γ ln x i = 0 (25)

∂ l ( γ , β ) ∂ β = − n γ β + γ β γ − 1 ∑ i = 1 n x i γ = 0 (26)

By eliminating β and simplifying, we obtained the following non-linear equation

∑ i = 1 n x i γ ln x i ∑ i = 1 n x i γ − 1 γ − 1 n ∑ i = 1 n ln x i = 0 (27)

This can be solved numerically to obtain the estimate of γ by using the

Newton-Raphson method. The MLE estimate for β is given by β ^ = 1 n ∑ i = 1 n x i γ

A positive random variable X is log-normally distributed if the logarithm of the random variable is normally distributed. Hence X follows a lognormal distribution if its probability density function is given by

f X ( x ; μ , σ ) = 1 2 π x σ exp ( − ( ln x − μ ) 2 2 σ 2 ) , x > 0 (28)

with parameters: location μ , scale σ > 0 .

The likelihood function for the lognormal distribution is

L ( μ , σ ) = ∏ i = 1 n 1 2 π x i σ exp ( − ( ln x i − μ ) 2 2 σ 2 ) (29)

Therefore the log-likelihood function is given by

ln L ( μ , σ ) = − n ln σ − n 2 ln 2 π − ∑ i = 1 n [ ln x i + 1 2 σ 2 ( ln x i − μ ) 2 ] .

The parameter estimators μ ^ and σ ^ for the parameters μ and σ can be determined by equating the derivatives of the log-likelihood function to zero and solve the following two equations

∂ ln L ( μ , σ ) ∂ μ = 0 and ∂ ln L ( μ , σ ) ∂ σ = 0 (30)

The resulting estimates are μ ^ = 1 n ∑ i = 1 n ln x i and σ ^ = 1 n ∑ i = 1 n ( ln x i − μ ^ ) 2

respectively.

The Pareto distribution with parameter α > 0 and γ > 0 , is given by

f X ( x ; α , γ ) = α γ α ( x + γ ) α + 1 , x ≥ 0 (31)

where α > 0 represent the shape parameter and γ > 0 the scale parameter.

The likelihood function is

L ( α , γ ) = ∏ i = 1 n α γ α x i α + 1 , 0 < γ ≤ min { x i } , α > 0

The log-likelihood function

l ( γ , α ) = n ln ( α ) + α n ln ( γ ) − ( α + 1 ) ∑ i = 1 n ln ( x i ) (32)

It is important to note that the best way to maximize the log-likelihood function is by adjusting as follows, γ ^ = min { x i } such that γ cannot be larger than the smallest value of x in the data. Thus, to determine the parameter estimate α ^ equates the derivative of the log-likelihood function to zero we solve the following equations

∂ ln L ( γ , α ) ∂ α = n α + n ln ( γ ) − ∑ i = 1 n ln ( x i ) = 0 (33)

Thus, we obtain α ^ = n / ∑ i = 1 n ln ( x i γ ^ )

After estimating the parameters of the selected distributions, the next step is typically the determination of the best fitting distribution, i.e., the distribution that provides the best fit to the data set at hand. Establishing the underlying distribution of a data set is fundamental for the correct implementation of claims modeling procedures. This is an important step that must be implemented before employing the selected distributions for forecasting insurance claims or pricing insurance contacts. The consequences of misspecification of the underlying distribution may prove very costly. One way of dealing with this problem is to assess the goodness of fit of the selected distributions.

A goodness-of-fit (GoF) test is a statistical procedure that describes how well a distribution fits a set of observations by measuring the quantifiable compatibility between the estimated theoretical distributions against the empirical distributions of the sample data. The GoF tests are effectively based on either of the two distribution functions: the probability density function (PDF) or cumulative distribution function (CDF) in order to test the null hypothesis that the unknown distribution function is, in fact, a known specified function. The tests considered for testing the suitability of the fitted distributions to claims data include; the Chi-Square goodness of fit test, Kolmogorov-Smirnov (K-S) test, and the Anderson-Darling (A-D) test. The three GoF tests were selected for two reasons. First, they are among the most common statistical test for small samples (and they can as well be used for large samples). Secondly, the tests are implemented in most statistical packages and are therefore widely used in practice. The Chi-Square test is based on the PDF while both, the K-S and A-D GoF tests use the CDF approach and therefore they are generally considered to be more powerful than the Chi-Square goodness of fit test.

For all the GoF tests, the hypotheses of interest are:

H_{0}: The claims data sample follows a particular distribution,

H_{1}: The claims data samples do not follow the particular distribution.

The Chi-Square goodness of fit test is used to test the hypothesis that the distribution of a set of observed data follows a particular distribution. The Chi-square statistic measures how well the expected frequency of the fitted distribution compares with the observed frequency of a histogram of the observed data. The Chi-square test statistic is:

χ 2 = ∑ j = 1 k ( O j − E j ) 2 E j (34)

where O j is the observed number of cases in interval j, E j is the expected number of cases in interval j of the specified distribution and k is the number of intervals the sample data is divided into. The test statistic approximately follows a Chi-Squared distribution with k − p − 1 degrees of freedom where p is the number of parameters estimated from the (sample) data used to generate the hypothesized distribution.

The Kolmogorov-Smirnov (K-S) test compares a hypothetical or fitted cumulative distribution function (CDF) F ^ ( x ) with an empirical F n ( x ) CDF in order to assess the goodness-of-fit of a given data set to a theoretical distribution. The CDF uniquely characterizes a probability distribution. The empirical F n ( x ) CDF is expressed as the proportion of the observed values that are less than or equal to x and is defined as

F n ( x ) = I ( x ) n (35)

where n is the size of the random sample, I ( x ) is the number of xi’s less than or equal to x. To test the null hypothesis, the maximum absolute distance between empirical F n ( x ) CDF and fitted CDF F ^ ( x ) and is computed. The K-S test statistic D n is the largest vertical distance between F n ( x ) and F ^ ( x ) for all values of x; i.e.

D n = sup x | F n ( x ) − F ^ ( x ) | (36)

where F ( x ) is the theoretical cumulative distribution of the distribution being tested which must be a continuous distribution, and it must be fully specified.

The Anderson-Darling (A-D) test is an alternative to other statistical tests for testing whether a given sample of data is drawn from a specified probability distribution. The test statistic is non-directional, and is calculated using the following formula:

AD = − n − 1 n ∑ i = 1 n ( 2 i − 1 ) [ ln ( x ( i ) ) + ln ( 1 − ( x ( n + 1 − i ) ) ) ] (37)

where { x ( 1 ) < ⋯ < x ( n ) } is the ordered sample of size n and F n ( x ) is the

underlying theoretical distribution to which the sample is compared. The Chi-square goodness-of-fit test is applied to select the appropriate discrete claim frequency distribution while the K-S and A-D tests are used to select the continuous claim severity distributions. Therefore, for the GoF tests considered the null hypothesis that the sample data are taken from a population with the underlying distribution F n ( x ) is rejected if the p-value is small than the criterion value at a given significance level α , such as 1% or 5%.

For all the selected claims frequency and claims severity distributions that pass the GoF tests, information criterion, namely the Akaike’s information criterion (AIC) developed by Akaike [

AIC = − 2 l ( θ ^ | x ) + 2 k (38)

where l ( θ ^ | x ) is the log-likelihood function, θ ^ the estimated parameter vector of the model, x is the empirical data and k the length of the parameter vector. The first part − 2 l ( θ ^ | x ) is a measure of the goodness-of-fit of the selected model and the second part is a penalty term, penalizing the complexity of the model. In contrast to the AIC the Bayesian information criterion (BIC) or Schwarz Criterion (SBC), comprises of the number of observations in the penalty term. Thus in BIC, the penalty for additional parameters is stronger than that of the AIC. Apart from that the BIC is similar to the AIC and is defined as

AIC = − 2 l ( θ ^ | x ) + 2 log ( n ) (39)

where l ( θ ^ | x ) is the log-likelihood function, θ ^ the estimated parameter vector of the model with length k and x the empirical data vector of length n. As the AIC, the first term is a measure of the goodness-of-fit and the second part is a penalty term, comprising of the number of parameters as well as the number of observations. Note that for both the AIC and BIC the model specification with the lowest value implies the best fit.

The data set used in modeling the claims frequency and claims severity distributions consists of three data sets: AutoCollision, dataCar and dataOhlsson obtained from the R package insuranceData. Autocollision data set is a sample of 32 observations due to [

AutoCollision | dataCar | dataOhlsson | ||||
---|---|---|---|---|---|---|

Frequency | Severity | Frequency | Severity | Frequency | Severity | |

No. Obs | 32 | 32 | 67,856 | 67,856 | 64,548 | 64,548 |

Minimum | 5.00 | 153.62 | 0.00 | 0.00 | 0.00 | 0.00 |

Maximum | 970 | 797.80 | 4 | 55,922 | 2 | 365,347 |

Mean | 279.44 | 276.35 | 0.073 | 137.27 | 0.01 | 264.02 |

Median | 208.00 | 250.53 | 0.00 | 0.00 | 0.00 | 0.00 |

Std.dev | 241.61 | 110.45 | 0.2828 | 1058.30 | 0.10 | 4690.42 |

Skewness | 1.19 | 3.22 | 4.07 | 17.50 | 10.5 | 30.6 |

Kurtosis | 3.84 | 15.67 | 21.50 | 482.9 | 121.30 | 1336.00 |

Occurrence | dataCar | dataOhlsson | ||
---|---|---|---|---|

Frequency | Percentage | Frequency | Percentage | |

0 | 63,232 | 93.18% | 63,878 | 98.96% |

1 | 4333 | 6.39% | 643 | 1.00% |

2 | 271 | 0.40% | 27 | 0.04% |

3 | 18 | 0.03% | ||

4 | 2 | 0.00% | ||

Total | 67,856 | 100% | 64,548 | 100% |

binomial instead of Poisson distribution to model the data.

The parameters of the claims frequency and claims severity fitted distributions are estimated using the maximum likelihood estimation method and is implemented via packages in R statistical software.

In order to have better comparisons among the fitted distributions, we us the LLF, AIC and BIC values of the fitted distributions. For the Auto Collision the geometric distribution has the lowest AIC followed by the negative binomial distribution. Both data Car and data Ohlsson have the Poisson distribution with the lowest AIC value followed by the negative binomial distribution. The over-dispersion observed in the data would suggest that Negative binomial or Geometric distributions might be good candidates for the modeling claims frequency data. This will be verified later by the Chi-square goodness-of-fit test.

dataCar | dataOhlsson | |||
---|---|---|---|---|

Frequency | Severity | Frequency | Severity | |

Claims Count | 67,856 | 4624 | 64,548 | 670 |

Minimum | 0.00 | 200 | 0.00 | 16 |

Maximum | 4 | 55,922 | 2 | 365,000 |

Mean | 0.073 | 2014 | 0.01 | 25,000 |

Median | 0.00 | 762 | 0.00 | 9015 |

Std.dev | 0.2828 | 3549.65 | 0.10 | 38729.83 |

Skewness | 4.07 | 5.04 | 10.46 | 3.10 |

Kurtosis | 21.50 | 43.20 | 121.20 | 17.52 |

the likelihood function (LLF), AIC and BIC values for the fitted claims severity distributions. The parameter estimates for all the fitted distributions are obtained for the three data sets except for those of the Pareto distribution in the Auto Collision data. The LLF, AIC and BIC criteria are employed for purposes of selecting the appropriate distribution among the fitted distributions. The distribution function with the maximum LLF and lowest AIC or BIC values is

Distribution | Parameter(s) | AutoCollision | dataCar | dataOhlsson |
---|---|---|---|---|

Binomial | n p (s.e) LLF AIC BIC | 8942 0.03125122 (0.00032496) −3249.72 6501.441 6502.907 | 67856 1.072227e−06 (0.000000) − − − | 64548 1.672889e−07 (0.000000) − − − |

Geometric | Probability (s.e) LLF AIC BIC | 0.003565857 (0.00057441) −212.3061 426.6122 428.0779 | 0.4836314 (0.00511074) −6622.056 13246.11 13252.55 | 0.4901244 (0.0135207) −947.2655 1896.531 1901.038 |

Negative Binomial | size (s,e) mu (s.e) LLF AIC BIC | 1.216671 (0.2757261) 279.447627 (44.8845448) −211.9508 427.9016 430.8331 | 8.392017e+06 (1.43127302) 1.067867 (0.01519796) −4840.089 9684.177 9697.055 | 1.088848e+07 (0.000000) 1.040076 (0.03939565) −688.1782 1380.356 1389.371 |

Poisson | Lambda (s.e) LLF AIC BIC | 279.4375 (2.955068) −3144.531 6291.062 6292.528 | 1.06769 (0.01519544) −4840.088 9682.177 9688.616 | 1.040299 (0.03940408) −688.1781 1378.356 1382.863 |

considered to be the best fit. The lognormal distribution as the highest log-likelihood function and also the smallest AIC and BIC values hence the best fit amongst all the distributions fitted and for all three datasets. The log likelihood of the gamma distribution is reasonably closer to that of the lognormal and it is the second best fitting distribution. The other distributions have values that are extremely out of the range. However, the LLF, AIC and BIC at best shows which distributions are better than the competing distributions but do not necessarily qualify the chosen distributions as the most appropriate. To remedy this problem, the Kolmogorov Smirnov and Anderson-Darling goodness-of-fit tests are used to check for the goodness-of-fit of these claim severity distributions. These goodness-of-fit tests verify whether the proposed theoretical distributions provide a reasonable fit to the empirical data.

The purpose of goodness-of-fit tests is typically to measure the distance between the fitted parametric distribution and the empirical distribution: e.g., the

Distribution | Parameter(s) | AutoCollision | dataCar | dataOhlsson |
---|---|---|---|---|

Exponential | Rate (s.e) LLF AIC BIC | 0.0036186 (0.0005856) −211.8936 425.7873 427.253 | 0.007284904 (2.742692e−05) −401839.9 803681.8 803690.9 | 0.003787624 (1.376899e−05) −424468.7 848939.4 848948.5 |

Gamma | Shape (s.e) Scale (s.e) LLF AIC BIC | 10.14141 (2.470826) 0.036695 (0.009158) −187.1523 378.3046 381.2361 | 0.7500861 (0.000000) 2686.2118 (0.000000) −39662.92 79329.85 79342.72 | 0.5951737 (0.000000) 42728.44 (0.000000) −7392.141 14788.28 14797.3 |

Pareto | Shape (s.e) Scale (s.e) LLF AIC BIC | N/A | 2.046569 (0.08776018) 2206.086511 (132.315102) −39169.85 78343.7 78356.58 | 1.482972 (0.211924) 16839.7191 (3878.686483) −7377.696 14759.39 14768.41 |

Lognormal | meanlog (s.e) sdlog (s.e) LLF AIC BIC | 5.5715751 (0.05141875) 0.2908684 (0.03635662) −184.1801 372.3603 375.2917 | 6.810081 (0.01748793) 1.189179 (0.01236580) −38852.15 77708.31 77721.19 | 9.104990 (0.06241052) 1.615456 (0.04413083) −7372.376 14748.75 14757.77 |

Weibull | Shape (s.e) Scale (s.e) LLF AIC BIC | 2.459737 (0.2704631) 309.842046 (23.7470479) −194.4251 392.8502 395.7817 | 0.7857048 (0.0081805) 1690.905575 (33.6783355) −39491.6 78987.19 79000.07 | 0.7026427 (0.02048285) 20437.75 (1089.2853) −7377.065 14758.13 14767.14 |

distance between the fitted cumulative distribution function F ^ ( x ) and the empirical distribution function F n ( x ) . The Chi-square goodness-of-fit test is used to select the most appropriate claims frequency distribution among the fitted discrete probability distributions.

The p-values are used to determine whether or not to reject the null

Distribution | AutoCollision | dataCar | dataOhlsson |
---|---|---|---|

TestStatistic (p-value) | TestStatistic (p-value) | Test Statistic (p-value) | |

Binomial | 1.9155e24 (0.0000) | 98.734 (0.0000) | 148.01 (0.0000) |

Geometric | 1.015653 (0.6018) | 1.868321 (0.17166) | 54.58665 (0.0000) |

Negative Binomial | 0.514027 (0.4734) | 0.255171 (0.61346) | 1.493632 (0.22165) |

Poisson | 1.6768e17 (0.0000) | 98.7294 (0.0000) | 184.002 (0.0000) |

hypotheses. The Negative-Binomial and geometric distributions have p-values greater than α = 0.01 in almost all cases except for the geometric distribution for data Ohlsson. The p-values for the binomial and Poisson distributions for all the three data sets are zero; hence the null hypothesis that claims data follows a particular distribution is rejected for the binomial and Poisson distributions with a 99% confidence level. Thus, the Chi-square goodness-of-fit test confirmed that both negative binomial and geometric distributions are appropriate distributions for modeling claims frequency while binomial and Poisson distributions provide the inappropriate fit.

To select the most appropriate continuous distributions for the claims severity, two goodness-of-fit statistics are typically considered: K-S and A-D statistics [

Distribution | K-S test statistic | A-D test statistic |
---|---|---|

Auto Collision | ||

Exponential | 0.4695586 | 8.1269415 |

Gamma | 0.1605827 | 1.2760080 |

Weibull | 0.2339492 | 2.8469195 |

Pareto | - | - |

Lognormal | 0.1410449 | 0.8257456 |

DataCar | ||

Exponential | 0.1870179 | 341.7652236 |

Gamma | 0.1502237 | 191.3327388 |

Weibull | 0.1704634 | 139.5430172 |

Pareto | 0.1627262 | 87.9184259 |

Lognormal | 0.1021038 | 72.4949308 |

DataOhlsson | ||

Exponential | 0.21022191 | 59.0055091 |

Gamma | 0.09457255 | 7.97122388 |

Weibull | 0.07629782 | 4.48316734 |

Pareto | 0.05641147 | 3.01466283 |

Lognormal | 0.04244645 | 1.82690162 |

Non-life insurance companies require an accurate insurance pricing system that makes adequate provision for contingencies, expenses, losses, and profits. Claims incurred by an insurance company form a large part of the cash outgo of the company. An insurance company is required to model its claims frequency and severity in order to forecast future claims experience in order to prepare adequately for claims when they fall due. In this paper, selected discrete and continuous probability distributions are used as approximate distributions for modeling both the frequency and severity of claims made on automobile insurance policies. The probability distributions are fitted to three datasets of insurance claims in an attempt to determine the most appropriate distributions for describing non-life insurance claims data.

The findings from empirical analysis indicate that claims severity distribution is more accurately modeled by a skewed and heavy-tailed distribution. Amongst the continuous claims severity distributions fitted, the lognormal distribution is selected as a reasonably good distribution for modeling claims severity. On the other hand, negative binomial and geometric distributions are selected as the most appropriate distributions for the claims frequency compared to other standard discrete distributions. These results may be considered as informative assumed models for automobile insurance companies while choosing models for their claims experience and making liability forecasts. The forecasts obtained under such distributions are however suitable for projecting the short-term claims behavior. For long-term usage, it is recommended that the company uses its own claims experience to make necessary adjustments to the distributions. This would allow for anticipated changes in the portfolios and for company specific financial objectives. These proposed claims distributions would also be useful to insurance regulators in their own assessment of required reserve levels for various companies and in checking for solvency.

A suggestion for further research, interested parties may look into extensions of the standard probability distributions covered here such as the zero-inflated models. Due to a large number of zeros in claims frequency data, there is need to consider other distributions such as the zero-truncated Poisson or zero-truncated negative binomial and zero-modified distributions from the class ( a , b , 1 ) to model this unique phenomenon.

Omari, C.O., Nyambura, S.G. and Mwangi, J.M.W. (2018) Modeling the Frequency and Severity of Auto Insurance Claims Using Statistical Distributions. Journal of Mathematical Finance, 8, 137-160. https://doi.org/10.4236/jmf.2018.81012