Estimation of Aggregate Losses of Secondary Cancer Using PH-OPPL and PH-TPPL Distributions ()
1. Introduction
Aggregate losses are estimated by in-cooperating both claim frequency and claim severity distributions. Pavel (2010) [1] reviewed methods used to calculate distributions of aggregate losses. Robertson (1992) [2] applied Discrete Fourier Transform in estimation of aggregate losses from frequency and severity distributions. Ronoet al. (2020) [3] developed compound distribution to model extreme natural disasters in Kenya. Mohamed et al. (2010) [4] introduced use of simulation approach in estimation of aggregate losses which can be employed when frequency and severity distribution cannot be combined to derive a compound distribution. Aggregate loss distributions are based on collective risk model expressed as:
(1)
where:
is the severity distribution and N is the claim count distribution. The distribution of N in this paper is considered to follow mixed PH Poisson distributions.
Phase type distributions are constructed, when mixture distributions are convoluted resulting to an interrelated Poisson process occurring in phases. Phase type distributions were introduced way back by Erlang (1909) [5] and it has been advanced by Marcel F. Neuts (1981) [6] and Assussen (2003) [7] among others. Mogens Bladt (2005) [8] introduced phase type distributions in risk theory while O’cinneide (2017) [9] highlighted on Phase type distributions as well as their invariant polytopes. Wu et al. (2010) [10] developed phase type distributions when frequency distributions followed Panjer class
while Kok et al. (2010) [11] used phase type distributions of Panjer class
to model claim frequency.
Markov chains were introduced by Andrei Markov (1856-1922). Nurul et al. (2019) [12] proposed a simple forecasting model of predicting the future air quality using Markov chains which in-cooperated the Markov chains as an operator of evaluating pollution distribution in the long run. Yajuan et al. (2018) [13] used Markov chains to model demand for stations in Bike sharing systems. In this study, the concept of Markov chains is used to determine the matrices of the phase type distributions used in modeling claim frequency.
Frequency data is used to model occurrences in different areas such as engineering, insurance, biology etc. Poisson distribution is often used to model count data; however, it is based on the assumption that variance to mean ratio is unity (equi-dispersion) which is not applicable to real data; hence, it is considered as an inflexible model. Most real life data either experience over dispersion where variance exceeds the mean or under dispersion where the mean exceeds the variance which can be modeled using Poisson mixtures [14]. Poisson Lindley distributions are perfect examples of Poisson mixtures where characteristics of Poisson distribution follow some characteristics of Lindley distribution. One parameter Poisson Lindley which can model over dispersed data was introduced by Sankaran (1970) [15] while Shanker and Mishra (2014) [16] developed two parameter Poisson Lindley which further research has justified that it can model over dispersed data.
In the insurance sector, when calculating aggregate losses for chronic diseases which have various stages like cancer the claim frequency distributions considered do not in-cooperate the different stages of such diseases. In-cooperating phase type distributions solve this short coming of ordinary distributions. Further considering mixed phase type distributions improves modeling of claim frequency data as it considers the heterogeneity aspect of claim data. In this paper, we develop PH one parameter Poisson Lindley distribution and PH two parameter Poisson Lindley distributions where the mixing distribution follows PH Lindley distribution. The resulting PH distributions are used to model claim numbers of secondary cancers in Kenya. Section 1 has a brief introduction to Poisson distributions and Poisson Lindley distributions.
The structure of this paper is as follows: Section 2 will discuss construction of phase type distribution using PH Lindley distributions which will later be applied in modeling of the aggregate losses. Compound distributions from the frequency and severity distributions are developed in Section 3. Aggregate losses for the data are estimated using Discrete Fourier Transforms and the results discussed in Section 4 and Section 5 outlines the conclusions.
2. Proposed Phase Type Poisson Lindley Distributions
In this section we develop phase type distributions for one parameter Poisson Lindley and two parameter Poisson Lindley. Phase type Poisson Lindley distributions are derived when the mixing distribution follow phase type Lindley distribution.
2.1. Phase Type One Parameter Poisson Lindley Distribution
Definition 1. A random variable X is said to be a phase type one parameter Poisson Lindley distribution if it follows:
for
and
is
matrix.
Theorem 1. If
distribution then the probability distribution function of X is:
(2)
where
is
and I is an identity matrix.
Proof:
If
and
, then the pdf of variable X is expressed as;
where
is
.
(3)
Properties of Phase Type One Parameter Poisson Lindley Distribution
The rth moments of PH-OPPL distribution is given by:
(4)
The expectation and variance of PH-OPPL distribution can be easily obtained from Equation (4) as:
1) Expectation
(5)
2) Variance
(6)
The probability generating function of PH-OPPL distribution is given by:
(7)
The parameter
of PH-OPPL distribution is estimated using continuous Chapman-Kolmogorov equation.
2.2. Phase Type Two Parameter Poisson Lindley Distribution
Definition 2. A random variable X is said to be a phase type two parameter Poisson Lindley distribution if it follows:
for
and
is
matrix.
Theorem 2. If
distribution then the probability density function of X is expressed as:
(8)
where
,
is
and I is an identity matrix.
Proof:
If
and
, then the pdf of variable X is given by;
where
is
.
(9)
Properties of Phase Type Two Parameter Poisson Lindley Distribution
The rth moments of PH-TPPL distribution is given by:
(10)
The expectation and variance of PH-TPPL distribution can be easily obtained from Equation (10) as:
1) Expectation
(11)
2) Variance
(12)
The probability generating function of PH-TPPL distribution is given by:
(13)
The value of
is known hence the value of
can be obtained from Equation (11) if the value of
is known.
2.3. Shape of Probability Function of PH-OPPL and PH-TPPL Distributions
Matrix
was determined using continuous Chapman-Kolmogorov equation for cancer data in Kenya and the values of
is the stationary probabilities obtained using the formula
. The values of
for three state Markov model represents cancer patients who transit from Healthy-Leukemia-Dead states, four state Markov model represents patients who transit from Healthy-Liver-Colon-Dead states, five state Markov model represents Healthy-Stomach-Pharynx-Colon-Dead states and six state Markov model represents patients transiting from Healthy-Oesophagus-Stomach-Lung-Kidney-Dead states. The values of
for different states are:
The shape of probability function of phase type one parameter Poisson Lindley is expressed as:
Figure 1 shows that phase type one parameter Poisson Lindley is a long tailed distribution.
The shape of probability function of phase type two parameter Poisson Lindley is expressed as:
Figure 2 shows that phase type two parameter Poisson Lindley is a long tailed distribution.
3. Compound Phase Type Distribution
Compound distribution in the actuarial field is the total loses in the group of insurance policies. In this section we develop compound phase type distributions (CPHD) which can be used to model secondary cancer cases.
Definition 3. Let N be a r.v with probability generating function
and
be a set of iid random variable with a common probability generating function
and is independent of N, then the probability generating function of the compound distribution is expressed as:
(14)
Unlike ordinary compound distributions which do not consider transition phases of diseases, (CPHD) in-cooperates the transition states. Probability generating functions of compound distributions can be derived by convolution of probability generating function of two distributions as shown in Equation (14).
Theorem 3 (Compound one parameter Poisson Lindley distribution). If the pgf of
the compound pgf of N is:
(15)
where
is the Laplace transform of the severity distribution as most continuous distributions their pgf is not available.
Proof:
(16)
Theorem 4 (Compound two parameter Poisson Lindley distribution). If the pgf of
the compound pgf of N is:
(17)
where
is as defined in theorem (3).
Proof:
(18)
The continuous distributions considered in this research are; Weibull, Pareto and Generalized Pareto distributions hence their Laplace transforms will be derived and replaced in Equations (16) and (18) to get the pgf of their compound distribution using PH-OPPL and PH-TPPL distributions respectively. The Laplace transform of Weibull, Pareto and Generalized Pareto are derived as:
1) Weibull distribution
(19)
2) Pareto distribution
(20)
3) Generalized Pareto distribution
(21)
Replacing Equations (19), (20) and (21) in Equation (16) the pgf of the compound distributions of PH-one parameter Poisson Lindley with Weibull, Pareto and Generalized Pareto respectively are:
1) Compound PH-OPPL-Weibull distribution
(22)
2) Compound PH-OPPL-Pareto distribution
(23)
3) Compound PH-OPPL-Generalized Pareto distribution
(24)
Replacing Equations (19), (20) and (21) in Equation (18) the pgf of the compound distributions of PH-two parameter Poisson Lindley with Weibull, Pareto and Generalized Pareto respectively are:
1) Compound PH-TPPL-Weibull distribution
(25)
2) Compound PH-TPPL-Pareto distribution
(26)
3) Compound PH-TPPL-Generalized Pareto distribution
(27)
4. Data Analysis, Results and Discussions
4.1. Severity and Frequency Probabilities
The cancer data considered in this research is obtained from a medical facility in Kenya. The cancer transitions states considered are Healthy-Leukemia-Dead states for 3 state model, Healthy-Liver-Colon-Dead states for four state model, Healthy-Stomach-Pharynx-Colon-Dead states for five state model and Healthy-Oesophagus-Stomach-Lung-Kidney-Dead states for six state models. The values of
for the data are obtained using continuous Chapman-Kolmogorov equations expressed as:
(28)
where:
The values of
for three, four, five and six state using the data obtained were as shown in Section 2.3.
The severity distributions considered in this research are Weibull, Pareto and Generalized Pareto distributions. DFT requires severity probabilities to be discrete hence they will be discretized using method of mass rounding which is expressed as:
The pdf of Wei-bull, Pareto and Generalized Pareto distributions respectively are expressed as;
The frequency and severity probabilities for secondary cancer cases are: (Table 1).
Table 1. Claim frequency and severity probabilities.
4.2. Discrete Fourier Transform
There are different numerical methods used in estimation of aggregate losses such as; Monte Carlo, Panjer recursive model, Fourier transforms and Direct Numerical Integration. Panjer recursive model is applicable when the claim frequency distributions follow either Panjer class
or class
. In this study we will consider Discrete Fourier Transform (DFT) in estimation of the aggregate losses. Robertson (1992) applied Fourier transforms in computation of aggregate losses [2]. Pavel (2010) [1] reviewed these numerical methods and concluded that each method had it strength and weaknesses hence they should be chosen according to the study. DFT mostly preferred as it is arguably said to be the most elegant and powerful technique in evaluation of aggregate loss probabilities when claim amount
is both discrete and continuous [17].
The algorithm of DFT of aggregate losses requires computation of DFT of frequency and DFT of severity separately.
Definition 4 (Discrete Fourier Transform). Let
be the severity or frequency distribution of the claim data. For any discrete function
the Discrete Fourier transform is the mapping;
(29)
Expression (29) is very complex to work with hence to reduce its complexity we apply Euler’s formula and it becomes:
(30)
which is the DFT of the severity or frequency probabilities. The severity and frequency probabilities are of length 8 and hence the matrix W must be a primitive 8th root of unity therefore Equation (30) can be rewritten as:
(31)
The frequency or severity probabilities will be padded with equal number of zero’s as its elements in order to perform no wrap convolution. The DFT algorithm is as follows:
1) Multiply the matrix
with the frequency or severity probabilities to get the DFT of frequency or severity probabilities.
2) Compute DFT of DFT of frequency and severity by multiplying DFT of frequency probabilities with the DFT of the severity probabilities and consequently multiplying the resulting vector with the matrix
.
3) Select the values without the complex i and divide each value by the number of elements in the vector of frequency or severity distribution and arrange the resulting probabilities in reverse except for the first probability.
4) Values corresponding to original frequency and severity values are the aggregate loss probabilities.
The values of aggregate loss probabilities using DFT are:
Table 2. Aggregate loss probabilities.
The values of Table 2 can be represented graphically as:
(a) (b) (c)
Figure 3. Aggregate loss probabilities.
Figure 3(a) shows aggregate loss probabilities using PH-OPPL distribution with severity distributions and it indicates that PH-OPPL with Weibull and Pareto were similar to the actual aggregate loss probabilities while PH-OPPL with generalized Pareto distribution overestimate the aggregate losses for six state model. Figure 3(b) shows aggregate loss probabilities using PH-TPPL distribution with Pareto and generalized Pareto provided a better fit for secondary cancer data while PH-TPPL with Weibull overestimated the aggregate losses. However, PH-OPPL with Weibull and PH-TPPL with generalized Pareto provided a better fit compared to PH-OPPL-Pareto model and PH-TPPL Pareto respectively hence they are compared in Figure 3(c) indicating that PH-OPPL with Weibull provided the best fit for aggregate loss data of secondary cancers in Kenya. PH-OPPL-Weibull model can be used to provide better estimates of aggregate losses for secondary cancer data in Kenya.
5. Conclusion
Mixed phase type distributions are developed to model secondary cancer cases in Kenya. Unlike ordinary distributions which do not in-cooperate the transition of different states, the distributions proposed here take into consideration transition states while modeling claim frequency data. The distributions are based on Poisson and Lindley distributions, where PH-OPPL-Weibull provided the best for PH-OPPL models while PH-TPPL-Generalized Pareto provided the best fit for PH-TPPL models. This model improves estimation of aggregate loses as it in-cooperates transition probabilities of different states of cancer as well as heterogeneous aspect of claim data. This greatly improves estimation of insurance policies for diseases which transit to different state such as cancer hence improving the financial positions of the insurance firms as it will improve estimation of its reserves. This model, however, is only applicable in risk theory for diseases which have multiple transitions states. Further research can be done on this study factoring in patients who were censored in this study and also the same study can be carried out for disease such as HIV-AID which has transition states.
Data Availability
The data used to support the findings of this study can be availed upon request.