Estimation of Aggregate Losses of Secondary Cancer Using PH-OPPL and PH-TPPL Distributions

Kenyan insurance firms have introduced insurance policies of chronic ill-nesses like cancer; however, they have faced a huge challenge in the pricing of these policies as cancer can transit into different stages, which consequently leads to variation in the cost of treatment. This has made the estimation of aggregate losses of diseases which have multiple stages of transitions such as cancer, an area of interest of many insurance firms. Mixture phase type distributions can be used to solve this setback as they can in-cooperate the transition in the estimation of claim frequency while also in-cooperating the heterogeneity aspect of claim data. In this paper, we estimate the aggregate losses of secondary cancer cases in Kenya using mixture phase type Poisson Lindley distributions. Phase type (PH) distributions for one and two parameter Poisson Lindley are developed as well their compound distributions. The matrix parameters of the PH distributions are estimated using continuous Chapman Kolmogorov equations as the disease process of cancer is continuous while severity is modeled using Pareto, Generalized Pareto and Weibull distributions. This study shows that aggregate losses for Kenyan data are best estimated using PH-OPPL-Weibull model in the case of PH-OPPL distribution models and PH-TPPL-Generalized Pareto model in the case of PH-TPPL distribution models. Comparing the two best models, PH-OPPL-Weibull model provided the best fit for secondary cancer cases in Kenya. This model is also recommended for different diseases which are dynamic in


Introduction
Aggregate losses are estimated by in-cooperating both claim frequency and claim severity distributions. Pavel (2010) [1] reviewed methods used to calculate distributions of aggregate losses. Robertson (1992) [2] applied Discrete Fourier Transform in estimation of aggregate losses from frequency and severity distributions. Rono et al. (2020) [3] developed compound distribution to model extreme natural disasters in Kenya. Mohamed et al. (2010) [4] introduced use of simulation approach in estimation of aggregate losses which can be employed when frequency and severity distribution cannot be combined to derive a compound distribution. Aggregate loss distributions are based on collective risk model expressed as: where: i X is the severity distribution and N is the claim count distribution. The distribution of N in this paper is considered to follow mixed PH Poisson distributions.
Phase type distributions are constructed, when mixture distributions are convoluted resulting to an interrelated Poisson process occurring in phases. Phase type distributions were introduced way back by Erlang (1909) [5] and it has been advanced by Marcel F. Neuts (1981) [6] and Assussen (2003) [7] among others. Mogens Bladt (2005) [8] introduced phase type distributions in risk theory while O'cinneide (2017) [9] highlighted on Phase type distributions as well as their invariant polytopes. Wu et al. (2010) [10] developed phase type distributions when frequency distributions followed Panjer class ( ) Kok et al. (2010) [11] used phase type distributions of Panjer class ( ) , ,1 a b to model claim frequency.
(2019) [12] proposed a simple forecasting model of predicting the future air quality using Markov chains which in-cooperated the Markov chains as an operator of evaluating pollution distribution in the long run. Yajuan et al. (2018) [13] used Markov chains to model demand for stations in Bike sharing systems. In this study, the concept of Markov chains is used to determine the matrices of the phase type distributions used in modeling claim frequency. Frequency data is used to model occurrences in different areas such as engineering, insurance, biology etc. Poisson distribution is often used to model count data; however, it is based on the assumption that variance to mean ratio is unity (equi-dispersion) which is not applicable to real data; hence, it is considered as an inflexible model. Most real life data either experience over dispersion where variance exceeds the mean or under dispersion where the mean exceeds the variance which can be modeled using Poisson mixtures [14]. Poisson Lindley distributions are perfect examples of Poisson mixtures where characteristics of Poisson distribution follow some characteristics of Lindley distribution. One parameter Poisson Lindley which can model over dispersed data was introduced by Sankaran (1970) [15] while Shanker and Mishra (2014) [16] developed two parameter Poisson Lindley which further research has justified that it can model over dispersed data.
In the insurance sector, when calculating aggregate losses for chronic diseases which have various stages like cancer the claim frequency distributions considered do not in-cooperate the different stages of such diseases. In-cooperating phase type distributions solve this short coming of ordinary distributions. Further considering mixed phase type distributions improves modeling of claim frequency data as it considers the heterogeneity aspect of claim data. In this paper, we develop PH one parameter Poisson Lindley distribution and PH two parameter Poisson Lindley distributions where the mixing distribution follows PH Lindley distribution. The resulting PH distributions are used to model claim numbers of secondary cancers in Kenya. Section 1 has a brief introduction to Poisson distributions and Poisson Lindley distributions.
The structure of this paper is as follows: Section 2 will discuss construction of phase type distribution using PH Lindley distributions which will later be applied in modeling of the aggregate losses. Compound distributions from the frequency and severity distributions are developed in Section 3. Aggregate losses for the data are estimated using Discrete Fourier Transforms and the results discussed in Section 4 and Section 5 outlines the conclusions.

Proposed Phase Type Poisson Lindley Distributions
In this section we develop phase type distributions for one parameter Poisson Lindley and two parameter Poisson Lindley. Phase type Poisson Lindley distributions are derived when the mixing distribution follow phase type Lindley distribution.

Properties of Phase Type One Parameter Poisson Lindley Distribution
The r th moments of PH-OPPL distribution is given by: The expectation and variance of PH-OPPL distribution can be easily obtained from Equation (4) as: The probability generating function of PH-OPPL distribution is given by: Theorem 2. If X PH TPPL − distribution then the probability density function of X is expressed as: , then the pdf of variable X is given by;

Properties of Phase Type Two Parameter Poisson Lindley Distribution
The r th moments of PH-TPPL distribution is given by:  The expectation and variance of PH-TPPL distribution can be easily obtained from Equation (10) as: The probability generating function of PH-TPPL distribution is given by: The value of Λ is known hence the value of α can be obtained from Equation (11) if the value of ( ) E x is known.

Shape of Probability Function of PH-OPPL and PH-TPPL Distributions
Matrix Λ was determined using continuous Chapman-Kolmogorov equation for cancer data in Kenya and the values of γ is the stationary probabilities obtained using the formula The shape of probability function of phase type one parameter Poisson Lindley is expressed as: Figure 1 shows that phase type one parameter Poisson Lindley is a long tailed distribution.
The shape of probability function of phase type two parameter Poisson Lindley is expressed as: Figure 2 shows that phase type two parameter Poisson Lindley is a long tailed distribution.

Compound Phase Type Distribution
Compound distribution in the actuarial field is the total loses in the group of insurance policies. In this section we develop compound phase type distributions (CPHD) which can be used to model secondary cancer cases.
 is the Laplace transform of the severity distribution as most continuous distributions their pgf is not available. Proof: The continuous distributions considered in this research are; Weibull, Pareto and Generalized Pareto distributions hence their Laplace transforms will be derived and replaced in Equations (16) and (18)    , Replacing Equations (19), (20) and (21) in Equation (16)

Severity and Frequency Probabilities
The cancer data considered in this research is obtained from a medical facility in The pdf of Wei-bull, Pareto and Generalized Pareto distributions respectively are expressed as; The frequency and severity probabilities for secondary cancer cases are: (Table 1). , ,1 a b . In this study we will consider Discrete Fourier Transform (DFT) in estimation of the aggregate losses. Robertson (1992) applied Fourier transforms in computation of aggregate losses [2]. Pavel (2010) [1] reviewed these numerical methods and concluded that each method had it strength and weaknesses hence they should be chosen according to the study. DFT mostly preferred as it is arguably said to be the most elegant and powerful technique in evaluation of aggregate loss probabilities when claim amount i X is both discrete and continuous [17].
The algorithm of DFT of aggregate losses requires computation of DFT of frequency and DFT of severity separately.
Definition 4 (Discrete Fourier Transform). Let n X be the severity or frequency distribution of the claim data. For any discrete function k X the Discrete Fourier transform is the mapping; ( ) Expression (29) is very complex to work with hence to reduce its complexity we apply Euler's formula and it becomes: which is the DFT of the severity or frequency probabilities. The severity and frequency probabilities are of length 8 and hence the matrix W must be a primitive 8 th root of unity therefore Equation (30) can be rewritten as: The frequency or severity probabilities will be padded with equal number of zero's as its elements in order to perform no wrap convolution. The DFT algorithm is as follows:  The values of Table 2 can be represented graphically as:   Figure 3(a) shows aggregate loss probabilities using PH-OPPL distribution with severity distributions and it indicates that PH-OPPL with Weibull and Pareto were similar to the actual aggregate loss probabilities while PH-OPPL with generalized Pareto distribution overestimate the aggregate losses for six state model. Figure 3(b) shows aggregate loss probabilities using PH-TPPL distribution with Pareto and generalized Pareto provided a better fit for secondary cancer data while PH-TPPL with Weibull overestimated the aggregate losses. However, PH-OPPL with Weibull and PH-TPPL with generalized Pareto provided a better fit compared to PH-OPPL-Pareto model and PH-TPPL Pareto respectively hence they are compared in Figure 3(c) indicating that PH-OPPL with Weibull provided the best fit for aggregate loss data of secondary cancers in Kenya. PH-OPPL-Weibull model can be used to provide better estimates of aggregate losses for secondary cancer data in Kenya.

Conclusion
Mixed phase type distributions are developed to model secondary cancer cases in Kenya. Unlike ordinary distributions which do not in-cooperate the transition of different states, the distributions proposed here take into consideration transition states while modeling claim frequency data. The distributions are based on Poisson and Lindley distributions, where PH-OPPL-Weibull provided the best for PH-OPPL models while PH-TPPL-Generalized Pareto provided the best fit for PH-TPPL models. This model improves estimation of aggregate loses as it in-cooperates transition probabilities of different states of cancer as well as heterogeneous aspect of claim data. This greatly improves estimation of insurance policies for diseases which transit to different state such as cancer hence improving the financial positions of the insurance firms as it will improve estimation of its reserves. This model, however, is only applicable in risk theory for diseases which have multiple transitions states. Further research can be done on this study factoring in patients who were censored in this study and also the same study can be carried out for disease such as HIV-AID which has transition states.

Data Availability
The data used to support the findings of this study can be availed upon request.