Modeling Bursts and Heavy Tails in Inter-Arrival Claims in Non-Life Insurance

Current insurance models, assuming that inter-arrival time of claims, are distributed randomly and thus well approximated by Poisson processes. Here we provide clear proof that the timing of inter-claims fits by non-Poisson patterns, marked by rapid events, separated by long periods of inactivity. The time of inter-arrival claims will be heavy tailed, most claims will be executed quickly, while a few will have very long waiting times. We will model and analysis of insurance based on claim inter-arrival time, the time interval between two successive claims and the ability to carry out such modeling was limited by a lack of ecologically relevant data collected on claims inter-arrival. We propose a structured process behavior model based on data from Egyptian fire insurance company. Our analysis shows that claim activities can be represented by non-Poisson processes and that the subsequent distribution of inter-arrival activity times follows the Pareto distribution. These results will help researchers understand daily behavioral trends and create more sophisticated predictive models of claims.


Introduction
It is becoming increasingly important to understand the nature of the claims acts. Indeed, the quantitative discovery of the laws governing the probability of ruin is of major scientific importance, and requires us to tackle the factors that determine the timing of claims. Certainly, the interest in addressing the timing of statements in ruin probability is not new: it has a long history in mathematical literature, contributing to the emergence of some of the core principles in probability theory, Feller (1971). But most existing ruin probability models presume tion forces successive events to match each other at relatively frequent time intervals and prevents very long waiting periods. Oliveira & Barabási (2005), on the other hand, the slowly decaying heavy tailed processes allow for very long periods of inactivity, which distinguish the bursts of intense activity.
In this paper, we propose formal models of (arrival process) inter-arrival claims. Specifically, we study and model the sequences and pacing of the Egyptian insurance company's claims time inter-arrival. The availability of these models offers a basis for allegations related to fire claims. Due to constraints on real world data collection techniques, previous ruin probability models did not provide adequate details on the complex properties of inter-arrival claims. Usually, they believed that inter-arrival claims could be based on Poisson processes and that inter-arrival time, or time interval between two successive claims, follows an exponential distribution. This statement indicates that claims take place at a constant rate. Nevertheless, this model does not capture the variations that occur in the arrival rate of the operation. Recently, researchers proposed using heavy tailed distributions to explain the many dynamics our approach in this paper is to create a general model of the arrival process involving the collection of realtime data in daily environments based on Egyptian fire insurance companies. To investigate that behavior, we use a case study with 10 years of data from one Egyptian insurance company. This behavior-driven by the company shows that Claims inter-arrival time routines can be modeled by non-Poisson processes. The time of inter-arrival operation is accompanied by a heavy tailed distribution, precisely the Pareto distribution. Our analysis offers evidence to support an inter-arrival claims hypothesis, the Pareto model and its properties, such as the 80/20 law, may be useful for the analysis of inter-arrival claims. The results of this study will provide the ability to simplify the treatment and design of claims behavioral interventions.

Objectives of Study
The main objective is to model inter-arrival time of claims in Insurance Company.

Justification of the Study
These results will help researchers understand daily behavioral trends and create more sophisticated predictive models of claims.
By estimating commercial fire loss insurance risk on business-line and event-type levels, we are able to present the estimates in a more balanced fashion and the results may help non-life insurance companies to manage their risk.

Research Structure
• Test if inter-arrival time of claims be heavy tailed and follow pareto distribution.
• I used "easy-fit" software for: 1) Exploratory data analysis; 2) Goodness of fit tests included KS-test, AD test, chi-squared test.
• I find that show that the pareto distribution is the best one among 56 continous distribution according to KS test, and also chi-squared test.
The remainder of the paper is organized as follows, in Section 2, the Poisson process for inter-arrival claims, which predicts an exponential distribution of interevent times. In Section 3, related works. In Section 4, pareto distribution, in Section 5, we present proof that the power law tail characterizes the interevent time probability density function of claims. In Section 6, the conclusion.

Poisson Process
Inter-arrival and waiting time distributions n T t N t n =≥ = = time of arrival of n-th claim (or waiting time until the n-th claim arrival). Put 1 , 1, 2, n n n A T T n − = − = so that n A time between (n − 1)-th and n-th claim arrivals. Recall from our initial comments that we had in fact defined the process, see Rolski et al. (1999 Next let us consider the joint distribution of ( ) where H is a function depending only on 1 t : Consequently the joint probability density function ( )

T T is given by
To find the joint distribution of ( ) 1 2 , A A from the above, note that The linear transformation given by the (2 × 2) matrix in (13) Thus 1 2 , A A are independent random variables each having an exponential distribution with parameter λ . See Billingsley (1968).
With more effort, one can prove  Feller (1969).
In view of the argument above for the case 2 n = , the general idea of the proof is clear. One proves rst that the joint distribution function of 1 2 , , , n T T T  is given by is a sum of a nite number of terms; each term is a product of powers of i t and e j t −λ with at least one , 2 k t k ≥ missing! Establishing this is the tedious part of the proof. Once this is done the joint probability density function of 1 2 , , , Note that the analogue is see Delampady et al. (2001) One can now proceed exactly as in the earlier case to obtain the theorem. The reader is invited to work out the details at least when 3, 4 n = .
Note: As 1 A has Exp (λ) distribution, its expectation is given by ( ) so 1 λ is the mean arrival time. Thus the arrival rate being λ is consistent with this conclusion. See Bingham et al. (1987).
Note: It is an easy corollary of the theorem that Remark 4: One can also go in the other direction. That is, let See Ethier & Kurtz (1986), Then the stochastic process can be shown to be time homogeneous Poisson process with rate λ . In the jargon of the theory of stochastic processes, Poisson process is the renewal process with i.i.d. exponential arrival rates.

Related Work
Maturing pervasive computing technologies have sparked a new wave of human behavior analysis and resulted in new theories regarding human behavior patterns. Barabasi's study of the timing of consecutive electronic and physical mail messages sparked a model of human dynamics as a heavy-tailed distribution see Journal of Financial Risk Management Oliveira & Barabási (2005) and Bees et al. (2005). A queuing model and heavy-tailed distribution were introduced in Barabasi's study to explain the large time gap between sent messages after a burst of responses.
After Barabasi's discovery, scientists use heavy tailed distributions to explain human behavior in diverse domains, ranging from social science to health care, see Andriani & McKelvey (2009). In the social network field, heavy-tailed distributions are used to characterize the dynamics of popularity based on diverse digital platforms, such as Wikipedia, blog posts, Android applications, Web pages, and Twitter see Leskovec et al. (2007) and Yu et al. (2017). As an example, Li et al. (2015) show that the behavior-based popularity of Android applications follows the Pareto principle. Tsompanidis et al. (2014) also discover that web traffic flow size can be explained by the Pareto distribution. Similarly, researchers presented a list of social and organizational power laws, one kind of heavy-tailed distribution, to describe human behavior see Scholz (2015) and Andriani & McKelvey (2009). Specifically, the power law distribution identifies the number of inter-firm relationships observed from linkages between firms: suppliers, customers, and owners see Dewes et al. (2003) and Saito et al. (2007).
Further, scientists use heavy-tailed distributions to model and predict human mobility see Mainardi et al. (2000) and Gallotti et al. (2016). For example, GPS-based human movement patterns can be captured by heavy-tailed flights for different transportation modes, including walking/running and car/taxi see Hong (2010) Regardless of transportation modes, the distribution of user's moving distances, from visited locations to the target location, can be modelled by the Pareto distribution see Zhu et al. (2015).
Evidence that non-Poisson activity patterns characterize human activity has first emerged in computer communications, where the timing of many human driven events is automatically recorded, see Gonzalez et al. (2008). For example, measurements identifying the distribution of the time differences between consecutive instant messages sent by individuals during online chats see Dewes et al. (2003) have found evidence of heavy tailed statistics. Professional tasks, such as the timing of job submissions on a supercomputer, directory listings and file transfers [FTP requests] initiated by individual users see Mainardi et al. (2000) were also reported to display non-Poisson features. Similar patterns emerge in economic transactions see Reberto et al., in the number of hourly trades in a given security see Plerou et al. (2000) or the time interval distribution between individual trades in currency futures see Masoliver et al. (2003). Finally, heavy tailed distributions characterize entertainment related events, such as the time intervals between consecutive online games played by users see Henderson & Henderson (2001 Thus for these measurements the interevent time does not characterize a single user but rather a population of users. Given the extensive evidence that the activity distribution of the individuals in a population is heavy tailed, these measurements have difficulty capturing the origin of the observed heavy tailed patterns. For example, while most people send only a few emails per day, a few send a very large number on a daily basis see Eckmann et al. (2004) and Ebel et al. (2002).

Pareto Distribution
The Pareto distribution is the classic heavy-tailed distribution. In comparison with the exponential, it has a much higher probability of generating extreme values. This means that jobs with very long service times account for a significant fraction of the queue's total work. The Pareto distribution is often associated with the famous 80 -20 rule, which holds that 80% of outputs are attributable to only 20% of inputs in applications with heavy-tailed behavior. For example, it's been observed that 20% of a population tends to hold about 80% of total wealth, or that 80% of business sales revenue tends to come from only 20% of customers. An extension of this rule holds that the top 1% of inputs account for 50% of outputs. If a system's jobs are Pareto distributed, then half of the total system running time will be dedicated to serving only 1% of jobs! It's important to remember that the numbers 80 and 20 are not magical. The actual values will vary for different applications. They don't even need to sum to one, since they're measures of two different quantities. The significant part of the "law of the vital few," as it's sometimes called, is the relative importance of a surprisingly small portion of the population, see (Amoroso, 1938& Pareto, 1898. A power-law probability distribution that is used in description of social, scientific, geophysical, actuarial, and many other types of observable phenomena. Originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population see Amoroso (1938), the Pareto distribution has colloquially become known and referred to as the Pareto principle, or "80 -20 rule", and is sometimes called the "Matthew principle". This rule states that, for example, 80% of the wealth of a society is held by 20% of its population. However, one should not conflate the Pareto distribution with the Pareto Principle as the former only produces this result for a particular power value, (α = log45 ≈ 1.16). While is variable, empirical observation has found the 80 -20 distribution to fit a wide range of cases, including natural see Van Montfort (1986) phenomena and human activities. See Oancea (2017).
If X is a random variable with a Pareto (Type I) distribution see Arnold (1983), then the probability that X is greater than some number x, i.e. the survival function (also called tail function), is given by x is the (necessarily positive) minimum possible value of X, and α is a positive parameter. The Pareto Type I distribution is characterized by a scale parameter m x and a shape parameter α, which is known as the tail index. When this distribution is used to model the distribution of wealth, then the parameter α is called the Pareto index.

Cumulative distribution function
From the definition, the cumulative distribution function of a Pareto random variable with parameters α and m x is

Probability density function
It follows (by differentiation) that the probability density function is When plotted on linear axes, the distribution assumes the familiar J-shaped curve which approaches each of the orthogonal axes asymptotically. All segments of the curve are self-similar (subject to appropriate scaling factors).
When plotted in a log-log plot, the distribution is represented by a straight line.

Moments and characteristic function
The expected value of a random variable following a Pareto distribution is The variance of a random variable following a Pareto distribution is

Methodology
This section presents the procedure which was used in the study. It explains in Journal of Financial Risk Management  detail the steps that were encountered in the modeling process which includes the data processing and analysis. There are 939 observations in the data set. All commercial fire insurance loss data sets used in this study were obtained from a non-life insurance company in Egypt.

Scope of the Data
Secondary data from E.G. insurance company regarding fire industrial claims for the period 2000-2011 was used in this study.

Actuarial Modeling Process
This section will describe the steps that were followed in fitting a statistical distribution to the extreme claim severity. These steps include 1) Exploratory data analysis.
2) Goodness of fit test.

Exploratory Data Analysis
It was necessary to do some descriptive analysis of the data to obtain the salient features. This involves the Mean, Median, Mode, Standard Deviation, Skewness Journal of Financial Risk Management and Kurtosis. This was done using easy fit programming language and also manual calculation.

Specific Objectives
Testing for the appropriate statistical distribution for the claim inter-arrival time.
Test the goodness of fit of the chosen distribution.

Variable
The random variables used in the study were the fire claim inter-arrival time reported and claimed at EG Insurance. Data Fitting process involves the use of certain statistical techniques which enable us to estimate fitness parameters according to the data sample. One benefit of using software to fit the data and interpret probability data is that they can automatically fit data simultaneously with a number of known distribution patterns. Easy Fit is a data analyzer and simulation program that helps us to fit probabilistic distributions to define data samples, to simulate them, to pick the best fit sample and to apply the analytical results to make better decisions.

Descriptive Data Analysis
Goodness of fit Test is a technique used to determine the appropriate distribution to be fitted for the given data. The theoretical history of this test is clarified initially and then the whole test is applied to live data collected from Egyptian insurance company. The traditional assessment of fitness test goodness in statistics is interested in testing precision for the sample produced from the supposed PDF. Moreover, it is also worth emphasizing the opportunity to reject the hypotheses when the supposed PDF is different from actual PDF. Furthermore, the opportunity to reject the hypotheses is also worth highlighting when the supposed The results indicate that the distribution of pareto is one of the best distributions for the inter-arrival time claims.

Goodness-of-Fit Tests
As their very name implies, can be used to assess whether or not a particular distribution is properly fit to the data. The measurement of goodness-of-fit statistics also helps to rank the fitted distributions over the raw data according to fit consistency. This particular function of the app is very useful when comparing fitted models. The most widely used tests for goodness-of-fit include Kolmogo-

Easy Fit Software
Easy Fit is a data analysis and simulation software which enables us to fit and simulate statistical distributions with sample data, choose the best model, and use the obtained result of analysis to take better decisions. This software can function as a stand-alone windows application or as an add-on for Excel spread sheet.
Prominent features of this program are: • Supports more than 50 discrete and continuous distributions.
• Automatic and manual settings.  In EasyFit, you can use almost all the Goodness-Of-Fit tests including Kolmogorov-Smirinov, Anderson-Darling, and Chi-square tests. When the distributions are fitted, EasyFit will generate a report of goodness-of-fit values which includes calculated test statistics and critical values for various significance levels, We will compare the process of fitting for several kinds of distribution.
Since Goodness-Of-Fit statistics are in form of distance between data and fitted distributions, clearly the distribution with minimum statistics value has been best fitted with data. Based on this fact, EasyFit will attribute a ranking number to each distribution (1-the best model, 2-best model after the first one … etc.). This allows you to select the most reliable model easily.

Methods
The

Problem Identification
After a detail study of research papers, articles and books related to reliability and other statistical analysis it has been found that in maximum of researches show that Current ruin probability models, assuming that inter-arrival time of claims, is distributed randomly and thus well approximated by Poisson processes.
Here we provide clear proof that the timing of claims fits by non-Poisson patterns, our analysis shows that claims activities can be represented by non-Poisson processes and that the subsequent distribution of inter-arrival activity times follows the Pareto, distribution. These results will help researchers understand daily behavioral trends and create more sophisticated predictive models of claims and their timing.  I will provide four classical goodness-of-fit plots for pareto distribution presented on:

KS Test
The null and the alternative hypotheses are: • H0: the data follow the pareto distribution.
• HA: the data do not follow the pareto distribution.    Table 3 shows that pareto distribution is accepted by KS Test having P value 0.91054 at all level of significance.

AD Test
The null and the alternative hypotheses are: • H0: the data follow the pareto distribution.
• HA: the data do not follow the pareto distribution. Journal of Financial Risk Management Table 4 shows that pareto distribution is accepted by AD Test at 1% and 2% and 5% level of significance, But rejected at 10% and 20% level of significance.

Chi Test
The null and the alternative hypotheses are: • H0: the data follow the pareto distribution.
• HA: the data do not follow the pareto distribution. Table 5 shows that Pareto distribution is accepted by chi-squared Test having P-value 0.95541 at all level of significance.

Conclusion
In many applications of claim inter-arrival time data distributions, a key concern   is fitting the claim inter-arrival time in the tail. As mentioned above, good estimates of the tails of fire claim inter-arrival time distributions are essential for pricing and risk management of commercial fire insurance loss. We execute an exploratory claim inter-arrival time analysis using a goodness of fit. The goodness of fit revealed the some distributions to be poorly fitted, while pareto distributions can be seen to fit the claim inter-arrival time data much better.
The Q-Q plots indicate that most points of the Pareto distribution are lying along the reference line thus making it one of the best distributions for claim inter-arrival time. A histogram of claims and goodness of fit with Probability Density Function (PDF) graph, Cumulated Distribution Function (CDF) graph, p-p Graph, Probability Difference (PD) graph and also pointed that Pareto distribution was one of the best fitting distribution among the 56 distributions.
Preceding page and the goodness-of-fit shows that after analyzing the results of

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.