Application of Hierarchical Model in Non-Life Insurance Actuarial Science

Loss data structures in non-life insurance businesses are increasingly complex, and the tendency of correlation and heterogeneity is gradually presented. Hierarchical model can breakthrough limitation that the traditional rate determination method only analyzes the loss data of the same insurance policy; meanwhile, the accuracy of complex structure data prediction is improved. This paper, using a hierarchical generalized linear model, studies the non-life rate determination of multi-year loss data and takes auto insurance data for empirical analysis. The research results show that GLMM’s fitting degree is greatly improved compared with GLM, considering the random effects. It can more effectively reflect different risk individual differences and also reveal the heterogeneity and correlation of risk individual loss during multiple insurance periods.


Introduction
In the 1990s, as a new statistical analysis technique, layered model is widely used in the world.The hierarchical model determines the model parameters to set its own probability submodel, extending the standard linear model (Linear Models, LM), generalized linear models (Generalized Linear Moeels, GLM) and the nonlinear model (Non-linear Models).In the process of using the above model for statistical analysis, data must be observed from independent (random) variables in general.At the same time in some actuarial and statistical problems [1], vertical data, spatial clustering data, and general clustering data are also needed.These data do not have the independence assumption, but have a certain level of hierarchy.According to the hierarchical structure difference of data, the data can Two of the core topics in non-life actuarial science are pricing and reserve assessment.For a property insurance company, its competitiveness and the company's profitability are closely related to the rationality of pricing.In the 1990s, British actuaries introduced GLM in non-life insurance pricing.Since then, GLM has been widely used in non-life insurance pricing practices in many countries and made great achievements.
However, GLM still has inadequacies.For example, when some classification explanatory variables have less data at some level, the standard error of these horizontal parameter assessments will be enhanced.Moreover, the direct application of GLM also faces too many estimated parameters.In order to solve these problems, the actuary incorporates reliability theory into the GLM framework, and some statistical models and methods appeared [2], including Hierarchical Generalized Linear Models (HGLM), Generalized Linear Mixed Models (GLMM) and so on.
The liability of the largest share of the balance sheet of the property insurance company is the claim reserve.The accurate assessment of the claim reserve is conducive to the correct judgment of the operating performance and solvency of the property insurance company.Therefore, the reasonable assessment of this liability is of great significance to the development of the property insurance company.

A Theoretical Introduction of the Model Based on GLMM Rate
Make the following assumptions: There are m risk individuals, using random variables to represent the number of claims or amounts incurred by the i-th risk individual in the year of the i-th policy.Use the GLMM framework to set the following three parts: 1) The setting of the random part: Under the premise of specifying the random effect ( ) , the observed variables of ij Y are independent of each other, and it is also consistent with the distribution of the exponential population (EDF).Then the probability density function can be recorded as: θ is a natural parameter.( ) ψ • and ( ) c • signify a known function.Scale parameter is ϕ .
2) System section Settings: The relationship between the mean of the response variable and the explanatory variable can be represented by a linear predictor.
Let's say the prediction is . Its design matrix of fixed effect is expressed as X.
+ can be used to represent mean conditions.

Theoretical Framework Based on HGLM Rate Setting Model
The four basic assumptions in this structure: 1) Independence: That is, under the conditions of the risk parameters, the following assumption is made.The claims (or amounts) of the number (or amounts) of individual risk individuals is non-interference.
2) Distribution: Y u is in consistent with the exponential distribution.
Then the probability density function is: One of the known weights (constants) is β + , and this relation can be connected by the join function.That is ( ) + .The new variable produced by u through the strict monotonic function is the cumulative effect v, which could be written as = ( ) ) Distribution of risk parameters: In HGLM, i u is a random risk parameter, which can depict heterogeneity risk characteristics of different risk individual i.
This assumption i v is subject to the distribution of EDF and can be written as: Above, the super parameter is ψ , and discrete parameter is ϕ and λ .

Compare GLM, GLMM and HGLM
We can see through the above that the connections and differences between models can be summarized as follows: 1) Structure of the model: GLM can get GLMM by extension and expansion.
HGLM is a relatively general framework.The linear prediction of GLM introduces a stochastic effect based on the assumption of normal distribution [3] [4], and gets GLMM.After removing the random effect, it can be reduced to GLM.However, HGLM assumes that the effect of stacking benefit is beyond the normal distribution.It can also show the inverse Gaussian distribution and Beta distribution, which can analyze non-life loss data accurately, especially is true in the non-life loss data for longitudinal data.
2) Theoretical Calculation: Compared to the other two models, GLM is relatively simple.In general, GLM use the construction of maximum natural func-tions to calculate the estimator parameters of MLE.The general calculation method is Fisher algorithm, Newton Raphson iterative algorithm and so on.

Empirical Analysis-Number of Claims Based on Auto Insurance Business
The data of this paper is from Sun Weiwei's paper [5], and has been sorted out in order to be compared easily.There are 40,000 policies (insured) in the data.( ) ( ) 3) Model 3 Based on model 1, the problems that are ignored have been taken into account.Make assumptions under the GLMM framework: Random effects do not interfere with each other, and it is consistent with the normal distribution [6] [7].The variance parameter in normal distribution is 2 σ .Then the model can be built as: ( ) ( ) β .Then the model can be built as:

Parameter Estimation
In this study, the program package gamlss, lme44, glmmML, and hglm in R software is adopted.And model 1 uses maximum likelihood estimation.The calculation method is Fisher's score iteration.The discrete parameter in the calculation is 1.In model 2, the estimation parameter method is the RS algorithm under the GAMLSS framework and calculates 12 iterations.Model 3 adopts Gauss-Hermitian integral method [8] [9], and calculate the fixed effect parameter estimation.Model 4 adopts the method of the maximum h likelihood function.The estimated results are shown in Table 1.

Result Analysis
From the above empirical analysis, the parameters under different models differ greatly.Fundamentally, if random effects of three consecutive policy years are excluded, the GLM model should be built.The statistical quantity of AIC is different levels.Then the concept of hierarchical model is introduced.
the natural and discrete pa- rameters are respectively ij ϑ and ϕ , the given function is ( ) b • and ( ) c • .3) Structuredness: There is a kind of change relation between µ and X Zb

3 . 1 .
The observation data of the number of claims for three consecutive policies years is used as the longitudinal data.And there are 120,000 records.The original sample data set variables respectively are: numclaims represent the number of claims; policyID represents the code for the policy; agecat represents the driver's age classification variable: 1, 2, 4, 5, 6, 10 (Age is in an increasing order); valuecat represents the vehicle value classification variables: 2, 3, 4, 5, 6, 9; period represents the year of observation policy: 1, 2, 3. Construct Model: The Fixed Effects Are Agecat and Valuecat.The Model Is Built According to the Distribution Characteristics of the Number of Claims 1) Model 1 Proposed a hypothesis: The number of claims is consistent with the Poisson distribution of the parameter ij µ .Ignoring the correlation between the hetero- geneity of the number of claims made by the random benefit and the claim for three years, the model can be built as: in the same form as model 1.But at the same time, the zero value problem of the number of claims is taken into account.Assume the number of claims is consistent with the zero expansion Poisson distribution (ZIP), then the model can be built as: hypothesis: In the HGLM framework, the random effects i u of individual claim differences can be reflected.It also corresponds to the gamma distribution.Its obey parameters are

Table 1 .
Parameter estimation results of the model 1 -3.