Stackelberg Differential Game for Target Benefit Pension Plans

Abstract

In this paper, a Stackelberg game model is constructed between a Target Benefit Pension (TBP) fund and a reinsurance company. The model features the following characteristics: the contribution rate of plan members is predetermined, while the pension payment level depends on the plans financial status and involves a risk-sharing mechanism across generations. Both participants invest in risk-free and risky assets. By applying stochastic optimal control methods, closed-form solutions are derived for the Stackelberg differential game model.

Share and Cite:

Chen, J. and Qie, J. (2025) Stackelberg Differential Game for Target Benefit Pension Plans. Modern Economy, 16, 1785-1801. doi: 10.4236/me.2025.1611082.

1. Introduction

Reinsurance operates as an effective means of risk transfer between insurers and reinsurers. By paying reinsurance premiums, a portion of insurers risk can be borne by reinsurers. Over decades has the academic study of reinsurance strategies developed, with special emphasis placed on optimal reinsurance models. Among notable contributions are works by Promislow and Young (2005), Bäuerle (2005), Bai and Guo (2008), and Liang and Bayraktar (2014), among others.

The paper examines reinsurance through the lens of game theory, taking into account the objectives of both the insurer and the reinsurer. Game-theoretic formulations fall into two broad classes: cooperative games, which seek Pareto-efficient outcomes, and non-cooperative games, typified by the Stackelberg setting. In a Stackelberg framework, actions unfold sequentially: the reinsurer first sets the premium rule and commits to a menu of admissible indemnity contracts; the insurer then chooses a best-response strategy given these terms. This timing structure captures the core strategic tension between the parties.

Within this line of work Stackelberg games in reinsurance two models dominate. The first is a static setup involving a single interaction. To the best of current knowledge, Chan and Gerber (1985) were the first to solve such a problem in reinsurance, maximizing the parties expected utilities of terminal surplus. Building on this, Cheung et al. (2019) analyze distortion risk measures for both players, while Chi et al. (2020) address a related formulation with constraints on the ceded portion. Boonen et al. (2021) consider an insurer of one of two types, each represented by a distortion risk measure, with asymmetric information arising because the reinsurer cannot observe the insurers type. Finally, Li and Young (2021) develop a mean variance Stackelberg model, which serves as the one-period analogue of the dynamic game studied in this paper.

The second strand is the continuous-time setting, often framed as a Stackelberg differential game. A growing body of work studies reinsurance in this framework, including Chen and Shen (2018, 2019), Gu et al. (2020), Wang and Siu (2020), Bai et al. (2020, 2021), and Yang et al. (2021). Across these papers, the insurer and reinsurer typically optimize either a mean variance objective (e.g., Chen & Shen, 2019; Li & Young, 2021) or an expected-utility objective (Chen & Shen, 2019; Gu et al., 2020; Wang & Siu, 2020; Bai et al., 2020, 2021; Yang et al., 2021), over a fixed horizon.

Canadian target benefit plans (TBPs) are collective pension arrangements with fixed contributions or contributions confined to a narrow, predetermined range and a target benefit determined by a salary-linked formula. Actual benefits may end up above or below the target. A combined benefit, funding, and investment policy sets out how benefit levels, contribution rates (if adjustable), and the asset mix are revised in light of unfolding experience (CIA, 2015).

A defining aspect of TBPs is that members shoulder all risks, yet those risks are pooled across generations rather than borne individually. Without an external sponsor guarantee, the plan enables transfers between cohorts so that temporary cross-subsidies including to and from future entrants can help smooth benefit levels over time. Effective intergenerational risk sharing has been shown to improve welfare relative to traditional defined benefit plans and individual defined contribution plans (Gollier, 2008; Cui et al., 2011; Wang et al., 2018). That said, the sharing mechanism must be designed with fairness in mind: distributing too large a share of emerging surpluses can create discontinuity risk (Westerhout, 2011), while distributing too little may invite attempts to raid the bank (Van Bommel, 2007).

In our framework, the fund invests in a risk-free asset and a risky asset, and benefit payments depend on plan wealth and a stated benefit target. The trustees objective is to minimize the cumulative squared and linear deviations of benefit outgo from the target over the distribution period, and to reduce discontinuity risk measured at the end of the horizon. Similar combinations of objectives mixing linear and quadratic penalties, with interim and terminal targets appear elsewhere in the literature. Yong and Zhou (1999) provide a comprehensive treatment of linear quadratic control. Vigna and Haberman (2001) analyze both investment and annuitization risks for an individual, using a sequence of interim targets and a retirement target linked to the desired net replacement ratio in a discrete-time model. Chang et al. (2003) introduce linear and quadratic penalties to weigh negative fund deviations more heavily than positive ones. Gerrard et al. (2004) study risk management in the decumulation phase of a DC plan, where the retiree may defer annuitization, consume from the fund, and invest the remainder; they derive a natural interim target for fund value that guides optimization. Incorporating interim benefit targets also aligns with practice in TBPs, where sponsors adjust the investment mix and/or benefit levels in response to realized experience by selecting the most suitable course of action.

To conclude, we study a dynamic Stackelberg reinsurance game and advance the literature on such games in two main respects. First, whereas most prior work models the insurers surplus via the CramrLundberg framework, we instead build on the premium and benefit principles used in target benefit plans (TBPs). Second, the continuous-time studies cited above typically assign similar objectives to the insurer and reinsurer. In reality, reinsurers, due to their larger scale and diversified portfolios, typically prioritize expected returns, whereas pension funds must balance returns with risk to ensure benefit security for members. Accordingly, in our setting the insurers objective follows the TBP criterion, and the reinsurer seeks to minimize its expected loss.

The paper proceeds as follows. Section 2 sets out the model, covering the financial market, the TBP structure, and the objective functions. Section 3 characterizes the Stackelberg equilibrium when the insurer uses the TBP criterion and the reinsurer minimizes loss. Section 4 offers concluding remarks.

2. Model Formulation

Set ( Ω,F, ) be a complete probability space. The F={ F t0 } is a filtration which is complete and right continuous and F is generated by a two-dimensional standard Brownian Motion W=( W 1 , W 2 ) , and is a probability measure defined on Ω.

2.1. Financial Market

We assume that there are two underlying assets available to the insurer and reinsurer: one risk-free asset (a bank account) and one risky asset (a stock). The evolution of the value of the risk-free asset, S t ( t ) , over time is given by

d S 0 ( t )= r 0 S 0 ( t )dt,t0, (1)

where r 0 represents the risk-free interest rate.

Let the price of the underlying stock (risky asset) at time t be S 1 ( t ) and suppose that the value of the risky asset at time t is described by the stochastic differential equation(SDE)

d S 1 ( t )= S 1 ( t )( μdt+σd W 1 ( t ) ),t0, (2)

where μ is the appreciation rate of the stock and σ is the volatility rate, both μ and σ being positive constants, and W 1 ( t ) is a standard Brownian motion. To exclude arbitrage opportunities, we assume that μ> r 0 .

2.2. Membership and Plan Provisions

Consider a plan that includes both active and retired members. Active members contribute to the pension fund, while retired members draw benefits from it. All participants enter the TBP at age a and retire at age r , and their survival is described by a function s( x ) with s( a )=1 for ax . For ages ax<r , members may exit the plan due to death or other causes; once xr , death is the sole decrement.

Definition 2.1. The legacy insurance model under investigation is formally defined by the following 6-tuple:

{ n( t ),A( t ),( t ), L ˜ ( x,t ),B( t ),C( t ) }

where

1) Let n( t ) denote the number of new entrants aged a who join the plan at time t .

2) The total active membership at time t , A( t ) , represents the population of active members currently contributing to the pension fund is

A( t )= a r n ( t( xa ) )s( x )dx;

3) The total count of retired members at time t is represented by ( t ) ,

( t )= r ω n ( t( xa ) )s( x )dx;

4) Let L ˜ ( x,t ) denote the retirement salary of an individual who is age x at time t . It is specified by

L ˜ ( x,t )=L( t ) e α( xr ) ,t0,xr (3)

where L( t ) represents the annual salary rate for a member retiring at time t . The process L( t ) follows dL( t )=L( t )( αdt+ηd W 2 ( t ) ) , t0 . Here, α + is the expected instantaneous growth rate of salary, η is the instantaneous volatility, and W 2 is a standard Brownian motion. The Brownian motion W 2 is correlated with W 1 under with correlation coefficient.

5) Let B( t ) denote the total benefit payment rate to all retirees at time t . It is given by the expression:

B( t )= r ω n ( t( xa ) )s( x )f( t )L( t ) e ( αζ )( xr ) dx (4)

where ζ is the constant annual rate at which the plan grants cost-of-living adjustments to pensions, and f( t ) is a control variable chosen dynamically by the plan trustees.

6) C( t ) denotes the aggregate instantaneous contribution rate for all active members at time t , defined as follows:

C( t )= a r n ( tx+a )s( x ) c 0 e αt dx,t0, (5)

Remark 2.1.

1) For further technical particulars regarding the model specification and implementation, the reader is directed to consult Wang et al. (2018) in the reference list.

2) For notation convenience, let C( t )= C 1 ( t ) e αt , where C 1 ( t )= c 0 a r n ( t( xa ) )s( x )dx . And let B( t )=I( t )f( t )L( t ) , where I( t ) is a positive function of t , defined as

I( t )= r ω n ( t( xa ) )s( x ) e ( αζ )( xr ) dx (6)

2.3. Wealth Progress and Objective Function

In this subsection we derive the wealth dynamics of the pension fund and the reinsurer for the model outlined above, taking into account investment and reinsurance strategies, contributions from active members, and benefit payments to retirees.

Assume that the pension fund can invest in both the risk-free and risky assets described by (1) and (2), respectively, and use the fund to pay retirement benefits. Let x 0 denote the initial wealth of this fund, π 1 ( t ) denote the amount that the plan manager invests in the risky asset at time t , q( t ) denote the fraction of benefits paid by the reinsurance company , c ˜ C( t ) denote the fraction of contribution paid by the pension fund to the reinsurance company and Let X( t ) be the pension funds wealth at time t after implementing the investment strategy π 1 ( t ) and the reinsurance strategy q( t ) . The funds value then evolves according to the following dynamics:

{ dX( t )= π 1 ( t ) d S 1 ( t ) S 1 ( t ) +( X( t ) π 1 ( t ) ) d S 0 ( t ) S 0 ( t ) +[ ( 1 c ˜ )C( t )( 1q( t ) )B( t ) ]dt, X( 0 )= x 0 , (7)

Using Equations (1), (2), (4) and (5), Equation (7) can be rewritten as follows:

{ dX( t )=[ r 0 X( t )+( μ r 0 ) π 1 ( t )+( 1 c ˜ ) C 1 ( t ) e α( t ) ( 1q( t ) )I( t )f( t )L( t ) ]dt + π 1 ( t )σdW( t ), X( 0 )= x 0 , (8)

Let ψ 1 = { ( π 1 ( s ),f( s ) ) } s[ t,T ] denote the strategy the pension fund follows on the interval [ t,T ] . Each pair consists of the investment decision π 1 ( s ) , the amount allocated to the risky asset at time s , and the benefit adjustment factor f( s ) applied at that time. Below we give the formal definition of an admissible strategy for the stochastic differential Equation (8).

Definition 2.2. For a fixed t[ 0,T ] , a strategy

ψ 1 = { ( π 1 ( s ),f( s ) ) } s[ t,T ]

is said to be admissible if

i) ψ 1 is F t -adapted;

ii) s[ t,T ],f( s )0 and E[ t T [ π 1 ( s ) ] 2 ds ]<+ ;

iii) ( X ψ 1 , ψ 1 ) is the unique solution to SDE (8).

Let Ψ 1 be a set of all admissible strategy ψ 1 .

The target for this continuous-time asset allocation and benefit distribution problem is to minimize expected discounted losses over the remaining time period until time T , where the losses correspond to the benefit risk and the discontinuity risk defined above. Since participants are concerned with both shortfall risk (benefit payments falling below the target) and benefit instability (deviations in either direction from the target), the loss function includes both linear and quadratic terms for benefit risk. Let J 1 ( t,x,l ) denote the objective function at time t , where the fund value and the salary level at time t are x and l , respectively. It is defined as follows:

{ J 1 ( t,x,l; π 1 ( ),f( ),q( ) )= E π 1 ,f,q { t T [ ( ( 1q( s ) )B( s ) B * e βs ) 2 λ 1 ( ( 1q( s ) )B( s ) B * e βs ) ] e r 0 s ds+ λ 2 ( X( T ) x 0 e r 0 T ) 2 e r 0 T }, J 1 ( T,x,l; π 1 ( ),f( ),q( ) )= λ 2 ( X( T ) x 0 e r 0 T ) 2 e r 0 T , (9)

This mathematical formulation to the practical goals of TBP trustees mentioned in the Introduction. Both λ 1 and λ 2 are nonnegative constants that serve as penalty weights: λ 1 penalizes negative deviations of B( s ) from B * e βs , while λ 2 penalizes failure to reach the terminal fund target. The expectations in (9) are conditional on the state variables x and l at time t . The values of λ 1 and λ 2 capture the trade-offs chosen by the TBPs stakeholders (for example, the employer, different member groups, and regulators) among benefit stability (weighted by λ 1 ), benefit adequacy (weighted by λ 1 ) and intergenerational equity (weighted by λ 2 ). These penalty weights are a design choice of the plan and cannot be altered by the pension fund.

Assume the reinsurance fund can allocate resources to both a risk-free asset and a risky asset, and that it contributes toward paying some portion of retirement benefits. Let y 0 be the fund’s initial capital. At time t , let π 2 ( t ) be the amount invested in the risky asset and let Y( t ) denote the fund’s wealth under the investment strategy π 2 ( t ) and the reinsurance strategy q( t ) . The evolution of the funds wealth is then governed by the following stochastic dynamics:

{ dY( t )= π 2 ( t ) d S 1 ( t ) S 1 ( t ) +( Y( t ) π 2 ( t ) ) d S 0 ( t ) S 0 ( t ) +[ c ˜ C( t )q( t )B( t ) ]dt, Y( 0 )= y 0 , (10)

Using (1), (2), (4) and (5), we can easily rewrite (10) as

{ dY( t )=[ r 0 Y( t )+( μ r 0 ) π 2 ( t )+ c ˜ C 1 ( t ) e α( t ) q( t )I( t )f( t )L( t ) ]dt + π 2 ( t )σdW( t ), Y( 0 )= y 0 , (11)

Let ψ 2 = { ( π 2 ( s ),q( s ) ) } s[ t,T ] denote the strategy employed by the pension fund over the interval [ t,T ] . Each pair ( π 2 ( s ),q( s ) ) consists of the investment amount π 2 ( s ) allocated to the risky asset at time s and the reinsurance parameter q( s ) at time s . Below we give the definition of an admissible strategy in relation to the SDE (11).

Definition 2.3. For any fixed t[ 0,T ] , a strategy

ψ 2 = { ( π 2 ( s ),q( s ) ) } s[ t,T ]

is said to be admissible if

i) ψ 2 is F t -adapted;

ii) s[ t,T ],q( s )0 and E[ t T [ π 2 ( s ) ] 2 ds ]<+ ;

iii) ( X ψ 2 , ψ 2 ) is the unique solution to SDE (11).

Let Ψ 2 be a set of all admissible strategy ψ 2 .

Similarly, the goal for the reinsurer is to minimize expected discounted losses over the remaining time period until time T , where the losses correspond to the discontinuity risk. Let J 2 ( t,y,l ) denote the objective function evaluated at time t , when the pension fund’s wealth is y and the salary level is l . It is defined by the expression:

{ J 2 ( t,y,l; π 2 ( ),f( ),q( ) )= E π 2 ,f,q { t T [ ( q( s ) 1q( s ) θ( s ) ) 2 + λ 3 ( q( s ) 1q( s ) θ( s ) ) ]ds + λ 4 ( Y( T ) y 0 e r 0 T ) 2 e r 0 T }, J 2 ( T,y,l; π 2 ( ),f( ),q( ) )= λ 4 ( Y( T ) y 0 e r 0 T ) 2 e r 0 T , (12)

where ( q( s ) 1q( s ) θ( s ) ) 2 is a penalty for the effective reinsurance premium rate deviating from a target safety loading. Let θ( s )>0 denote the reinsurance safety loading at time s . The constants λ 3 and λ 4 are nonnegative penalty weights: λ 3 penalizes the nonnegative deviation between q( s ) 1q( s ) and θ( s ) , while λ 4 penalizes failure to reach the terminal fund target. For notation convenience, let J 1 ( t,y,l; π 1 ( ),f( ),q( ) ) and J 2 ( t,y,l; π 2 ( ),f( ),q( ) ) be equivalent to J 1 ( t,y,l; ψ 1 , ψ 2 ) and J 2 ( t,y,l; ψ 1 , ψ 2 ) .

3. Stochastic Differential Game for Target Benefit Pension

In this section we study a Stackelberg reinsurance problem involving two players: a pension fund and a reinsurer. The game is played over the strategy spaces specified in Definitions 2.2 and 2.3, and the players payoffs are given by the utility functionals in (9) and (12). The sequence of moves in the Stackelberg framework is as follows:

1) Leader First Announces Its Strategy to Followers: The game begins when the reinsurer announces an admissible strategy ψ 2 to the pension fund;

2) Followers Select the Best-Response Strategies: When the leaders strategy ψ 2 is revealed, the follower chooses a best-response from its strategy set Ψ 1 . Denote this response by ψ 1 ( ψ 2 )=( π 1 ψ 2 ( ), f ψ 2 ( ) ) ; it is obtained by solving the following optimization problem:

ψ 1 ( ψ 2 ):=arg min ψ 1 Ψ 1 J 1 ( t,x,l; ψ 1 , ψ 2 ),t,x,l>0 (13)

3) Leader Selects Its Optimal Strategy Based on the Identified Best-Response Strategies of the Followers: Based on the followers best-response mapping ψ 1 ( ψ 2 ) , the leader chooses an optimal strategy from Ψ 2 . Denote this optimal leader strategy by ψ 2 * ; it is found by solving the following optimization problem:

ψ 2 * :=arg min ψ 2 Ψ 2 J 2 ( t,y,l; ψ 1 ( ψ 2 ), ψ 2 ),t,y,l>0 (14)

The solution concept for this game is the Stackelberg equilibrium (SE). At the SE, the leader chooses a strategy that minimizes its objective while taking the followers best-response mapping into account; this choice is the leaders equilibrium strategy. Knowing the leaders action, the follower then selects the best-response strategy that minimizes its own objective this choice serves as the followers equilibrium strategy. The precise definition of the SE for the described Stackelberg game is given below.

Definition 3.1. Denote ψ 1 ( ψ 2 ) to be the best-response strategy of the follower identified by solving problem (13). Let ψ 1 * be equivalent to ψ 1 ( ψ 2 ) for notation convenience. Given the above notations, a strategy ( ψ 1 * , ψ 2 * ) is an SE for the Stackelberg game if it corresponds to the solution of the following optimization problems

( ψ 1 * , ψ 2 * )=arg min ( ψ 1 * , ψ 2 )Ψ J 2 ( t,y,l; ψ 1 * , ψ 2 ),t,y,l>0, (15)

s.t. ψ 1 * =arg min ψ 1 Ψ 1 J 1 ( t,x,l; ψ 1 , ψ 2 ),t,x,l>0, (16)

where Ψ= Ψ 1 × Ψ 2 is a set of the admissible strategies.

3.1. Best-Response Strategy

In this section we apply standard techniques to solve the optimal control problem in (16) and obtain the followers best-response strategy ψ 1 * =( π 1 ψ 2 ( ), f ψ 2 ( ) ) among all admissible policies Ψ 1 .

The value function of pension fund is defined as

ϕ 1 ( t,x,l ):= min ψ 1 Ψ 1 J 1 ( t,y,l; ψ 1 , ψ 2 ),t,x,l>0 (17)

where J 1 ( t,x,l ) is given by (9).

First, we derive the Hamilton-Jacobi-Bellman (HJB) equation corresponding to the stochastic control problem (17). For further background see, for example, Fleming and Soner (2006). Applying variational arguments together with Its formula yields the HJB equation satisfied by the value function ϕ 1 ( t,x,l ) :

min ψ 1 Ψ 1 { ϕ t 1 +[ r 0 x+( μ r 0 ) π 1 +( 1 c ˜ ) C 1 ( t ) e αt ( 1q( t ) )flI( t ) ] ϕ x 1 +αl ϕ l 1 + 1 2 π 2 σ 2 ϕ xx 1 + 1 2 η 2 l 2 ϕ ll 1 +ρσηlπ ϕ xl 1 + [ ( ( 1q( t ) )flI( t ) B * e βt ) 2 λ 1 ( ( 1q( t ) )flI( t ) B * e βt ) ] e r 0 t }=0 (18)

with the boundary condition

ϕ 1 ( T,x,l )= λ 2 ( x x 0 e r 0 T ) 2 e r 0 T , (19)

where ϕ t 1 , ϕ x 1 , ϕ xx 1 , ϕ l 1 , ϕ ll 1 and ϕ xl 1 are partial derivatives of ϕ 1 ( t,x,l ) .

We summarize the solution to the optimal control problem (17) in the following theorem. For notational convenience, denote the Sharpe ratio of the risky asset by δ= ( μ r 0 )/σ .

Theorem 3.1 For the optimal control problem (17), the optimal asset allocation π 1 ψ 2 and the benefit-adjustment policy f ψ 2 are given by the following expressions, respectively:

π 1 ψ 2 ( t,x,l )= δ σ [ x+ Q 1 ( t ) 2 ], (20)

f ψ 2 ( t,x,l )= 1 l( 1q( t ) )I( t ) [ B * e βt + λ 1 2 + λ 2 P 1 ( t )( x+ Q 1 ( t ) 2 ) ], (21)

and the corresponding value function is given by

ϕ 1 ( t,x,l )= λ 2 e r 0 t P 1 ( t )[ x 2 +x Q 1 ( t ) ]+ K 1 ( t ), (22)

where

K 1 ( t )= x 0 2 e 2 r 0 T + t T e r 0 s { λ 2 P 1 ( s ) Q 1 ( s ) [ ( 1 c ˜ ) C 1 ( s ) e αs B * e βs λ 1 2 1 4 ( δ 2 + λ 2 P 1 ( s ) ) Q 1 ( s ) ] λ 1 2 4 }ds, (23)

and expressions of P 1 ( t ) and Q 1 ( t ) are given below, depending on the values of the parameters β , r 0 and σ 2 ,

P 1 ( t )= { 1 λ 2 ( Tt )+1 , r 0 = δ 2 , r 0 δ 2 λ 2 +( r 0 δ 2 λ 2 ) e ( r σ 2 )( Tt ) , r 0 δ 2 , (24)

Q 1 ( t )= { 2 e r 0 t [ t T ( 1 c ˜ ) C 1 ( s ) e ( α r 0 )s ds B * ( Tt ) x 0 ] + λ 1 r 0 ( e r 0 ( Tt ) 1 ), r 0 =β, 2 e r 0 t [ t T ( 1 c ˜ ) C 1 ( s ) e ( α r 0 )s ds x 0 B * β r 0 ( e ( β r 0 )T e ( β r 0 )t ) ]+ λ 1 r 0 ( e r 0 ( Tt ) 1 ), r 0 β, (25)

Proof. First, observe that the minimization in (18) with respect to π 1 and f can be separated into two coupled subproblems: one minimizing with respect to π 1 and the other with respect to f . These two dependent minimization problems are written as follows.

min π 1 Π 1 { ϕ t 1 +[ r 0 x+( μ r 0 ) π 1 +( 1 c ˜ ) C 1 ( t ) e αt ] ϕ x 1 +αl ϕ l 1 + 1 2 π 2 σ 2 ϕ xx 1 +ρσηlπ ϕ xl 1 }=0, (26)

min fF { ( 1q( t ) )flI( t ) ϕ x 1 + 1 2 η 2 l 2 ϕ l 1 l+ [ ( ( 1q( t ) )flI( t ) B * e βt ) 2 λ 1 ( ( 1q( t ) )flI( t ) B * e βt ) ] e r 0 t }=0, (27)

Differentiating the bracketed expression in (26) with respect to π 1 , and the bracketed expression in (27) with respect to f , then setting the derivatives to zero and solving, yields immediately:

π 1 ψ 2 ( t,x,l )= δ ϕ x 1 +ρηl ϕ xl 1 σ ϕ xx 1 , (28)

f ψ 2 ( t,x,l )= 1 l( 1q( t ) )I( t ) [ ϕ x e r 0 t + λ 1 2 + B * e βt ], (29)

Obviously, a sufficient condition for π 1 to be minimal is:

ϕ xx 1 >0. (30)

We will confirm this condition after deriving the expression for ϕ 1 .

Next we derive an explicit formula for ϕ 1 ( t,x,l ) . Guided by the terminal condition (19), we assume ϕ 1 ( t,x,l ) takes the form:

ϕ 1 ( t,x,l )= λ 2 e r 0 t P 1 ( t )[ x 2 + Q 1 ( t )x ]+ R 1 ( t )xl+ U 1 ( t ) l 2 + V 1 ( t )l+ K 1 ( t ), (31)

where P 1 ( t ) , Q 1 ( t ) , R 1 ( t ) , U 1 ( t ) , V 1 ( t ) , K 1 ( t ) are functions of t to be determined. The boundary condition (19) implies that

P 1 ( T )=1, Q 1 ( T )=2 x 0 e r 0 T , K 1 ( T )= x 0 2 e 2 r 0 T , R 1 ( T )= U 1 ( T )= V 1 ( T )=0, (32)

From (31) we obtain

ϕ t 1 = λ 2 e r 0 t { r 0 P 1 ( t )[ x 2 + Q 1 ( t )x ]+ P 1 ( t ) Q t 1 x+ P t 1 [ x 2 + Q 1 ( t )x ] } + R t 1 xl+ U t 1 l 2 + V t 1 l+ K t 1 , ϕ x 1 = λ 2 e r 0 t P 1 ( t )[ 2x+ Q 1 ( t ) ]+ R 1 ( t )l ϕ l 1 = R 1 ( t )x+2 U 1 ( t )l+ V 1 ( t ) ϕ xx 1 =2 λ 2 e r 0 t P 1 ( t ), ϕ ll 1 =2 U 1 ( t ), ϕ xl 1 = R 1 ( t ). (33)

Substituting (33) into (28) and (29), the optimal controls can be written in terms of the time-dependent coefficients P 1 ( t ), Q 1 ( t ) and R 1 ( t ) . Inserting (33), together with (28) and (29), into the HJB equation (18) and collecting coefficients of the powers and cross-terms x 2 , l 2 , x l , x , l and 1 and the constant term, we obtain:

λ 2 e r 0 t [ P t 1 +( r 0 δ 2 λ 2 P 1 ( t ) ) P 1 ( t ) ] x 2 +[ U t 1 +( 2α+ η 2 ) U 1 ( t )( P 1 ( t )+ ( δ+ρη ) 2 λ 2 ) e r 0 t ( R 1 ( t ) ) 2 4 P 1 ( t ) ] l 2 +[ R t 1 +( r 0 +α δ 2 δρη λ 2 P 1 ( t ) ) R 1 ( t ) ]xl + λ 2 e r 0 t [ P t 1 Q 1 ( t )+ P 1 ( t ) Q t 1 ( ( δ 2 + λ 2 P 1 ( t ) ) Q 1 ( t ) 2( ( 1 c ˜ ) C 1 ( t ) e αt B * e βt )+ λ 1 ) P 1 ( t ) ]x + [ V t 1 +α V 1 ( t )+( ( 1 c ˜ ) C 1 ( t ) e αt B * e βt λ 1 2 1 2 ( δ 2 +δρη+ λ 2 P 1 ( t ) ) Q 1 ( t ) ) R 1 ( t ) ]l + K t 1 + λ 2 e r 0 t P 1 ( t ) Q 1 ( t ) [ ( 1 c ˜ ) C 1 ( t ) e αt B * e βt λ 1 2 1 4 ( δ 2 + λ 2 P 1 ( t ) ) Q 1 ( t ) ] λ 1 2 e r 0 t 4 =0

The coefficients of x 2 , l 2 , xl , x , l and 1 and the constant term must vanish, which yields the following system of differential equations:

P t 1 +( r 0 δ 2 λ 2 P 1 ( t ) ) P 1 ( t )=0 (34)

U t 1 +( 2α+ η 2 ) U 1 ( t )( P 1 ( t )+ ( δ+ρη ) 2 λ 2 ) e r 0 t ( R 1 ( t ) ) 2 4 P 1 ( t ) =0 (35)

R t 1 +( r 0 +α δ 2 δρη λ 2 P 1 ( t ) ) R 1 ( t )=0 (36)

P t 1 P 1 ( t ) Q 1 ( t )+ Q t 1 ( δ 2 + λ 2 P 1 ( t ) ) Q 1 ( t )2( ( 1 c ˜ ) C 1 ( t ) e αt B * e βt )+ λ 1 =0 (37)

V t 1 +α V 1 ( t )+( ( 1 c ˜ ) C 1 ( t ) e αt B * e βt λ 1 2 1 2 ( δ 2 +δρη+ λ 2 P 1 ( t ) ) Q 1 ( t ) ) R 1 ( t )=0 (38)

K t 1 + λ 2 e r 0 t P 1 ( t ) Q 1 ( t ) [ ( 1 c ˜ ) C 1 ( t ) e αt B * e βt λ 1 2 1 4 ( δ 2 + λ 2 P 1 ( t ) ) Q 1 ( t ) ] λ 1 2 e r 0 t 4 =0 (39)

with boundary conditions (32).

Differential Equations (34)-(36) with boundary conditions in (32) can be easily solved. First, because R 1 ( T )=0 , solving (36) gives R 1 ( t )=0 for 0<t<T . Consequently, we obtain from Equations (35) and (38) that U 1 ( t )= V 1 ( t )=0 for 0<t<T . Second, solving (34) gives

P 1 ( t )= { 1 λ 2 ( Tt )+1 , r 0 = δ 2 , r 0 δ 2 λ 2 +( r 0 δ 2 λ 2 ) e ( r σ 2 )( Tt ) , r 0 δ 2 ,

Using (34) in (37), we obtain that Q 1 ( t ) satisfies

Q 1 ( t )= { 2 e r 0 t [ t T ( 1 c ˜ ) C 1 ( s ) e ( α r 0 )s ds B * ( Tt ) x 0 ]  + λ 1 r 0 ( e r 0 ( Tt ) 1 ), r 0 =β, 2 e r 0 t [ t T ( 1 c ˜ ) C 1 ( s ) e ( α r 0 )s ds x 0 B * β r 0 ( e ( β r 0 )T e ( β r 0 )t ) ]+ λ 1 r 0 ( e r 0 ( Tt ) 1 ), r 0 β,

Finally, solving Equation (39) yields an explicit expression for K 1 ( t ) :

K 1 ( t )= x 0 2 e 2 r 0 T + t T e r 0 s { λ 2 P 1 ( s ) Q 1 ( s ) [ ( 1 c ˜ ) C 1 ( s ) e αs B * e βs λ 1 2 1 4 ( δ 2 + λ 2 P 1 ( s ) ) Q 1 ( s ) ] λ 1 2 4 }ds.

We now check the constraint given in (30), which is:

ϕ xx 1 =2 λ 2 e r 0 t P 1 ( t )>0.

Here P 1 ( t ) is defined by Equation (24), and λ 2 >0 . Clearly P 1 ( t )>0 when r 0 = σ 2 . If r 0 σ 2 , one can likewise verify P 1 ( t )>0 by treating the two cases r 0 > σ 2 and r 0 < σ 2 separately. □

Remark 3.1 It is notable that the optimal asset-allocation strategy π 1 ψ 2 ( t,x,l ) in (20) does not depend on the parameter λ 2 , which measures aversion to intergenerational risk sharing.

Remark 3.2 It is also notable that the optimal asset-allocation strategy π 1 ψ 2 ( t,x,l ) given in (20) does not depend on the salary level l at time t . From the optimal benefit-adjustment rule f ψ 2 ( t,x,l ) in (21), one obtains the optimal aggregate benefit payment rate for all retirees at time t , denoted B * ( t ) . This adjusted aggregate payment takes the following form ((4)):

B * ( t )=I( t ) f ψ 2 ( t )l= 1 1q( t ) [ B * e βt + λ 1 2 + λ 2 P 1 ( t )( x+ Q 1 ( t ) 2 ) ]. (40)

The expression above does not depend on the salary l at time t . Likewise, the value function in (22) is independent of l , which implies that fluctuations in salary are fully offset by the optimal benefit-adjustment factor f ψ 2 ( t,x,l ) .

Remark 3.3 It is clear that the strategies π 1 ψ 2 ( t,x,l ) , f ψ 2 ( t,x,l ) and the value function ϕ 1 ( t,x,l ) depend on the policy ψ 2 =( π 2 ( ),q( ) ) via the expressions in (21) and (23).

3.2. Optimal Strategy of the Leader

In this section, we solve the optimal control problem (15) and derive expressions of the leader’s optimal strategy ψ 2 * within all the admissible policies Ψ 2 .

The value function of reinsurer is defined as

ϕ( t,y,l ):= min ( ψ 1 * , ψ 2 )Ψ J 2 ( t,y,l; ψ 1 * , ψ 2 ),t,y,l>0 (41)

where J 2 ( t,y,l ) is given by (12) and ψ 1 * =( π 1 ψ 2 ( ), f ψ 2 ( ) ) is given by (20) and (21).

Same as the Section 3.1, we derive an HJB equation associated with the stochastic control problem (41) and get the following HJB equation satisfied by the value function ϕ( t,x,l ) :

min ψ 2 Ψ 2 { ϕ t +[ r 0 y+( μ r 0 ) π 2 + c ˜ C 1 ( t ) e αt ql f ψ 2 ( t,y,l )I( t ) ] ϕ y +αl ϕ l + 1 2 π 2 σ 2 ϕ yy + 1 2 η 2 l 2 ϕ ll +ρσηlπ ϕ yl 2 + [ ( q( t ) 1q( t ) θ( t ) ) 2 λ 3 ( q( t ) 1q( t ) θ( t ) ) ] }=0 (42)

with the boundary condition

ϕ( T,y,l )= λ 4 ( y y 0 e r 0 T ) 2 e r 0 T , (43)

where ϕ t , ϕ y , ϕ yy , ϕ l , ϕ ll and ϕ yl are partial derivatives of ϕ 1 ( t,y,l ) .

We state our findings on the optimal strategy for optimal control problem (41) in the following theorem.

Theorem 3.2. For the optimal control problem (17), the optimal asset allocation policy π 1 ψ 2 and benefit adjustment policy f ψ 2 are given, respectively, by

π 2 * ( t,y,l )= δ σ [ x+ Q( t ) 2 ], (44)

q * ( t,y,l )= q ˜ ( t,y,l ) 1+ q ˜ ( t,y,l ) , (45)

where q ˜ ( t,y,l )=θ( t ) λ 3 2 + M( t ) 2 λ 4 P( t )[ 2y+Q( t ) ] and the corresponding value function is given by

ϕ( t,y,l )= λ 4 e r 0 t P( t )[ y 2 +yQ( t ) ]+K( t ), (46)

where

K( t )= y 0 2 e 2 r 0 T + t T e r 0 s { λ 4 M 2 ( s ) e r 0 s P( s )Q( s ) [ c ˜ C 1 ( s ) e αs θ( s )+ λ 3 2 1 4 ( δ 2 + λ 4 M 2 ( s ) e r 0 s P( s ) )Q( s ) ] λ 3 2 4 }ds. (47)

and expressions of Q( t ) and P( t ) are given below

P( t )= { 1 t T λ 4 M 2 ( s ) e r 0 s ds +1 , r 0 = δ 2 , 1 e ( r 0 σ 2 )( Tt ) e ( r 0 σ 2 )( Tt ) t T λ 4 M 2 ( s ) e 2 r 0 s ds , r 0 δ 2 , (48)

Q( t )=2 x 0 e r 0 T δ 2 ( Tt ) . (49)

Proof. The proof of this theorem is similar Theorem 3.1 to the and is omitted here.

Through Theorem 3.1 and Theorem 3.2, we get a SE of the the proposed leader-follower Stackelberg game, ψ * =( ( π 1 ψ 2 * , f ψ 2 * ),( π 2 * , q * ) ) .

Remark 3.4. It is worth noting that the optimal asset allocation strategy π 2 * ( t,x,l ) given by (44) and optimal reinsurance strategy q * ( t,x,l ) are independent of the salary level l at time t . Furthermore, the value function in (46) at time t does not depend on the salary level l at time t . This shows that salary fluctuations are effectively hedged by the optimal reinsurance strategy q * ( t,x,l ) .

4. Conclusion

In this paper, we considered the continuous-time version of the Stackelberg game for TBP. We use the premium principles and benefit principles of TBP to describe the surplus progress of reinsurer. And assume that the insurer and the reinsurer have different objectives. The objective of the insurer is the TBP criterion and that of the reinsurer is to minimize its expected loss.

In Theorem 3.1, we derived the expression of the best-response strategy, denoted by ψ 1 * =( π 1 ψ 2 , f ψ 2 ) and the value function ϕ 1 ( t,x,l ) . In Theorem 3.2, we got the expression of the reinsurer’s strategy, denoted by ψ 2 * =( π 1 * , q * ) and the value function ϕ( t,x,l ) . Through the Theorem 3.1 and Theorem 3.2, we showed that this Stackelberg game had a Stackelberg equilibrium ψ * .

The model relies on several simplifying assumptions that may limit its practical applicability. In particular, market parameters such as drift and volatility are treated as constant, and the objectives for the pension fund and reinsurer are specified by particular quadratic functions. These choices exclude time-varying risk premia, stochastic volatility and more general preference representations, all of which can materially alter optimal strategies. The framework also omits important long-term pension factors, including stochastic interest rates, longevity risk and model uncertainty. To address these shortcomings, future work could take several concrete directions. First, introduce stochastic short-rate dynamics (for example, Vasicek or CIR) to assess interest-rate risk and examine duration-matching approaches. Second, incorporate stochastic volatility or regime-switching processes to capture time-varying risk premia. Third, replace quadratic objectives with utility-based or risk-measure-driven criteria (e.g., CRRA utility or meanCVaR) to reflect nonlinear risk preferences. Fourth, apply ambiguity-averse or robust-control techniques to mitigate parameter misspecification and model uncertainty. Finally, embedding mortality tables and multi-period liability dynamics would enhance the models relevance for pension practice.

Funding

Supported by National Key Research and Development Program of China (2022YFA1004600).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Bai, L., & Guo, J. (2008). Optimal Proportional Reinsurance and Investment with Multiple Risky Assets and No-Shorting Constraint. Insurance: Mathematics and Economics, 42, 968-975.[CrossRef
[2] Bai, Y. F., Zhou, Z. B., Xiao, H. L., Gao, R., & Zhong, F. M. (2021). A Hybrid Stochastic Differential Reinsurance and Investment Game with Bounded Memory. European Journal of Operational Research, 296, 717-737.
https://www.sciencedirect.com/science/article/abs/pii/S0377221721003817
[3] Bai, Y., Zhou, Z., Xiao, H., & Gao, R. (2020). A Stackelberg Reinsurance-Investment Game with Asymmetric Information and Delay. Optimization, 70, 2131-2168.[CrossRef
[4] Bäuerle, N. (2005). Benchmark and Mean-Variance Problems for Insurers. Mathematical Methods of Operations Research, 62, 159-165.[CrossRef
[5] Boonen, T. J., Cheung, K. C., & Zhang, Y. Y. (2021). Bowley Reinsurance with Asymmetric Information on the Insurer’s Risk Preferences. Scandinavian Actuarial Journal, 2021, 623-644.[CrossRef
[6] Chan, F., & Gerber, H. U. (1985). The Reinsurer’s Monopoly and the Bowley Solution. ASTIN Bulletin, 15, 141-148.[CrossRef
[7] Chang, S. C., Tzeng, L. Y., & Miao, J. C. Y. (2003). Pension Funding Incorporating Downside Risks. Insurance: Mathematics and Economics, 32, 217-228.[CrossRef
[8] Chen, L., & Shen, Y. (2018). On a New Paradigm of Optimal Reinsurance: A Stochastic Stackelberg Differential Game between an INSURER and a Reinsurer. ASTIN Bulletin, 48, 905-960.[CrossRef
[9] Chen, L., & Shen, Y. (2019). Stochastic Stackelberg Differential Reinsurance Games under Time-Inconsistent Mean-variance Framework. Insurance: Mathematics and Economics, 88, 120-137.[CrossRef
[10] Cheung, K. C., Yam, S. C. P., & Zhang, Y. (2019). Risk-adjusted Bowley Reinsurance under Distorted Probabilities. Insurance: Mathematics and Economics, 86, 64-72.[CrossRef
[11] Chi, Y., Tan, K. S., & Zhuang, S. C. (2020). A Bowley Solution with Limited Ceded Risk for a Monopolistic Reinsurer. Insurance: Mathematics and Economics, 91, 188-201.[CrossRef
[12] CIA (2015). Report of the Task Force on Target Benefit Plans.
https://www.cia-ica.ca/publications/215043e/
[13] Cui, J., Jong, F. D., & Ponds, E. (2011). Intergenerational Risk Sharing within Funded Pension Schemes. Journal of Pension Economics and Finance, 10, 1-29.[CrossRef
[14] Fleming, W. H., & Soner, H. M. (2006). Controlled Markov Processes and Viscosity Solutions. Springer.
[15] Gerrard, R., Haberman, S., & Vigna, E. (2004). Optimal Investment Choices Post-Retirement in a Defined Contribution Pension Scheme. Insurance: Mathematics and Economics, 35, 321-342.[CrossRef
[16] Gollier, C. (2008). Intergenerational Risk-Sharing and Risk-Taking of a Pension Fund. Journal of Public Economics, 92, 1463-1485.[CrossRef
[17] Gu, A., Viens, F. G., & Shen, Y. (2020). Optimal Excess-Of-Loss Reinsurance Contract with Ambiguity Aversion in the Principal-Agent Model. Scandinavian Actuarial Journal, 2020, 342-375.[CrossRef
[18] Li, D., & Young, V. R. (2021). Bowley Solution of a Mean-Variance Game in Insurance. Insurance: Mathematics and Economics, 98, 35-43.[CrossRef
[19] Liang, Z., & Bayraktar, E. (2014). Optimal Reinsurance and Investment with Unobservable Claim Size and Intensity. Insurance: Mathematics and Economics, 55, 156-166.[CrossRef
[20] Promislow, S. D., & Young, V. R. (2005). Minimizing the Probability of Ruin When Claims Follow Brownian Motion with Drift. North American Actuarial Journal, 9, 110-128.[CrossRef
[21] van Bommel, J. (2007). Intergenerational Risk Sharing and Bank Raids. Working Paper, University of Oxford.
[22] Vigna, E., & Haberman, S. (2001). Optimal Investment Strategy for Defined Contribution Pension Schemes. Insurance: Mathematics and Economics, 28, 233-262.[CrossRef
[23] Wang, N., & Siu, T. K. (2020). Robust Reinsurance Contracts with Risk Constraint. Scandinavian Actuarial Journal, 2020, 419-453.[CrossRef
[24] Wang, S., Lu, Y., & Sanders, B. (2018). Optimal investment strategies and intergenerational risk sharing for target benefit pension plans. Insurance: Mathematics and Economics, 80, 1-14.[CrossRef
[25] Westerhout, E. (2011). Intergenerational Risk Sharing in Time-Consistent Funded Pension Schemes. SSRN Electronic Journal.[CrossRef
[26] Yang, L. Y., Zhang, C. K., & Zhu, H. N. (2021) Robust Stochastic Stackelberg Differential Reinsurance and Investment Games for an Insurer and a Reinsurer with Delay. Methodology and Computing in Applied Probability, 24, 361-384. [Google Scholar] [CrossRef
[27] Yong, J., & Zhou, X. Y. (1999). Stochastic Controls: Hamiltonian Systems and HJB Equations (p. 43). Springer.

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.