Stackelberg Differential Game for Target Benefit Pension Plans ()
1. Introduction
Reinsurance operates as an effective means of risk transfer between insurers and reinsurers. By paying reinsurance premiums, a portion of insurers risk can be borne by reinsurers. Over decades has the academic study of reinsurance strategies developed, with special emphasis placed on optimal reinsurance models. Among notable contributions are works by Promislow and Young (2005), Bäuerle (2005), Bai and Guo (2008), and Liang and Bayraktar (2014), among others.
The paper examines reinsurance through the lens of game theory, taking into account the objectives of both the insurer and the reinsurer. Game-theoretic formulations fall into two broad classes: cooperative games, which seek Pareto-efficient outcomes, and non-cooperative games, typified by the Stackelberg setting. In a Stackelberg framework, actions unfold sequentially: the reinsurer first sets the premium rule and commits to a menu of admissible indemnity contracts; the insurer then chooses a best-response strategy given these terms. This timing structure captures the core strategic tension between the parties.
Within this line of work Stackelberg games in reinsurance two models dominate. The first is a static setup involving a single interaction. To the best of current knowledge, Chan and Gerber (1985) were the first to solve such a problem in reinsurance, maximizing the parties expected utilities of terminal surplus. Building on this, Cheung et al. (2019) analyze distortion risk measures for both players, while Chi et al. (2020) address a related formulation with constraints on the ceded portion. Boonen et al. (2021) consider an insurer of one of two types, each represented by a distortion risk measure, with asymmetric information arising because the reinsurer cannot observe the insurers type. Finally, Li and Young (2021) develop a mean variance Stackelberg model, which serves as the one-period analogue of the dynamic game studied in this paper.
The second strand is the continuous-time setting, often framed as a Stackelberg differential game. A growing body of work studies reinsurance in this framework, including Chen and Shen (2018, 2019), Gu et al. (2020), Wang and Siu (2020), Bai et al. (2020, 2021), and Yang et al. (2021). Across these papers, the insurer and reinsurer typically optimize either a mean variance objective (e.g., Chen & Shen, 2019; Li & Young, 2021) or an expected-utility objective (Chen & Shen, 2019; Gu et al., 2020; Wang & Siu, 2020; Bai et al., 2020, 2021; Yang et al., 2021), over a fixed horizon.
Canadian target benefit plans (TBPs) are collective pension arrangements with fixed contributions or contributions confined to a narrow, predetermined range and a target benefit determined by a salary-linked formula. Actual benefits may end up above or below the target. A combined benefit, funding, and investment policy sets out how benefit levels, contribution rates (if adjustable), and the asset mix are revised in light of unfolding experience (CIA, 2015).
A defining aspect of TBPs is that members shoulder all risks, yet those risks are pooled across generations rather than borne individually. Without an external sponsor guarantee, the plan enables transfers between cohorts so that temporary cross-subsidies including to and from future entrants can help smooth benefit levels over time. Effective intergenerational risk sharing has been shown to improve welfare relative to traditional defined benefit plans and individual defined contribution plans (Gollier, 2008; Cui et al., 2011; Wang et al., 2018). That said, the sharing mechanism must be designed with fairness in mind: distributing too large a share of emerging surpluses can create discontinuity risk (Westerhout, 2011), while distributing too little may invite attempts to raid the bank (Van Bommel, 2007).
In our framework, the fund invests in a risk-free asset and a risky asset, and benefit payments depend on plan wealth and a stated benefit target. The trustees objective is to minimize the cumulative squared and linear deviations of benefit outgo from the target over the distribution period, and to reduce discontinuity risk measured at the end of the horizon. Similar combinations of objectives mixing linear and quadratic penalties, with interim and terminal targets appear elsewhere in the literature. Yong and Zhou (1999) provide a comprehensive treatment of linear quadratic control. Vigna and Haberman (2001) analyze both investment and annuitization risks for an individual, using a sequence of interim targets and a retirement target linked to the desired net replacement ratio in a discrete-time model. Chang et al. (2003) introduce linear and quadratic penalties to weigh negative fund deviations more heavily than positive ones. Gerrard et al. (2004) study risk management in the decumulation phase of a DC plan, where the retiree may defer annuitization, consume from the fund, and invest the remainder; they derive a natural interim target for fund value that guides optimization. Incorporating interim benefit targets also aligns with practice in TBPs, where sponsors adjust the investment mix and/or benefit levels in response to realized experience by selecting the most suitable course of action.
To conclude, we study a dynamic Stackelberg reinsurance game and advance the literature on such games in two main respects. First, whereas most prior work models the insurers surplus via the CramrLundberg framework, we instead build on the premium and benefit principles used in target benefit plans (TBPs). Second, the continuous-time studies cited above typically assign similar objectives to the insurer and reinsurer. In reality, reinsurers, due to their larger scale and diversified portfolios, typically prioritize expected returns, whereas pension funds must balance returns with risk to ensure benefit security for members. Accordingly, in our setting the insurers objective follows the TBP criterion, and the reinsurer seeks to minimize its expected loss.
The paper proceeds as follows. Section 2 sets out the model, covering the financial market, the TBP structure, and the objective functions. Section 3 characterizes the Stackelberg equilibrium when the insurer uses the TBP criterion and the reinsurer minimizes loss. Section 4 offers concluding remarks.
2. Model Formulation
Set
be a complete probability space. The
is a filtration which is complete and right continuous and
is generated by a two-dimensional standard Brownian Motion
, and
is a probability measure defined on Ω.
2.1. Financial Market
We assume that there are two underlying assets available to the insurer and reinsurer: one risk-free asset (a bank account) and one risky asset (a stock). The evolution of the value of the risk-free asset,
, over time is given by
(1)
where
represents the risk-free interest rate.
Let the price of the underlying stock (risky asset) at time
be
and suppose that the value of the risky asset at time
is described by the stochastic differential equation(SDE)
(2)
where
is the appreciation rate of the stock and
is the volatility rate, both
and
being positive constants, and
is a standard Brownian motion. To exclude arbitrage opportunities, we assume that
.
2.2. Membership and Plan Provisions
Consider a plan that includes both active and retired members. Active members contribute to the pension fund, while retired members draw benefits from it. All participants enter the TBP at age
and retire at age
, and their survival is described by a function
with
for
. For ages
, members may exit the plan due to death or other causes; once
, death is the sole decrement.
Definition 2.1. The legacy insurance model under investigation is formally defined by the following 6-tuple:
where
1) Let
denote the number of new entrants aged a who join the plan at time
.
2) The total active membership at time
,
, represents the population of active members currently contributing to the pension fund is
3) The total count of retired members at time
is represented by
,
4) Let
denote the retirement salary of an individual who is age
at time
. It is specified by
(3)
where
represents the annual salary rate for a member retiring at time
. The process
follows
,
. Here,
is the expected instantaneous growth rate of salary,
is the instantaneous volatility, and
is a standard Brownian motion. The Brownian motion
is correlated with
under
with correlation coefficient.
5) Let
denote the total benefit payment rate to all retirees at time
. It is given by the expression:
(4)
where
is the constant annual rate at which the plan grants cost-of-living adjustments to pensions, and
is a control variable chosen dynamically by the plan trustees.
6)
denotes the aggregate instantaneous contribution rate for all active members at time
, defined as follows:
(5)
Remark 2.1.
1) For further technical particulars regarding the model specification and implementation, the reader is directed to consult Wang et al. (2018) in the reference list.
2) For notation convenience, let
, where
. And let
, where
is a positive function of
, defined as
(6)
2.3. Wealth Progress and Objective Function
In this subsection we derive the wealth dynamics of the pension fund and the reinsurer for the model outlined above, taking into account investment and reinsurance strategies, contributions from active members, and benefit payments to retirees.
Assume that the pension fund can invest in both the risk-free and risky assets described by (1) and (2), respectively, and use the fund to pay retirement benefits. Let
denote the initial wealth of this fund,
denote the amount that the plan manager invests in the risky asset at time
,
denote the fraction of benefits paid by the reinsurance company ,
denote the fraction of contribution paid by the pension fund to the reinsurance company and Let
be the pension funds wealth at time
after implementing the investment strategy
and the reinsurance strategy
. The funds value then evolves according to the following dynamics:
(7)
Using Equations (1), (2), (4) and (5), Equation (7) can be rewritten as follows:
(8)
Let
denote the strategy the pension fund follows on the interval
. Each pair consists of the investment decision
, the amount allocated to the risky asset at time
, and the benefit adjustment factor
applied at that time. Below we give the formal definition of an admissible strategy for the stochastic differential Equation (8).
Definition 2.2. For a fixed
, a strategy
is said to be admissible if
i)
is
-adapted;
ii)
and
;
iii)
is the unique solution to SDE (8).
Let
be a set of all admissible strategy
.
The target for this continuous-time asset allocation and benefit distribution problem is to minimize expected discounted losses over the remaining time period until time
, where the losses correspond to the benefit risk and the discontinuity risk defined above. Since participants are concerned with both shortfall risk (benefit payments falling below the target) and benefit instability (deviations in either direction from the target), the loss function includes both linear and quadratic terms for benefit risk. Let
denote the objective function at time
, where the fund value and the salary level at time
are
and
, respectively. It is defined as follows:
(9)
This mathematical formulation to the practical goals of TBP trustees mentioned in the Introduction. Both
and
are nonnegative constants that serve as penalty weights:
penalizes negative deviations of
from
, while
penalizes failure to reach the terminal fund target. The expectations in (9) are conditional on the state variables
and
at time
. The values of
and
capture the trade-offs chosen by the TBPs stakeholders (for example, the employer, different member groups, and regulators) among benefit stability (weighted by
), benefit adequacy (weighted by
) and intergenerational equity (weighted by
). These penalty weights are a design choice of the plan and cannot be altered by the pension fund.
Assume the reinsurance fund can allocate resources to both a risk-free asset and a risky asset, and that it contributes toward paying some portion of retirement benefits. Let
be the fund’s initial capital. At time
, let
be the amount invested in the risky asset and let
denote the fund’s wealth under the investment strategy
and the reinsurance strategy
. The evolution of the funds wealth is then governed by the following stochastic dynamics:
(10)
Using (1), (2), (4) and (5), we can easily rewrite (10) as
(11)
Let
denote the strategy employed by the pension fund over the interval
. Each pair
consists of the investment amount
allocated to the risky asset at time
and the reinsurance parameter
at time
. Below we give the definition of an admissible strategy in relation to the SDE (11).
Definition 2.3. For any fixed
, a strategy
is said to be admissible if
i)
is
-adapted;
ii)
and
;
iii)
is the unique solution to SDE (11).
Let
be a set of all admissible strategy
.
Similarly, the goal for the reinsurer is to minimize expected discounted losses over the remaining time period until time
, where the losses correspond to the discontinuity risk. Let
denote the objective function evaluated at time
, when the pension fund’s wealth is
and the salary level is
. It is defined by the expression:
(12)
where
is a penalty for the effective reinsurance premium rate deviating from a target safety loading. Let
denote the reinsurance safety loading at time
. The constants
and
are nonnegative penalty weights:
penalizes the nonnegative deviation between
and
, while
penalizes failure to reach the terminal fund target. For notation convenience, let
and
be equivalent to
and
.
3. Stochastic Differential Game for Target Benefit Pension
In this section we study a Stackelberg reinsurance problem involving two players: a pension fund and a reinsurer. The game is played over the strategy spaces specified in Definitions 2.2 and 2.3, and the players payoffs are given by the utility functionals in (9) and (12). The sequence of moves in the Stackelberg framework is as follows:
1) Leader First Announces Its Strategy to Followers: The game begins when the reinsurer announces an admissible strategy
to the pension fund;
2) Followers Select the Best-Response Strategies: When the leaders strategy
is revealed, the follower chooses a best-response from its strategy set
. Denote this response by
; it is obtained by solving the following optimization problem:
(13)
3) Leader Selects Its Optimal Strategy Based on the Identified Best-Response Strategies of the Followers: Based on the followers best-response mapping
, the leader chooses an optimal strategy from
. Denote this optimal leader strategy by
; it is found by solving the following optimization problem:
(14)
The solution concept for this game is the Stackelberg equilibrium (SE). At the SE, the leader chooses a strategy that minimizes its objective while taking the followers best-response mapping into account; this choice is the leaders equilibrium strategy. Knowing the leaders action, the follower then selects the best-response strategy that minimizes its own objective this choice serves as the followers equilibrium strategy. The precise definition of the SE for the described Stackelberg game is given below.
Definition 3.1. Denote
to be the best-response strategy of the follower identified by solving problem (13). Let
be equivalent to
for notation convenience. Given the above notations, a strategy
is an SE for the Stackelberg game if it corresponds to the solution of the following optimization problems
(15)
(16)
where
is a set of the admissible strategies.
3.1. Best-Response Strategy
In this section we apply standard techniques to solve the optimal control problem in (16) and obtain the followers best-response strategy
among all admissible policies
.
The value function of pension fund is defined as
(17)
where
is given by (9).
First, we derive the Hamilton-Jacobi-Bellman (HJB) equation corresponding to the stochastic control problem (17). For further background see, for example, Fleming and Soner (2006). Applying variational arguments together with Its formula yields the HJB equation satisfied by the value function
:
(18)
with the boundary condition
(19)
where
and
are partial derivatives of
.
We summarize the solution to the optimal control problem (17) in the following theorem. For notational convenience, denote the Sharpe ratio of the risky asset by
.
Theorem 3.1 For the optimal control problem (17), the optimal asset allocation
and the benefit-adjustment policy
are given by the following expressions, respectively:
(20)
(21)
and the corresponding value function is given by
(22)
where
(23)
and expressions of
and
are given below, depending on the values of the parameters
,
and
,
(24)
(25)
Proof. First, observe that the minimization in (18) with respect to
and
can be separated into two coupled subproblems: one minimizing with respect to
and the other with respect to
. These two dependent minimization problems are written as follows.
(26)
(27)
Differentiating the bracketed expression in (26) with respect to
, and the bracketed expression in (27) with respect to
, then setting the derivatives to zero and solving, yields immediately:
(28)
(29)
Obviously, a sufficient condition for
to be minimal is:
(30)
We will confirm this condition after deriving the expression for
.
Next we derive an explicit formula for
. Guided by the terminal condition (19), we assume
takes the form:
(31)
where
,
,
,
,
,
are functions of t to be determined. The boundary condition (19) implies that
(32)
From (31) we obtain
(33)
Substituting (33) into (28) and (29), the optimal controls can be written in terms of the time-dependent coefficients
and
. Inserting (33), together with (28) and (29), into the HJB equation (18) and collecting coefficients of the powers and cross-terms
,
,
,
,
and 1 and the constant term, we obtain:
The coefficients of
,
,
,
,
and 1 and the constant term must vanish, which yields the following system of differential equations:
(34)
(35)
(36)
(37)
(38)
(39)
with boundary conditions (32).
Differential Equations (34)-(36) with boundary conditions in (32) can be easily solved. First, because
, solving (36) gives
for
. Consequently, we obtain from Equations (35) and (38) that
for
. Second, solving (34) gives
Using (34) in (37), we obtain that
satisfies
Finally, solving Equation (39) yields an explicit expression for
:
We now check the constraint given in (30), which is:
Here
is defined by Equation (24), and
. Clearly
when
. If
, one can likewise verify
by treating the two cases
and
separately. □
Remark 3.1 It is notable that the optimal asset-allocation strategy
in (20) does not depend on the parameter
, which measures aversion to intergenerational risk sharing.
Remark 3.2 It is also notable that the optimal asset-allocation strategy
given in (20) does not depend on the salary level
at time
. From the optimal benefit-adjustment rule
in (21), one obtains the optimal aggregate benefit payment rate for all retirees at time
, denoted
. This adjusted aggregate payment takes the following form ((4)):
(40)
The expression above does not depend on the salary
at time
. Likewise, the value function in (22) is independent of
, which implies that fluctuations in salary are fully offset by the optimal benefit-adjustment factor
.
Remark 3.3 It is clear that the strategies
,
and the value function
depend on the policy
via the expressions in (21) and (23).
3.2. Optimal Strategy of the Leader
In this section, we solve the optimal control problem (15) and derive expressions of the leader’s optimal strategy
within all the admissible policies
.
The value function of reinsurer is defined as
(41)
where
is given by (12) and
is given by (20) and (21).
Same as the Section 3.1, we derive an HJB equation associated with the stochastic control problem (41) and get the following HJB equation satisfied by the value function
:
(42)
with the boundary condition
(43)
where
and
are partial derivatives of
.
We state our findings on the optimal strategy for optimal control problem (41) in the following theorem.
Theorem 3.2. For the optimal control problem (17), the optimal asset allocation policy
and benefit adjustment policy
are given, respectively, by
(44)
(45)
where
and the corresponding value function is given by
(46)
where
(47)
and expressions of
and
are given below
(48)
(49)
Proof. The proof of this theorem is similar Theorem 3.1 to the and is omitted here.
Through Theorem 3.1 and Theorem 3.2, we get a SE of the the proposed leader-follower Stackelberg game,
.
Remark 3.4. It is worth noting that the optimal asset allocation strategy
given by (44) and optimal reinsurance strategy
are independent of the salary level
at time
. Furthermore, the value function in (46) at time
does not depend on the salary level
at time
. This shows that salary fluctuations are effectively hedged by the optimal reinsurance strategy
.
4. Conclusion
In this paper, we considered the continuous-time version of the Stackelberg game for TBP. We use the premium principles and benefit principles of TBP to describe the surplus progress of reinsurer. And assume that the insurer and the reinsurer have different objectives. The objective of the insurer is the TBP criterion and that of the reinsurer is to minimize its expected loss.
In Theorem 3.1, we derived the expression of the best-response strategy, denoted by
and the value function
. In Theorem 3.2, we got the expression of the reinsurer’s strategy, denoted by
and the value function
. Through the Theorem 3.1 and Theorem 3.2, we showed that this Stackelberg game had a Stackelberg equilibrium
.
The model relies on several simplifying assumptions that may limit its practical applicability. In particular, market parameters such as drift and volatility are treated as constant, and the objectives for the pension fund and reinsurer are specified by particular quadratic functions. These choices exclude time-varying risk premia, stochastic volatility and more general preference representations, all of which can materially alter optimal strategies. The framework also omits important long-term pension factors, including stochastic interest rates, longevity risk and model uncertainty. To address these shortcomings, future work could take several concrete directions. First, introduce stochastic short-rate dynamics (for example, Vasicek or CIR) to assess interest-rate risk and examine duration-matching approaches. Second, incorporate stochastic volatility or regime-switching processes to capture time-varying risk premia. Third, replace quadratic objectives with utility-based or risk-measure-driven criteria (e.g., CRRA utility or meanCVaR) to reflect nonlinear risk preferences. Fourth, apply ambiguity-averse or robust-control techniques to mitigate parameter misspecification and model uncertainty. Finally, embedding mortality tables and multi-period liability dynamics would enhance the models relevance for pension practice.
Funding
Supported by National Key Research and Development Program of China
(2022YFA1004600).