_{1}

In this paper, we introduce a clustering method to approximate the solution to a general Backward Stochastic Differential Equation with Jumps (BSDEJ). We show the convergence of the sequence of approximate solutions to the true one. The method is implemented for an application in finance. Numerical results show that the method is efficient.

Solving BSDEs is of great importance in mathematical finance. The derivatives pricing problems, dynamic portfolio choice problems and even dynamic stochastic general equilibrium problems can be related to various types of BSDEs. However, due to its recursive nature, a BSDE with jumps cannot be directly solved via Monte Carlo method.

In this paper, we apply the methodology to evaluate conditional expectations, which is originally introduced in [

Recent literature of machine learning method to solve PDEs/BSDEs can be found in, for example, [

The organization of this paper is as follows. Section 2 describes the method and convergence proof. Section 3 discusses an application in finance and Section 4 concludes. All the proofs can be found in Appendix.

Throughout the paper, we consider a filtered probability space ( Ω , F , ( F t ) 0 ≤ t ≤ T , ℙ ) , with T ∈ R + . The space is supporting a d-dimensional Brownian motion W t = ( W t 1 , ⋯ , W t d ) and a Poisson random measure N on B ( [ 0, T ] ) ⊗ E , where B ( [ 0, T ] ) is the Borel σ-algebra on [ 0, T ] and ( E , E ) is a measurable space. Define E : = ℝ q and E as the Borel σ-algebra on E. ℙ is the probability measure on F . The filtration ( F t ) 0 ≤ t ≤ T is completed with all ℙ -null sets, right-continuous and F t = F t W , N is generated by ( W t , N ( ⋅ , [ 0, t ] , ⋅ ) ) for t ∈ [ 0, T ] . Assume that F = F T W , N and W and N are mutually independent under ℙ . Suppose that the compensating measure of N is ν ( d t , d e ) : = ν ( d e ) d t , where ν is a σ-finite measure on ( E , E ) satisfying ∫ E ( 1 ∧ | e | 2 ) ν ( d e ) < ∞ . The corresponding compensated Poisson random measure is defined by N ˜ ( ω , d t , d e ) : = N ( ω , d t , d e ) − ν ( d e ) d t .

The BSDEJ under consideration, whose solution is denoted by ( X , Y , Z , U ) , is

d X t = μ ( t , X t ) d t + σ ( t , X t ) d W t + ∫ E γ ( t − , X t − , e ) N ˜ ( d t , d e ) X 0 = x 0 d Y t = − f ( t , X t , Y t , Z t , V t ) d t + Z t d W t + ∫ E U t − ( e ) N ˜ ( d t , d e ) Y T = ϕ ( X T ) (1)

where V t = ∫ E U t ( e ) ρ ( e ) ν ( d e ) for a given smooth and bounded function ρ . The vector of state variables X t ∈ F t is r-dimensional and satisfies the Forward-SDE (FSDE) indicated. The standard Brownian motion W is d-dimensional, Y t ∈ F t is 1-dimensional, Z t ∈ F t is d-dimensional, U t ( e ) ∈ F t is q-dimensional, V t ∈ F t is 1-dimensional and N ˜ is q-dimensional. ( X , Y , Z , U ) is called the solution to the BSDEJ system (1). We assume that r ≤ d throughout the paper without loss of generality.

This section describes the construction of the approximate solution to the BSDEJ (1). The intermediate solutions can be obtained by the following equations, according to [

Y t = u ( t , X t ) = E [ ϕ ( X T ) + ∫ t T f ( v , X v , Y v , Z v , V v ) d v | F t ] Z t = ∂ x u ( t , X t ) σ ( t , X t ) U t ( e ) = u ( t , X t + γ ( t , X t , e ) ) − u ( t , X t ) (2)

where ( Y t , Z t , U t ) are functions of ( t , X t ) . It is therefore crucial to be able to evaluate the above conditional expectations for each t.

Suppose that the domain of state variables X is D ⊂ ℝ r ^{1}. Partition D by a class of disjoint sets { U k } k = 1 K . Denote the distance d U = sup x , y ∈ U | x − y | , where | ⋅ | is the canonical Euclidean distance. It is known that the evaluation of conditional expectation problems are essentially regression ones, in the following sense

E t [ ψ ( X T ) ] ≡ arg min ϕ ∈ Φ ( D ) E [ | ψ ( X T ) − ϕ ( t , X t ) | 2 ] (3)

where Φ ( D ) is an appropriate function space with domain D ^{2}. First, we need the following result.

Lemma 1. We have

E t [ ψ ( X T ) ] ≡ lim i → ∞ arg min ϕ ∈ Φ ( D i ) E [ | ψ ( X T ) − ϕ ( t , X t ) | 2 1 X t ∈ D i ] (4)

in L 2 sense.

With the above result in mind, we will always write D i as D and assume that D is bounded and closed. Many methods have been emerging to find global solutions to the above Equation (3). However, in this paper, we follow a divide-and-conquer approach, by trying to fit the regression locally in each of the subspace U k , for k = 1 , 2 , ⋯ , K . This means, we should try to evaluate

ϕ k ( t , X t ) : = arg min ϕ k ∈ Φ ( U k ) E [ | ψ ( X T ) 1 X t ∈ U k − ϕ ( t , X t ) 1 X t ∈ U k | 2 ] (5)

where Φ ( U k ) is an appropriate function space with domain U k , under the constraint ∑ k = 1 K ϕ k ( t , x ) 1 x ∈ U k ∈ Φ . ( [

∑ k = 1 K ϕ k ( t , x ) 1 x ∈ U k ∈ Φ is relaxed under certain conditions. In this paper, we will use the two aforementioned theorems and show that polynomial regressions and Taylor expansions for ϕ k and ψ , applying a Monte Carlo simulation to generate samples and use machine learning clustering to partition the simulated paths, will result in convergent numerical methods. In order to use Taylor expansion, we need ψ to be C l -smooth, meaning it should have bounded derivatives up to order l in D . This requires us to approximate it using C l -smooth functions. The following assumption applies throughout the paper.

Assumption 1. (On ψ ). Assume that, in some sense, ψ can be approximated by a sequence of C l -smooth functions { ψ i } , with l finite or ∞ . Moreover, we have

lim i → ∞ E t [ | ψ ( X t ) − ψ i ( X t ) | 2 ] = 0 (6)

for any t ∈ [ 0, T ] .

Remark 2. For example, ψ i can converge to ψ in a point-wise sense or L 2 sense. Because of Assumption 1, we will always assume that ψ is C l -smooth.

Then, we have the following Theorems.

Theorem 3. (On Polynomial Regression). Assume that^{3}

ϕ ^ k , J ( t , X t ) : = arg min P J ∈ P J ( U k ) E [ | ψ ( X T ) 1 X t ∈ U k − P J ( X t ) 1 X t ∈ U k | 2 ] (7)

where P J ( U k ) is the space of all polynomials of degree equal to or less than J with domain U k . Then, we have

ϕ ( t , X t ) = lim max 1 ≤ k ≤ K d U k → 0 + ∑ k = 1 K ϕ ^ k , J ( t , X t ) 1 X t ∈ U k (8)

in L 2 sense with J ≥ 1 fixed.

Theorem 4 (On Taylor Expansion). Assume that ψ ( x ) ≅ ψ ^ J ( x , x t k ) = ∑ | α | ≤ J ∂ α ψ ( x t k ) α ! ( x − x t k ) α and

ϕ ^ k , J ( t , X t k ) = E t [ ψ ^ J ( X T , X t k ) 1 X t k ∈ U k ] (9)

where X t k ∈ U k . Then, we have

ϕ ( t , X t ) = lim max 1 ≤ k ≤ K d U k → 0 + ∑ k = 1 K ϕ ^ k , J ( t , X t ) 1 X t ∈ U k (10)

in L 2 sense with J ≥ 1 fixed.

The following theorem establishes the convergence under Monte Carlo simulation.

Theorem 5 (On Monte Carlo Simulation). Assume that there is a time-discretization { t j } j = 1 n and we obtain M samples { X t j m } j , m = 1 , 1 n , M . Then, we have

ϕ ( t , X t ) = lim M → ∞ lim max 1 ≤ k ≤ K d U k → 0 + 1 M ∑ m = 1 M ∑ k = 1 K ϕ ^ k , J ( t , X t m ) 1 X t m ∈ U k (11)

where ϕ ^ k , J is obtained via OLS in the cases of polynomial regression and Taylor expansion, in L 2 sense with J ≥ 1 fixed.

Theorem 5 is ultimately the convergence result for the methodology we propose. We first generate M copies of the sample paths of X from 0 to T. Then, we use MiniBatchKMeans function in Python programming language to partition the M copies of simulated values of X, at each time t ∈ [ 0, T ] , into K clusters. With each cluster U k for k = 1 , 2 , ⋯ , K , we perform the analysis as shown above and concatenate different pieces together to formulate the values for the conditional expectations at time t. The methodology works backwards in time. Of course, there are many different choices of ϕ ^ k , J , depending on how we want to approximate the conditional expectations within U k .

With the above discussions on the evaluation of conditional expectations, we can start to describe the main numerical method for a BSDEJ. First, define a set of time nodes π n = { t i } i = 1 n , with t 0 = 0 and t n = T . Suppose that

t i − t i − 1 ≡ h = T n . Second, we simulate M copies of the forward process X at the

discretization nodes { t i } i = 1 n . Third, we iterate backwards in time using the methodologies to evaluate conditional expectations outlined above. At time interval [ t n − 1 , t n ) , use MiniBatchKMeans method to cluster { X t n − 1 π n , m } m = 1 M into K disjoint subsets { U t n − 1 k } k = 1 K . Within each cluster U t n − 1 k , apply polynomial regression or Taylor expansion method to evaluate ϕ ^ π n , M , k , J ( t n − 1 , X t n − 1 π n , m ) , where we add subscript π n to remind the readers that this intermediate function depends on time discretization now and M to denote Monte Carlo simulation dependence. After we obtain { ϕ ^ π n , M , k , J } k = 1 K , we can treat it as the intermediate solution to the BSDEJ at time t n − 1 . In order to evaluate Z and V, taking Z as an example,

we use the relation Z t ≅ 1 h E t [ Δ Y t Δ W t ] . The equations for Z and V are stated in

( [

First, we introduce necessary spaces to facilitate the discussions of convergence results. For an ℝ r -valued function x : [ 0, T ] → ℝ r , let the sup-norm be

‖ x ‖ [ a , b ] : = sup { | x t | , t ∈ [ a , b ] } . (12)

We shall use the following spaces for stochastic processes with p ≥ 2

• S r p [ s , t ] is the set of ℝ r -valued adapted càdlàg processes X such that

‖ X ‖ S r p [ s , t ] : = E [ e θ ( t − s ) ‖ X ( ω ) ‖ [ s , t ] p ] 1 p < ∞ . (13)

Here θ ≥ 0 is a nonnegative constant. We sometimes write S r p [ s , t ] as S p if doing so causes no ambiguity. The same is true for the spaces to be defined below.

• ℍ p [ s , t ] is the set of progressively measurable ℝ d -valued processes Z such that

‖ Z ‖ ℍ p [ s , t ] : = E [ ( ∫ s t e θ ( v − s ) | Z v | 2 d v ) p 2 ] 1 p < ∞ . (14)

• J p [ s , t ] is the set of q-dimensional processes ψ = { ψ i ,1 ≤ i ≤ q } , ψ i : Ω × [ 0, T ] × E → ℝ which is F × B ( [ 0, T ] ) × B ( E ) -measurable and satisfy

‖ ψ ‖ J p [ s , t ] : = E [ ( ∑ j = 1 q ∫ s t ∫ E e θ ( v − s ) | ψ i ( ω , v , e ) | 2 ν i ( d e ) d v ) p 2 ] 1 p < ∞ . (15)

• K p [ s , t ] is the set of processes ( Y , Z , ψ ) in the space S p [ s , t ] × ℍ p [ s , t ] × J p [ s , t ] with the norm

‖ ( Y , Z , ψ ) ‖ K p [ s , t ] : = ( ‖ Y ‖ S p [ s , t ] p + ‖ Z ‖ ℍ p [ s , t ] p + ‖ ψ ‖ J p [ s , t ] p ) 1 p . (16)

Let C b g ( D ) be the space of bounded functions that have continuous and bounded derivatives up to order g in the domain D ⊂ ℝ r and C g ( D ) the space of functions that have continuous derivatives up to order g. In what follows, we will only consider the norms with θ > 0 and p = 2 . In order to provide the final convergence result, we need the following assumption.

Assumption 6 (On BSDEJ). Assume that the BSDEJ (1) admits a unique solution S t = ( Y t , Z t , V t ) in K p [ 0, T ] and the time discretized solution S t π n : = ( Y t π n , Z t π n , V t π n ) converges to S t = ( Y t , Z t , V t ) in K p [ 0, T ] sense.

Then, the following convergence result is valid to justify the methodology we introduce above.

Theorem 7 (Main Convergence Result). Under Assumptions 1 and 6, we have

lim h → 0 + lim M → ∞ lim max 1 ≤ k ≤ K d U k → 0 + ‖ S π n , M , K , J − S ‖ K p [ 0, T ] = 0 (17)

where S t π n , M , K , J = ∑ k = 1 K S t π n , M , k , J 1 X t ∈ U t k and S t π n , M , k , J = ( Y t π n , M , k , J , Z t π n , M , k , J , V t π n , M , k , J ) .

The BSDE associated with American option prices can be found in [

In this paper, we propose a new machine learning method, based on clustering, to run [

M = 50000 N = 50 | S 0 | Benchmark | Clustering | Ave. Time (s) | Rel. Error |
---|---|---|---|---|---|

σ = 0.2 | 80 | 0.219 | 0.222 | 12.56 | 1.37% |

90 | 1.386 | 1.412 | 12.56 | 1.88% | |

100 | 4.783 | 4.790 | 12.56 | 0.15% | |

110 | 11.098 | 11.131 | 12.56 | 0.30% | |

120 | 20.000 | 19.984 | 12.56 | −0.08% | |

σ = 0.4 | 80 | 2.689 | 2.736 | 12.56 | 1.75% |

90 | 5.722 | 5.778 | 12.56 | 0.98% | |

100 | 10.239 | 10.274 | 12.56 | 0.34% | |

110 | 16.181 | 16.204 | 12.56 | 0.14% | |

120 | 23.360 | 23.413 | 12.56 | 0.23% |

M = 75000 N = 100 | S 0 | Benchmark | Clustering | Ave. Time (s) | Rel. Error |
---|---|---|---|---|---|

σ = 0.2 | 80 | 0.219 | 0.223 | 42.716 | 1.83% |

90 | 1.386 | 1.397 | 42.716 | 0.79% | |

100 | 4.783 | 4.814 | 42.716 | 0.65% | |

110 | 11.098 | 11.114 | 42.716 | 0.14% | |

120 | 20.000 | 19.979 | 42.716 | −0.11% | |

σ = 0.4 | 80 | 2.689 | 2.699 | 42.716 | 0.37% |

90 | 5.722 | 5.749 | 42.716 | 0.47% | |

100 | 10.239 | 10.299 | 42.716 | 0.59% | |

110 | 16.181 | 16.247 | 42.716 | 0.41% | |

120 | 23.360 | 23.403 | 42.716 | 0.18% |

The author declares no conflicts of interest regarding the publication of this paper.

Zhang, L.L. (2020) A Clustering Method to Solve Backward Stochastic Differential Equations with Jumps. Journal of Mathematical Finance, 10, 1-9. https://doi.org/10.4236/jmf.2020.101001

Proof of Lemma 1. Denote λ i = min ϕ ∈ Φ ( D i ) E [ | ψ ( X T ) − ϕ ( t , X t ) | 2 1 X t ∈ D i ] . Because we have D i ⊂ D i + 1 , it is obvious that Φ ( D i + 1 ) ⊂ Φ ( D i ) . Therefore, we have λ i < λ i + 1 ≤ λ * , where

λ * = min ϕ ∈ Φ ( D ) E [ | ψ ( X T ) − ϕ ( t , X t ) | 2 ] . (18)

We immediately have lim i → ∞ λ i = λ ^ ≤ λ * . In addition, we have

λ * = min ϕ ∈ Φ ( D ) E [ | ψ ( X T ) − ϕ ( t , X t ) | 2 ] (19)

≤ min ϕ 1 ∈ Φ ( D i ) E [ | ψ ( X T ) − ϕ 1 ( t , X t ) | 2 1 X t ∈ D i ] (20)

+ E [ | ψ ( X T ) − ϕ * ( t , X t ) | 2 1 X t ∈ D − D i ] . (21)

Here ϕ * = arg min ϕ ∈ Φ ( D ) E [ | ψ ( X T ) − ϕ ( t , X t ) | 2 ] . The term

E [ | ψ ( X T ) − ϕ * ( t , X t ) | 2 1 X t ∈ D − D i ] (22)

≤ C ℙ ( X t ∈ D − D i ) . (23)

Here C is independent of i and we have lim i → ∞ ℙ ( X t ∈ D − D i ) = 0 . Combine this result with λ i < λ i + 1 ≤ λ * and we will obtain the claim of this lemma.

Lemma 2 (On Lead-Lag Regression). Suppose that H ⊂ Φ , then we have

arg min ϕ ∈ H E [ | ψ ( X T ) − ϕ ( t , X t ) | 2 ] = arg min ϕ ∈ H E [ | ϕ * ( t , X t ) − ϕ ( t , X t ) | 2 ] (24)

where ϕ * ( t , X t ) = arg min ϕ ∈ Φ E [ | ψ ( X T ) − ϕ ( t , X t ) | 2 ] .

Proof of Lemma 2. The proof of this lemma follows directly from the Repeated Projection Theorem (see, e.g., [

Proof of Theorem 3. Apply Lemma 2 and ( [

Proof of Theorem 4. The proof of the theorem follows from the error term of the Taylor expansion, Lemma 2 and ( [