Approximation of an Integral Markov Process Arising in the Approximation of Stochastic Differential Equation ()
1. Introduction
Nonlinear ordinary and partial differential equations arise in various fields of sciences, particularly fluid mechanics, solid state physics, plasma physics, nonlinear optics, and mathematical biology. Powerful numerical methods and its implementation can be obtained in [2] - [7]. However, nonlinear differential equations (DE) with parametric noise play a significant role in a range of application areas, including engineering, physics, mechanics, epidemiology, and neuroscience. It is important to mention that noisy systems can be modeled in several ways: for example, Langevin’s equation describes a linear physical system to which white noise is added, and the linear theory for it has been extended to nonlinear stochastic differential equations with additive white noise [8]. Another approach is to derive models, such as Markov chains, for the systems state variables as being random, and then use the method of probability for analysis, see [9]. Third approach was derived by averaging nonlinear oscillatory systems. In this approach, parameters in the system are allowed to be random processes, and the method based on averaging and ergodic theory provides useful predictions from the model [10]. Indeed, solutions of differential or dynamical systems are functions, say of time. If they are in addition random, we must describe both randomness and time dependence simultaneously. Thus we refer to the system as either a random process, or a stochastic process. A complete understanding of DE theory with perturbed noise requires familiarity with advanced probability and stochastic processes (see [8] [11] ).
In this paper, our approach to modeling randomness in differential or dynamical system is through allowing parameters b in a system to be a random process. For example, in the case of differential equation we write
, indicating that the parameters can change with some random process
. The resulting solutions
will also be a random process. This approach will be based on assuming noise process faster time scale than the system time scale. This work has been dedicated to the question in particular when y is a discrete-space Markov process. Most recent contributions have aimed in general at relaxing the assumptions on y [1] [12]. This result is known as Functional Central Limit Theorem [13] [14] [15].
In Section 2 we reviewed and discussed limit of variance for discrete-space Markov Chain [1]. In Section 3, a new equivalent expression for a limit of a variance is given for circulant n-state markov chain. In Section 3, stochastic approximation and numerical simulation were discussed. It is observed that computer simulations of this type of stochastic ordinary differential equation with standard methods have some issues needed to be explored so that the reader will be benefitted while solving these type problems numerically.
2. Discrete-Space Markov Chains
We consider the case where
, and let
The transition probability
can be represented by an
matrix P such that
, denotes the probability that
, if
(P depends on
but this dependence is omitted from the notation for simplicity). The matrix P satisfies the following properties.
· P is nonnegative (its entries are probabilities).
· P is stochastic i.e.,
(one of the
must be the outcome of a transition from
).
Then
, is an eigenvalue of P and
, shows that all other eigenvalues have modulus at most 1. We shall assume that
· P is irreducible, i.e., any state
can be eventually be reached in a finite number of steps with a nonzero probability (this is the case if
). This implies that
, has multiplicity one (e.g., see [16] ).
· P is aperiodic (or acyclic, e.g., not a permutation). This implies that
for
(e.g., see [9]; the chain is then called regular).
The above conditions guarantee the existence of a unique vector
such that
(2.1)
and
. The vector
is the unique positive left eigenvector associated to
, satisfying (2.1). It is natural to consider the (spectral) decomposition
(2.2)
where
(2.3)
and
. From Chapman-Kolmogorov equation and the homogeneity property (see, [1] ) we have
and by induction
as
, independently of i (i.e.,
). Thus
(2.4)
defines the limit distribution. An explicit expression of the coefficients of
in terms of the coefficients of P can be found in [ [17], p. 21].
The relation of the expected value yields
(2.5)
i.e.,
Then the zero average condition on
becomes
(2.6)
The relation (2.6) implies
. Therefore
(2.7)
(2.8)
(2.9)
where
can be interpreted as the expected value of the random variable
obtained after one transition probability applied to the random variable y (for more detail, see [1] ). Note that (2.7) implies
Because of (2.3) we also have
(2.10)
With
we obtain
(2.11)
as
. The expression (2.11) represents the limit of a variance and thus expected to be nonnegative.
Two-State Markov Chain
Let
and
(2.12)
with
,
. Then (2.2) holds with
As a result
and
(2.13)
is (symmetric) positive definite for any choice
and
. A simplified expression for (2.11) is obtained using (2.6), i.e.,
We obtain
(2.14)
3. Circulant n-State Markov Chain
If P is doubly stochastic (i.e., P and PT are stochastic) and irreducible aperiodic then
i.e., the limit distribution is uniform and
(3.16)
Stochastic Toeplitz, hence circulant, matrices constitute an example of doubly stochastic matrices. Let
with
and
. A sufficient condition for P to be irreducible and aperiodic is for two consecutive
to be nonzero (i.e., positive, see e.g. [ [18], p. 5]). The symbol of P is the polynomial
Since
for
, P admits the spectral decomposition
with
(3.17)
(3.18)
where
. Multiplication of a vector by the matrix
performs a (normalized) discrete Fast Fourier Transform (FFT), while multiplication by V results in the inverse (normalized) discrete FFT.
We write
, with
. Since
for
the matrix
is nonsingular. The matrix
represents the (orthogonal) projection onto
. Hence
implies
. Then
(3.19)
Now
for
(
is real), with
(3.20)
From (3.19) and (3.16) we obtain
(3.21)
(3.22)
(3.23)
since
is real and
, for
. The condition
for
clearly shows that (3.23) is nonnegative.
3.1. Example: Uniform Distribution
If
,
, (i.e.,
) we obtain
for
. Parseval’s identity yields
Then (3.23) reduces to
3.2. Example: Big World Transition Probability
Assume now that the probability
,
, is independent of j but distinct from the probability
, i.e.,
with
. Then
for
. In particular
for
. Therefore, by Parseval again,
(3.24)
3.3. Example: Small World Transition Probability
A case of particular interest in the study of the transmission of a signal around a cyclic biological chain corresponds to
with
. The quantity
represents the transition probability of a state
to the next state
. Then
, and
for
. Therefore
(3.25)
(3.26)
For
, (3.25) reduces to
. Note that (3.26) is half
of (3.24), as could be expected from a less dispersive signal.
In the following section our approach to modeling randomness in dynamical systems is through allowing parameters in a system to be a random process. This approach will determine when solution of this type of stochastic problem do or do not persist when the system is perturbed.
4. Mathematical Derivation for Numerical Approximation of Stochastic Differential Equations (SDE)
For the simplicity, consider
(27)
The average system is defined by the differential equation
(28)
We want to compute the deviation of the perturbed system to be average one. We consider
, then
Note that
,
It follows from limit theorem of stochastic processes (see, [2] ),
Detailed derivation of
is shown in Section 2. The stochastic processes
converge in expected sense to the solution
that is the solution to the integral equation
The distribution of
is close to the distribution of the stochastic processes
in the sense that
Thus
, as
. Thus the nature of this convergence and the sense in which the error in the formula of the expansion x is small are in the sense of distributions over some time interval.
4.1. Forward Euler Scheme for SDE
The Euler method to approximate the analytic solution of the IVP
(4.29)
We can derive an entire family of discrete numerical methods (including the Euler method) by truncating the Taylor series and utilizing Taylor’s Theorem. First, we rewrite the Taylor series expansion in differential form,
The Euler method is found by truncating Taylor series at the first derivative, giving
(4.30)
where
is the error term, or Taylors remainder term, which is of order
. So if we define
, and
, we obtain
which has local error term (error at each step) of
, giving a global error
.
4.2. Discrete Approximation for SDE
Numerical schemes for solving SDE can be classified into either explicit or implicit methods. Explicit methods compute approximations that are dependent on previous approximations only, whereas the implicit methods compute approximations that are dependent on previous and current approximations. The Euler method presented earlier is also known as the explicit Euler method. The explicit Euler method is defined as
(4.31)
where
and
. Here h is the step size. This method has a global error
. The implicit method is defined as
(4.32)
where
, and
. This method has a global error
.
4.3. Example 1
We write the algorithm for the following equation, which we will be used as a test equation since it has an analytic solution.
(4.33)
over
and then
Now,
1) choose a step
. Set
,
.
2) Generate approximation
to from the following recursion:
for
.
As
,
where
in our numerical simulation and W is realized by the normally distributed random number whose expectation and variance are 0 and 1, respectively. A computer simulation of the distribution of solution of
using Euler without max step size, ode15s without max step, and ode15s using max step size are given in Figure 1.
Figure 1. Horizontal bar plot: x vs. bins. This figure shows the distribution of
represented by the vertical axis of 500 sample paths versus bins in the horizontal axis at the final time t = 1 of the system (4.33). In this case,
, =0.67,
, and
,
. The left figure shows the distribution of
using Euler without max step size. The middle figure shows the distribution of
using ode15s solver without max step size. The right figure shows the distribution of
using ode15s using max step size.
4.4. Example 2
We consider a stochastic process
is defined by the differential equation
(4.34)
with
and limiting distribution
We want to solve (4.34) numerically over time interval
using
1) Forward Euler with smaller time scale.
2) evaluate
, where
(4.35)
and
(4.36)
3) compare 2. with 1., where
and
are the solutions of (4.35) (using large time scale) and (4.36) respectively. And
is computed using (2.15) along with
, and
. Comparison of the averaged system to the perturbed system as well as error convergence are illustrated in Figure 2 and Figure 3.
4.5. Example 3
We consider a stochastic process
is defined by the differential equation
(4.37)
with
Figure 2. Left figure:
and
vs. t. Comparison of the averaged system to the perturbed system. Red curve shows the solution of the averaged system; other curves are solutions of the perturbed system with different
. Right figure: horizontal bar plot, where the vertical axis represents the distribution of solutions for 500 sample paths of the system (4.34) at the final time t = 1. In this case,
,
,
is generated with two state Markov process with
= 0.67.
Figure 3. Error plots:
represents vertical axis vs
, which represents horizontal axis. Dashed line is the line with the reference slope 1/2. Graphs are drawn on log-log scale.
and limiting distribution
We want to solve (4.37) numerically over time interval
using
1) Forward Euler with smaller time scale.
2) evaluate
(using smaller time scale, h), where
(4.38)
and
(4.39)
3) compare 2. with 1., where
and
is the solution of (4.38) (using large time scale), and (4.39) respectively. And
is computed using (2.15) along with
and
. Comparison of the averaged system to the perturbed system as well as error convergence are illustrated in Figure 4 and Figure 5.
4.6. Example 4
We consider a stochastic process
is defined by the differential equation
Figure 4. Left figure:
and
vs. t. Comparison of the averaged system to the perturbed system. Red curve shows the solution of the averaged system; other curves are solutions of the perturbed system with different
. Right figure: horizontal bar plot, where the vertical axis represents the distribution of solutions for 500 sample paths of the system (4.37) at the final time t = 1. Here
,
,
is generated with two sate Markov process with
= 0.67.
Figure 5. Error plots:
represents vertical axis vs
, which represents horizontal axis. Dashed line is the line with slope 1/2. Graphs are drawn on log-log scale.
(4.40)
with
and limiting distribution
We want to solve (4.40) numerically over time interval
using
1) Forward Euler with smaller time scale.
2) evaluate
(using smaller time scale, h), where
(4.41)
and
(4.42)
3) compare 2. with 1. where
and
is the solution of (4.41) (using large time scale) and (4.42) respectively. And
is computed using (2.15) along with
, and
. Comparison of the averaged system to the perturbed system as well as error convergence are illustrated in Figure 6 and Figure 7.
4.7. Strong Convergence
In the examples above, directly simulated solution
with smaller time scale matches more closely to the solution
, (where
solved using larger time scale and
is the solution of
, using Euler with large time scale H) as
is decreased, the convergence seems to take place. Using
, where E denotes the expected value, leads the concept of strong convergence. A method is said to have strong order of convergence equal to m if there exists a constants K such that
(4.43)
and
is sufficiently small. It can be shown that perturbation method has strong order of convergence
. In our numerical tests, we will focus on the error at the end point
, so let
(4.44)
If the bound in (4.43) holds with
at any fixed point in
, then it certainly holds at the end point, so we have
Figure 6. Left figure:
and
vs. t. Comparison of the averaged system to the perturbed system. Red curve shows the solution of the averaged system; other curves are solutions of the perturbed system with different
. Right figure: horizontal bar plot, where the vertical axis represents the distribution of solutions for 500 sample paths of the system (4.40) at the final time t = 1. In this case,
,
,
is generated with two state Markov process with
= 0.67.
Figure 7. Error plots: The vertical axis represents
vs
, which represents the horizontal axis. Dashed line is the line with the reference slope 1/2. Graphs are drawn on log-log scale.
(4.45)
for sufficiently small
. It is shown that perturbation method has strong order of convergence
. While experimenting the error
, we implicitly assumed that number of other sources of error are negligible, including error arising from approximating an expected value by sample mean, inherent error in the random generator, and floating point roundoff errors. For a typical computation the sampling error is likely to be the most significant of these three. In preparing the programs for these simulations we found that some experimentation is required to make the number of samples sufficiently large and the time step is sufficiently small for the predicted order of convergence to be observable. The sampling error decays like
, where n is the number of sample paths used. A study in ( [19] ) indicates that as step size decreases, the lack of independence in the samples from a random generator typically degrades the computation before rounding errors becomes significant.
Although the definition of strong convergence involves an expected value, it has implications for individual simulations. The Markov inequality says that if a random variable
has a finite expected value, then for any
the probability that
is bounded above by
, that is,
Hence taking
, we see that perturbation method’s strong convergence of order
is
or, equivalently,
This shows that the error at a fixed point in
is small with probability close to 1.
4.8. Conclusion
Stochastic approximations for the process
with parametric noise can be used to analyze aspects of noise. The result of the analysis is an approximation of the form
for
of the system. This can be used to evaluate the impact of parametric noise in the neural network. First of all, the system can be averaged. The system for
may or may not be analytically solvable but we develop numerical method for its solution. Second, the next order term solves a linear system forced by a Gaussian process, whose statistics depends on the nature of noise in the model.
Our systems are characterized by some system components which combine very fast and very slow behavior. These systems require adaptable step-size, as only in certain phases they require very small step size. It is important to use integration method that allows an efficient step size control. A system is called stiff when integrated with an explicit algorithm and a local error tolerance
, the step size of the algorithm is forced down to below a value indicated by the local error estimate due to constraints imposed on it by the limited size of the numerical stable region. Ode15s is a variable-order solver based on numerical differentiations formulas, optionally uses the backward differentiations formulas (also known a Gear’s method) like ode113, ode15s is multi-step solver. If one suspects that the problem is stiff or if ode45 fails or is very inefficient, try ode15s. But this slow vs. fast time scale problem, if we do not use max step size for the ode15s, the method does not give the correct result which is reflected in Figure 1.
Future work will address systems involving noisy neural network and the impact of noise on a node’s information processing capability which is determined by its signal-to-noise ratio which can be estimated by spectral methods.
Acknowledgements
The author would like to thank Prof. Samir K. Bhowmik for his opinion and discussion. The author also thanks the unanimous referees for their careful reading the manuscript and useful suggestions that improved the paper significantly.