Reducing Stochastic Discrete Models of Biochemical Networks

Biochemical systems have numerous practical applications, in particular to the study of critical intracellular processes. Frequently, biochemical kinetic models depict cellular processes as systems of chemical reactions. Many biological processes in a cell are inherently stochastic, due to the existence of some low molecular amounts. These stochastic fluctuations may have a great effect on the biochemical system’s behaviour. In such cases, stochastic models are necessary to accurately describe the system’s dynamics. Biochemical systems at the cellular level may entail many species or reactions and their mathematical models may be non-linear and with multiple scales in time. In this work, we provide a numerical technique for simplifying stochastic discrete models of well-stirred biochemical systems, which ensures that the main properties of the original system are preserved. The proposed technique em-ploys sensitivity analysis and requires solving an optimization problem. The numerical tests on several models of practical interest show that our model reduction strategy performs very well.


Introduction
Modelling and simulation of cellular processes are subjects of significant interest in fields such as Computational and Systems Biology. Biological processes at the level of a single cell are commonly portrayed as systems of biochemical reactions.
The dynamics of various biochemically reacting systems may exhibit random fluctuations, due to some molecular species which exist in low amounts. These random fluctuations could have critical implications in biology [1] [2]. Then, biochemical networks have high dimension, and thus their numerical simulation is quite demanding. Moreover, numerous biochemical systems evolve on multiple time scales, leading to stiffness. Stiffness is a challenge for numerical simulations. A large number of the species and/or reactions of some biochemical systems encountered in applications imply that their mathematical models have a significant number of parameters. Usually, these parameters cannot be measured accurately. For many biochemical networks, it is difficult to determine the parts of the system which are critical in determining their behaviour. Therefore, it is critical to design accurate and robust strategies to simplify these complex biochemical systems, which maintain the key properties of the original network.
The reduced models will be easier to analyze and simulate numerically. In addition, they can be utilized to predict and control the behaviour of the system. Furthermore, the simplified models have a lower number of parameters, which thus become easier to determine. Reduction techniques for deterministic models of biochemical systems include species and reaction lumping methods [6] [7], sensitivity analysis based schemes [8] and time-scale analysis based strategies [9] [10]. Lumping schemes cause a loss of the physical interpretation of the elementary reactions and of information on some species or reactions. These schemes are useful when poor data exist about certain reactions. Time-scale analysis applies to systems with reactions that can be partitioned into fast and slow events, for which the fast components are in a quasi-steady state. For complex networks of biochemical reactions, finding the constraints for the fast dynamics may be a challenging task. Sensitivity analysis examines the dependence of the system's behaviour on its parameters. In particular, local sensitivity analysis of models of biochemical systems measures the variation of the system's state with small perturbations in model's parameters [11]. Sensitivity analysis makes it possible to identify and discard the reactions which are not essential, those with a negligible parametric sensitivity with respect to their rate parameters [8]. Few reduction techniques for stochastic models of biochemical systems exist in the literature.
These include [12] [13] [14] for discrete models and [15] [16] for continuous ones (see also references therein). Local sensitivity analysis quantifies the variations in the system's behaviour caused by small changes in its parameters. Our method relies on estimating (local) parametric sensitivities of the Chemical Master Equation model. These parametric sensitivities are approximated using the Coupled Finite Difference (CFD) scheme [17] and the Common Random Number (CRN) method [18]. Sensitivity analysis is performed to detect which parameters, and thus which reactions, are good candidates for elimination. It should be noted that local sensitivity analysis alone may not be a reliable tool for model reduction. It may indicate that a fast reaction can be eliminated, although that reaction is important. Elimination of a fast reaction means that the value of its rate parameter changes significantly. Nevertheless, local sensitivity analysis provides no information on the perturbations in the system's dynamics induced by large variations in its parameters. To correct this, local sensitivity analysis is coupled with a global approach to model reduction, which requires solving an optimization problem. For achieving this, we shall apply a strategy akin to that proposed in [8] for reducing continuous deterministic models of chemical reactions. This strategy ensures that the simplified biochemical system retains the stability and nonlinear properties of the original network.
The paper is organized as follows. In Section 2, we present the background on stochastic discrete models of homogeneous biochemical systems, and methods to simulate and estimate parametric sensitivities for these models. In Section 3, we propose a new model reduction strategy for the Chemical Master Equation.
The advantages of the proposed strategy are illustrated on three critical models arising in applications, namely the infection, the epidermal growth factor receptor signalling pathway and the gemcitabine biochemical systems, in Section 4.

Chemical Master Equation
be the conditional probability that the state vector at time t is ( ) x . Then, satisfies the Chemical Master Equation [3] ( ) where j ν is the state-change vector for reaction j R . The Chemical Master Equation is a refined stochastic discrete model of homogeneous biochemical systems.
It is worth mentioning that, generally, the CME is a high dimensional model.

Sensitivity Analysis
Sensitivity analysis is a key tool for investigating the robustness properties of a model with respect to perturbations in its parameters, for estimating these parameters or for guiding model reduction. Models of biochemically reacting systems depend on several parameters, such as the reaction rate parameters and the initial conditions. In many instances, the values of several such parameters are not available or cannot be evaluated accurately.
Here h is a small perturbation of the parameter c. In what follows, c represents a kinetic rate parameter.
be trajectories generated by Monte Carlo simulations, for the parameters c h + and c, respectively. Here n is the trajectory index (1 n L ≤ ≤ ) and L is the total number of trajectories. In this case, a finite-difference numerical approximation of the sensitivity, utilizing Monte Carlo simulations, is It is widely known that variance reduction strategies allow more precise esti-  [18] and the Coupled Finite-Difference (CFD) scheme proposed by Anderson [17]. For further information on the CFD method, using the random time change representation [19] of the Markov process ( ) t X , we refer the reader to [17]. The CRN method is presented below. Notice that the CRN and CFD techniques are based on exact Monte Carlo simulation schemes for the Chemical Master Equation.
Common Random Number (CRN) The common random number technique is utilized for estimating parametric sensitivities of stochastic discrete biochemical kinetic models. The CRN estimator has a low variance [18]. To estimate the sensitivity of the mean , with respect to a parameter c, two systems will be considered: the unperturbed system ( ) The CRN algorithm consists of the following steps: For each n L ≤ , with L being the number of sample paths, 2) Generate an SSA path for the unperturbed system (for parameter c), with a sequence of independent uniform (0, 1) random numbers, ( ) ( ) 3) Generate an SSA path for the perturbed system (for parameter c h + ), with the same sequence of independent uniform (0, 1) random number, ( ) ( )

5)
End loop over n. 6) Compute the mean of ( ) ( ) Remark that the perturbation h should be small enough to decrease the truncation error associated with finite-difference approximations of derivatives, but large enough to increase the convergence rate of the sensitivity estimation [17].
The CRN algorithm is a popular technique for sensitivity estimation, being efficient and easy to implement. The parametric sensitivity estimators discussed above is utilized to simplify complex stochastic models of homogeneous biochemical networks. Alternatively, a finite-difference scheme may be employed for estimating sensitivities by approximate Monte Carlo simulation techniques [20] [21], such as adaptive tau-leaping methods; these strategies are efficient on moderately stiff to very stiff models.

Model Reduction
In this section, we introduce a new procedure to simplify stochastic discrete biochemical networks, which are modelled with the Chemical Master Equation.
The goal of this procedure is to determine a reduced biochemical reaction system which retains the essential features of the full system, such as its stability features, nonlinear behaviour and physical interpretation of the elementary reac- tions. The reduced biochemical model will be faster to solve numerically and easier to investigate and understand. Local sensitivity analysis is applied to uncover the unimportant reactions of the full biochemical network. Also, it seeks to find the essential reactions, which should be retained in a simplified model.
Further, our strategy extends the global approach for continuous deterministic chemical system reduction proposed in [8]. Remark that the reduced network generated with the procedure presented below is expected to perform well over a variety of parameter values, in a neighbourhood of the values for which it was obtained.
Initially, a local sensitivity analysis is performed for the stochastic discrete model of the original biochemical system. Of interest are the kinetic parameters, These reaction channels should be kept in the system. Moreover, the sensitivity analysis also helps determine the reactions for which small perturbations in their rate parameters cause insignificant changes in the overall behaviour of the system. More precisely, the kinetic parameter i c of such a reaction satisfies for all the quantities of interest k X . These reactions could be unimportant, and may be considered for deletion. Stability is a desired property of a simplified reaction network and a reduction technique based on local sensitivity analysis of the original system alone cannot guarantee it. As a consequence, we shall use a global approach to ensure that our strategy leads to a stable reduced system. For this, we consider the following continuous optimization problem for deterministic models (see also [8] Note that x and z are the state vectors of the original and the reduced biochemical systems, respectively, modelled with the reaction rate equations. Also, ν is the stoichiometric matrix, ( ) Remark that the optimization problem is formulated in a continuous deterministic framework. Empirically, we found that posing an optimization problem for the reaction rate equations, rather than for the much more challenging Chemical Master Equation model, yields an accurate reduction mechanism for biochemical systems with a low to moderate level of noise. Moreover, this procedure can handle efficiently large biochemical networks as well.
For solving the optimization problem (4), we utilize GEKKO [22], an optimizer toolbox. The modes of operation of GEKKO include parameter regression, data reconciliation, real-time optimization, dynamic simulation, and nonlinear predictive control. For our reduction procedure, we apply dynamic estimation with the APOPT solver. After the optimization problem is solved, we isolate the coefficients i d which are very small and set them to 0. This means that the associated reactions will be deleted from the biochemical system. The remaining

Numerical Examples
In this section, we illustrate the advantages of the proposed model reduction

Infectious Disease Model
The infectious disease model is a simple example of a two species reaction network [23]. The reaction channels of the infectious disease model and their rate parameters are given in Table 1 The parametric sensitivities with respect to c 2 are estimated using the CRN and the CFD methods on 10,000 sample paths. Figure 1 depicts the evolution in time of the fully normalized sensitivities of the species S 1 and S 2 , with respect to the kinetic parameter c 2 . The approximations obtained with the two finite-difference sensitivity estimators, the CFD and the CRN algorithms, are in good agreement.
We note that, for i = 1 and 2, the normalized sensitivities with respect to c 2 are very small. Thus, reaction R 2 is a good candidate for elimination, and this is validated by solving the optimization problem.
The full and reduced (without reaction R 2 ) models are simulated with the SSA on 10,000 trajectories. The means and standard deviations of the molecular counts of S 1 and S 2 as functions of time, for the full and reduced models, are plotted in Figure 2 and Figure 3, respectively. The match between the numerical results for the reduced, in which reaction R 2 is deleted, and the full models is excellent in each case.

Epidermal Growth Factor Receptor (EGFR) Model
The epidermal growth factor receptor signalling pathway participates in cell differentiation and proliferation. The role of the epidermal growth factor in cancer cell proliferation has recently been the subject of intense research. The biochemical reaction network which models the EGFR signalling pathway consists of 23 molecular species undergoing 47 reaction channels [24].
The EGFR model reactions and their rate parameter values are given in Table   2. The molecular species and their initial molecular counts are listed in Table 3.

Gemcitabine Model
The final numerical experiment is performed on the gemcitabine (2,2-difluorodeoxycytidine, dFdC) biochemical network, which is a critical, real-world model [26] [27] designed to study the mechanisms of resistance to gemcitabine effectiveness. Gemcitabine is a chemotherapy drug, commonly administered for the   To approximate its parametric sensitivities, we apply the common reaction number (CRN) algorithm with 1000 trajectories. At this step, we select some reactions which local sensitivity analysis predicts to be possible choices for deletion. These reactions are then considered an initial collection, when solving the optimization problem for this system. The model has many parameters and finding the global minimum in its large-dimensional parameter space is a challenge. Therefore, we use sensitivity analysis to determine a set of reactions deemed critical in driving the behaviour of the system. For this model, we selected 14 important reactions (and set their 1 i d = ) out of a total of 29. Also, we choose the following parameter values for the optimizer: 2 = and 23 u = and 1 r ≈ . Solving the optimization problem suggests that reactions 2 3 12 14 16 , , , , R R R R R and 18 R may be eliminated without a significant impact on the accuracy and stability properties of the model.
To validate the solution of the optimizer, we apply the SSA with 1000 runs to simulate both the reduced (resulted after turning off reactions 2 3 12 14 16 , , , , R R R R R and 18 R ) and the original biochemical network. In Figure 7 and Figure 8, we compare the plots of the time-evolution of the the means and standard deviations for the molecular counts predicted by the full and the reduced model. In the reduced network, the following reactions were eliminated: 2 3 12 14 16 , , , , R R R R R

Conclusions
In the present work, we proposed a numerical procedure for model reduction of homogeneous, discrete stochastic biochemical networks. The dynamics of these networks are governed by the widely used model of the Chemical Master Equation. This model has a diverse range of critical practical applications. Numerous biochemical networks arising in practice are complex, consisting of a large number of species subject to many reaction channels. In addition, the network of interactions of some components of the biochemical system may be quite complicated. Furthermore, biochemical processes present at the cellular level typically entail slow and fast reactions, meaning that their mathematical models are stiff.
Stiffness and/or high dimensionality constitute significant challenges for numerical simulation and analysis of these models.
The model reduction procedure developed in this paper utilizes sensitivity analysis of stochastic discrete models of biochemical systems and requires solving a nonlinear optimization problem. The procedure is expected to have an excellent performance on a broad class of biochemical systems, with moderate levels of noise, and various degrees of stiffness. We tested the reduction methodology described above on three realistic biochemical networks, two of these having applications to cancer research. The numerical experiments carried out on these models show that the reduced biochemical network retains the properties of the original version while preserving its overall behaviour.