Constrain-based analysis of gene deletion on the metabolic flux redistribution of Saccharomyces Cerevisiae

Based on the gene-protein-reaction (GPR) model of S. cerevisiae_iND750 and the method of constraint-based analysis, we first calculated the metabolic flux distribution of S. cerevisiae_iND750. Then we calculated the deletion impact of 438 calculable genes, one by one, on the metabolic flux redistribution of S. cerevisiae_iND750. Next we analyzed the correlation between v (describing deletion impact of one gene) and d (connection degree of one gene) and the correlation between v and Vgene (flux sum controlled by one gene), and found that both of them were not of linear relation. Furthermore, we sought out 38 important genes that most greatly affected the metabolic flux distribution, and determined their functional subsystems. We also found that many of these key genes were related to many but not several subsystems. Because the in silico model of S. cerevisiae_iND750 has been tested by many experiments, thus is credible, we can conclude that the result we obtained has biological significance.


INTRODUCTION
Since various 'omics' datasets are becoming available, biology has transited from a data-poor to a data-rich environment.This has underscored the need for systems analysis in biology and systems biology has become a rapidly growing field as well [1].
A change in mathematical modeling philosophy, i.e., building and validating in silico models is also necessitated.Modern biological models need to meet new sets of criteria: organism-specific, data-driven, easily scalable, and so on.Many modeling approaches, such as kinetic, stochastic and cybernetic approaches, are currently being used to model cellular processes.Owing to the computational complexity and the large number of parameters needed, it is currently difficult to use these methods to model genome-scale networks.To date, genome-scale models of metabolism have only been analyzed with the constraint-based modeling philosophy [2,3].Genome-scale network models of diverse cellular processes such as signal transduction, transcriptional regulation and metabolism have been generated.Gene-protein-reaction (GPR) associated models can keep track of associations between genes, proteins, and reactions [4], and there have been several genome-scale GPR models, such as E. col i [4,5] , S. aureus [6], H. pylori [7], M. barkeri [8], S. cerevisiae [9] and B. subtilis [10].A reconstruction is herein defined as the list of biochemical reactions occurring in a particular cellular system and the associations between these reactions and relevant proteins, transcripts and genes [2].A reconstruction can include the assumptions necessary for computational simulation, such as maximum reaction rates and nutrient uptake rates.
Computer simulations of complex biological systems began essentially as soon as the computational capability became available.As reconstructed networks have been made publicly available, researchers around the world have undertaken new computational studies using these networks.Many researches apply a core set of basic in silico methods and often also describe novel methods to investigate different models.An extensive set of methods for analyzing these genome-scale models have been developed and have been applied to study a growing number of biological problems.As we have mentioned above, genome-scale models of metabolism have only been analyzed with the constraint-based philosophy [2,3].
The in silico models can be applied to generate novel, testable and often quantitative predictions of cellular behavior.The impact of a gene deletion experiment on cellular behavior can be simulated in a manner similar to linear optimization of growth.The results can be used to guide the design of informative confirmation experiments and will be helpful to metabolic engineering.Some gene deletion studies on genome-scale in silico organism models have been made [4][5][6][7][8][9][10], most of them are from the standpoints of distinguishing lethal and
In this paper, there are four parts.In part two, as a base for later research, we firstly calculate flux distribution of S. cerevisiae_iND750.Then we will calculate the deletion impact of 438 calculable genes, one by one, on themetabolic flux redistribution of S. cerevisiae_ iND750.Next we will analyze the correlation between v (describing deletion impact of one gene) and d (connection degree of one gene) and the correlation between v and gene v (flux sum controlled by one gene).Furthermore, we will seek out those important genes that most greatly affected the metabolic flux distribution, determine their functional subsystems.Because the in silico model of S. cerevisiae_iND750 has been tested by many experiments [5] , we can conclude that the result we got has biological significances; In part three, we introduce the GPR model, some properties of the in silico model of S. cere-visiae_iND750 (SBML format) and the method of constraint-based analysis; Part four is a simple discussion.

Metabolic flux distribution and of S. cere-visiae_iND750
As a base for the later compared research, we here calculated the flux distribution of S. cerevisiae_iND750 [9].What we use is Sc_iND750_GlcMM.xml, the SBML file that is presented with the reconstruction of S. cere-visiae_iND750 [11].The computational method we used is flux balance analysis (FBA) [11], one of the most fundamental genome-scale phenotypic calculations, which can simulate cellular growth.FBA is based on linear optimization of an objective function, which typically is bio-mass formation.Given an uptake rate for key nutrients and the biomass composition of the cell (usually in mmol component gDW -1 and defined in the biomass objective function), the maximum possible growth rate of the cells can be predicted in silico.We use the COBRA toolbox [11] to carry out this computation of FBA.The flux distribution of S. cerevisiae_iND750 is illustrated in ).Rxns is the reaction set in the model.

Impact of gene deletion on the metabolic flux redistribution
There are 750 genes in the model of S. cere-visiae_iND750, but we can not calculate the impact of every gene deletion.If a single gene is associated with multiple reactions, the deletion of that gene will result in the removal of all associated reactions.On the other hand, a reaction that can be catalyzed by multiple non-interacting gene products will not be removed in a single gene deletion.Among 750 genes of S. cerevisiae_ iND750, there are 438 genes which have no "OR" relationship with other genes in every reaction of S. cere-visiae_iND750, and by the aid of the COBRA toolbox [11], we can calculate the impact of their deletion.We define the impact of one gene deletion on the whole metabolic flux redistribution as v Where i v and i v′ are respectively the flux value of i-th reaction of the model of S. cerevisiae_iND750 before and after a single gene deleting, and R is the whole reaction set.
Figure 2 shows the deletion impact of these 438 genes.From the figure, we can't see what are important genes which greatly affect, if deleted, the metabolic redistribution of S. cerevisiae_ iND750.This remains as a problem, and we will settle it in section 4).In the following, we will analyze the relationship between the impact of every gene deletion (v) and the connection degree of every gene (d) and the connection degree of every gene ( gene v ).

Correlation between v and d (connection degree of every gene)
We compute out the related reaction number d of every gene in those 438 genes of the S. cerevisiae_iND750 model, as illustrated in Figure 3. From the figure, we can find that some but not many genes have high d value, but we don't know whether they affect metabolic flux distribution greatly.

Key genes that affect metabolism most greatly
In this section, we seek out what are important or key genes which greatly affect the metabolic redistribution of S. cerevisiae_iND750, and furthermore in next section, we will give their belonged functional subsystems.Table 1 provides the corresponding relationship between gene number (GN) and v scope, and as an example, GN=100 & v= (0-0.1)×10 7means that there are 100 genes while the v scope which these genes control is (0~0.1)×10 7.
We define those genes with v>3.0×10 7 as key genes, and there are 38 genes.

Functional subsystems to which these key genes belong
If a gene catalyze a reaction and while the reaction belong to a certain subsystem, we will say that the gene belong to the subsystem.Functional subsystems about important genes in the metabolic system of micro organism are seldom reported.We list the functional subsystems to which every key gene belong, and several genes appear in more than one subsystem, shown in Table 2.
We will find that many of 38 key genes are related to 26 but not several subsystems.

Gene-protein-reaction (GPR) associated model
The association between genes and reactions is not a one-to-one relationship.Many genes may encode suunits genes that encode so-called promiscuous enzymes that can catalyze several different reactions.So it is necessary to keep track of associations between genes, proteins, and reactions and to distinguish "&" and "OR" associations in GPR models.Examples of different types of GPR associations are illustrated in the figures of Ref. [4,14].

GPR model structure of S. cerevisiae_ iND750
The in silico model that we used is S. cerevisiae_iND750 [9], a metabolic reconstruction consisting of the chemical reactions that transport and interconvert metabolites within yeast.This network reconstruction was based on a previous reconstruction, termed S. cerevisiae_iFF708 [15].The general features of S. cerevisiae_iND750 are shown in the Table 1 of Ref. [9].A SBML format file to the model S. cere-visiae_iND750 can be downloaded from the supplementary information of Ref. [11].SBML file properties are given in the supplementary of Ref. [9].The dimensions of rxns, mets, and genes are respectively 1266, 1061, 750.We can use Cytoscape [12] to draw the GPR-network (Figure 7) of S. cerevisiae_ iND750, which contains 3077 nodes and 6666 edges.
The minimal media of in silico model is an important aspect.The computational minimal media of S. cere-visiae_iND750 is also included in Ref. [9].In the method of constraint-based analysis, the biomass objective function (BOF) should be defined.The BOF was generated by defining all of the major and essential constituents that make up the cellular biomass content of S. cerevisiae [9].Gene-protein-reaction associations embodied in rxnGeneMat matrix, which is a matrix with as many rows as there are reactions in the model and as many columns as there are genes in the model.The i-th row and the j-th column contains a one if the j-th gene in genes is associated with the i-th reaction in rxns and zero otherwise.

Constraint-based analysis
In silico modeling and simulation of genome-scale biological systems are different from that practiced in the physicochemical sciences.A network can fundamentally have many different states or many different solutions.Which states (or solutions) are picked is up to the cell and based on the selection pressure experienced, and such choices can change over time.Therefore, constraint-based approaches [2,3] to the analysis of complex biological systems have proven to be very useful.This difference between the physicochemical sciences and the physical sciences or engineering is illustrated in Ref. [14].All theory-based considerations (i.e., engineering and physics) lead one to attempt to seek an "exact" solution, and typically computed based on the laws of physics and chemistry.However, constraint-based considerations (as in biology) are useful.Not only can a network have many different behaviors that are picked based on the evolutionary history of the organism, but also these networks can carry out the same function in many different and equivalent ways.

Representation of reconstructed metabolic network
Before calculation and simulation, the reconstructed metabolic network must be represented mathematically.The stoichiometric matrix, S, is the centerpiece of a mathematical representation of genome-scale metabolic networks.It represents each reaction as a column and each metabolite as a row, where each numerical element is the corresponding stoichiometric coefficient.A graphical form of the first few reactions of glycolysis and the corresponding stoichiometric matrix are shown in the Figure 2 of Ref. [11].
An upper and lower bound for the allowable flux through each reaction also requires defining.This represents the lowest and highest reaction rate possible for each reaction.The set of upper and lower bounds is represented as two separate vectors, each containing as many components as there are columns in S, and in the same order.An example is shown in Figure 2 of Ref. [11].In many cases, reversible reactions are defined to have an arbitrary large upper bound and an arbitrarily large negative lower bound.Irreversible reactions have a lower bound that is nonnegative, usually zero.
In order to predict meaningful fluxes, setting upper and lower bounds is especially important for exchange reactions which serve to uptake compounds to the cell or secrete compounds from the cell.The lower bound of the exchange reaction column must be a finite negative number using this orientation (e.g., glucose).The upper bound of the exchange reaction column must be greater than zero.At least one of the reactions in the model must have a constrained lower/upper bound, and typically, the substrate (e.g., glucose or oxygen) uptake rates are set to experimentally measured values.The upper and lower bounds for exchange reactions are quantitative in silico representations of the growth media environment.

Biomass objective function (BOF) and minimal media
The constraint-based approach is based on the assumption that cells strive to maximize their growth rate.This assumption which provides an acceptable starting point for many types of computations is satisfied by simulating maximal production of the molecules required to make new cells (biomass precursor molecules).In spite of their limitations, the predictive power of genome-scale models of metabolic networks has been demonstrated in diverse situations through careful experimentation [11].
The biomass objective function (the function v growth , see below) is a special reaction taking as substrates of all biomass metabolites, ATP and water, and producing ADP, protons, and phosphate (as a result of the non-growth associated ATP maintenance requirement) [6].
The minimal media is determined computationally with the systematic testing of distinct inputs.Different combinations of molecules were allowed to enter the reaction network until the minimal group that allowed biomass production, or non-zero Z (see below), was found [6].It is only concerned that some amount of biomass production is calculated but do not discriminate between extremely slow, inefficient growth and rapid growth.

Computation of phenotypic states
In genome-scale metabolic networks, the fluxes within a cell usually cannot be uniquely calculated because a range of feasible values exist when fluxes are subjected to known constraints.Flux-balance analysis (FBA) was used to find optimal growth phenotypes.Briefly, a large-scale linear programming was used to find a complete set of metabolic fluxes (v) that are consistent with steady-state condition (eq. 3) and reaction rate bounds (eq.4), and at the same time maximize the biomass objective function in the defined ratio.This corresponds to the following linear programming problem [6]: ) where S is the stoichiometric matrix, and where α i and β i define the bounds through each reaction v i .The flux range was set arbitrarily high for all internal reactions so that no internal reaction restricted the network, with the exception of irreversible reactions, which have a minimum flux of zero.The inputs to the system were restricted to a minimal media.
The value of Z computed with the above procedure can either be zero (predicting no growth) or greater than zero (corresponding to cellular growth) depending on the inputs and outputs that are allowed, according to the nutrients provided in the media.

Gene deletion study
The effect of a gene deletion experiment on cellular

SciRes Copyright © 2008 JBiSE
growth can be simulated in a manner similar to linear optimization of growth [5,11].Gene-reaction associations model the logical relationship between genes and their corresponding reactions.If a single gene is associated with multiple reactions, the deletion of that gene will result in the removal of all associated reactions, i.e. to simultaneously restrict the fluxes (upper and lower flux bounds) of these reactions to zero prior to computing maximal biomass objective function.On the other hand, a reaction that can be catalyzed by multiple non-interacting gene products will not be removed in a single gene deletion.The possible results from a simulation of a single gene deletion are unchanged maximal growth (non-lethal), reduced maximal growth or no growth (lethal).Those genes were considered essential if no biomass could be produced without their usage.

DISCUSSION
Based on the gene-protein-reaction model of S. cere-visiae_iND750 and the methodology of constraint-based analysis and by the aid of the COBRA toolbox and MATLAB software, we have calculated the deletion impact of 438 calculable genes, one by one, on the metabolic flux redistribution of S. cerevisiae_iND750.We found that both of the v-d correlation and the vgene v correlation were not of linear relation.Although some properties about the metabolic network of microorganisms have been reported in literatures [15][16][17][18][19], our research will provide further evidences to the properties about the metabolic network, because the measure we defined is different.
Furthermore, we sought out 38 important genes that most greatly affected the metabolic flux distribution, determined their functional subsystems and found that many of 38 key genes were related to many but not several subsystems.From these results, we speculate that many but not several subsystems are important subsystems in the metabolism of S. cerevisiae and that this may increase the robustness of the metabolic network.
As a next step, we will do similar research on other organisms and compare them with the case of E. coli.Although it is theoretically possible to attempt double deletion of every possible gene pair experimentally, the sheer number of possible two-gene deletions makes this virtually impossible.However, computational predicttions of double gene deletion phenotypes can be made in a matter of hours [11].This will also become our work in the future.

Figure 1 .
Figure 1.Flux distribution of S. cerevisiae _iND750.X-axis indicating every reaction in rxns (the order is as the same as in rxns, total 1266) and y-axis indicating the value of its corresponding flux (unit is mmol gDW -1 h -1

Figure 2 .Figure 3 .Figure 4 .
Figure 2.The deletion impact of calculable 438 genes of the S. cerevisiae_iND750 model.X-axis indicating every gene in 438 genes (the order is as the same as in genes, total 438) and y-axis indicating its impact v.

Figure 4
Figure4is the scatters diagram (d, v), total 438 data pairs.From the diagram, we can easily find that the relationship between d and v is not of linear relation.So high-d genes and low-d genes are equally important to the metabolism of S. cerevisiae_iND750.2.2.3.Correlation between v and gene v

Figure 6 Figure 5 .Figure 6 .
Figure5.The controlled reaction number of every gene in 438 genes of the S. cerevisiae_iND750 model.X-axis indicating every gene in 438 genes (the order is as the same as in genes, total 438) and y-axis indicating the number of its controlled reactions.

Table 1 .
Gene number (GN) and v scope v (

Table 2 .
The functional subsystems and their related genes of S. cerevisiae_iND750