Efficient Numerical Optimization Algorithm Based on New Real-Coded Genetic Algorithm, AREX + JGG, and Application to the Inverse Problem in Systems Biology

In Systems Biology, system identification, which infers regulatory network in genetic system and metabolic pathways using experimentally observed time-course data, is one of the hottest issues. The efficient numerical optimization algorithm to estimate more than 100 real-coded parameters should be developed for this purpose. New real-coded genetic algorithm (RCGA), the combination of AREX (adaptive real-coded ensemble crossover) with JGG (just generation gap), have applied to the inference of genetic interactions involving more than 100 parameters related to the interactions with using experimentally observed time-course data. Compared with conventional RCGA, the combination of UNDX (unimodal normal distribution crossover) with MGG (minimal generation gap), new algorithm has shown the superiority with improving early convergence in the first stage of search and suppressing evolutionary stagnation in the last stage of search.


Introduction
A system is not just an assembly of more than two components which have unique function and show timevariant behavior.Since there are principles that govern at the system-level, mere collection of database of components does not lead to the understanding of system's functional property.The aim of systems biology is to understand how functional properties and behavior of living organisms are brought about by the interactions of their constituents such as genes, proteins, metabolites and so on.The research strategies of systems biology can be divided into the following four fields: 1) System identification: inference of interaction between system components, 2) System analysis: dynamics of time-variant components, 3) System control: control toward the desirable condition of the system, 4) System design: design the system which realizes a certain dynamic or timevariant behavior, which expands to the research field, "Synthetic Biology".These research developments are strongly supported by mathematical and computational methodologies, such as inference algorithm, statistical analysis, numerical calculation, nonlinear optimization, computer simulation and so on.Especially at the research field of systems identification, in order to infer the interaction among systems components, we have to develop the powerful inferring engine, in which large numbers of real-coded parameter values related to the interactions can be estimated efficiently using experimentally observed time-course data; algorithm of powerful numerical optimization for inverse problem involving more than 100 numbers of numerical parameters.
Organizationally complex systems such as gene regulatory networks and metabolic pathways are composed of many interacting components.In the many cases, the details of the molecular mechanism that govern interactions among system components are not well known, however, how do we represent mathematical model of such complex processes?most of theses processes are nonlinear.Description of these processes requires a representation that is general enough to capture the essence of the experimentally observed response.One of the best approaches that satisfy this requirement is the "S-system [1]" which is a type of power-law formalism because it is based on a particular type of ordinary differential equation in which the component processes are characterized by power-law functions (see Section 2).The S-system is suitable for conceptual modeling and describing organizationally complex systems including loop or cyclicstructure between system components.However, S-system formalism has a major disadvantage; this formalism includes large number of parameters must be estimated.This number is 2n (n + 1), where n is the number of system components.
Tominaga et al. have developed the numerical optimization method based on the simple genetic algorithm (SGA) in S-system formalism [2].However, the SGA has two problems that early convergence in the fast stage of search and evolutionary stagnation in the last stage of search.Then, real-coded genetic algorithms (RCGAs) [3] have attracted attention as alternative numerical optimization methods to the SGA.One of the crossover operators for RCGA, known as the unimodal normal distribution crossover (UNDX) [4,5], has shown good performance in optimizing various functions including multimodal ones and benchmark functions with epistasis among the parameters.Sato et al. have proposed a new generation-alternation model called the minimal generation gap (MGG) [6,7], to improve early convergence in the first stage of search and to suppress evolutionary stagnation in the last stage of search.So far, we have developed efficient numerical optimization method using S-system modeling and RCGAs, with a combination of the UNDX and the MGG.This method can be applied to not only the estimation of gene networks but also the estimation of metabolic regulatory network.Shikata et al. have applied this method to the estimation of the amino acid network [8].However, this method has not been able to estimate 20 amino acids' network (n = 20), because this method converges slowly in the early stage of search, followed by the stagnation of optimization.
Therefore, in order to achieve the estimation of the larger network involving more than 100 real-coded parameters, we have developed a new estimation method using RCGA.The research group of Kobayashi has proposed just generation gap (JGG) [9,10], modified MGG.Akimoto et al. have reported that a combination of the adaptive real-coded ensemble crossover (AREX) with JGG showed the superiority in most of the benchmark functions [11].In this study, we have applied new RCGA, AREX + JGG, to the inverse problem in Systems Biology, that is inference of regulatory network in genetic system and metabolic pathways for realizing experimentally observed time-course data of system components, and have compared with conventional RCGA, UNDX + MGG.

S-System Formalism
The S-system [1] is a type of power-law formalism because it is based on a particular type of ordinary differential equation in which the component processes are characterized by power-law functions: where n is the total number of state variables or reactants (x i ), i, j (1 ≤ i, j ≤ n) are suffixes of state variables.The terms g ij and h ij are interactive effectivity of x j to x i .The first term represents all influences that increase x i , whereas these cond term represents all influences that decrease x i .In a genetic network context, the non-negative parameters α i and β i are called relative inflow and outflow of gene x i , respectively, and real-valued exponents g ij and h ij are referred to the interrelated coefficients between genes x j and x i .Given the experimentally observed time-course data of x i , the task is to estimate the values of interrelated coefficients g ij , h ij and non-negative constant α i and β i , which can be realized experimentally observed time-course of x i .The S-system formalism has a major disadvantage in that this formalism includes large number of parameters that must be estimated (α i , β i , g ij , h ij ); the number of estimated parameters in S-system formalism is 2n (n + 1), where n is the number of state variables (x i ); the number of estimated parameters increases with the second power of the number of system component (n).

Optimization Procedure
Because the S-system is a formalism of ordinary nonlinear differential equations, the system can easily be solved numerically by using a numerical calculation program to be customized specifically for this structure.However, when an adequate time-course of relevant state variables is given, the set of parameter values α i , β i , g ij and h ij , in many cases, will not be uniquely determined, as it is highly possible that other sets of parameter values will also show a similar time-course.Therefore, even if one set of parameter values that explains the observed time-course is obtained, this set is still one of the best candidates to explain the observed time-courses.Our strategy is to explore and exploit these candidates within the immense searching space of parameter values.
In this problem, each set of parameter values to be estimated is evaluated using the following procedure: Suppose that cal where D is the total number of data sets observed under different experimental conditions, such as gene disruption and over expression, N is the number of experimentally observable state variables, and T is the number of sampling points over time in one experimental condition.The computational task is to determine the set of parameter values that minimizes the objective function E.

Real-Coded Genetic Algorithms (RCGAs)
The genetic algorithm (GA) is stochastic search and optimization technique based on the mechanism of biological evolution, such as natural selection and natural genetics.The GA can seek out the "best" solution which will be found in regions of the search space containing a relatively high proportion of "good" solutions, that is, the GA can escape from trapping in local minima.The GAs employing real-valued vectors for representation of the chromosomes is called RCGAs [3].The RCGAs are one of promising evolutional algorithms and are applied to several real world problems and have shown good results [12,13].We have developed an efficient computational technique based on the RCGA called UNDX + MGG [14][15][16] and have applied to the inference of genetic interaction.In this study, we propose a new method using the RCGA, the combination of AREX (adaptive real-coded ensemble crossover) with JGG (just generation gap).

UNDX
The UNDX [4,5], unimodal normal distribution crossover, generates offspring using the normal distribution defined by three parents, as shown in Figure 1.Offsprings are generated around the line segment, which is connecting two parents, P 1 and P 2 .The third parent, P 3 , is used to determine the standard deviation of the normal distribution.The standard deviation, which corresponds to the coordinate axis along the line segment, is proportional to the distance between P 1 and P 2 .The others are proportional to the distance of P 3 from the line segment and multiplied by 1 l , where l is the number of parameters must be estimated; 2n (n + 1) for S-system.The effect of 1 l is to keep the distribution of offspring around line segment against increasing the number of parameters must be estimated.The mathematical representation of UNDX is as follows: where c 1 and c 2 are offspring vector, P 1 and P 2 are parent vectors, m is the middle point of P 1 and P 2 , d 1 is the distance between P 1 and P 2 , d 2 is the distance of the third parent, P 3 , from the line connecting P 1 to P 2 .α and β are constants given by the user, and α = 0.5 and β = 0.35 are recommended.The e 1 is the basis vector in a direction parallel to the coordinate axis along the line segment, and vectors e i (i = 2, 3, •••, l), which are linearly independent basis vectors perpendicular to the vector e 1 , are defined by Gram-Schmidt procedure for finding orthogonal vectors.

MGG
The MGG [6,7] is the generation-alternation model.Figure 2 shows the conceptual figure of the MGG the procedure of which can be summarized as follows:

1) Generation of initial population
Make an initial population that is composed of random real number vectors.
2) Selection for reproduction Select a pair of individuals by random sampling without there placement from the population.The selected pair of individuals becomes the parents of offspring.
3) Generation of offspring Generate offspring by applying crossover to the parents and evaluate the offspring.
4) Selection for survival Select two individuals from the family containing the parents and their offspring; one is the best individual and the other is chosen using a roulette-wheel selection.Replace the parents chosen in step 2 in the population with the two selected individuals.
5) Repeat the above procedures from step 2 to step 4 until a certain stop condition is satisfied.

AREX
The AREX [11], adaptive real-coded crossover, is a crossover has been developed to improve the premature convergence in early stage.Akimoto et al. has proposed the AREX based on the REX [10,17], real-coded ensemble crossover, which is the generalization of UNDX-m [18].The AREX generates λ candidate points c i (1 ≤ i ≤ μ; μ is the number of parents) around the center   1 1 j j  of μ parents P j (1 ≤ i ≤ μ), as shown in Figure 3.And the mathematical representation of the REX is as follows: where α is the expansion rate, all , i j are independently and identically distributed (i.i.d.) random numbers with the average 0 and the variance  the normal distribution with mean vector 0 and covariance matrix Equation (5).
The REX shows the good performance in ill-conditioned and variable dependent problems because of the invariance against scale transformations and rotation transformations.However, Akimoto et al. have reported that there are the situations where the REX is likely to cause early convergence [11].Furthermore, being combined the adaption of expansion rate (AER) technique and the crossover mean descent (CMD) technique with the REX, they have proposed the AREX to solve the early convergence.
The AER is the expansion rate which is controlled in the middle of the ongoing search process in AREX [11,17].The CMD is a technique to shifts the center of the crossover to promising offspring point.The offspring are generated around the weighted average based on the ranking of evaluation values in parents.The CMD, m, is as follows: where is the weight coefficient under the condition of . 1 The AREX is the crossover applying the CMD and the AER to the REX.The mathematical representation of the AREX is as follows:

JGG
The JGG was proposed for multi-parental crossovers [9,10], and it is a modification model of MGG in two major points.First, the JGG does not employ elitist selection in evolution strategies.This modification makes a good effect on multimodal problems.Second, the JGG replaces parents with children completely in every generation.Figure 4 shows the conceptual figure of the JGG, and the procedure of the JGG can be summarized as follows: 1) Generation of initial population Make an initial population that is composed frandom real number vectors.
2) Selection for reproduction Select randomly μ parents from the population.The selected μ of individuals becomes the parents of offspring and μ = l + 1 is recommended [9,10], where l is the number of parameters must be estimated.
3) Generation of offspring Generate offspring by applying crossover to the parents and evaluate the offspring.
4) Selection for survival Select the top ranked μ individuals with respect to the fitness value from the offspring.And replace the parents chosen in step 2 in the population with the μ selected individuals.
5) Repeat the above procedures from step 2 to step4 until a certain stop condition is satisfied.

Results
We prepared an S-system network models composed of six genes, as shown in Figure 5, and of seven genes, as shown in Figure 6.In these figures, the arrow and T bar represent active and inhibitory interactions, respectively, and the numerals show the value of g ij in S-system formalism.The S-system representation of Figures 5 and  6 are shown in Table 1.We applied the conventional RCGA method (UNDX + MGG) and the proposed method (AREX + JGG) for inferring gene expression network candidates to these artificial gene regulatory network models.We prepared 7 types of time-course data (one a "wild-type" and six "one gene-disrupted strain") for six genes network model (Figure 5) and 8 types of time-course data (one a "wild-type" and seven "one gene-disrupted strain") for seven genes network model (Figure 6), respectively.The number of sampling points in each time-course data set was 70.We calculated time-course data sets for the "one gene-disrupted strain" with the rate constant for the synthetic process of disrupted gene i(α i ) set to 0.
Using these networks, we prepared 4 case studies to compare the performance of UNDX + MGG and AREX + JGG.The case studies 1 to 3 are for the estimation of the parameters of six genes network model, and case study 4 is for the estimation of the parameters of seven genes network model.In each case, the parameters must be estimate dare shown in Table 1.

Case Study 1
First, in order to compare the convergence speed of two RCGA methods (UNDX + MGG, AREX + JGG), we performed the estimation of parameters in case study 1.In this experiment, we used the recommended default parameter [11,17] of RCGAs as follows: in UNDX + MGG, population size n pop was set to 300, the number of offspring n c was set to 100, the number of parents n p was set to 2, and in AREX + JGG, n pop was set to 200, n c was set to 80, n p was set to 21.We regarded the optimization terminated successfully when the total relation error E in Equation ( 2) is less than 0.1.We performed 50 trials, in each RCGA, and evaluated the performances with the average of 50 trials.Figure 7 shows the average of 50 trials in E of each generation.As shown in Figure 7, it is obvious that the convergence speed of AREX + JGG is faster than UNDX + MGG.In all trials, both methods can be estimated the correct values of parameters shown in Figure 5. Comparing the CPU time (CPU: XeonE5540 2.53 GHz, Memory: 8.0 GB), AREX + JGG was approximately 20-times faster than UNDX + MGG; the CPU time for AREX + JGG was 38.6 sec and that for UNDX + MGG was 765.0 sec.This experiment shows that AREX + JGG is remarkably better than UNDX + MGG in the convergence speed.when I ≠ j) g ij = 0.0 (when i = j), h ij = 1.0 (when i = j), h ij = 0.0 (when i ≠ j), α i = 1.0, β i = 1.0 Case study 2 54 h ij = 0.0 (when i ≠ j) Six-gene network model Case study 3 84

None
Seven-gene network model Case study 4 112 None maximum limit of generation (100,000).On the other hand, AREX + JGG succeeded the estimation of all parameters except when n pop was set to 20l.Especially, AREX + JGG showed the best performance when n pop was set to 30l.

Discussion
In this study, we proposed new inferred engine for estimating the gene regulatory network using time-courses of gene expression data using real-coded genetic algorithm (RCGA) with combination of the adaptive realcoded ensemble crossover (AREX) and the just generation gap (JGG).The combination of unimodal normal distribution crossover (UNDX) with the minimal generation gap MGG has disadvantage in convergence and can estimate only a few parameters.The AREX works well in ill-conditioned and variable dependent problems, because the AREX has the invariance against scale transformations and rotation transformations [11].The JGG is suited for multi-parental crossovers.The JGG is a modification of the MGG in the points that replaces parents with offspring completely every generation and does not employ elitist selection.Akimoto et al. reported the AREX + JGG outperforms the existing RCGAs in a number of function evaluations on various functions including functions whose landscape forms ridge structure or multi-peak structure.In this study, AREX + JGG showed remarkably better performance (convergence speed) than UNDX + MGG in all experiments.Table 2 shows that AREX + JGG could solve the problems in which UNDX + MGG could not do that, and suggests that the adequate default parameter of the population size for AREX + JGG is 30l, where l is the total number of parameters.By employing AREX + JGG, we succeeded in improving the performances in the point of the convergence speed and the scalability in the number of estimated parameters; compared with UNDX + MGG, the

Case Study 2, 3, 4
Next, in order to evaluate the scalability of optimization, the number of parameters to be estimated increased in case studies 2 to 4. The default parameter of n c and n p were set to the recommended values [11] for AREX + JGG.In case studies 2 to 3, in order to find an adequate the default parameter of n pop for AREX + JGG, n pop was scanned between 20l and 50l (l is the number of estimated parameters).In all runs, the maximum number of generation was set to 100,000.We regarded the optimization terminated successfully when the total relation error E in Equation ( 2) is less than 0.5.convergence speed and the scalability made rapid progress to approximately 5-foldand 20-fold, respectively.As described already, in S-system formalism, the number of parameters to be estimated is proportion to the second power of the number of system components (n).In this study, AREX + JGG could estimate maximum 112 parameters, which corresponds to n = 7. Suppose we apply this method to the inference of regulatory network of all amino acids, we have to estimated 840 parameters because the total number of amino acids is 20 (n = 20).For the larger network estimation, we should further improve the proposed method.Substantially the GA can find the solution within a comparatively large searching range, but it is stepwise convergence, because the GA does not have an efficient local searching function.One of the methods having a local searching function is the modified Powell method.The modified Powell method is well known to have an ultimate fast convergence among the various direct-search methods without calculating the derivative of the objective function, especially where the objective function is well approximated by a quadratic form of parameters to be estimated.The modified Powell method can find the solution very quickly only when the initial guess is very near the solution; this method has no way to escape from a local minimum.The hybrid method, the RCGA and the modified Powell method, is expected to offer all advantages of both optimization techniques.We are now on going the development of the hybrid method by incorporation the modified Powell method into AREX + JGG, which is expected to become a more powerful optimization method for the larger network estimation.

 2 
. There is arbitrariness in choice of the probability density function (p.d.f.) of .For example, if obeys normal distribution generated by Equation (4) obeys

Table 2
shows the result of convergence speed in UNDX + MGG and AREX + JGG.As shown inTable 2, UNDX + MGG could not estimate 54 and 84 parameters within the