Convergence Properties of Piecewise Power Approximations

We address the problem of convergence of approximations obtained from two versions of the piecewise power-law representations arisen in Systems Biology. The most important cases of meansquare and uniform convergence are studied in detail. Advantages and drawbacks of the representations as well as properties of both kinds of convergence are discussed. Numerical approximation algorithms related to piecewise power-law representations are described in Appendix.


Introduction
For a given function ( ) , defined in a domain n Ω ∈  , let us calculate its partial derivatives in the logarithmic space: ( ) ln , ln where ( )  In this paper we study piecewise constant ap- proximations of the quantities (1) or, in other words, nonlinear approximations of the function v by piecewise power functions.
This study is first of all motivated by applications in Systems Biology, where many networks can be described via compartment models , with the influx and efflux functions 0 i V + ≥ and 0 i V − ≥ , respectively.
For instance, in a typical metabolic network used in Biochemical Systems Theory the index ( ) 1, , i i n =  refers to the n internal metabolites 0 i x ≥ .The influx ( ) 1 , , 0 function accounts for the rate (velocity) of a production (synthesis), resp.degradation of the metabolite i x .Another important example is gene regulatory networks which in many cases can be described as a system of nonlinear ordinary differential equations of the form ( ) ( ) where ( ) x t is the gene concentration (i = 1, •••, n) at time t, while the regulatory functions i F and i G depend on the response functions ( ) , which control the activity of gene k and which are assumed to be sigmoid-type functions [1].
The derivatives ( ) j f P in the logarithmic space are very important local characteristics of biological networks.In Biochemical Systems Theory these derivatives are known as the kinetic orders of the function v, while in Metabolic Control Analysis (see e.g.[2]) they are called elasticities.From the mathematical point of view, these quantities measure the local response of the function v to changes in the dependent variable (for instance, the local response of enzyme or other chemical reaction to changes in its environment).Thus, they describe local sensitivity of the function v, the terminology which is widespread in e.g.engineering sciences.
If all influx and efflux functions in (2) have constant kinetic orders, one obtains the so-called "synergetic system", or briefly "S-system": where the exponents ij g , ij h represent all the (constant) kinetic orders associated with (4).The right-hand side of an S-system, thus, contains power functions, and analysis based on S-systems is, therefore, called "Power-Law (PL) Formalism", see e.g.[3]- [7]).
The Power-Law Formalism has been successfully applied to a wide number of problems, for example, to metabolic systems [8], gene circuits [9], signalling networks [10].Such systems are very advantageous in biological applications, as the systems' format considerably simplifies mathematical and numerical analysis such as steady state analysis, sensitivity, stability analysis, etc.For instance, calculation of steady states for the Ssystems is a linear problem (see [7]).By these and other biological and mathematical reasons, it was suggested in [11] to classify such systems as "a canonical nonlinear form" in systems biology.
In many models, however, the kinetic orders may vary considerably.A typical example is a model coming from Generalized Mass Action ( ) where the power functions describe the rates of the process no.r, while ir µ is a stoichiometric factor that stands for the number of molecules of i x produced, i.e. 1, 2, ing the processes in (5) in a net process of synthesis i V + (positive terms) and a net process of degradation i V − (negative terms) results in an aggregated system (2), which is not an S-system.
Another example of generic systems with non-constant kinetic orders stems from Saturable and Cooperativity Formalism [12] reflecting two essential features of biological systems, which gave the name to this formalism (see [13] for more details).In this case, the system (5) becomes

(
) ( ) where 1, , i n =  and j n , j m , j K and j L are real numbers.
Another version of Saturable and Cooperativity Formalism, which is mentioned in [12], is defined as follows: where 1, , i n =  and j n , j m , i b , i c , α and β are real numbers.
In the case of gene regulatory networks (3) the sensitivities (1) are non-constant as well, even if one considers the functions i F and i G to be multilinear in k z .In addition, the usage of non-multilinear functions are also known in this theory [14].
Taking into account the importance of kinetic orders/elasticities/sensitivities (1) in Systems Biology, one one hand, and convenience of the well-developed analysis of S-systems (stability theory [7], parameter estimation routines [15], software packages) on the other, a new kind of generic representations of compartment systems (2) was suggested in [16] (see also [17] for further applications of this representation).According to this idea, the entire operating domain is divided to partition subsets where all kinetic orders can be viewed as constants.In other words, the system (2) is approximated by a set of S -systems, each being only active in its own partition subset.This way of representing (2) is called "Piecewise Power-Law Formalism" [18].
From the biological point of view, piecewise power-law representations are useful in many respects, when compared to other ways of approximations, as they take into account biologically relevant characteristics (kinetic orders) rather than the standard partial derivatives.Therefore, piecewise S-systems preserve important biological structures and, at the same time, do not destroy a relatively simple mathematical structure of plain S-systems.By this reason, approximations of a general target function by piecewise power approximations may be of a great importance for biological and other modelling.A rigorous mathematical justification of the idea of piecewise power-law approximations is the main purpose of the present paper.More precisely, we consider mean-square and uniform convergence of approximations by piecewise power functions to the target function provided that the associated partitions of the operating domain Ω satisfy some additional assumptions.One of the challenges is that partitions of the operating domain Ω may not be chosen freely in applications.For instance, the partitions may directly stem from biological properties of the model [17].Other ways of constructing partitions can be dictated by optimality-oriented algorithms.In Appendix (see also [18]) we describe such a method which goes back to the paper [19] and which is based on an automatical procedure, allowing to obtain simultaneously the best possible polyhedral partition and the respective best possible piecewise linear approximation in the logarithmic space.
The main results of the paper are presented in Section 3 (mean-square convergence of piecewise power approximations) and in Section 4 (uniform convergence of piecewise power approximations).Several auxiliary results are proved in Appendices A.1-A.3, while Appendix A.4 presents an approximation algorithm which provides an automated partition and the respective best possible approximation in the logarithmic space for a given number of subdomains.Finally, in Appendix A.5 we explane by example why a direct piecewise powerlaw fitting is ill-posed.

Preliminaries
Throughout the paper we use the following notations (see Table 1


which we call Cartesian.We assume Ω to be closed and bounded (i.e.compact) subset of n  .Let ∆ be its image in the logarithmic space n  and { } 1 for any natural N.In some results and algorithms ∆ will be a polyhedron domain in the logarithmic space, and N i ∆ will be a polyhedral partition.

∆
under the inverse logarithmic transformation.
Table 1.Overview of the basic terminology and notation used in the paper (LS-least-squares).

Cartesian space Logarithmic space
Space We also put ( ) ( ) be a least-squares (LS) power-law fitting to the function v on x ∈ Ω we consider the piecewise power function be a LS linear approximation to the function ψ on N N V x y = Ψ We remind that the parameters N c and N ij g of the linear functions Ψ are uniquely obtained from the following minimization criterion in the logarithmic space: Alternatively, one can define approximations of the target function v by power functions minimizing the distance in the space Ω : .
Our last minimization criterion looks similar to (6), but is, in fact, very different as the minimum here is taken over all polyhedral partitions { } 1

∆
of the polyhedral domain ∆ , and all corresponding linear functions ).The main advantage of the criteria ( 8) and ( 10) is their linearity that provides the uniqueness of the solution and also makes the process of finding the solution computationally cheap, as it is based on explicit matrix formulas.On the other hand the use of the logarithmic transformation requires caution.The influences of the data values will change, as will the error structure of the model.Yet, the criterion (8) only requires a standard linear regression, while the criterion (10) requires a special regression algorithm, still linear, but much more involved (see Appendix A.4 for details).
The criterion (9) gives best possible approximation in terms of the LS error in the Cartesian space.However, a nonlinear regression algorithm should be used in this case, which is less advantageous, especially when the number of the estimated parameters is big.In addition, the nonlinear regression may have other drawbacks, one of which is ill-posedness (see Appendix A.5).

Mean-Square Convergence of Piecewise Power Approximations
The results of this section provide the mean-square convergence (L 2 -convergence) of piecewise approximations by power functions.The involved parameters may be e.g.obtained according to one of the minimization criteria (8) or (9).
The main technical challenges stemming from the nature of these minimization algorithms can be summarized as follows: 1) the L 2 -convergence of the approximations in the logarithmic space may not imply the L 2 -convergence of their images in the Cartesian space (and vice versa); 2) it is not evident that automatic dissections of the operating domain, as e.g. in the algorithms based on the minimization criterion (10), make the diameters of the partition subsets go to zero even if the number of partition subsets tends to ∞.
Three propositions below deal with L 2 -convergence in the logarithmic domain.Proposition 1.Let the target function 0 v > be measurable and bounded on Ω and log v ψ = . Suppose that the measurable partitions { } 1 To prove this proposition we need the following lemma, the proof of which can be found in Appendix A.1: Lemma 1.Let v be measurable and ( ) and the measurable partitions { } 1 Proof of Proposition 1.We use the sequences from the lemma 1, which both converge in the L 2 -sense in the respective domains.Since ( ) is the LS piecewise linear approximation in ∆ , we have as N → ∞ . In the next proposition we do not assume that 0 Proposition 2. Let ∆ be a polyhedral domain in n  , the function ψ be square integrable in ∆ and { } 1

∆
be the optimal polyhedral partition of ∆ obtained by the algorithm described in Appendix A. 4. Then for the corresponding LS approximations Proof.Evidently, for the L 2 -function ψ there exists a sequence of polyhedral partitions { } 1 For the optimal polyhedral approximation ( ) as N → ∞ . In particular, the assumption on ψ is fulfilled if the target function v is measurable and bounded on Ω .The case of the L 2 -convergence of the approximations N V , given as ( ) We introduce the following notation.Given a partition subset where the point , be the symmetric n n × -matrix with the entries defined as , , mes Below we fix a matrix norm . .All matrix norms are equivalent.One of the norms is Euclidean, which is defined via the maximal eigenvalues: ( ) In the case of symmetric, positive definite matrices (like N i A above) we can write that ( ) We say that the sequence of partitions { } 1 If the chosen norm is Euclidean, then the latter estimate can be rewritten as 0 diam , where N i λ is the least (positive) eigenvalue of the matrix Informally speaking, this property means that the partition subsets cannot be too different from each other in the shape.Assume, for instance, that the partition sets are enclosed in rectangular boxes.The result below says that if the ratio of the longest and the shortest edges of the boxes is bounded above, i.e. boxes are not "too thin", then the sequence of such boxes satisfies the property ( ∆ )., where N a (resp.N b ) is the length of the smallest (resp.biggest) edge of the box N P .
We fix N and the Nth rectangular box , , , .

1) If the measurable partitions
and associated LS piecewise linear approximations N Ψ satisfy the criterion (10) for each Proof.To prove the first part of the theorem, we apply Lemma 1 and obtain x is the LS piecewise power approximation in Ω .In the second part of the theorem, we use either Proposition 1 or Proposition 2, which yields the L 2convergence of the LS approximations N Ψ to the function log v ψ = . Applying Lemma 2 we obtain the uniform boundedness of the approximations: for some M and any 1, 2, N =  .Then we have

Uniform Convergence of Approximations
In the previous section we studied convergence of LS approximations in the L 2 -norm.In many applications, however, it is desirable to consider their uniform convergence.This may be, for instance, of interest if we include the obtained approximations into the models based on differential equations, as it is well-known that convergence of (approximations of) solutions is only guaranteed by the uniform convergence of (approximations of) the right-hand sides.
The main result of this section is formulated in terms of kinetic orders ( ) and its piecewise power approximations N w .Theorem 5. Let the target function 0 v > be a C 1 -function (i.e.differentiable with the continuous partial derivatives).Let the sequence of partitions { } 1 of Ω have the following two properties: 1) The closure of each N i Ω coincides with the closure of its interior .
Assume, in addition, that for any 1, , Then N w v → uniformly on Ω as .N → ∞ Proof.We fix N and consider the corresponding partition By assumption, for On the other hand, the mean value theorem yields The uniform continuity of the continuous vector function ( ) y ψ ∇ on ∆ and the property that ( ) imply that, given an 0 is fulfilled for sufficiently large N. Since (15) holds for any we also obtain that for sufficiently large N ( ) ( ) As the uniform convergence of the sequence { } N Φ implies its uniform boundedness, there is M such that ( ) ( ) . This gives the uniform convergence of N w to v as .N → ∞  Our last result shows that the LS approximations converge uniformly in the scalar case.This is due to the fact that in the scalar case the equalities ( 14) are always fulfilled.

Corollary 1. Let the target function v be continuous on [ ]
Then for the corresponding LS power approximations N V and N v we have N The proof of the theorem follows directly from the previous theorem and the following lemma, the proof of which is given in Appendix A.

Discussion and Conclusions
Piecewise power-law representations may be very useful as practical approximations to target functions which are defined analytically or numerically.However, a strict mathematical justification of these approximations is not always paid attention to.Unfortunately, such an analysis is not always straightforward, especially if one puts additional a priori assumptions on the approximations, which is quite common in many applications.
We showed in the present paper that under additional assumptions power approximations do converge to the target function.We studied least-squares and uniform convergence, both of which are widely used (explicitly or implicitly) in applications.
Our analysis dealt with two types of regression: linear regression in the logarithmic space and power-law regression in the Cartesian space.The first procedure has all the advantages of the linear regression, but the transformation back to the Cartesian space distorts the error structure of the problem; the least squares error for the resulting piecewise power-law fitting is in general less accurate than the corresponding error for a power-law regression of the original data.As a partial remedy, it may be advantageous to apply power-law regression to the original data over each of the partition subsets back in the Cartesian space.Yet, being nonlinear regression this procedure is essentially ill-posed.Thus, both kinds of regression have their strong and weak sides, so that the choice between them must be undertaken by modeling consideration.
In many cases, it may also be advantageous to use the classical linear regression in combination with optimal partitions of the operating domain.In the logarithmic space this procedure is again linear and can be automatized, but this may also cause several technical problems when proving the convergence of the corresponding approximations.
In the present paper, we offered a partial mathematical justification of the analysis based on piecewise power approximations, stemming from both kinds of regression, by verifying their convergence in the mean-square (L 2 ) and uniform sense.Uniform convergence is e.g.important if target functions are included in differential equations, as it is the uniform, and not L 2 -convergence, which is inherited by the solutions of the equations.However, a comprehensive analysis of convergence of solutions of differential equations, approximated by piecewise S-systems, is beyond the scope of this paper and will be discussed in a separate publication.

L ∆
consisting of all linear functions and equipped with the scalar product ( ) ( ) One basis is given by the set (11).However, this set is not necessarily orthogonal.First of all, we choose 0 1 e =  and observe that its norm is equal to 1. Using the description (11) of the basis functions In the proof below we often omit one of the variables in ( ) N i e l y , that is either l, or y, depending on a particular interpretation of this basis.Writing

( )
N i e y means that we regard it as a vector for each particular y, i.e.Omitting y ( ( ) N i e l ) means that we treat ( ) e l y as a function of y for a given l, i.e. as an element of the space ( ) , we require the following constraints on the coefficients: The final step in the proof of the lemma uses the explicit representation of the LS approximation This implies also the uniform boundedness of the approximations ( ) N i V x on Ω .The proof of the lemma is complete.

A.3. Proof of Lemma 3, Section
Let us first prove the of 0 y .Assume the converse, i.e. that ( ) ( ) where i ∆ are all polyhedral sets defined by (17) and defining a partition of the logarithmic domain ∆ .Scalar weights 0 1 , , , , and the numbers

∆
such that in the domain Ω for all j, then v clearly is a power function in Ω of the form 1 is more involved.The reason for that is that the L 2 -convergence of the sequence { }

Proposition 3 .
A sequence of rectangular boxes { } N P satisfies the property ( ∆ ) if and only if The latter estimate is due to the uniform Lipschitz continuity of the function ( ) exp u on the interval [ ] This estimate proves the L 2 -convergence of the LS approximations N V to the target function v. 

3 . 3 .
Lemma Let a linear function [ ] : , l a b →  be the LS approximation of a 1 defined via the center of mass we directly deduce from (12) that 0 1 e =  is orthogonal to any linear combination of the other basis functions.The challenge is therefore to estimate the norms of linear combinations in further considerations).

∆λ
is the Euclidean norm in n  and a b ⋅ is the scalar product of two vectors) with the constraint Diagonalization of the symmetric, positive definite matrix N i A with the help of an orthogonal matrix Q gives the matrix containing the eigenvalues 0 is evidently an upper estimate for the functions (11) on the partition subset N i ∆ .The maximum value of the expression ( ) is the minimal eigenvalue of the matrix N i A .Due to the condition ( ∆ ) we get that the constant 0 c does not depend on i and N.
, contradicts the definition of the least squares approximations.The case ( ) ( ) 0 prove that in this case the graph of the scalar linear function ( ) l y intersects the graph of ( )y θ in at least two points from the interval [ ] , a b .From the first part of the proof we know that at least one intersection point does exist.Assume that there is exactly one point

1 l(
these sets may be empty).Consider a new linear approximation given by ( ) ( ) () chosen in such a way that the graphs of the functions ( ) namely, d by construction).It is easy to see that such a δ does exist.Indeed, in a vicinity U of the point d we have that y in U. Outside U, i.e. inside the compact set [ ] , \ a b U the continuous function Θ is non-zero, so that the graphs of the functions θ and 1 l meet only in d.We complete now our analysis of the scalar case observing that for such δ rise to a partition of the original domain Ω .Applying the inverse logarithmic transformation, we obtain function ψ and the corresponding partition { } i ∆ .Below the weights are collected in a vector i The aim of the piecewise linear regression: given a function ( ) of partition subsets N and a natural number c find a piecewise linear function Ψ and the polyhedral partition { } 1 N i i=

). Let
n + also dominates the asymptotics of the diameter.Therefore the condition ( ∆ ) is fulfilled for the given sequence of rectangular boxes if and only if the sequence { } N nb , which