The Principal Component Transform of Parametrized Functions

Many advanced mathematical models of biochemical, biophysical and other processes in systems biology can be described by parametrized systems of nonlinear differential equations. Due to complexity of the models, a problem of their simplification has become of great importance. In particular, rather challengeable methods of estimation of parameters in these models may require such simplifications. The paper offers a practical way of constructing approximations of nonlinearly parametrized functions by linearly parametrized ones. As the idea of such approximations goes back to Principal Component Analysis, we call the corresponding transformation Principal Component Transform. We show that this transform possesses the best individual fit property, in the sense that the corresponding approximations preserve most information (in some sense) about the original function. It is also demonstrated how one can estimate the error between the given function and its approximations. In addition, we apply the theory of tensor products of compact operators in Hilbert spaces to justify our method for the case of the products of parametrized functions. Finally, we provide several examples, which are of relevance for systems biology.


Introduction
This study is closely related to applications in the so-called "metamodeling" of differential equations, where a "proper" model of an e.g.complex biological process is replaced by its approximation which contains "most information" about the model, but which is simpler.In particular, the true parameters of the model are replaced by "the latent parameters", which makes the model linear with respect to the latter and hence enables the usage of the (if necessary, partial) least-squares regression.This explains why this idea proved to be efficient in parameter estimation (see e.g.[1]).This also justifies the high numerical efficiency of metamodeling, which has been widely used in statistics [2], chemometrics [3], biochemstry [1], genetics [4] [5] [6], infrared spectroscopy [7] to simplify theoretical and computational analysis of the "true" models.x is constructed to yield the minimum distance (in some sense) between x and all possible approximations of x of the form ( ) ( ) . The distance is chosen to ensure an efficient way to estimate the deviation of k x from x .
Geometrically, the parametrized function x may be regarded as a curve ( ) in a separable Hilbert space.Then ( ) can be interpreted as a projection of this curve onto an k -dimensional subspace, which is chosen in such a way that the image k x gives a best possible individual fit to x among all k -dimensional subspaces.As we will see in Subsection 3.1, this necessarily leads to nonlinearity of the mapping PCT.
As we will see in Subsection 3.3, discretizing the function ( ) x u ω and its PCT yields matrices and the projections onto their first k principal components, respectively.This explains our terminology: PCT can be regarded as a functional analog of the principal component analysis (PCA) of matrices.This terminology was suggested by Prof. E. Voit in a private talk with the second author during his seminar lecture in Oslo in 2014.
All the papers cited above concentrate on efficiency of the metamodeling approach and disregard mathematical properties of PCT and their justification, which is, for instance, quite important for understanding the limitations of the method and describing the exact conditions under which the method is applicable.In particular, the convergence properties of the sequence of metamodels to the original model has not been studied in the available literature.In our paper we try to fill this gap suggesting a rigorous mathematical approach to PCT and analysis of its basic properties.More precisely, we demonstrate how the theory of compact operators in separable Hilbert spaces can be used to provide such an analysis.
The paper is organized as follows.In Section 2 we introduce the distance in the space of parametrized functions, formulate the theorem on the best individual fit in terms of PCT of functions (Subsection 2.1) and provide some examples relevant for systems biology (Subsection 2.2).In Section 3 we study mathematical properties of PCT: nonlinearity (Subsection 3.1), continuity (Subsection 3.2) and show relations of PCT and PCA via discretization of functions (Subsections 3.3 and 3.4).In Section 4 we study PCT of products of parametrized functions which are interpreted as elements of the tensor product of two or several Hilbert spaces (Subsection 4.1).We aslo show that PCT pre-serves the tensor products and therefore the product of parametrized functions (Subsection 4.2) and give some examples (Subsection 4.3).In Appendix 5 we offer short proofs of some auxiliary results used in the paper: Allahverdiev's theorem (Subsection 5.1) and some propositions related to tensor products of linear compact operators in Hilbert spaces (Subsection 5.2).

The Best Individual Fit Theorem
In this section we define the distance in the space of parametrized functions and describe how best individual fits ( )( )

PCT ,
x k k ∈  to a given function x can be obtained using the theory of compact operators in Hilbert spaces.We also prove nonlinearity and continuity of PCT and give some specific examples.

The Distance in the Space of Parametrized Functions
Let U be a compact subset of N  and Ω be a compact subset of .Suppose we are given a measurable, square integrable function : The aim is to find a best possible approximation of x in the class k  of all functions of the form , where ( ) ( ) To explain better the nature of topology we use in this case let us have a look at finite dimensional Hilbert, i.e.Euclidean, spaces.Let In this case, the best approximation k X to X in the class of m n × -matrices of rank not greater than k is given by the first k terms in the singular value decomposition of X : where i i t Xp = and i p are the normalized eigenvectors of the matrix * X X and * A is the conjugate (transpose) of a matrix A .In other words, min , where rank The matrix norm is defined as , where α is the Euclidean norm in n  .Now we will look at arbitrary real separable Hilbert spaces which are denoted by H and K and which are equipped with the scalar products ( ) We want to find an operator ( ) The construction of k X is very close to the singular value decomposition of matrices.

Assume that
are self-adjoint and positive-definite.

Let
( )   be all positive eigen- values of the operator * X X , the associated normalized eigenvectors being , respectively: It is well-known that i p can always be chosen to be orthogonal: + ∑ and, moreover, + ∑ Now, the operator X can be represented as where i i t Xp = and the convergence is understood in the sense of the norm in the space K .The truncated versions ( ) The following result, a short proof of which is offered in Appenix 5.1, is known as Allahverdiev's theorem, see e.g.[8, Chapter II, p. 28]: Theorem 1.For any linear compact operator The functions in numerical calculations are usually replaced by their discretizations, which in the case of parametrized functions gives matrices.That is why, the distance in the space of the parametrized functions ( ) x u ω should be con- sistent with the distance in the space of matrices, so that we can get all the advantages of the finite dimensional singular value decomposition as well as Allahverdiev's theorem.To define the distance in the space of matrices we have to interpret matrices as linear operators between two Euclidean spaces.Analogously, we have to interpret parametrized functions as operators between suitable Hilbert spaces, and define the distance accordingly.
Let us therefore go back to the spaces Under the assumptions of the square integrability of the kernel ( ) x u ω the operator X becomes compact and linear from the space ( ) 2 L U to the space ( )
The distance between two square integrable parametrized functions x and x′ can be now defined in the following way: where X is defined in (9) and ( )( ) ( ) ( ) The norm of the linear operators acting from ( ) L Ω is defined in the standard way.
Remark 1. Evidently, for some constant C .Therefore, 2 L -convergence of the sequence plies the convergence in the sense of the distance dist. Let be the adjoint of X , so that Now, the self-adjoint and positive-definite integral operators : and : can be written as follows: and respectively.Let, as before, ( ) be all positive eigenvalues of the integral operator (14) associated with its normalized and mutually orthogonal eigenfunctions ( ) From Theorem 1 we immediately obtain the Best Individual Fit Theorem.
Theorem 2. For a given function : x U × Ω →  satisfying (1) the best approximation of x in the class k  of all functions of the form ( ) ( ) ( ) where i p are the normalized, mutually orthogonal eigenfunctions of the oper- ator ( 14) and ( ) ( )( ) ( ) ( ) In other words, ( ) ( ) Remark 2. The functions i t have the following properties (which we do not use in this paper): Definition 1.
• The kth Principal Component Transform (PCT) of the function • The Full Principal Component Transform of the function We will also write We remark that none of these transforms is uniquely defined: even if all i σ are all different, we have always a choice between two normalized eigenfunctions i p .However, the distance between x and any k x is independent of the pro- jection we use.On the other hand, this means that the properties of PCT should be formulated with a care.

Examples of PCT
In this subsection we consider three examples which are of importance in systems biology.
Example 1.Let ( ) Then, using Formulas ( 14) and (15), we obtain the following representations of the kernels γ and δ ( ) ( ) ( ) Therefore the normalized eigenfunctions ( ) i p u can be obtained from the equation ( ) ( ) ( ) The functions ( ) can be alternatively found from the equations ( ) ( ) The parametrized power function x ω is of crucial importance in the bioche- mical system theory, where u represents the concentration of a metabolite, while ω stands for the kinetic order.In the case of several metabolites, one gets products of such power functions, which, in turn, are included into the righthand side of the so-called "synergetic system", see (e.g.[10], Chapter 2, p. 51) and the references therein.The products of parametrized power functions are considered in Section 4.
Example 2. Consider the function ( ) Then, using Formulas ( 14) and ( 15), we obtain the following representations of the kernels γ and We denote for simplicity Therefore the normalized eigenfunctions The functions ( ) can be also obtained from the equations The function is often used in the neural field models, where it serves as the simplest example of the so-called "connectivity functions" describing the interactions between neurons, see e.g.[11] and the references therein.
Example 3. Consider the Hill function ( ) and ( ) The Hill function plays central role in the theory of gene regulatory networks, where it stands for the gene activation function, x being the gene concentration and θ being the activation threshold, see e.g. [12]and the references therein.

Some Properties of PCT
The Principal Component Transform ( ) PCT , x k is not uniquely defined.That is why, we will use a special notation when comparing PCT of different functions, namely, we will write if there exist coinciding versions of PCT of x and y .
, see (21).By definition, i p are normalized, mutually orthogonal eigenfunctions of the ope- rator * X X and so that i p are the same for c X and X .On the other hand, ( ) ( ) ( ) 2. Before constructing an example illustrating nonlinearity of PCT we remark that this statement, in its more precise formulation, says that there are no versions of We put To calculate PCT we observe that both operators have a 2-dimensional image in ( ) operators with an 1-dimensional image.However, their sum , r r is given by the non-singular matrix 3.5 1.5 1.5 1.5 cannot coincide with any version of ( ) PCT ,1 X .

PCT Is Continuous
Let us consider a sequence of parametrized, square integrable functions ( ) : In this case The above theorem can be reformulated in terms of robustness of PCT.
Corollary 1.Let k ∈  and : x U × Ω →  be a parametrized, square integrable function and k ∈  .Then given an 0 ε > there is a 0 ε > such that for every parametrized, square integrable function for some suitable versions of PCT.

Discretization of Functions
In the papers [5] [6], which are aimed at applying the metamodeling approach to gene regulatory networks, the approximations of the parametrized sigmoidal functions are performed numerically by using discretization and SVD of the resulting matrices.The continuity of PCT, proved in the previous subsection, can now be used to justify this analysis and, in particular, the results on the number of the principal components k ensuring the prescribed precision.
In this subsection we suppose that all functions are continuous, which is sufficient for most applications.The general case is, however, unproblematic as well if we slightly adjust the approximation procedure.
Let x be a continuous function on a compact set , , We define the sequence of the functions ( ) x s as follows: where ( ) Proof.The function x is continuous on the compact set D , therefore ( ) x s is uniformly continuous on D .Then for all 0 ε > there is On the other hand, there is a number N for which ( ) < as long as n N > .Let s be an arbitrary point from D .Then for any n there is . Taking now an arbitrary n N > we obtain x k = there are versions Finally, we observe that if x s     .In the next subsection we provide an example of such approximation stemming from the biochemical systems theory.

Examples of Discrete Approximations
In this subsection we study the parametrized power function ( ) , , , , 0 To approximate this function we construct a matrix X  as follows: we divide [ ] Every entry of the matrix X  will be given by the values ( ) The corresponding discretization of ( ) PCT , x k will be then given by the matrix * 1 , , The vectors i p  and i t  can be obtained from the singular value decompo- where the rows of the scores matrix T US = consists of the numbers i t  and the columns of the loadings matrix P are the vectors i p  .As an example, let us consider the case 4 Then Assume now that 0.5 ω = .This value corresponds to row s in the matrix T .We find a number s as follows: ( ) ( ) This yields  The Figure 2 depicts the cumulative normal distribution function

PCT of Products of Functions
To calculate PCT of products of parametrized functions we need to apply the theory of tensor products of Hilbert spaces and compacts operators.Appendix 5.2 includes all the necessary details we need in this section.

Products of Parametrized Functions
Theorem 5.In the above notation: We prove now that the set E e e i j ≡ ∈  is an orthonormal basis in the space ( ) Its orthonormality follows directly from its definition.It remains therefore to check that the set of all linear combinations of the elements from E is dense in H . Indeed, the set of continuous functions, and hence the set P of polynomials ( ) P u , on U is dense in H . On the other hand, the set P of polynomials of the form ( ) ( ) ( ) ( ) ( ) ( ) P u P u spans the set P and, finally, the set E spans the set P .Thus, E spans H and we have proved that any h H ∈ can be represented as the 2 L -convergent series H H H = ⊗ .The equality Let us now prove the last formula of the theorem.First of all, we remark that the Definition (63) implies By the assumptions on the kernels, the operators in this equality are linear and bounded.Therefore, it is sufficient to check the equality for ( ) ( ) This proves that ij p are normalized, mutually orthogonal eigenvectors of the operator * X X corresponding to the eigenvalues ( ) ( ) On the other hand, ( ) which proves the theorem. Remark 3. Theorem 6 is only valid for the full PCT.The truncated versions of PCT are not necessarily valid, as the order of the singular values ( ) ( ) depends on the magnitude of the eigenvales ( )

Examples of Products of Parametrized Functions
In this subsection we describe the kernels of the integral operators related to products of parametrized functions from Subsection 0. These examples are of importance in systems biology.
Example 1.Consider the following function Then, using Formulas ( 14) and (15), we obtain the following representations of the kernels γ and δ ( x u u e e u u U Then, using Formulas ( 14) and (15), we obtain the following representations of the kernels γ and δ ( ) ) ( ) ( ) ( ) , , , ( ) , , , q q q q q q u u x u u u u ( ) , , 1, 2.
Then, using Formulas ( 14) and (15), we obtain the following representations of the kernels γ and δ ( ) , , , d d , ( ) , , , d d q q q q q q q q q q q q U u u u u u u u u u u Remark 4. The eigenfunctions of the integral operators with the kernels that are products of parametrized functions are, according to Subsection 5.2, also products of the respective eigenfunctions of the factors.

Conclusions
The main results of the paper can be summarized as follows.We defined the distance in the space of parameterized functions.We defined the k -th Principal Component Transform (PCT) and the Full Principal Component Transform of functions ( ) x L U ∈ × Ω .The kth PCT is the best approximation of the given function, i.e. it minimizes ( ) dist , ⋅ ⋅ .We proved that if the sequence of functions ( ) ( ) x s converge to the continuous function ( ) x s , then the sequence of the PCT of ( ) ( ) x s will converge to the PCT of ( ) x s .Some properties of PCT were considered.These results can also serve as theoretical background for the design of some metamodels.Using the theory of the tensor product of Hilbert spaces and compact operators we calculated the PCT of products of functions.We provided several examples of the discrete approximations and products of the parametrized functions.
We will emphasize that our study is related to systems biology.In future works we aim to investigate the problem of "sloppiness" in nonlinear models [1] and create an effective parameter estimation method for the "S-systems" ( [10 It is well-known that i p can always be chosen to be orthogonal: , .
, , where i i t Xp = , and the convergence is understood in the sense of the norm in the space K .We define the linear bounded operators ( ) , The following result is known as Allahverdiev's theorem, see e.g.[8]: Proposition 7.For any linear compact operator Proof.First of all, we prove that From ( 79) and (78) we get ( ) We calculate the norm of k X X − using (81), (82): and As Hence, Secondly, we prove that Let 1 , , k y y  be a basis in ImY .Then there exist some We want to prove that span , , span , , 0 , , such that the system , , 0, 1 hasnon trivialsolutions., We define the tensor product ( ) ( ) Therefore X is bounded, and in particular, To prove compactness we choose an arbitrary 0 ε > and linear bounded fi- Therefore, the operator Proof.The set of linear combinations of parameters and k ∈  be a given number.The kth Principal Component Transform (PCT) is a specially constructed parametrized function
be an m n × -matrix, for instance, a discretized function ( ) , x u ω where ( )

2 L⋅
Ω , where U , as before, is a compact subset of N  and Ω is a compact subset of .M We denote the norm in both spaces as 2 .L Consider the integral operator

,
a b and ( ) * , a b are row and column vectors, respectively.Matrices A and B are symmetric.Then * 2 , respectively.There- fore the best rank 1 approximations of A and B are

2 -
dimensional image, as its representation in the basis { } 1 2

Figure 1 .
Figure 1.(a) The power function and its PCT; (b) The Hill function and its PCT.

Figure 2 .
Figure 2. (a) The cumulative normal distribution function and its PCT; (b) The normal distribution function and its PCT.

Example 3 .
For the Hill function we obtain

2 .
Tensor product of operators in Hilbert spaces Let 1 2, H H and 1 2 , K K be real separable Hilbert spaces, where use the definition of the tensor product from Appendix 5.2. ], i σ > be all positive eigenvalues of the operator * X X , the associated normalized eigenvectors being