Discrete Differential Geometry and the Structural Study of Protein Complexes

This paper proposes a novel four-dimensional approach to the structural study of protein complexes. In the approach, the surface of a protein molecule is to be described using the intersection of a pair of four-dimensional triangular cones (with multiple top vertexes). As a mathematical toy model of protein complexes, we consider complexes of closed trajectories of n-simplices ( 2,3,4, n =  ), where the design problem of protein complexes corresponds to an extended version of the Hamiltonian cycle problem. The problem is to find “a set of” closed trajectories of n-simplices which fills the n -dimensional region defined by a given pair of 1 n + -dimensional triangular cones. Here we give a solution to the extended Hamiltonian cycle problem in the case of 2 n = using the discrete differential geometry of triangles (i.e., 2-simplices).


Introduction
Proteins are called the workhorse molecules of life, playing a crucial role in essentially every activity of living organisms.A protein molecule is made from one or more long chains of amino acids, which normally folds into a well-defined three-dimensional structure.It is the precise shape of the folded structure that determines the function of proteins in a cell.
Most cellular processes are not carried out by random collisions between freely diffusing proteins.Proteins usually interact with other proteins and assemble into complexes to carry out their function [1] [2] [3].It is therefore crucial to understand and control the formation of protein complexes for understanding biological activity in the cell.In particular, structural characterization of the components of complexes, such as shape complementarity at proteinprotein interfaces, is the key to understanding the function of proteins.
In the last two decade, huge number of protein structures is experimentally determined via high-throughput structural genomics pipelines.However experimental determination of their functions is lagged far behind the pace due to the labor-intensive and time-consuming nature of the process.Urgently needed are improved computational approaches to function prediction of the proteins with known structure [4].
It is however extremely difficult to describe the shape of proteins without visual inspection on a three-dimensional display.The fundamental question is how to describe the geometry of such a highly complicated shape as proteins.
In most of previous studies, the surface of proteins is described using concepts developed in computational geometry and topology, such as the Voronoi diagram, the Delaunay simplices, and the alpha shape representation [5] [6] [7].As for protein complexes, the topological arrangement of their subunits is usually represented as a graph [8] [9].
The Hamiltonian cycle problem on a regular triangular mesh: a) A region in a regular triangular lattice.b) A Hamiltonian cycle through the region.
An extended version of the Hamiltonian cycle problem on a regular triangular mesh: de nove design of complexes of closed trajectories of triangles.Shown are all the three sets of closed trajectories of triangles which cover the specified region.In this case, the region has no Hamiltonian cycle.
In this paper, we propose a novel mathematical toy model which is intended for the structural study of protein complexes.While physics and mathematics have been inspired each other in their long relationship, the relationship between biology and mathematics is still to come.In our case, it is the relation between real protein complexes and the new mathematical toy model.That is, it is critical to justify why such new toy models are indeed relevant and practically useful.
To justify the usefulness of mathematical tools in biology, I'd like to mention the case of the Estrada index introduced by Ernesto Estrada [10] in 2000.The Estrada index was originally proposed as a molecular structure descriptor, and the protein structure has been investigated by using the Estrada index and the normalized Laplacian Estrada index [11] extensively in mathematics in the past decade.The Estrada indices have also found a range of applications in chemistry and complex networks.These days, a dynamic version of the Estrada indices are proposed [12] to study large-scale time-evolving networks which arise naturally in a variety of areas from peer-to-peer telecommunication to online human social behavior to neuroscience.
As for other mathematical approaches to protein structure analysis, most of them are application of known mathematical techniques to the structural study of proteins, such as, distance geometry [13], the knot theory [13], and persistent homology [14].Differential geometric techniques are also applied to the analysis of the backbone structure of proteins [15].
In our model, instead of open chains of amino acids, we consider closed trajectories of n-simplices using the discrete differential geometry of n-simplices [17].Then, interaction of open chains of amino acids (i.e., proteins) is mimicked with "recombination", such as fusion and fission, of closed chains of n-simplices.The advantage of our model lies in the correspondence between the shape of a complex of closed trajectories of n-simplices and (a projection image of) the intersection of a pair of 1 n + -dimensional cones.
Using the mathematical toy model, we will consider the problem of designing protein complexes from scratch (de novo design of protein complexes [18] [19] [20]).That is, we will consider the problem of finding a set of closed trajectories of n-simplex that forms a specified n-dimensional shape: de nove design of complexes of closed trajectories of n-simplices.For simplicity, we consider the case of 2 n = only.

Problem
The problem we consider here is an extended version of the Hamiltonian cycle problem on a regular triangular mesh.A Hamiltonian cycle of triangles (i.e., 2simplices) is a closed trajectory through a given triangular mesh which visits each triangle exactly once, where the trajectory passes triangles through a common edge.As shown in Figure 1(a), meshes are given as a region in a two-dimensional regular triangular lattice.In this case, a Hamiltonian cycle is obtained as shown in Figure 1(b).
To study the formation of a complex of closed trajectory of triangles, we consider not only a single but also multiple closed trajectories of triangles to cover the given region.In the case of Figure 2, two closed trajectories are required.
In what follows, we will propose a novel method for finding all the sets of closed trajectories which cover a given region of triangles.

Differential Structure on the Mesh
To define a differential structure on a regular triangular mesh, we stack unit cubes diagonally in the three-dimensional Euclidean space 3 E (Figure 3(a)).
By piling up unit cubes orderly in the direction of ( ) line diagonally on the three upper faces of each unit cube, we will obtain a "drawing" on the slope of the mountain range-like shape object (Figure 3(a) and Figure 3(d)).It is the drawing which specifies a flow of "slant" triangles (along the thick polygonal lines) on the slope.
Then, we define a flow of "flat" triangles on a plane which is perpendicular to the direction of ( ) E by projecting the flow of "slant" triangles on the plane (the lower part of Figure 3(c))).In the case of Figure 3(c), we obtain a closed trajectory of flat triangles of length 30 and others.In this section we give the precise definition of the differential structure on a regular triangular mesh.
For space saving purposes, we use monomial in indeterminates 0 1 , x x and 2 x to represent the coordinate of points in the three-dimensional Euclidean space.For example, point x x x , where  denotes the set of all integers.Then, points ( ) for all pairs of i and j.)

Triangle Tiles
Shown in the upper part of Figure 3  O of a three-dimensional Cartesian coordinate system defined by three axes 0 ϕ , 1 ϕ and 2 ϕ .Let Then, the upper face a b c d P P P P on the 0 1 p p -plane is divided into two "slant triangles", a b c P P P and a d c P P P , by the line segment a c P P .The other upper faces are also divided into two "slant triangles" similarly.
Shown in the lower part of Figure 3(a) is the projected image of the unit cube on a plane which is perpendicular to the direction of ( ) The unit cube at O is projected onto a hexagon, which is divided into six "flat triangles" by the image of the three thick line segments on the cube.
The schematic drawing of Figure 3(b) shows the projection of slant triangles onto a flat triangle.Using the projection, we will define a discrete differential structure on the set of flat triangles, i.e., a regular triangular mesh.Let 3 Sym be the symmetric group on a finite set of three symbols.For  denote the convex hull of three points a , ( ) where  denotes the set of all real numbers.For example, the "slant triangle" a d c P P P defined above is denoted by [ ] ( ) ( ) Definition 3.1 We define the set 2 S of all slant (triangle) tiles by The set 2 B of all flat (triangle) tiles is defined as the quotient of 2 S by "shift operator" σ , i.e., ) ( ) ( ) ( ) We identify 2 B with the projection image of "slant triangles" on a plane perpendicular to vector ( )

Tangent Space at a Flat Triangle Tile
A tangent bundle-like local structure , the gradient Ds of s is defined by Then, we can identify  s S ∈ corresponds to the direction of the thick line on the "slant triangle" s which is described in subsection 3.1 above (Figure 3).

Vector Field on B2
Having defined a tangent-bundle like structure ( ) , , TM B π on a set of trian- gles, now we consider the inverse of the projection map π .Definition 3.5 A section γ of ( ) , , For a section γ of ( ) , , TB B π , the value of γ on  [ ]  Then, the set of three slant tiles, { } , , D U s s s , makes up a "continuous mountain path" along the thick polygonal line (i.e., along the gradient Ds ) at s in 2 S (Figure 4(a)).By projecting these slant tiles on 2 B , we obtain a trajectory of flat tiles of length three at mod s σ .To consider the "smoothness" of the section γ , we firstly define a local tra- jectory passing through .For example, ( ) can assume one of the three values of However some of the slant tiles are not connected smoothly to ( ) is not connected smoothly to ( ) To obtain a "smooth" trajectory, we will impose a condition on sections of , , TB B π .Definition 3.7 (Smoothness condition) Let γ be a section of ( ) , , TB B π and . The smoothness condition on γ at t is defined by In what follows, we will only consider the sections of ( ) , , TB B π which sa- tisfies the smoothness condition at every flat tiles of 2 B .

Remark
( ) x ρ corresponds to (the direction of) the contact edge between s and U s .
( ) x ρ corresponds to (the direction of) the contact edge between s and D s .Definition 3.8 (Vector field) A vector filed on 2 B is a section of , , TB B π which satisfies the smoothness condition at every flat tiles of 2 B .
Shown in Figure 4(c) are all the six types of "local" smooth sections of , , TB B π on a hexagonal region composed of six flat tiles of 2 B .By patching these "local" sections together, we will obtain a "global" section of ( ) , , TB B π .Note that some of the "global" sections do not satisfy the smoothness condition as shown in Figure 4(d).The singular flat tile of a section γ of ( ) , , TB B π is the flat tile where γ dose not satisfy the smoothness condition.
A singular flat tile is assigned either no gradient (i.e., without thick edge), two gradients (i.e., two thick edges), or three gradients. Let be a trajectory of a vector field V, where I is a sub-set of the set  of natural numbers.Then, we can define the second derivative of the trajectory as follows.
Definition 3.9 The second derivative , otherwise where : D U − = and : U D − =.
In [16], the conformation of a protein backbone structure is encoded into a 16-valued sequence using the second derivative of trajectories of tetrahedrons (i.e., 3-simplices).

Vector Fields Induced by Tangent Cones
In the beginning of this section, we constructed a "mountain range-like shape object" by piling up unit cubes diagonally.(Using the terminology defined above, it is a section of 2 , , TB π .)Unit cubes are piled up to form a union of triangular cones, which can be identified by its top vertexes.For example, the object shown in the upper part of Figure 3(c) is identified by five peaks 0 P , 1 P , 2 P , 3 P , and 4 P .Then, dc is given as follows.Lemma 3.11 For c ConeA The result follows immediately.
The surfaces of a tangent cone c induce a vector field of ( ) , , TB B π as fol- lows.
V t is uniquely determined at every flat tile of 2 B .For example, in Figure 3(c), the thick polygonal lines on the surfaces of the tangent cone shows the vector field induced by the tangent cone.
Note that all the smooth sections shown in Figure 4(c) are induced by a tangent cone as indicated in the figure.Proposition 3.13 For any vector field V, there exists a tangent cone c such that c V = .
Proof.Let V be a filed on 2 B and let denote the restriction of V on the hexagon i U .
Because of the smoothness condition, V is locally induced by a tangent cone as shown in Figure 3(c).That is, there exists a tangent cone i c for each Moreover, by considering all combinations, we can assume for any pair of adjacent hexagons Suppose that In other words, tangent cone a c is (partially) covered by tangent cone b c Then, there exists a circular loop Θ of hexagons of E around b U such that such that and for , where Ω is the set of all the hexagons of E contained in the circular region surrounded by Θ .Because of the shape of the tangent cones, U is adjacent to e U and e c is (partially) covered by f c on e U , i.e., ( ) ( ) ( ) which is in contradiction to Equation (1).

The Boundary of a Closed Trajectory
Now let's go back to our problem described in section 2. Using the terminology given in section 3, the problem is stated as follows.B find all the vector fields on 2 B which give a decomposition of the region into closed trajectories.
If there exists such a vector field, we can describe the boundary of the region using a pair of three-dimensional cones as explained in this section.
The cones are defined in another lattice which is associated with 3   .Recall that the three-dimensional lattice 3  is generated by 0 x , 1 x and 2 x .The associated lattice is defined as follows.
Definition 4.2.The conjugate lattice 3  is the lattice which is generated by x x and 0 2 x x .
Note that the gradient of a slant tile corresponds to one of the three coordinate axes of the conjugate lattice 3  .In particular, a trajectory of slant tiles cor- responds to a zig-zag walk (with gaps) on the grid of 3  .
Two types of cones are defined in 3  : The inverted cotangent roof of the region, where ICone A ⊂ is defined by IRoof A ⊂  is defined by Then, the boundary of a closed trajectory of a vector field on 2 B can be de- scribed using a pair of a cotangent roof and an inverted cotangent roof as shown below.
Let w be a cotangent cone.We denote by ( ) V be a vector field of ( ) , , TB B π induced by a tangent cone c whose top vertexes ( ) V .Let R µ be the region swept by the trajectory µ .Then, there exist a cotangent cone w and an inverted cotangent cone iv such that the boundary of R µ is uniquely specified by the intersection of ( ) The pair ( ) w iv is called a boundary pair (of the region R µ ) and the specified region is denoted by ( ) Because of the smoothness condition, we may assume slant tiles of Λ are connected "smoothly" in 2 S without any gap.Let A be the set of all vertexes of the slant tiles of Λ .Define cones w and iv by * * , .
Then, the boundary of R µ is obtained by connecting the points of ( ) ( ) ( ) , where π denotes the projection of the lattice points of 3   on the corresponding vertexes of flat tiles of 2 B ., , , , as shown in Figure 5.
Remark Let  be the set of all tangent cones.Let  be the set of all cotan- gent Let  be the set of all the regions in 2 B which are defined by boundary pairs.Then, an  -valued function is defined on ×   by   One of the solutions to the problem is obtained immediately, i.e.,

( ) ( )
Cone w iv ∂ ∩ ∂ (Figure 6(b)).In this section, we consider how to find all solutions to the problem.

Closed-Trajectory Decomposition
Definition 5.2.For In other words, RoofA is obtained by putting as many unit cubes as possible on ConeA .
Definition 5.3.For CeilA ⊂  is defined by where ( ) p c denotes the set of all the "top vertexes" of a cone c.Proposition 5.5.When a closed trajectory is merged with a closed trajectory of length 6 (which occupies a hexagonal region), they don't fuse together to form a single closed trajectory.
In other words, closed trajectories always split when they interact with a hexagon.
See Tabel 1 for the distribution of the length of closed trajectories of n-simplices ( 2,3, 4 n = ).
The distribution of the length of closed trajectories of n-simplices ( 2,3, 4 n = ).
Two closed trajectories are identified if and only if their sequences of the second derivative coincide with each other by rotational shift, inversion, or reversion.

Conclusions
We have considered an extended version of a two-dimensional Hamiltonian cycle problem in a three-dimensional setting, where the boundary of a two-dimensional region is uniquely specified by a pair of three-dimensional cones, i.e., a boundary pair.Using the discrete differential geometry of triangles, all decompositions of the region into closed trajectories of triangles are obtained immediately from the intersection of the boundary pair.
In the structural study of protein complexes, it is essential to characterize surface features such as bumps (convexity) and dents (concavity) of protein molecules.However mathematical surface characterization has not produced any satisfactory results so far, where the surface of protein molecules is usually studied in a three-dimensional setting.
This paper proposes a novel mathematical approach to the structural study of protein complexes, i.e., an approach from a four-dimensional setting, where the surface of protein molecules is to be described by a pair of four-dimensional cones (with multiple top vertexes) as in the case of complexes of closed trajectories of triangles.
In our approach, protein molecules are to be represented as closed trajectories of tetrahedrons, where shape complementarity is expressed inherently.In particular, we could define fusion and fission of molecules (i.e., closed trajectories) naturally.
As a future research subject, we are considering whether there exist any (algebraic) equations a given boundary pair satisfies.If there exists a set of such equations that specifies the given boundary pair, it is possible to represent the shape Table 1.The distribution of the length of closed trajectories of n-simplices ( 2, 3, 4 n = ).Two closed trajectories are identified if and only if their sequences of the second derivative coincide with each other by rotational shift, inversion, or reversion.

Figure 1 .
Figure 1.The Hamiltonian cycle problem on a regular triangular mesh.(a) A region in a regular triangular lattice; (b) A Hamiltonian cycle through the region.

Figure 2 .
Figure 2.An extended version of the Hamiltonian cycle problem on a regular triangular mesh: de nove design of complexes of closed trajectories of triangles.Shown are all the three sets of closed trajectories of triangles which cover the specified region.In this case, the region has no Hamiltonian cycle.

Figure 3 .
Figure 3. Differential structure on a regular triangular mesh.(a) A unit cube and its projection on a plane perpendicular to the direction of ( ) 1,1,1 in 3 E ; (b) The projection of "slant triangles" onto a "flat triangle"; (c) A "mountain range-like shape object" obtained by piling up unit cubes orderly along the diagonal direction, whose peaks are ( ) 0 1, 0, 0 P = , ,1,1 mentioned above.Then, the schematic drawing of Figure 3(b) shows the equivalence class of a slant tile

2 TB is defined on 2
∈ .Then, we obtain Note that the monomial Ds of

Figure 4 .
Figure 4. Local trajectory.(a) The local trajectory specified by ) The smoothness condition on a section γ .Colored gray is above are the gradient of the white tile.The gradient of the gray tile is 0 1x x ; (c) Smooth sections of ( )2 2, , TB B π on a hexagonal region composed of six flat tiles; (d) Sections of ( )2 2, , TB B π which dose not satisfy the smoothness condition.The corresponding sin- gular flat tiles are colored gray in the lower part.
local trajectory specified by s is the set flat tiles passing though mod s σ .Let γ be a section on 2 B .Then, ( )

Definition 3.10 For 3 A ⊂  , a tangent cone 3 ConeA
We denote the set of all the "top vertexes" of ConeA by ( )p ConeA .Then, the mountain range-like shape object of Figure3(c) is given by For a tangent cone c, let dc be the set of all the slant tiles on the surfaces of c, i.e., of z with respect to "origin" p.In particular,

Problem 4 . 1 .
(De nove design of complexes of closed trajectories of triangles) Given a region in 2

Definition 4 . 3 . For 3 A ⊂  , a cotangent cone * 3 Cone*Figure 5 .
Figure 5. Cotangent roofs associated with a closed trajectory on 2 B .(a) The boundary of the closed trajectory of Figure 3(c); (b) The cotangent roof of the region, where 1 0 2

∂ the set of all the lattice points of 3 
which resides on the surface of the cone w. ( ) w ∂ is called the boundary lattice points of w.The boundary lattice points of an inverted cotangent cone is also defined in the same manner.Proposition 4.5.Let c

*
Roof p c is not defined if ( ) 3 p c ∉  .For example, in the case of the closed trajectory given in Figure3(d), the boundary of R µ is uniquely specified by

Corollary 4 . 6 .
Let R be a region in 2 B .Then, R has a closed-trajectory decomposition if and only if there exists a pair of a cotangent cone w and an inverted cotangent cone iv such that

*,
ConeA Cone B := "the region in 2 B which is specified by the intersection of ConeA and * Cone B ".In particular,

5. 1 .
Problem By Corollary 4.6, we can paraphrase Problem 4.1 as follows.Problem 5.1.(De nove design of complexes of closed trajectories of triangles) Given a boundary pair ( ) , w iv , find all the tangent cones which induce such a vector field that gives a decomposition of the region ( )

Figure 6 .
Figure 6.The extended Hamiltonian cycle problem on 2 B (See also Figure 1).(a) A pair of a cotangent cone and an inverted cotangent cone which specifies the boundary of a region: