On the Defining Equations of Protein’s Shape from a Category Theoretical Point of View

This paper proposes a novel category theoretic approach to describe protein’s shape, i.e., a description of their shape by a set of algebraic equations. The focus of the approach is on the relations between proteins, rather than on the proteins themselves. Knowledge of category theory is not required as mathematical notions are defined concretely. In this paper, proteins are represented as closed trajectories (i.e., loops) of flows of triangles. The relations between proteins are defined using the fusion and fission of loops of triangles, where allostery occurs naturally. The shape of a protein is then described with quantities that are measurable with unity elements called “unit loops”. That is, protein’s shape is described with the loops that are obtained by the fusion of unit loops. Measurable loops are called “integral”. In the approach, the unit loops play a role similar to the role “1” plays in the set Z of integers. In particular, the author considers two categories of loops, the “integral” loops and the “rational” loops. Rational loops are then defined using algebraic equations with “integral loop” coefficients. Because of the approach, our theory has some similarities to quantum mechanics, where only observable quantities are admitted in physical theory. The author believes that this paper not only provides a new perspective on protein engineering, but also promotes further collaboration between biology and other disciplines.


Introduction
The interaction of proteins is mainly determined by their shape because they form protein-protein comlexes to perform their tasks. This paper considers the How to cite this paper: Morikawa defining equations of the shape of proteins, using the mathematical toy model proposed in [1]. Our focus is on "relations between proteins" rather than "proteins".
In the mathematical toy model, proteins are represented as closed trajectories (i.e., loops) of flows of triangles. The formation of protein-protein complexes is then described as a "fusion" of loops (denoted by "+"). One of the features of the model is a simple mechanism of the enzyme/substrate/activator-type regulation, i.e., the type of regulation called "allosteric regulation" in which a substrate cannot bind to an enzyme without an activator [2] [3].
The purpose of this paper is to describe the shape of a loop with "measurable" quantities, which are the shapes obtained by the "fusion" of unity elements called "unit loops". Unit loops are of length 6 and have a hexagonal shape. The unit loops play a role similar to the role "1" plays in the set  of integers. We call the loops obtained by the fusion of unit loops "integral". Loops obtained by factoring an integral loop are called "rational".
Because of the approach, our theory has some similarities to quantum mechanics, where only observable quantities are admitted in physical theory. For example, we consider two sets of flows in the following. One is a set L of the "rational" flows, and the other is a set L A of the "integral" flows. (Loops of flows of L and L A are called "rational" and "integral", respectively.) In analogy to quantum mechanics, L corresponds to the quantum world of objects, and L A corresponds to the world of observers on which the measurement is performed.
The motivation of the study is the search for a new (discrete) geometry with high affinity to biology. Despite of the significant progress in mathematics in recent decades, it is still too early to say that we can describe the shape of bio-molecules, such as proteins, mathematically.
When designing a new protein, there are two types of approaches. One is the design of proteins with a desired backbone structure. The other is the design of proteins with desired functions (i.e., desired active sites or desired interacting surfaces). In both cases, target descriptions are usually given as a two-dimensional schematic diagram [4] [5]. In the diagram, three-dimensional backbone structures are represented as a sequence of local structural patterns (such as alpha-helices and beta-strands) with sets of pairwise spatial relationships between them.
In the previous studies, the author was thinking of specifying a given shape by the way it splits, i.e., a fission-based approach [6] [7]. However, it is not yet clear how to describe the relation between various fissions of the given shape.
(In the conclusion section, this issue is mentioned as one of the directions for future research. That is, for a given L-spectrum, find a flow U that has the same L-spectrum.) To obtain a novel mathematical representation of protein shape, the author now adopts a fusion-based approach rather than the fission-based approach using the notion of "integral loops".
In this paper the author shows that it is possible to describe protein-like shapes using algebraic equations if one takes a category theoretical approach. Along the way, also provided is the basis for describing the relation between various fissions of a shape, i.e., L-spectrum of a flow U L ∈ , albeit briefly.
However, they are results for a simple model and the author considers this to be a starting point rather than an end point of the study. The ultimate goal is not only to understand the functional mechanisms of proteins, but also to design protein-like molecules with new functions. I hope that the mathematical description proposed will provide some insight into the realization of this goal. Basic ideas of the approach adopted in this paper are: • Loops are identified with the flow consisting of the loop and any open trajectories.
• A binary relation "<" between flows is defined using the fusion "+" of loops.
• The fusion of flows of L A is regarded as an "addition".
• The fission of a flow of L A into flows of L is regarded as a "factorization".
• Flows of L are specified as a "direct factor" of a flow of L A .
• The "shadow" of a flow of L is a projection on L A , and is measurable with unit loops (i.e., hexagons).    2) Next, "a minimal flow 1 A C L ∈ that contains B 1 " is computed. C 1 consists of an integral loop c 1 and open trajectories. C 1 is called the "shadow" of B 1 .
In what follows, the author tries to present the approach outlined above in a systematic manner from scratch using simple examples. Knowledge of category theory is not required, as we define mathematical notions concretely. The rest of the paper is organized as follows. In Section 2, we give a brief review of previous results. After defining basic notions of loops of triangles in Section 3, we give the definitions of the set L of the rational flows in Section 4. The set L A of the integral flows is defined in Section 5. Then, we consider the defining equations of flows of L in a step-by-step manner in Section 6. After a discussion in Section 7, the author concludes with some suggestions for future research directions in Section 8.
The author believes that this paper will open up a new perspective for the protein engineering and bring about further advances in collaboration between biology and other disciplines.
Finally, Genocript (http://www.genocript.com) is the one-man bio-venture started by the author in 2000 which is developing software tools for protein structure analysis. In particular, the author is not affiliated with any research institution.

Previous Works
Category theory has been offered as a common language for over 50 years across disciplines other than pure mathematics, such as theoretical computer science, mathematical physics, network theory, and others. In biology, the application of category theory began in 1958 when Robert Rosen proposed a category theoretic model of metabolic networks called the "M-R system" [8] [9] [10]. However, the major concern is mathematical modeling of biological systems and has not been applied to analysis of the shape of bio-molecules. (See also [11] for application of group theory to molecular systems biology.) As for analysis of the shape of bio-molecules, the alpha shape theory provides an accurate and robust method for computing topological, combinatorial, and metric properties of the union of finitely many spherical d-balls in the d-dimensional Euclidean space d  [12] [13]. The principle of graph theory is also adopted in the description of protein structure [14] [15]. However, the fusion and fission of bio-molecules are not considered there. Finally, as for the mechanism of allosteric regulation, it has been a focus of much effort over 50 years and many models have been successfully formulated  [2]. Broadly, studies of allostery fall into three mainstream categories. The first is based on the principle of thermodynamic equilibrium. The second is based on the conceptual thermodynamic view such as conformational selection and induced fit. The third is based on the inferred structural coupling between the two sites (i.e., the active and binding sites). On the other hand, our interpretation provides a purely geometrical explanation of allostery.

Basic Notions of Loops of Triangles
We begin by defining the two basic notions, "loops of triangles" and "fusion and fission" of loops. We then show how to quickly compute a loop with a given shape. The same method can be used for the computation of fusion and fission of given loops.

Loops of Triangles
To define a loop of triangles, we divide the two-dimensional Euclidean plane 2  into equilateral triangles as illustrated in Figure 2 In the figures, we indicate the value of ( ) U t ( t TT ∈ ) by thickening the edge shared by t and ( ) U t . Trajectories of U are then obtained by connecting triangles to the two adjacent triangles that are not assigned by U (Figure 2  We denote flows by upper case letters, and loops by lower case letters.

Fusion and Fission of Loops
"Fusion and fission" of loops are defined by identifying sets of loops with the same "shape". ( 1, 2 i = ). Then, Definition 5 (Fusion "+" of loops). Let Then, we write , .

Computation of the Loops with a Given Shape
By stacking unit cubes diagonally in the three-dimensional Euclidean space 3  , we can quickly compute a loop that fills a given two-dimensional "shape" on TT [1].
Example 2. The loop b 1 of Figure 1(a) is obtained by stacking unit cubes as shown in Figure 3. By stacking unit cubes so that the thick line segments on them form the contour of the shape, a loop with the contour is automatically obtained (Figure 3(b)). Each triangle is connected to two adjacent triangles via shared edge (thin line segment) (Figure 3(c)). We can also compute a fusion and fission of loops instantly by placing/removing unit cubes on/from the stacked unit cubes. Example 3. Figure 4(a) shows the fusion of two loops. Two loops of flow U 1 are fused into the loop of flow U 3 by placing a unit cube on U 1 . Two loops of flow U 2 are also fused into the same loop of U 3 by removing a unit cube from U 2 . U 3 has no loop decomposition other than U 1 and U 2 .
As mentioned in the introduction, one of the features of our model is a simple mechanism of the allosteric regulation. , such that e s c + + is defined, but e s + is not defined.
Example 4. In Figure 5, e is allosteric. (That is, b 1 of Figure 1 is allosteric.) Loops s and e don't fuse into a loop due to an "overhang" of stacked unit cubes.

Affine Flows L (Aka Rational Flows)
Now let's define the set L of flows mentioned in the introduction. The other set L A is defined in the next section. We define not only the objects in a set, but also a binary relation between objects within the set. This is why we call our approach a "category theoretical" point view. Using the binary relation, we define the set L ∧ of { } 0,1 -valued functions on L. As for introduction to category theory, a number of standard textbooks are available, including [16] and [17]. Example 5 (Non-affine loop). Figure 4(b) gives an example of non-affine loops. We need two sets of stacked unit cubes to cover the whole shape: one for the upper half and the other for the lower half.

The Set L of Affine Flows
Definition 8 (Unit loop). Affine loops of length 6 are called unit loops. They always form a hexagonal shape (Figure 2(b)). Unit loops play a role similar to the role "1 (the unity)" plays in the set  of integers. Notation 1. In the following, we usually denote unit loops by i a and i x .
Loops in general are denoted by b , i b , c , i c , i y , i z and others.

The Category Cat(L) of Affine Flows
Definition 9 (Binary relation "<" on L). Let , In other words, U is a subdivision of V with respect to the fusion "+" operation.
Note that U U < .
That is, "<" is not a partial order.
Definition 12 (Equivalence relation " ≡ " on L). Let , U V L ∈ . Then, we define an equivalent relation " ≡ " on L by Remark 3. Definition 11 and Definition 12 are consistent.
That is, U and V are divided into the same set of shapes (i.e., i j b c = for some i and j ), but the shapes may have different "folding patterns" of trajectories uniquely up to equivalence ≡ . In this paper, we consider a fixed terminal flow Z.

Basic Operations of Cat(L)
The least upper bound and the greatest lower bound for S L ⊂ with respect to "<" is defined as follows [16] [17].

UB S of upper bounds of S is defined by
if and only if . . In theoretical computer science, an "exponential object" B U corresponds to a computer program that takes input B and produces output U. In the following, we will specify affine loops (i.e., proteins) as an exponential.

Hom Function of Cat(L)
Definition 20 (The category of Definition 22 (Hom functions on L). Let V L ∈ . The contravariant Hom function  ι is injective (up to equivalence ≡ ). In particular, we obtain a "spectral decomposition"

Loop Representation of Affine Flows
Now let's describe the loops of a flow of L using the binary relation "<" instead be a set of one-loop flows. Then, implies .
Example 7. Shown in Figure 1 , , , m V V V  of maximal-cover flows of U such that: Note that the set of co-generators consists of a finite number of flows and is uniquely determined by U.
In the case of Figure 4(a), we have , .

The Fusion-Structure Presheaf F on the Sup Topology K
In quantum mechanics, each physical system is associated with a collection of data obtained when measuring some physical property of the system, i.e., the spectrum of an observable quantity associated with the physical property. In our model of proteins, each protein, i.e., "loop", is associated with the "spectrum" of the measurable quantity, i.e., "shape". Remark 10. Discrete differential geometry of triangles [1] provides a mechanism by which the observed result comes about.
Definition 29 (Sup topology K). The sup topology K on L is defined by Elements of ( ) K U are called states of the "biological entity" U. Definition 30 (Fusion-structure presheaf F). Fusion-structure presheaf F on L is defined by Recall that ( ) F W is the spectral decomposition of W (See Lemma 6). That is, ( ) F W is the spectoral decomposition of the state W of the "biological entity" U.
Remark 11. As for the sup topology and presheaf, a number of standard textbooks are available, including [17] and [19].
Example 11. In the case of Figure 4(a), we have

Integral Flows L A
Here we define the other set L A of flows mentioned in the introduction. Using the binary relation defined on L A , we define the set ( )

The Category Cat(L A ) of Integral Flows
Definition 32 (The set L A of integral flows). Let A L ∈ be the flow consisting of only unit loops (Figure 6(a)). A flow U L ∈ is called integral if A U < . The set of all integral flows is denoted by L A , i.e.,
A is called the base flow of L A . Loops of integral flows are called integral loops. Remark 12. " A U < " means that each loop of U is obtained by fusing a set of loops of A. That is, A spawns all flows of L A . In other words, A is a kind of "quantum vacuum". On the other hand, the terminal object Z does not generate anything because Z contains no loop. That is, Z is a kind of "classical vacuum". Binary relations "<" and " ≡ " as well as mappings " ∨ " and " ∧ " are already Proof. It follows immediately from the definition of "<". □ Remark 13. Unit loops have the same shape, but behave differently with respect to "+". That is, they form different shapes depending on their relative placement. Because of the property, the algebra of integral flows is also called "hetero numbers", i.e., numbers with "heterogeneous" unity elements.
The following proposition gives a characterization of allosteric loops (See Definition 6). Proposition 2. Integral loops are not allosteric.  .

Proof. (Sketch of proof) Let
Then, A ι is injective (up to equivalence " ≡ "). That is, we can regard L A as a subset of ( ) : , : .
ι is injective (up to equivalence " ≡ "). In particular, we obtain a spectral decomposition Example 12. In the case of Figure

Closure of Affine Loops
First, we define an operation that replaces the "background" open trajectories of an integral flow with a set of unit loops. Definition 38 (Localization/unlocalization). Let  Moreover, there exist one-loop flows 1 2 , . By definition of "<", there exist loops 1 2 , , ,

Defining Equations of Affine Loops
Now let's consider the problems mentioned in the introduction step by step. For intuitive understanding, we use loops instead of flows in this section.

In the Case of Self-Computable Loops
We begin by considering the second half of Problem 2.

In the Case of Integral Loops
Now let's consider the first half of Problem 2. Subsets of a self-computable loop are obtained as a solution to the following problem.
z c z c z c x a x a x a gives a solution to Problem 2 of the introduction.

In the Case of Non-Integral Loops
Finally, let's consider Problem 1 of the introduction.  . m C C C C C C  By analogy with quantum mechanics, C 1 is called the "output spectral flow", and 2 3 , , , m C C C  are called the "input spectral flows". Example 19. Let 1 U L ∈ . Then, , .

Discussion
The author has shown that it is possible to describe protein-like shapes using algebraic equations if one takes a category theoretical approach. Along the way, also provided is the basis for describing the relation between various fissions of a shape, i.e., L-spectrum of a flow U L ∈ , albeit briefly.
In the previous attempts [6] [7], the author tried to describe the shape of a loop by considering the relation between all the fissions of the loop, i.e., the L-spectrum of the corresponding flow. Although transitions between states correspond to the addition or removal of unit cubes in the "stacked-unit-cube description", it is not yet clear how to describe the relation between various fissions of the given shape. In this study, the author obtains a set of defining equations of a given shape by assuming the existence of integral flows. That is, a fusion-based approach rather than the previous fission-based approach. By referring to the construction of the rational numbers, he describes the shape of loops algebraically in a way that is compatible with the description style of quantum mechanics.
One of the features of the study is that it deals with both protein folding and protein shape simultaneously. This provides us with a simple model of allostery. Another feature is that it expresses both geometrical and algebraic features of protein interactions simultaneously. This allows us to describe the shape of proteins algebraically. The author is not aware of any similar multifaceted studies by other researchers on protein shape. However, they are results for a simple model and he considers this to be a starting point rather than an end point of the study.
In relation to experimental studies, the author hopes that this study will provide options on the underlying general principles that govern the engineering of self-assembling molecules such as proteins. For example, our model indicates that allostery is a remnant of a giant protein molecule, i.e., interaction between a surface region and the core of the molecule.
It also proposes an approach to the question "How to obtain a well-defined shape with desired properties by folding a chain of subunits" [20]. That is, first construct the shape with unity elements, then divide it into a set of folded chains.
(Of course, in order to do that, we need to find something that can be used as a unity element.) However, so far there are no experimental results to support these proposals.

Conclusions
Protein design starts with a specification. Since the function of proteins is largely determined by their shape, the specification should include a description of the shape of the target protein. This paper proposes a novel category theoretic approach to describe protein's shape, i.e., a description of their shape by a set of algebraic equations. However, this paper considers the approach in a very simplified model. That is, in the two-dimensional case of the mathematical toy model proposed in [1]. Moreover, some notions were only defined and not specifically considered.
The author hopes that this paper will serve as one of the starting points for a variety of related researches. Shown below are some of the future research directions the author considers: 1) Base change.
The base flow A is obtained by partitioning 2  into hexagons. On the other hand, flows of triangles on a closed surface are considered in [20]. By changing the base flow A with another base flow (i.e., another flow consisting of only unit loops), we can consider flows on various surfaces. For example, the surface flow on a rhombic dodecahedron corresponds to a base flow consisting of four unit loops.
In the case of flows of tetharadrons (i.e., 3-simplices), unity elements are rhombic dodecahedral loop of length 24. Unlike the case of triangles (i.e., 2-simplices), they split into four loops of length 6 [1]. In general, unity elements of flows of n-simplices have fissions if 2 n > . It is a challenge to describe fissions of unity elements in higher dimensional cases.

3) Characterization of allosteric proteins.
Allostery is at play in all processes in the living cell, and increasingly in drug discovery [2]. Our model indicates that allosteric proteins are produced as a result of the fission of larger protein molecules. That is, in our model, the design of allosteric proteins corresponds to the specification of fission of larger proteins.
However, the author does not know how to specify the fission of proteins (i.e., the fission of loops of n-simplices). 4) Symmetry of the shape of proteins.
Here the "symmetry" means the relation between the flows of L-spectrum defined in Subsection 4.6. The question is to what extent the symmetry controls the shape? In particular, for a given L-spectrum, find a flow U that has the same L-spectrum. 5) "Biologic" logic.
The relations between proteins (i.e., loops) are defined using the fusion and fission of proteins, where allostery (i.e., global interaction) occurs naturally [3].
What kind of logic can be made by building theories based on the fusion and fission of proteins?
Our model may be too simple to describe the ecology of actual proteins. But it is said that the simpler the model, the broader the range of applications. The author believes that this paper not only provides a new perspective on protein engineering, but also promotes further collaboration between biology and other disciplines.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.