On the Defining Equations of Protein’s Shape from a Category Theoretical Point of View
Naoto Morikawa
Genocript, Zama, Japan.
DOI: 10.4236/am.2020.119058   PDF    HTML   XML   393 Downloads   1,107 Views   Citations

Abstract

This paper proposes a novel category theoretic approach to describe protein’s shape, i.e., a description of their shape by a set of algebraic equations. The focus of the approach is on the relations between proteins, rather than on the proteins themselves. Knowledge of category theory is not required as mathematical notions are defined concretely. In this paper, proteins are represented as closed trajectories (i.e., loops) of flows of triangles. The relations between proteins are defined using the fusion and fission of loops of triangles, where allostery occurs naturally. The shape of a protein is then described with quantities that are measurable with unity elements called “unit loops”. That is, protein’s shape is described with the loops that are obtained by the fusion of unit loops. Measurable loops are called “integral”. In the approach, the unit loops play a role similar to the role “1” plays in the set Z of integers. In particular, the author considers two categories of loops, the “integral” loops and the “rational” loops. Rational loops are then defined using algebraic equations with “integral loop” coefficients. Because of the approach, our theory has some similarities to quantum mechanics, where only observable quantities are admitted in physical theory. The author believes that this paper not only provides a new perspective on protein engineering, but also promotes further collaboration between biology and other disciplines.

Share and Cite:

Morikawa, N. (2020) On the Defining Equations of Protein’s Shape from a Category Theoretical Point of View. Applied Mathematics, 11, 890-916. doi: 10.4236/am.2020.119058.

1. Introduction

The interaction of proteins is mainly determined by their shape because they form protein-protein comlexes to perform their tasks. This paper considers the defining equations of the shape of proteins, using the mathematical toy model proposed in [1]. Our focus is on “relations between proteins” rather than “proteins”.

In the mathematical toy model, proteins are represented as closed trajectories (i.e., loops) of flows of triangles. The formation of protein-protein complexes is then described as a “fusion” of loops (denoted by “+”). One of the features of the model is a simple mechanism of the enzyme/substrate/activator-type regulation, i.e., the type of regulation called “allosteric regulation” in which a substrate cannot bind to an enzyme without an activator [2] [3].

The purpose of this paper is to describe the shape of a loop with “measurable” quantities, which are the shapes obtained by the “fusion” of unity elements called “unit loops”. Unit loops are of length 6 and have a hexagonal shape. The unit loops play a role similar to the role “1” plays in the set of integers. We call the loops obtained by the fusion of unit loops “integral”. Loops obtained by factoring an integral loop are called “rational”.

Because of the approach, our theory has some similarities to quantum mechanics, where only observable quantities are admitted in physical theory. For example, we consider two sets of flows in the following. One is a set L of the “rational” flows, and the other is a set LA of the “integral” flows. (Loops of flows of L and LA are called “rational” and “integral”, respectively.) In analogy to quantum mechanics, L corresponds to the quantum world of objects, and LA corresponds to the world of observers on which the measurement is performed.

The motivation of the study is the search for a new (discrete) geometry with high affinity to biology. Despite of the significant progress in mathematics in recent decades, it is still too early to say that we can describe the shape of bio-molecules, such as proteins, mathematically.

When designing a new protein, there are two types of approaches. One is the design of proteins with a desired backbone structure. The other is the design of proteins with desired functions (i.e., desired active sites or desired interacting surfaces). In both cases, target descriptions are usually given as a two-dimensional schematic diagram [4] [5]. In the diagram, three-dimensional backbone structures are represented as a sequence of local structural patterns (such as alpha-helices and beta-strands) with sets of pairwise spatial relationships between them.

In the previous studies, the author was thinking of specifying a given shape by the way it splits, i.e., a fission-based approach [6] [7]. However, it is not yet clear how to describe the relation between various fissions of the given shape. (In the conclusion section, this issue is mentioned as one of the directions for future research. That is, for a given L-spectrum, find a flow U that has the same L-spectrum.) To obtain a novel mathematical representation of protein shape, the author now adopts a fusion-based approach rather than the fission-based approach using the notion of “integral loops”.

In this paper the author shows that it is possible to describe protein-like shapes using algebraic equations if one takes a category theoretical approach. Along the way, also provided is the basis for describing the relation between various fissions of a shape, i.e., L-spectrum of a flow, albeit briefly. However, they are results for a simple model and the author considers this to be a starting point rather than an end point of the study. The ultimate goal is not only to understand the functional mechanisms of proteins, but also to design protein-like molecules with new functions. I hope that the mathematical description proposed will provide some insight into the realization of this goal.

Basic ideas of the approach adopted in this paper are:

· Loops are identified with the flow consisting of the loop and any open trajectories.

· A binary relation “<” between flows is defined using the fusion “+” of loops.

· The fusion of flows of LA is regarded as an “addition”.

· The fission of a flow of LA into flows of L is regarded as a “factorization”.

· Flows of L are specified as a “direct factor” of a flow of LA.

· The “shadow” of a flow of L is a projection on LA, and is measurable with unit loops (i.e., hexagons).

Figure 1(a) shows the process for obtaining the defining equations of the shape of a loop b1.

Figure 1. Approach for obtaining the defining equations of a loop b1. (a) The shadow C1 of a flow B1. (left) A flow consisting of a rational loop b1 and open trajectories. (middle) The flow consisting of an integral loop c1 and open trajectories. (right) A fission of c1 into rational loops and. The background unit loops (i.e., hexagons) are shown for clarity. (b) Fission of c1 into, where b1 is colored dark grey. On the right are the shadows Ci of Bi, where Bi consists of a loop bi (colored dark grey) and open trajectories (). Ci consists of an integral loop ci and open trajectories. The background unit loops are shown for clarity. (c) Fission of c1 into, where b7 is colored dark grey. On the right are the shadows Ci of Bi, where Bi consists of a loop bi (colored dark grey) and open trajectories (). Ci consists of an integral loop ci and open trajectories.

1) First, b1 is identified with consisting of b1 and open trajectories.

2) Next, “a minimal flow that contains B1” is computed. C1 consists of an integral loop c1 and open trajectories. C1 is called the “shadow” of B1.

3) C1 is then factored into b1 and other loops.

4) The shadows of () are also computed, where consists of a loop and open trajectories. consists of an integral loop and open trajectories.

5) B1 is specified by a set of equations using the six shadows. (See Problem 1 below for the equation.)

Note that we can represent integral loops as a fusion of unit loops (i.e., hexagons).

To check uniqueness, let’s consider another loop b7 of Figure 1(c). Let be the flow consisting of b7 and open trajectories. Then, B7 has the same shadow C1 on LA, where b7 fuses with loops to form c1. In this case, is different from, i.e.,.

In summary, B1 is a solution to the following problem.

Problem 1. For a given set of flows, find a flow such that there exists a set of flows that satisfies

where is the shadow of ().

We denote the set of all the solutions to the above type of problem by “”. By analogy with quantum mechanics, C1 is called the “output spectral flow”, and are called the “input spectral flows”. In the case of Figure 1, we obtain

In general, consists of more that one flows. For example

Also shown in this paper is that the set of integral loops are the unique (up to the rotational and mirror symmetries) solution to the following problem. For intuitive understanding, we use loops instead of flows.

Problem 2. Find integral loops such that

where are unit loops and satisfy

for some integral loops.

That is, combining Problem 1 and Problem 2, we obtain the defining equations of the shape of the given loop b1.

In what follows, the author tries to present the approach outlined above in a systematic manner from scratch using simple examples. Knowledge of category theory is not required, as we define mathematical notions concretely. The rest of the paper is organized as follows. In Section 2, we give a brief review of previous results. After defining basic notions of loops of triangles in Section 3, we give the definitions of the set L of the rational flows in Section 4. The set LA of the integral flows is defined in Section 5. Then, we consider the defining equations of flows of L in a step-by-step manner in Section 6. After a discussion in Section 7, the author concludes with some suggestions for future research directions in Section 8.

The author believes that this paper will open up a new perspective for the protein engineering and bring about further advances in collaboration between biology and other disciplines.

Finally, Genocript (http://www.genocript.com) is the one-man bio-venture started by the author in 2000 which is developing software tools for protein structure analysis. In particular, the author is not affiliated with any research institution.

2. Previous Works

Category theory has been offered as a common language for over 50 years across disciplines other than pure mathematics, such as theoretical computer science, mathematical physics, network theory, and others. In biology, the application of category theory began in 1958 when Robert Rosen proposed a category theoretic model of metabolic networks called the “M-R system” [8] [9] [10]. However, the major concern is mathematical modeling of biological systems and has not been applied to analysis of the shape of bio-molecules. (See also [11] for application of group theory to molecular systems biology.)

As for analysis of the shape of bio-molecules, the alpha shape theory provides an accurate and robust method for computing topological, combinatorial, and metric properties of the union of finitely many spherical d-balls in the d-dimensional Euclidean space [12] [13]. The principle of graph theory is also adopted in the description of protein structure [14] [15]. However, the fusion and fission of bio-molecules are not considered there.

Finally, as for the mechanism of allosteric regulation, it has been a focus of much effort over 50 years and many models have been successfully formulated [2]. Broadly, studies of allostery fall into three mainstream categories. The first is based on the principle of thermodynamic equilibrium. The second is based on the conceptual thermodynamic view such as conformational selection and induced fit. The third is based on the inferred structural coupling between the two sites (i.e., the active and binding sites). On the other hand, our interpretation provides a purely geometrical explanation of allostery.

3. Basic Notions of Loops of Triangles

We begin by defining the two basic notions, “loops of triangles” and “fusion and fission” of loops. We then show how to quickly compute a loop with a given shape. The same method can be used for the computation of fusion and fission of given loops.

3.1. Loops of Triangles

To define a loop of triangles, we divide the two-dimensional Euclidean plane F ( U 1 ) > F ( U 2 ) into equilateral triangles as illustrated in Figure 2(a).

Definition 1 (Flow of triangles). Let be the triangle tiling obtained by dividing { 0,1 } into equilateral triangles (Figure 2(a)). A flow U of triangles is a “reflective” assignment of an adjacent triangle to triangles of TT, i.e.,

, is adjacent to and.

In the figures, we indicate the value of () by thickening the edge shared by t and. Trajectories of U are then obtained by connecting triangles to the two adjacent triangles that are not assigned by U (Figure 2(b)). That is, is the “normal vector” to the trajectory at t. The set of all flows on TT is denoted by.

For example, a loop of length 6 is obtained by connecting triangles one by one as shown in Figure 2(b). The loop of Figure 1(a) is also obtained in the same way (Figure 2(c)).

Definition 2 (Loop of triangles). A (triangular) loop is a closed trajectory of triangles of finite length. Let. The set of all the loops occurred in U is denoted by. If, then we identify a flow U with the loop b. We denote flows by upper case letters, and loops by lower case letters.

(a) (b) (c)

Figure 2. Loops of triangles. (a) Triangle tiling of. (b) Formation of a loop of length 6. Triangles are connected to two of the three adjacent triangles via the shared edge. Thick line segments indicate the “normal vector” edge. (c) The loop b1 of length 30 given in Figure 1.

3.2. Fusion and Fission of Loops

“Fusion and fission” of loops are defined by identifying sets of loops with the same “shape”.

Definition 3 (The shape of a loop b). Let and. The shape of b is the union of the triangles swept by b.

Remark 1. Some areas are not obtainable as the shape of a loop.

Definition 4 (Equivalence relation “” on loops). Let and (). Then,

if and only if.

Definition 5 (Fusion “+” of loops). Let and . The fusion of () is defined if and only if there exists

and such that.

Then, we write

We often denote by (if there is no risk of confusion). Note that (1) is not always defined, and (2) is unique up to equivalence “” even if is defined. () are said to interact with each other if is defined. The inverse operation of fusion is called fission. Loops may have more than one fissions.

Example 1. Two fissions of c1 are shown in Figure 1(b) and Figure 1(c), i.e.,

3.3. Computation of the Loops with a Given Shape

By stacking unit cubes diagonally in the three-dimensional Euclidean space, we can quickly compute a loop that fills a given two-dimensional “shape” on TT [1].

Example 2. The loop b1 of Figure 1(a) is obtained by stacking unit cubes as shown in Figure 3. By stacking unit cubes so that the thick line segments on them form the contour of the shape, a loop with the contour is automatically obtained (Figure 3(b)). Each triangle is connected to two adjacent triangles via shared edge (thin line segment) (Figure 3(c)).

We can also compute a fusion and fission of loops instantly by placing/removing unit cubes on/from the stacked unit cubes.

Example 3. Figure 4(a) shows the fusion of two loops. Two loops of flow U1 are fused into the loop of flow U3 by placing a unit cube on U1. Two loops of flow U2 are also fused into the same loop of U3 by removing a unit cube from U2. U3 has no loop decomposition other than U1 and U2.

As mentioned in the introduction, one of the features of our model is a simple mechanism of the allosteric regulation.

(a) (b) (c)

Figure 3. Stacking of unit cubes in. (a) A unit cube to be stacked. Each of the three top faces is divided into two triangles by a thick line segment. The thick edge of the triangles corresponds to the “normal vector” edge of the flow. (b) Stacked unit cubes. (c) The top view of the stacked cubes of (b). Triangles on the surfaces of the stacked cubes are connected via thin edges, forming the loop b1 of Figure 1.

(a) (b)

Figure 4. Computation of loops with a given shape. (a) A shape (left) and the corresponding flows (right). On the right are the top views of stacked unit cubes. Flows U1 and U2 consist of two loops and open trajectories. Flow U3 consists of a loop and open trajectories. (The contained loop is considered to be part of the surrounding loop.) (b) Non-affine loop. Note that there is an “overhang” of stacked unit cubes in the lower right.

Definition 6 (Allosteric loop). Let for some. e is called allosteric if there exist and, such that

is defined, but is not defined.

Example 4. In Figure 5, e is allosteric. (That is, b1 of Figure 1 is allosteric.) Loops s and e don’t fuse into a loop due to an “overhang” of stacked unit cubes.

4. Affine Flows L (Aka Rational Flows)

Now let’s define the set L of flows mentioned in the introduction. The other set LA is defined in the next section. We define not only the objects in a set, but also a binary relation between objects within the set. This is why we call our approach a “category theoretical” point view. Using the binary relation, we define the set of -valued functions on L. As for introduction to category theory, a number of standard textbooks are available, including [16] and [17].

4.1. The Set L of Affine Flows

Definition 7 (The set L of affine flows). An affine flow (or rational flow) is a flow that can be computed with a single set of stacked unit cubes. The set of all affine flows is denoted by L. In particular,. The loops of affine flows are called affine loops (or rational loops). Since we are concerned with the interaction between loops, we consider the contained loop to be part of the surrounding loop.

(a) (b) (c) (d)

Figure 5. Allosteric regulation of loops. (a) Three loops. e, s, and c correspond to “Enzyme”, “Substrate”, and “aCtivator”, respectively. Shown below is the top view of the corresponding stacked unit cubes. (b) Fusion of loop e and loop c. (c) Loops e and s. They don’t fuse into a loop without c because of the “overhang” enclosed by the white dashed line. (d) Fusion of e, s, and c.

Remark 2. In order to show the similarity to the relation between the integers and the rational numbers, flows of L are called rational in the introduction.

Example 5 (Non-affine loop). Figure 4(b) gives an example of non-affine loops. We need two sets of stacked unit cubes to cover the whole shape: one for the upper half and the other for the lower half.

Definition 8 (Unit loop). Affine loops of length 6 are called unit loops. They always form a hexagonal shape (Figure 2(b)). Unit loops play a role similar to the role “1 (the unity)” plays in the set of integers.

Notation 1. In the following, we usually denote unit loops by and. Loops in general are denoted by, , , , , and others.

4.2. The Category Cat(L) of Affine Flows

Definition 9 (Binary relation “<” on L). Let. Then, if and only if there exist loops for any such that

In other words, U is a subdivision of V with respect to the fusion “+” operation. Note that.

Definition 10 (Cat(L)). The category of affine flows is a pair of

1) the set L of affine flows, and

2) the binary relation < on L defined above.

Definition 11 (Fusion “+” of flows). Let and. Suppose that and. Then, we write

if.

Lemma 1.

1) Let. Then, if.

2) Let. Let and. Suppose that. Then

and if.

That is, “<” is not a partial order.

Definition 12 (Equivalence relation “” on L). Let. Then, we define an equivalent relation “” on L by

if and only if and.

Remark 3. Definition 11 and Definition 12 are consistent.

Remark 4. Let. Let and . Then, implies

and

That is, U and V are divided into the same set of shapes (i.e., for some and), but the shapes may have different “folding patterns” of trajectories (i.e.,).

Definition 13 (A terminal flow Z). Let such that. Then, for any. Z is called a terminal flow of. Z is determined uniquely up to equivalence. In this paper, we consider a fixed terminal flow Z.

4.3. Basic Operations of Cat(L)

The least upper bound and the greatest lower bound for with respect to “<” is defined as follows [16] [17].

Definition 14 (LB(S) and UB(S)). Let be a finite subset. The set of lower bounds of S is defined by

The set of upper bounds of S is defined by

Definition 15 (GLB(S)). Let be a finite subset. A greatest lower bound of S is a maximal element of. We denote the set of all the greatest lower bounds of S by, i.e.,

Note that may contain more than one flow.

The author does not know the answer to the following question.

Question 1. Let be a finite subset. Does imply?

Definition 16 (Greatest lower bound mapping “” on L). Let be a finite subset. A greatest lower bound mapping” is an assignment of an element of to S, i.e.,

where is the set of all finite subsets of L. In this paper, we consider a fixed greatest lower bound mapping.

Remark 6. Let be a finite subset. Then, for all dose not imply because may contain multiple flows.

Example 6. Shown in Figure 1(b) left is

where such that (). Then,

Definition 17 (Least upper bound mapping “” on L). A least upper bound mapping” is defined on L similarly, i.e.,

where

Note that may contain more than one flow. In this paper, we consider a fixed least upper bound mapping on L.

Notation 2. Let. For simplicity, we often write and instead of and, respectively.

Definition 18 (Consistency and disjointness). Let. U and V are called consistent if

U and V are called disjoint if

Notation 3. By abuse of notation, we often write instead of.

Finally, let’s define a basic operation of the propositional calculus [18] [19].

Definition 19 (Exponential UB). Let. An exponential of U for B is defined by

In particular, if. Note that if.

Remark 7. if and only if

See the next subsection for the definition of.

In theoretical computer science, an “exponential object” corresponds to a computer program that takes input B and produces output U. In the following, we will specify affine loops (i.e., proteins) as an exponential.

4.4. Hom Function of Cat(L)

Definition 20 (The category of {0,1}). Let be the set of two integers. A binary relation “<” is defined on by

Multiplication “∙” is defined on by

Definition 21 (). A -valued function F on L is called contravariant if implies for. We denote the set of all contravariant -valued functions on L by.

Let F and G be -valued functions on L. Then, multiplication of F and G is defined by

We sometimes write FG instead of. The multiplication of is often denoted by (if there is no risk of confusion).

Definition 22 (Hom functions on L). Let. The contravariant Hom function is defined by

Lemma 3. Let. Then, U and V are consistent if and only if.

Lemma 4. Let be the mapping from L into defined by

Then, is injective (up to equivalence). That is, we can regard L as a subset of.

Notation 4. Let. By identifying V with, we set

for.

Lemma 5.

1).

2).

3) Suppose that is defined. Then,

Remark 8. Let. Then if and only if.

Definition 23 (). We denote the set of all contravariant -valued functions on by.

Lemma 6 (Spectral decomposition). Let be the mapping from L into defined by

Then, is injective (up to equivalence). In particular, we obtain a “spectral decomposition

of.

4.5. Loop Representation of Affine Flows

Now let’s describe the loops of a flow of L using the binary relation “<” instead of the lp function.

Definition 24 (One-loop flow). Let. U is called a one-loop flow if

Definition 25 (Generators of a flow). Let. The set of generators of U are a set of one-loop flows such that:

1), and

2) ().

Note that the set of generators consists of a finite number of flows and is uniquely determined by U.

Notation 5 (Direct sum “”). Let. Let be a set of one-loop flows. We write

if and only if

Lemma 7. Let be a set of one-loop flows. Then,

Example 7. Shown in Figure 1(b) left is

where such that ().

Lemma 8 (Direct sum representation of a flow). Let. Let be the generators of U. Let () be a partition of S, i.e.,

Then,

where. In particular,

Lemma 9 (Exponential UB of U for B). Let. Let . Suppose that for some. Then,

Definition 26 (Maximal-cover flow). Let and. V is called a maximal-cover flow of U if

Definition 27 (Co-generators of a flow). Let. The set of co-generators of U are a set of maximal-cover flows of U such that:

1), and

2) if.

Note that the set of co-generators consists of a finite number of flows and is uniquely determined by U.

Definition 28 (Consistent component of co-generators). Let. Let be the set of co-generators of U. Let. The subset defined by

is called the component consistent with. Then, there exist such that

’s are called the consistent components of U.

Remark 9. If is a consistent components of U, then

Notation 6 (Direct product “”). Let. Let be a set of maximal-cover flows of U. We write

if and only if

Lemma 10. Let be a set of one-loop flows. Suppose that for all. Then,

implies.

Lemma 11 (Direct product representation of a flow). Let. Let

be a maximal-cover representation of U. Let

be the consistent components of U. Then,

Example 8. In the case of Figure 1, we have

Example 9. In the case of Figure 4(a), we have

where, , and (). That is,

4.6. The Fusion-Structure Presheaf F on the Sup Topology K

In quantum mechanics, each physical system is associated with a collection of data obtained when measuring some physical property of the system, i.e., the spectrum of an observable quantity associated with the physical property. In our model of proteins, each protein, i.e., “loop”, is associated with the “spectrum” of the measurable quantity, i.e., “shape”.

Remark 10. Discrete differential geometry of triangles [1] provides a mechanism by which the observed result comes about.

Definition 29 (Sup topology K). The sup topology K on L is defined by

Note that

Elements of are called states of the “biological entity” U.

Definition 30 (Fusion-structure presheaf F). Fusion-structure presheaf F on L is defined by

Note that

Recall that is the spectral decomposition of W (See Lemma 6). That is, is the spectoral decomposition of the state W of the “biological entity” U.

Remark 11. As for the sup topology and presheaf, a number of standard textbooks are available, including [17] and [19].

Example 10. In the case of Figure 4(a), we have

where () are defined in Example 9.

Definition 31 (L-spectrum). Let. The L-spectrum of U is defined by

Note that captures all the observable properties of the “quantum system” U of L.

Example 11. In the case of Figure 4(a), we have

5. Integral Flows LA

Here we define the other set LA of flows mentioned in the introduction. Using the binary relation defined on LA, we define the set of -valued functions on LA.

5.1. The Category Cat(LA) of Integral Flows

Definition 32 (The set LA of integral flows). Let be the flow consisting of only unit loops (Figure 6(a)). A flow is called integral if. The set of all integral flows is denoted by LA, i.e.,

A is called the base flow of LA. Loops of integral flows are called integral loops.

Remark 12. “” means that each loop of U is obtained by fusing a set of loops of A. That is, A spawns all flows of LA. In other words, A is a kind of “quantum vacuum”. On the other hand, the terminal object Z does not generate anything because Z contains no loop. That is, Z is a kind of “classical vacuum”.

Binary relations “<” and “” as well as mappings “” and “” are already defined on LA because. For example,

(a) (b) (c)

Figure 6. Integral flows. (a) The base flow A, consisting of only unit loops. (b) The binary relation U < V on LA.

in LA if and only if in L

where (Figure 6(b)).

Definition 33 (Cat(LA)). The category of integral flows is a pair of

1) the set LA of integral flows, and

2) the binary relation < on LA,

Lemma 12. Let be a finite subset. Then,.

Definition 34 (lpA function). Let. The loop function is defined on LA by

Loops of LA are “measurable with unit loops” (i.e., “computable”) because of the proposition below.

Proposition 1. Let. Let. Then, there exists a unique finite subset of unit loops such that

That is, every integral loop is a fusion of unit loops.

Proof. It follows immediately from the definition of “<”. □

Remark 13. Unit loops have the same shape, but behave differently with respect to “+”. That is, they form different shapes depending on their relative placement. Because of the property, the algebra of integral flows is also called “hetero numbers”, i.e., numbers with “heterogeneous” unity elements.

The following proposition gives a characterization of allosteric loops (See Definition 6).

Proposition 2. Integral loops are not allosteric.

Proof. (Sketch of proof) Let. Let. Then, there are a set of unit loops surrounding b. Suppose that there exists and such that is defined, but is not. As shown in Figure 3(c), allosteric regulation is due to a “overhang” of over, i.e., the difference of the “height” of and. On the other hand,’s are at the same “height” because they are surrounding an integral loop. In particular, (the absence of) does not prevent from interacting with b. □

5.2. Hom Function of C(LA)

Definition 35 (). We denote the set of all contravariant -valued functions on LA by. Multiplication “∙” is defined on in the same way as.

Definition 36 (Hom functions on LA). Let. The contravariant Hom function is defined by

Notation 7. Let. By identifying V with, we set

for.

Definition 37 (). We denote the set of all contravariant -valued functions on by.

Lemma 13 (Spectral decomposition).

1) Let be the mapping from LA into defined by

Then, is injective (up to equivalence “”). That is, we can regard LA as a subset of.

2) Let be the mapping from LA into defined by

Then, is injective (up to equivalence “”). In particular, we obtain a spectral decomposition

of.

Lemma 14.

1).

2).

3) Suppose that is defined. Then,

Lemma 15 (Factorization of in). Let. Let such that. Then,

where.

Example 12. In the case of Figure 1, two factorizations of are considered i.e.,

where.

5.3. Closure of Affine Loops

First, we define an operation that replaces the “background” open trajectories of an integral flow with a set of unit loops.

Definition 38 (Localization/unlocalization). Let. A localization of U is the integral flow defined by

An unlocalization of U is the integral flow defined by

where we set if.

Example 13. and.

Using the unlocalization operation, we define a projection of flows of L onto flows of LA, i.e., the “shadows” of affine flows on LA.

Definition 39 (Closure cl(B) of). Let. Let be the set of all the integral flows that is consistent with B, i.e.,

Let be the set of all the minimal elements of, i.e.,

A closure mapping is defined on L by

Note that may contain more than one flow. In this paper, we consider a fixed closure mapping on L. is called the closure (or shadow) of B on LA.

Remark 14. In the introduction, it is implicitly assumed that consists of one loop for any (because of the definition of fusion).

The author does not know the answer to the following questions.

Question 2.

1) Let. Does consist of one flow?

2) Can one define the closure mapping in such a way that consists of one loop for all?

Example 14. Figure 7(d) shows the closure of a flow B1. In this case, and.

(a) (b) (c) (d)

Figure 7. Closure of affine loops. (a) A flow consisting of one loop and open trajectories. Shown below is the top view of the corresponding stacked unit cubes. (b) The only flow of. (c) A fission of the loop of. Note that , where’s are integral flows such that. (d) The closure.

Example 15. In the case of Figure 1,

Note that some pairs of flows have the same closure. For example,

and.

Proposition 3. Let. Suppose that consists of one loop. Then, there exists such that

Moreover, there exist one-loop flows such that

where

Proof. (Sketch of proof) By definition of, there exists such that

and.

Suppose that for simplicity. Let. By definition of “<”, there exist loops and such that

Then,

Remark 15. Suppose that consists of multiple loops, i.e.,

Let such that

and.

Then, there exist loops and () such that

where is a partition of. It follows that there exists such that and

where.

Remark 16. Recall that categories and consists of a set of “objects” and a binary relation “<” on them. However, the closure mapping does not preserve the binary relation, i.e.,

dose not imply.

Lemma 17. Let. Suppose that consists of only one flow, i.e.,

If, then

implies.

Remark 17. Let. If consists of only one flow, then

if and only if

for.

6. Defining Equations of Affine Loops

Now let’s consider the problems mentioned in the introduction step by step. For intuitive understanding, we use loops instead of flows in this section.

6.1. In the Case of Self-Computable Loops

We begin by considering the second half of Problem 2.

Problem 3. Find integral loops such that

where are unit loops, and are fusions of a subset of.

Remark 18. For simplicity, the author uses “=” instead of “” in the introduction.

Definition 40 (Self-computable loops). Let be integral loops. Let be unit loops. Suppose that

is a solution to Problem 3. Then, we call a self-computable loop. is determined uniquely up to the rotational and mirror symmetries by m fusion interactions between.

Example 16 (Loop triangles of size two). Let’s consider an integral loop such that

where are unit loops. That is, there is a fusion interaction between the three unit loops. (Note that there is no interaction between two unit loops.) Figure 8(b) gives a solution to the problem, i.e.,

is called a loop triangle of size two.

Example 17 (Loop triangles of size three). Let’s consider integral loops, , , such that

where are unit loops. That is, there are four interactions between six unit loops. Figure 8(b) and Figure 8(c) give a solution to the problem, i.e.,

is called a loop triangle of size three.

For any positive integer n, loop triangles of size n are similarly defined.

Lemma 18. Loop triangles of size n are self-computable for any positive integer n.

6.2. In the Case of Integral Loops

Now let’s consider the first half of Problem 2. Subsets of a self-computable loop are obtained as a solution to the following problem.

Problem 4. Find integral loops such that

where are unit loops and are fusions of a subset of.

By embedding a given integral loop in a self-computable loop, we obtain a set of defining equations of the integral loop. For example, by choosing a large positive integer n, one can embed the loop in a loop triangle of size n.

Lemma 19. Integral loops are obtained as a solution to Problem 4.

Example 18. By embedding the loops of Figure 1 in a loop triangle of size 7 (Figure 9) at once, we obtain

(a) (b) (c)

Figure 8. Loop triangles. (a) Six unit loops of the base flow A. (b) Loop triangles of size two., , and. (c) A loop triangle of size three. (Recall that the contained loop is considered to be part of the surrounding loop.)

(a) (b) (c)

Figure 9. Subsets of loop triangles. (a) A loop triangle (consisting of) of size 7 and the integral loop c1 of Figure 1, where the position of b1 is colored dark grey. (b) Integral loops of Figure 1, where the position of the corresponding is colored dark grey.

That is,

gives a solution to Problem 2 of the introduction.

6.3. In the Case of Non-Integral Loops

Finally, let’s consider Problem 1 of the introduction.

Problem 5. For a given set of flows, find a flow such that there exists a set of flows that satisfies

Remark 19. For simplicity, the author uses “=” instead of “” in the introduction. The dual version of Problem 5 provides a simpler description of the problem (See below).

Remark 20. In Problem 5, it is implicitly assumed that consists of one loop (because of the definition of fusion).

We denote the set of all the solutions to the problem by

By analogy with quantum mechanics, C1 is called the “output spectral flow”, and are called the “input spectral flows”.

Example 19. Let. Then,

if and consists of one loop.

Example 20. In the case of Figure 1,

Moreover, since, we have

The dual version of Problem 5 mentioned above is given by

Problem 6. For a given set of flows, find a flow such that there exists a set of flows that satisfies

Remark 21. Let. If, then

implies.

7. Discussion

The author has shown that it is possible to describe protein-like shapes using algebraic equations if one takes a category theoretical approach. Along the way, also provided is the basis for describing the relation between various fissions of a shape, i.e., L-spectrum of a flow, albeit briefly.

In the previous attempts [6] [7], the author tried to describe the shape of a loop by considering the relation between all the fissions of the loop, i.e., the L-spectrum of the corresponding flow. Although transitions between states correspond to the addition or removal of unit cubes in the “stacked-unit-cube description”, it is not yet clear how to describe the relation between various fissions of the given shape.

In this study, the author obtains a set of defining equations of a given shape by assuming the existence of integral flows. That is, a fusion-based approach rather than the previous fission-based approach. By referring to the construction of the rational numbers, he describes the shape of loops algebraically in a way that is compatible with the description style of quantum mechanics.

One of the features of the study is that it deals with both protein folding and protein shape simultaneously. This provides us with a simple model of allostery. Another feature is that it expresses both geometrical and algebraic features of protein interactions simultaneously. This allows us to describe the shape of proteins algebraically. The author is not aware of any similar multifaceted studies by other researchers on protein shape. However, they are results for a simple model and he considers this to be a starting point rather than an end point of the study.

In relation to experimental studies, the author hopes that this study will provide options on the underlying general principles that govern the engineering of self-assembling molecules such as proteins. For example, our model indicates that

allostery is a remnant of a giant protein molecule, i.e., interaction between a surface region and the core of the molecule.

It also proposes an approach to the question “How to obtain a well-defined shape with desired properties by folding a chain of subunits” [20]. That is,

first construct the shape with unity elements, then divide it into a set of folded chains.

(Of course, in order to do that, we need to find something that can be used as a unity element.)

However, so far there are no experimental results to support these proposals.

8. Conclusions

Protein design starts with a specification. Since the function of proteins is largely determined by their shape, the specification should include a description of the shape of the target protein. This paper proposes a novel category theoretic approach to describe protein’s shape, i.e., a description of their shape by a set of algebraic equations.

However, this paper considers the approach in a very simplified model. That is, in the two-dimensional case of the mathematical toy model proposed in [1]. Moreover, some notions were only defined and not specifically considered.

The author hopes that this paper will serve as one of the starting points for a variety of related researches. Shown below are some of the future research directions the author considers:

1) Base change.

The base flow A is obtained by partitioning into hexagons. On the other hand, flows of triangles on a closed surface are considered in [20]. By changing the base flow A with another base flow (i.e., another flow consisting of only unit loops), we can consider flows on various surfaces. For example, the surface flow on a rhombic dodecahedron corresponds to a base flow consisting of four unit loops.

2) Fission of unity elements (Higher dimensional cases).

In the case of flows of tetharadrons (i.e., 3-simplices), unity elements are rhombic dodecahedral loop of length 24. Unlike the case of triangles (i.e., 2-simplices), they split into four loops of length 6 [1]. In general, unity elements of flows of n-simplices have fissions if. It is a challenge to describe fissions of unity elements in higher dimensional cases.

3) Characterization of allosteric proteins.

Allostery is at play in all processes in the living cell, and increasingly in drug discovery [2]. Our model indicates that allosteric proteins are produced as a result of the fission of larger protein molecules. That is, in our model, the design of allosteric proteins corresponds to the specification of fission of larger proteins. However, the author does not know how to specify the fission of proteins (i.e., the fission of loops of n-simplices).

4) Symmetry of the shape of proteins.

Here the “symmetry” means the relation between the flows of L-spectrum defined in Subsection 4.6. The question is to what extent the symmetry controls the shape? In particular, for a given L-spectrum, find a flow U that has the same L-spectrum.

5) “Biologic” logic.

The relations between proteins (i.e., loops) are defined using the fusion and fission of proteins, where allostery (i.e., global interaction) occurs naturally [3]. What kind of logic can be made by building theories based on the fusion and fission of proteins?

Our model may be too simple to describe the ecology of actual proteins. But it is said that the simpler the model, the broader the range of applications. The author believes that this paper not only provides a new perspective on protein engineering, but also promotes further collaboration between biology and other disciplines.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1] Morikawa, N. (2017) Discrete Differential Geometry and the Structural Study of Protein Complexes. Open Journal of Discrete Mathematics, 7, 148-164.
https://doi.org/10.4236/ojdm.2017.73014
[2] Tsai, C.-J. and Nussinov, R. (2014) A Unified View of “How Allostery Works”. PLoS Computer Biology, 10, e1003394.
https://doi.org/10.1371/journal.pcbi.1003394
[3] Morikawa, N. (2018) Global Geometrical Constraints on the Shape of Proteins and Their Influence on Allosteric Regulation. Applied Mathematics, 9, 1116-1155.
https://doi.org/10.4236/am.2018.910076
[4] Michalopoulos, I., Torrance, G.M., Gilbert, D.R. and Westhead, D.R. (2004) TOPS: An Enhanced Database of Protein Structural Topology. Nucleic Acids Research, 32, D251-D254.
https://doi.org/10.1093/nar/gkh060
[5] Taylor, W.R., Chelliah, V., Hollup, S.M., MacDonald, J.T. and Jonassen, I. (2009) Probing the “Dark Matter” of Protein Fold Space. Structure, 17, 1244-1252.
https://doi.org/10.1016/j.str.2009.07.012
[6] Morikawa, N. (2003) Research Project: Toward Galois Theory of Protein-Like Objects.
http://www.genocript.com/papers/project_Galois.pdf
[7] Morikawa, N. (2004) Research Project: Protein as Numbers.
http://www.genocript.com/papers/project_numbers.pdf
[8] Rosen, R. (1958) The Representation of Biological Systems from the Standpoint of the Theory of Categories. Bulletin of Mathematical Biophysics, 20, 317-341.
https://doi.org/10.1007/BF02477890
[9] Leteliera, J.-C., Soto-Andradeb, J., Abarzua, G.F., Cornish-Bowden, A. and Cárdenas, M. (2005) Organizational Invariance and Metabolic Closure: Analysis in Terms of (M, R) Systems. Journal of Theoretical Biology, 238, 949-961.
https://doi.org/10.1016/j.jtbi.2005.07.007
[10] Varenne, H. (2013) The Mathematical Theory of Categories in Biology and the Concept of Natural Equivalence in Robert Rosen. Revue d’Histoire des Sciences, 66, 167-197.
https://doi.org/10.3917/rhs.661.0167
[11] Rietman, E.A., Karp, R.L. and Tuszynski, J.A. (2011) Review and Application of Group Theory to Molecular Systems Biology. Theoretical Biology and Medical Modelling, 8, Article No. 21.
https://doi.org/10.1186/1742-4682-8-21
[12] Edelsbrunner, H. (1995) The Union of Balls and Its Dual Shape. Discrete & Computational Geometry, 13, 415-440.
https://doi.org/10.1007/BF02574053
[13] Li, J., Mach, P. and Koehl, P. (2013) Measuring the Shapes of Macromolecules—and Why It Matters. Computational and Structural Biotechnology Journal, 8, e201309001.
[14] Vishveshwara, S., Brinda, K.V. and Kannan, N. (2002) Protein Structure: Insights from Graph Theory. Journal of Theoretical and Computational Chemistry, 1, 187-211.
https://doi.org/10.1142/S0219633602000117
[15] Penner, R.C., Knudsen, M., Wiuf, C. and Andersen, J.E. (2011) An Algebro-Topological Description of Protein Domain Structure. PLoS ONE, 6, e19670.
https://doi.org/10.1371/journal.pone.0019670
[16] MacLane, S. (1998) Categories for the Working Mathematician. 2nd Edition, Springer-Verlag, New York.
[17] Kashiwara, M. and Schapira, P. (2006) Categories and Sheaves. Springer-Verlag, Berlin.
https://doi.org/10.1007/3-540-27950-4
[18] McLarty, C. (1992) Elementary Categories, Elementary Toposes (Oxford Logic Guides 21). Oxford University Press Inc., New York.
[19] MacLane, S. and Moerdijk, I. (1992) Sheaves in Geometry and Logic: A First Introduction to Topos Theory. Springer-Verlag, New York.
[20] Morikawa, N. (2019) Design of Self-Assembling Molecules and Boundary Value Problem for Flows on a Space of n-Simplices. Applied Mathematics, 10, 907-946.
https://doi.org/10.4236/am.2019.1011065

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.