Bioinformatic Game Theory and Its Application to Biological Affinity Networks

The exact evolutionary history of any set of biological taxa is unknown, and all phylogenetic reconstructions are approximations. The problem becomes harder when one must consider a mix of vertical and lateral phylogenetic signals. In this paper we propose a game theoretic approach to constructing biological networks. The key hypothesis is that evolution is driven by distinct mechanisms that seek to maximize two competing objectives, taxonomic conservation and diversity. One branch of the mathematical theory of games is brought to bear. It translates this evolutionary game hypothesis into a mathematical model in two-player zero-sum games, with the zero-sum assumption conforming to one of the fundamental constraints in nature in mass and energy conservation. We demonstrate why and how a mechanistic and localized adaptation to seek out greater information for conservation and diversity may always lead to a global Nash equilibrium in phylogenetic affinity. Our game theoretic method, referred to as bioinformatic game theory, is used to construct network clusters. As an example, we applied this method to clustering of a multidomain protein family. The protein clusters identified were consistent with known protein subfamilies, indicating that this game-theoretic approach provides a new framework in biological sequence analysis, especially in studying gene-genome and domain-protein relationships.


Introduction
Phylogenetic methods are used to reconstruct the evolutionary history of amino acid and nucleotide sequences.The number and diversity of tools for phylogenetic analysis are continually increasing.Classic phylogenetic methods assume that evolution is a tree-like (bifurcating) branching process, where genetic information arises through the divergence and vertical transmission of existing genes, from parent to offspring.However, when there are reticulate evolutionary events, such as lateral gene transfer (LGT) or hybridization of species, the evolutionary process is no longer tree-like.Such evolutionary histories are more accurately represented by networks [1][2][3].The purpose of this paper is to illustrate a game theoretic formulation of evolution which allows us to simultaneously construct an affinity network and a profile for each of the taxa in the network.
The development of analytical tools to generate net-work topologies that accurately describe evolutionary history is an open field of research.Early network construction methods often employed some appropriate notion of distance between taxa.Posada and Crandall [4] explain why networks are appropriate representations for several different types of reticulate evolution and describe and compare available methods and software for network estimation.One of the earliest methods for phylogenetic network construction was the statistical geometry method [5].The authors in [6] use a least-squares fitting technique to infer a reticulated network.Other network construction methods can be found in [2,[7][8][9], each of which is useful in modeling a particular kind of data.
Differentiating between vertical and lateral phylogenetic signals is a challenging task in developing accurate models for reticulate evolution.In order to establish a definition for vertical versus lateral transfer it must be that some component of evolutionary signal recovered from a set of genes being awarded privileged status [10].In the genomic context, vertical signals are assumed to reside within a core set of genes, shared between genomes.The best examples of such sets are the 16S and 18S ribosomal DNA sequences, often used to infer organismal phylogeny [11].
When conflicting phylogenetic signals are combined, relationships amongst taxa that appear to be vertical may in fact be lateral and vice versa, resulting in a set of invalid evolutionary connections [10].This phenomenon is observed, for example, in the thermophilic bacterium Aquifex aeolicus, which has been described as an early branching bacterium with similarities to thermophiles [12] or a Proteobacterium with strong LGT connections to thermophiles [13].
A comprehensive map of genetic similarities which encapsulates the results of all phylogenetic signals is a desirable goal.Lima-Mendez et al. [14] developed a methodological framework for representing the relationships across a bacteriophage population as a weighted edge network graph, where the edges represent the phagephage similarities in terms of their gene content.The genes within the phage were assigned to modules, groups of proteins that share a common function.The authors used graph theory techniques to cluster the phage in the network, and then analyzed the 'module profile' for each of the clusters in order to identify modules that were common to phage within the clusters.
Holloway and Beiko [10] introduced the framework for an evolutionary network known as an intergenomic affinity graph (IAG).An IAG is a directed, weighted edge graph, where each node represents an individual genome and an edge between two nodes denotes the relative affinity of the genetic material in the source genome to the target genome.The assignment of edge weights in the IAG is based on the solutions to a set of linear programming (LP) problems.A noteworthy feature of the IAG is that the LP derivation of the edge weights does not force the relationship between two genomes to be symmetric.
Here we introduce a novel game-theoretic formulation for evolutionary analysis.In this context we use the term evolution as a broad descriptor for the entire set of mechanisms driving the inherited characteristics of a population.The key assumption in our development is that evolution (or some subset of the mechanisms therein) tries to accommodate the competing forces of selection, of which conservation force (e.g.functional constrains) seeks to pass on successful structures and functions from one generation to the next, while diversity force seeks to maintain variations that provide sources of novel structures and functions.In other words, we assume that evolution seeks to maximize these two competing objectives.This hypothesis is naturally modeled through the use of game theory, which is suitable for optimizing competing goals in various applied fields.We will further restrict our game model to a zero-sum game because the zerosum hypothesis closely mirrors one of the fundamental principles in nature-the conservation of mass and energy.That is, an atom, nucleotide, or amino acid used up for conservation will not be available for diversification.From a population genetics view, this can be understood as new mutations will be ultimately either lost or fixed.Also, the zero-sum assumption can also be justified as the first order approximation of all competing objectives in games.As a result, our formulation leads to the simultaneous construction of an affinity network and a profile for each of the taxa in the network.
The paper is organized as follows: Section 2 contains a discussion of definitions and notation for a biological affinity network, as well as the development of our game theory model.Also in Section 2 we describe the construction of the affinity network graph as well as the profiles for each of the network taxa, and apply our technique to a multidomain protein family.We summarize our bioinformatic game theory in Section 3. Lastly, all important and pertinent results and proofs about twoplayer zero-sum games are reviewed and compiled in the Appendix.

Results and Discussion
We begin this section by defining a biological affinity graph for a given set of taxa.Next, we establish the game-theoretic approach to evolution and demonstrate how this can be used to formulate an LP problem for a given reference taxa in the set, whose solution yields the set of evolutionary neighbors for the reference taxa with respect to all the taxa in the network.

Biological Affinity Graphs
A taxon space is defined by a set of biological entities, each of which is in turn defined by a set of components.For example, if the entities in the taxon space are defined to be genomes the components will be genes, while if proteins are the taxa the components will be functional or structural domains.Given the taxon space and the corresponding component space we will construct an affinity network (Figure 1(a)).
We use the affinity graph definition similar to that of [10].A biological affinity graph, , for a given set of taxa is a directed graph such that each vertex in the set , , , n V P P P   uniquely corresponds to one taxon, and all edges have nonzero weights with the incoming edges to any given vertex summing to 1. (The distinction between whether the edges are incoming or outgoing is determined by the construction of the similarity matrix, to be discussed in more detail later.)An example of such a network can be seen in Figure 1(b).
An edge from vertex j P to i P is present if and only if the edge weight ij w is strictly positive.The edge weight ij w is a measure for the affinity of member j P to i P relative to all other members of V .No edge is drawn from any node to itself (i.e., 0 ii w  ) by convention.The network graph is constructed in a taxon-by-taxon approach.For each member of the taxon space we construct a similarity matrix.In these matrices, we compare the amino acid (or nucleotide) sequences of each component as a first order approximation.Suppose there are a total of m components, 1 2 , , , m d d d  , found in the n members of V .Then, as exhibited in Figure 2   A a s j  , of a given reference member i P is an m n  matrix, where   , i a s j is the similarity score of component s in member i P to member j P .This entry may be considered as a proxy for the mutual information between i P and j P with respect to component s in that the higher the value the more similar the pair are in component s .The values in the reference column (the th i column) will not be used in calculation and therefore we arbitrarily set   , 0 i a s i  for all s .As mentioned above, the edge directionality of our network graph depends on the construction of the similarity matrix.If the scores are established using the reference taxon, i P , as the intended parent sequence, the edges with nonzero weights will be the outgoing edges of the corresponding node, representing a likely ancestordescendent directionality.Similarly, if the scores are constructed to permit the inference that the reference taxon is a descendent sequence, the edges with nonzero weights will be the incoming edges to the reference node.If there is no obvious parent-offspring directionality, as is the case in the similarity matrix construction for our analysis, either convention may be used but only to keep track of the model solutions.
Once the edge weights are found for all i P , they in turn, as mentioned above, define the network matrix [ ] ij w  W as shown in Figure 2. The directed network graph is then constructed according to the weight matrix W and vice versa.Furthermore if the matrix is block diagonalizable as explained in Figure 2, then each (irreducible) block defines a distinct subnetwork graph, referred to as a cluster.Therefore, the construction of any directed affinity network graph is reduced to finding the weight vector i w n.

Evolution as a Game
Our idea for the construction of the edge weights ij w is based on the assumption that evolution seeks to maximize both conservation and diversity.First we will view the interwoven relationships of taxa as the result of all evolutionary processes by, to name a few, mutation, recombination and gene transfers (both vertical and lateral), all taking place amongst thousands of individual organisms contemporaneously in space and repeated for thousands of generations, and all driven by some particular selective forces.Second we will view that the net effect of these processes as a non-cooperative, twoplayer game in which one player, or one force, is to maximize the genetic conservation, or self-preservation, so that deleterious changes are eliminated and successful (or non-deleterious) structures and functions are passed on from one generation to the next; and the other player, or the other force, is to maintain the genetic diversity and to maximize evolutionary resources where novel structures and functions can be tried out.We will assume, for this paper and for the purpose of being a primary approximation, that the two goals are polar opposite because conservation as characterized by structural and functional similarity is negatively correlated with diversity which is characterized by the reverse.This first order approximation can also be justified by the principle of mass and energy conservation in nature.That is, genetic materials and natural resources that are devoted for conservation will not be available for divergence (or in population genetics, extinction vs. fixation).In short we will view evolution as a repeatedly played game with the aim to maximize both preservation and divergence simultaneously.We suppose that the net effect of playing this game of evolution is the closeness, or the distance, of one member to other members in the taxon space, and this effect is to be captured by the frequencies, explained below, with which a reference taxon has interacted with the other taxa.
The goal of our game-theoretic model for evolution is to find a Nash equilibrium [15][16][17]  .The conservation-diversification dichotomy interpretation for the y and x probability vectors can be explained as follows.In the case of a pure "diversity" or "component" strategy being played, say 1 s x  , 0 t x  for s t  and i s S  , that is, when component s of the reference member i P is used to measure divergence it is the taxon (or taxa) j P having the largest similarity score   , i a s j that should be picked as the countering "conservation" or "taxon" strategy to maximize the similarity score i E .Here i S denotes the subset of the m components that are present in the reference taxon i P .Thus > 0 j y for these j , and the j y sum to 1.This gives the conservation interpretation of the y solutions.
On the other hand, in the case of a pure 'conservation' or 'taxon' strategy being played, say 1 j y  , 0 k y  for , k i j  and j i  , that is, when j P is used to measure affinity to the reference i P it is those component(s) having the smallest similarity score   , i a s j that stands out and should be picked as the countering "diversification", or "component" strategy to minimize the similarity score i E .That is, these s x are positive and sum to 1, and permit the interpretation for divergence.
As we mentioned before, since all evolutionary processes-all kinds of genetic transfers or otherwise-take place amongst all organisms all the time, the evolutionary state we observe today would be the result of the frequencies with which all pure conservation strategies and all pure diversity strategies are played one event a time, and by our proposed game-theoretical model these frequencies are approximated by the solution y and x to the following min-max problem (see Equation (1)).
The solution to this problem exists and is exactly a Nash equilibrium point (see Appendix).The optimal expected similarity score, i E , is the so-called game value.There are two different ways to find a Nash equilibria (NE) for two-player zero-sum games.One is through a dynamical play of the game to find an NE asymptotically which is modeled by the Brown-von-Neumann-Nash (BNN) system of differential equations.The other is by the simplex method in linear programming.The Appendix gives a comprehensive compilation for all fundamental results of both methods.
Here we present a mechanistic derivation of Nash's map.Nash used this map to prove the existence of NE for all non-cooperative games (Appendix).Our derivation is extremely relevant to our game theory formulation for bioinformatics.It gives a plausible answer to the question how an NE is realized by nature.It shows that evolution or individual organisms need only be driven by their immediate, short term gain in game play payoff to reach a globally attractive Nash equilibrium.Here is an outline of the scenario, which works for not only two but for any number of player types of a game or multiple competing objectives of a process.
In the case that a game is played by large populations of all types repeatedly for a long time so that the time between consecutive plays can be blurred to view the game as played continuously and the play strategy frequency for player typei ,   i x t , changes continuously, where , , , is the mixed strategy probability or frequency vector, 1, 0 corresponds to the j th strategy of the typei players, and ji x can be interpreted to be the fraction of the player typei population that uses its pure strategy j .Consider it at time t and a t  time later,

   
, i i x t x t t   .We would like to understand how We will do so probabilistically.
Let i  be the scalar inertia probability by which an individual of the typei population plays the same strategy with probability is the non-inertia or kinetic probability with which an individual of the typei population chooses or adapts to a particular strategy, including the choice of playing the same strategy at time t t   as at time t because it is advantageous to do so, or because the individual organism is driven to do so due be the conditional play probability vector given that the play is kinetic, 0, 1 , then by elementary probability rules The scalar marginal probability i  and the conditional probability vector i  are derived as follows.
First we assume the advantage for typei player's kinetic strategy switch or adaptation depends on its total (scalar) excess payoff  from the current play frequency (Appendix, also [16,17]), where denotes the current play frequency for all player types with probability vector k x for player type-k .That is, if then all plays are of the inertia kind, 1 i   , and


(Appendix) is the j th strategy's excess payoff from the current play for the player typei , That is, the strategy switch to strategy j for the typei players is strictly proportional to its excess payoff against the total . As for the scalar marginal inertial probability i  , we assume it is a function of the total excess payoff as well as the time increment t  .Specifically, consider the probability equivalently in its reciprocal 1 i  , which represents all fractional possible choices for each inertia choice.The fractional possible choices automatically include the inertia choice itself so that 1 1 i   always holds.Then at 0 t   , we must have this trivial boundary condition  , the default inertia choice only for lack of time to adapt.Assume the fractional possible choices increase linearly for small time increment t  , we have     where 1 represents the inertia choice itself and r represents the rate of increase in the kinetic choices, which may include the choice of maintaining the same strategy play, because of its excess payoff is positive, and all other kinetic strategy adaptations.We assume the rate of the kinetic choice change is proportional to the total excess payoff, , i.e., the greater the excess payoff the more play switches in the population for a greater payoff gain.As a result, which is exactly the Nash map ( [17]) if 1 h  and 1 t   .From Equation (3) we also have and hence the following equivalent system of differential equations after a time scaling by h .This type of equations was first introduced in [18] by Brown and von Neumann to compute an NE for symmetrical zero-sum games and the derivation of Equation ( 4) from the Nash map was first noted in [19].
However, our derivation of the Nash map from the time evolution relationship (2) between inertia and kinetic strategy plays is new to the best of our knowledge.The derivation immediately suggests an evolutionary mechanism as to how a Nash equilibrium point may be realized or reached because the process or the game play is driven by the excess payoff at every step of the way, which can be interpreted as a mechanism for adaptation and a force of selection.In fact, let define the total excess payoff potential, then for any two-player zero-sum game Theorem 2 of Appendix shows that x t of the BNN Equation ( 4).An NE is reached when there is no more excess payoff left, 0 V  .It shows that as a global dissipative system, any mixed play frequency trajectory will always find a Nash equilibrium by following the down gradient of the excess potential function V .That is, in their search of greater excess payoffs, the total excess payoff for the players can never increase along any time evolution of their game plays.A computational implication of this theorem is that a Nash equilibrium of any two-player zero-sum game can always be found by the iterates of the Nash map or the solution to the BNN equations for any initial strategy frequency.This result solves the important problem as to how dynamical plays of a zero-sum game driven by individual players seeking out only localized advantages can eventually and collectively find a globally stable Nash equilibrium.Figure 3 shows, for a prototypical two-player zero-sum game, the trajectories of the Nash map for a small time increment t  and the BNN equation converge to an NE which is a saddle point on the payoff surface of one player and a global minimum on the excess potential surface, which can be viewed as a energy function for the dynamics of the BNN equation.
With the existence problem and the search problem for NE solved, we can employ alternative and practical methods to find them.One standard procedure to solve the min-max problem (1) is to solve the following linear programming problem as reviewed in Appendix (see Equation ( 5)).
There are both commercially available and free packages to solve such LP problems.It is well-known that the y solution and the optimal value i E for the objective function i E v  to the LP problem (5) are exactly the y solution and respectively the game value to the min-max problem (1), and the shadow price or the set of Lagrange multipliers for the LP problem is exactly the x solution to the min-max problem.

Affinity Network and Component Profile Construction
To complete the outline of our method, the edge weight ij w from j P to i P is assigned to be j y from a Nash equilibrium of the min-max problem for node i P .That is, the y solution, which obviously depends on i but the dependence is suppressed for simplicity, for each node i P gives the i th row of the network matrix W of Figure 2. The x solution vector for each i , is used to define the component profile for the node i P .Thus, by our game theoretic approach the edge weights of the affinity network and its corresponding component profiles are the result of both conservation and diversity being maximized.More specifically, a high edge weight in the affinity graph indicates a strong affinity between taxon pairs relative to the others, and a high row weight, s x of node i , indicates that the reference individual, i P , is somewhat unique or dissimilar with respect to the component s compared to the other members of the taxon space.
The game values also yield important information about the affinity network.For example, for two topologically identical clusters, it is their average game values that set them apart, which in this sense the cluster with the higher average game value is a "tighter" or a more similar subnetwork than the latter.
The importance of a Nash equilibrium lies in the property that if we change the affinity frequency vector y from its optimal, then we may find a different diversity frequency vector x so that the corresponding expected similarity score is lower than the game value.Similarly, if we change the diversity frequency vector x from its optimal, we may then find a different affinity vector y so that the corresponding expected similarity score is larger.That is, deviating from the conservation optimal distribution may give rise to a greater diversification, and deviating from the diversity optimal distribution may give rise to a greater conservation.The dynamical state of the evolution, according to our model, is literally sitting at a saddle point of the expected similarity function; and the game value is a balanced tradeoff between reproducibility and diversity, a minimally guaranteed affinity.

An Application to a Multidomain Protein Family
A protein domain is a part of a protein sequence, a structural unit, that can function and evolve almost independently of the rest of the protein.Proteins often include multiple domains.Domain shuffling [20] or domain accretion [21] is an important mechanism in protein evolution underlying the evolution of complex functions and life forms.Figure 4 is a simple example of evolution of multidomain proteins illustrating how multidomain proteins can be evolved from simple single-domain proteins.Multiple evolutionary events including duplication, loss, recombination, and divergence generate complex proteins [22,23].As shown in Figure 4, the evolutionary process of multidomain protein families also contains network relationships.As a consequence of their com-  plex evolutionary history, a large variation exists in the numbers, types, combinations, and orders of domains among member proteins from the same family.
In order to understand relationships of proteins and their functions, it is important to incorporate domain information when we study multidomain proteins.To show how we can apply our game-theory based method to reconstruct protein networks, we studied an example of the Regulator of G-protein Signaling (RGS) protein family.
We extracted a set of 66 (RGS family protein) sequences from the mouse genome.RGS sequences were found by performing a profile hidden Markov model search in HMMER [24,25] using the Pfam [26] families PF00615 (RGS) and PF09128 (RGS-like) as query sequences and with E-value threshold 10.
This RGS sequence set was subsequently used to HMMER search against Pfam database to find other do-mains present in the sequences.This step tries to identify all other domains that coexist with the RGS and RGSlike domains in our RGS proteins.From the sixty six RGS proteins, fifty eight Pfam domains (including RGS and RGS-like) were identified.Next, each of the individual domain sequences from each of the RGS proteins was extracted and used as the query sequence in a blastp sequence similarity search [27] against each and every sequence in the RGS protein sequence set.
The BLAST E-value was used as distance measure between each domain and each of the RGS protein sequences, so that an E-value of 0 is expected when using the domain from a sequence to BLAST search against the sequence itself and large E-values are expected when highly diverged domain sequences are identified.If a domain is not found on some sequence, we use 2870 as the maximum distance since on average this is the maximum possible E-value using BLAST on our data search space.In the end, we obtained all the distances between every domain sequence of every RGS protein and every RGS protein sequence.For the entries in each similarity matrix we used a log-transformed score of the E-value with following [10], where sj  is the E-value obtained for the domain query s d against the subject protein j P .With one similarity matrix as the input, the LP problem shown in ( 5) can be solved.The solutions to the set of LP problems provide the edge weights for the RGS protein affinity network.
In the resulting network, eight clusters were identified within this protein space.Clusters are labeled according to their average game values, in descending order.The first four clusters are exhibited in Figure 5.As mentioned before, the game values yield important information about the clusters present in the affinity network.For example, Cluster 1 and Cluster 4 each include three proteins (nodes) and are topologically identical.However, their average game values are 146.5601and 68.3844, respectively, which in this sense Cluster 1 is a tighter or a more similar subnetwork than Cluster 4.
The domain profile across the proteins in the first four clusters in the RGS affinity network is exhibited in Fig- ure 6.The proteins are grouped by the clusters in the network graph.Clear profile pattern differences exist between the model clusters.For many of the clusters, the proteins contain the same domain and the weights placed on these domains in the LP solution (the x vector) for each of the proteins are similar.The profile also highlights domains that are unique to specific clusters.For example, in the set of clusters shown the Pfam domain PF00631.17 is hallmark to Cluster 4 because it is present in all members of Cluster 4 and not elsewhere.
For a regular phylogenetic analysis of multidomain proteins, usually only sequence information of domains shared across all member proteins (e.g., RGS domain) can be used.A phylogenetic analysis using RGS domain sequences showed phylogenetic clusters largely consistent with the network clusters our method identified (data not shown).However a regular phylogenetic analysis cannot represent information from many other domains that are not shared, nor network relationships as our affinity networks reveal.
As an additional validation of the network clusters, we provide the information for the proteins within each cluster in Figure 7.It clearly shows that different domain architectures are represented in different clusters.Sequence divergence within the same domain type (e.g., RGS_RZ-like domain for RGS 19/20 vs. RGS 17 proteins) is also recognized in separating Clusters 1 and 2. Furthermore we note that isoforms (proteins coded in alternatively spliced transcripts derived from the same gene) of the same gene fall into the same cluster even if some domains are missing in different isoforms as shown in the beta-adrenergic receptor kinase 2 isoforms 1 and 2 in Cluster 3. Therefore, using our game-theoretic framework, we incorporated both sequence diversity and domain information and produced a valid RGS protein network.

Concluding Remarks
Using game theory to study biological problems was introduced by Maynard Smith [29,30].Our formulation of evolution as a game is different from his evolutionarily stable strategy theory (ESS) for animal behavior and conflict.In ESS, there are the literal players in individual animals and the literal strategies that the players use in competition for reproduction and ecological resources.In our formulation however, evolution as a process is modeled as a game in which, the player (or the numerous players) is the selective force which operates everywhere, any time, in every biological process.Whenever an evolutionary event consists of an exact vertical transmission of a piece of genetic information, evolution plays the role of conservation, and otherwise it plays the role of diversification.That is, conservation and diversification are the two inseparable sides of evolution.The strategies of evolution as a process are its products.At the genome level, any enhancement of affinity between two genomes is the play of conservation whereas any widening of difference in a gene or gene composition between the genomes is the play of diversification.In fact, the genome network constructed in [10] can now be exactly replicated by our game theoretical approach.At the protein level, the overall similarity between two proteins is of the conservation play and any diverging domain difference is of the diversity play.In each level, the payoff of the game or process is not literal but the evolutionary similarity or dissimilarity in their bioinformatics broadly construed, which can be measured in terms of various informational distances.
Our bioinformatic game theory gives a plausible mechanistic explanation as to why and how evolution  should sit at an informational Nash equilibrium.In fact, our derivation of the Nash map gives the same explanation to all games, including games of the ESS theory.The evolutionary selection force is local in time, space, and genetic sequences-organisms or biological processes only need to seek out excess similarity for conservation and excess dissimilarity for diversity one step, one place, and one nucleotide a time before collectively, globally, and eventually an informational Nash equilibrium is reached for the competing objectives.This evolutionary scenario is based on the dissipative dynamics of our localized Nash map or the Brown-von-Neumann-Nash equations.In searching for greater information for both conservation and diversity, the total excess information potential cannot increase but eventually converge to a state from which any deviation will not enhance the information for one of the two purposes.Since Nash equilibria are usually saddle points of the expected payoff functions, in this sense we can say that evolution should sit at a saddle point forged by the opposite pulls of conservation and diversity that evolution plays.
There is another fundamental difference from ESS as well.Our zero-sum assumption obeys a basic natural constraint in the law of mass and energy conservation, and as a result the informational Nash equilibrium states are always globally stable.In contrast, the lack of such constraints permits the existence of unstable NE and hence there are evolutionary unstable strategies in Maynard Smith's evolutionary game theory (EGT).In our formulation and analogy, the conservation and diversity strategies embodied by all the evolutionary processes are evolutionarily stable strategies, to borrow one essential term from the EGT.In fact, our bioinformatic game theory seems to give a plausible answer to one of the outstanding questions in biology that why by and large the system of life on Earth is incredibly stable.

A.1. Existence of Nash Equilibrium for Non-Cooperative n-Players Games.
Consider a game of n-players or n -types of players.The set-up is as follows.For player i let , , ,  be the set of its pure strategies, and be the product set of all pure strategies.For a particular play, a typei player uses one of his strategy and we denote by for one play by one player of each type.For a repeatedly played game or one play by a large population of every type, let ji x be the frequency of the typei players who play the strategy ji s .Then 0 ji x  for all 1 i j n   and , , , to denote the frequency or the probability vector and be the product simplex space for all player types, and and their product X are convex, compact, and finite dimensional.For the typei players let   i a s denote the payoff for any play   . We will use a dynamic notation   to separate the typei player's play frequency i i x X  from its opponents play frequencies

 
: . Similarly, we will use to denote any strategy play s with the typei player using strategy i j i s and his opponents using strategy With these notations, the expected payoff for the typei players is be the typei player's j th pure strategy play with ji e having all zero frequencies except for the j th strategy ji s .Then when substituting ji e for i x in the formula above we have the expected payoff for the typei players when all of them switches to the j th pure strategy while its opponents maintain the same plays in frequency: for all 1 and all 1 .
This means that the typei player will not improve its payoff by switching to any pure strategy from its mixed play frequency i x when other players maintain their Nash equilibrium mixed play frequencies.
Theorem 1 ([17]) Every n -player game has an NE.Proof.Nash's proof from [17] is based on a map : T X X  with the property that T has a fixed point by Brouwer's Fixed Point Theorem and the property that a point is a fixed point of T if and only if it is a Nash equilibrium point.The definition of T is as follows.By definition the excess payoff from a mixed play strategy x for the typei player to play his j th pure strategy is , and the total excess payoff (from the mixed play strategy x ) for the typei player is Notice that x is an NE if and only if and all 1 i n   if and only if is defined as follows in component for each player: , , , Let x be any fixed point of T which is guaranteed by Brouwer's Fixed Point Theorem.Notice that the typei player's payoff   x and is a weighted probability average in 1 2 , , , . In fact, we have explicitly . As a result, the corresponding excess payoff is zero, Because it is fixed by . Since this holds for all i it shows x is an NE.The converse is straightforward since the excess payoff from every NE equilibrium x is zero,

A.2. Mechanistic Derivation of the Nash Map
See the main text.

Let
is player y's payoff vector for all pure strategy plays.
Similarly, player x's i th pure strategy play payoff is as before.
Then the excess payoff for player y's j th pure strategy is for all pure strategies in the vector form.Similarly,      be the product simplex.Then the corresponding BNN system of equations is The following result is due to [31]: That is, the convex set of NE are globally asymptotically stable for the BNN equation.
Theorems and proofs of this type were originated by Brown and von Neumann ([18]), and generalized by [31].Although the proof below for this theorem was a specific case of a generalized theorem (Theorem 5.1) of [31], it is worthwhile to present it here to complete a comprehensive review on the theory of two-player zero-sum games.In addition, the theorem above and its proof below is not readily an obvious special case of Theorem 5.1 of [31], and a stand-alone exhibition should prove convenient for future researchers.
Proof.We first consider only the time-derivatives of those Using the fact that 1 . n j j j j j j j j q A p x q p x Ay x A q y q q A p x p x Ay x A q y Applying the same argument for 1  we have the following by grouping below the 2 2 , ,    terms, of which the mixed term vanishes: The last equality is due to the following calculations:  whenever 2 0 j   , and similarly we have we have the following time derivative for V : and hence the proof of the theorem.
As pointed out previously that the derivation of Nash's map and the resulting BNN equations suggested an evolutionary mechanism for a two-player zero-sum game to reach its Nash equilibria.The global stability result above indeed proved the case.The next subsection shows on the other hand a Nash equilibria can be found by means of linear optimization, i.e. by the simplex method from linear programming.

A.4. Linear Programming Method for Nash Equilibria of Two-Player Zero-Sum Games
The theory of two-player zero-sum games was first developed by von Neumann in 1928 [32,33].All results surveyed below are known [15,[32][33][34] but our exposition here seems to be more concise and succinct than all others we know.Our starting point is to assume the knowledge of the simplex method for linear programming.
Here below , T   denotes the vector of all entries equal to 1 for an appropriate dimension depending on the context.The linear programming (LP) aspect of the two-player zero-sum game theory is based on the following theorem which encapsulates the simplex method and can be found in most linear optimization textbooks, c.f. [35]. , i.e. the optimal values for both LP problems must be the same.
We note that the optimal solution x  for the dual LP problem is referred to as the shadow price or the Lagrange multipliers of the primal LP problem and vice versa.Also, the simplex algorithm for the primal problem will simultaneous find both the optimal solution y  and its shadow price x  .The same for the dual problem as well.For convenience, we also need the following result.
Lemma 1 Let S be the simplex defined by 0 As before for a two-player zero-sum game, for which Lemma 1 will be repeatedly applied below.
Because both extreme values are reached by an NE point y and x , the equalities hold.The second equivalence is obvious from the first.□An NE as a solution   , x y to this optimization problem is also referred to as an optimal game solution or a game solution for short, and   Conversely, because , x y are the optimal solutions for the dual pair with the optimal value v , from

 
T T x A w x  1 for all x .In particular,     is a basic feasible point to the LP problem.Since v is the optimal value of the LP solution, we must have u v   , contradicting the assumption < u v  .Exactly the same argument applies to the primal LP problem.□ ,

Figure 1 .
Figure 1.(a) Protein/domain space and protein networks.Each protein is composed of a set of domains (shown in the domain space), and groups of proteins from the protein space form protein families (i.e. protein clusters).Within each protein family there exists a network graph.In this diagram the individual proteins are represented by a linear string of domains and their network edges are exhibited with dashed lines; (b) Method overview.It shows the pipeline for constructing a protein affinity network from a set of domain architectures.

Figure 2
Figure 2. The similarity matrix

Figure 5 .
Figure 5. RGS protein affinity network.This protein network is a subset of the larger RGS sequence set (see Figure 1 for information).RGS proteins contain various domains in different combinations, but all share the RGS domain (Pfam family: PF00615 or PF09128).We identified in total 58 Pfam domains on the 66 RGS proteins.In this graph, the nodes represent distinct proteins, and the edges are directed so that the incoming edge weights of each node sum to 1.The edge color indicates the edge weight, with darker (black) edges indicating high weights and lighter (red) edges indicating low weights.Clusters in the network, represented by different node colors, are labeled in descending order according to their average game value, i.e., Cluster 1 denotes the cluster with the largest average game value.See Figure 6 for the domain profiles for the proteins.

Figure 6 .
Figure 6.Domain profile.In this diagram, each row represents one of the proteins in the protein space (66 RGS proteins), while each column represents a domain (58 PFAM domains in this example).The color bar on the right shows the color scheme for the diversity weights x.A black shaded entry denotes a weight of 1 (or nearly 1), while a cyan shaded entry indicates that the domain was present in the protein but was assigned a weight of 0 for the optimal solution.A blank entry means the absence of a corresponding domain.The clusters in the graph are separated by horizontal magenta lines and labeled along the y-axis.The proteins in each cluster are arranged according to their game value, with the largest appearing as the lowest row in the cluster.

Figure 7 .
Figure 7.The RGS proteins included in each of the top four clusters.Conserved domains included in each protein are identified using the Conserved Domain Database Search (CD-Search) at the National Center for Biotechnology Information (NCBI) website [28].Domains belonging to the RGS domain superfamily (cl02565) are shown in italics, and the parental RGS subfamily (if exists) is shown in square brackets.The domain identified as non-specific (below the domain specific E-value threshold) is indicated with *.
its j th pure strategy, i.e.
the same for player x's.And the total excess payoffs are of the BNN equation, define the following total excess payoff potential function Moreover, if they do the solutions y  and x  must satisfy T T c y b x  

, 2 Proposition 3 1 . 1 .
E x y is referred to as the game value for player y.Proposition The two-player zero-sum game value is unique.Proof.Let     , , , x y x y   be two optimal solutions with game values    , , , u E x y v E x y    , respectively.Then by the result above, for the first inequality and   , x y   is an NE for the second inequality.Since , u v are two arbitrary NEs, we have by the same argument v u The dual LP problem for the primal LP problem of max z u Therefore, the optimal value is the same and the solution of one problem is part of the shadow price of the other.Proof.have the dual LP problem reduced to min z v  The following two theorems complete our compilation of the basic theory for two-player zero-sum games.The first is the LP algorithm for finding NEs and the second is the maximin theorem.That is, ,x v is a basic feasible point for the LP problem min z u x v must be an optimal solution to the LP problem.If not, there is an x and u such that that   , x y is an NE.Similar arguments apply to the primal LP problem.This proves the necessary condition.