The Game of Life , Decision and Communication

The game of life represents a spatial environment of cells that live and die according to fixed rules of nature. In the basic variant of the game a cell’s behavior can be described as reactive and deterministic since each cell’s transition from an actual state to a subsequent state is straight-forwardly defined by the rules. Furthermore, it can be shown that the alive cells’ spatial occupation share of the environment decreases quickly and levels out at a really small value (around 3%), virtually independent of the initial number of alive cells. In this study we will show that this occupation share can be strongly increased if alive cells become more active by making non-deterministic sacrificial decisions according to their individual positions. Furthermore, we applied signaling games in combination with reinforcement learning to show that results can be even more improved if cells learn to signal for navigating the behavior of neighbor cells. This result stresses the assumption that individual behavior and local communication supports the optimization of resourcing and constitute important steps in the evolution of creature and man.


Introduction
Conway's game of life [1] is a cellular automaton implemented on an m × n grid of cells.A cell can be in two possible states: dead or alive.Each cell interacts with its eight horizontally, vertically, or diagonally adjacent neighbor cells.At each time step transitions occur, defined by four simple rules.In this article these rules are considered as unalterably written in stone, and therefore called rules of nature, defined as follows: 1) under-population: any alive cell with fewer than two alive neighbor cells die; 2) surviving: any alive cell with two or three alive neighbor cells lives on to the next generation; 3) overcrowding: any alive cell with more than three alive neighbor cells dies; 4) reproduction: any dead cell with exactly three alive neighbors becomes an alive cell.By considering a finite grid size, the proportion of alive cells is called the occupation share since alive cells are considered as occupied, dead cells are considered as empty.By starting the game of life with a randomly chosen set of alive cells, it can be shown for sufficiently large grid sizes that after a while the occupation share stabilizes on a specific number.This number is generally around 3% of all cells of the grid.Figure 1 shows the course of the number of alive cells over 3000 steps for 15 different simulation runs.Each run is an instance of the game of life applied on a 70 × 70 grid (4900 cells) by starting with a 25% chance of each cell to be initially alive.Thus the expected initial occupation share is 25% (1225 alive cells).As a result, in each simulation run the number of live cells strongly decreased and finally leveled out at occupation share values between 1.9% (92 alive cells) and 4.1% (199 alive cells), in average 3.2% (157.6 alive cells).
Furthermore, it seems to be independent of the initial occupation share that the final occupation share is around 3%.This fact is supported by further experiments with different initial occupation shares of 12%, 25% and 50%, 15 simulation runs for each setting.For each experiment the number of alive cells decreased during runtime and finally leveled out at an occupation share of around 3% on average.
Thus there is an interesting fact to observe: no matter how high the initial number of alive cells is, the rules of nature of the game of life cause a strong decrease of alive cells down to a very low level of around 3% until it stabilizes.If all cells are considered as resources and alive cells represent usage of resources, then the occupation share depicts a utilization value.As a consequence, with an occupation share of 3% of all possible resources utilization is fairly poor.
In this article we deal with the following question: is it possible to keep the occupation share and therefore the utilization value on a higher level?If the alive cells have the opportunity to make a pre-"rules of nature" decision to sacrifice themselves (decide to die), is it then possible to keep the occupation share of a higher level?On the first view it sounds like a paradox that sacrifice might increase the occupation share, but it possibly restricts the overcrowding effect.In the next section we will show that the final occupation share can be increased by simple fixed pre-"rule of nature" decisions of sacrifice.

Sacrifice Decisions in Pre-Games
As mentioned in Section 1, an appropriate action that improves the occupation share of the population should not alter the basic game of life rules-they are fixed by nature.Furthermore, the creation of new cells is not allowed and can only happen by the reproduction rule of nature.An action that can be added is the deletion of cells before the rules of nature appliance.Occupied resources would be freed.Since we want cells to make such decisions on their own, this could be seen as sacrifice.

The Non-Deterministic n-Die Game
For that purpose we integrate a pre-"rule of nature" game that an alive cell can play.In such a pre-game the cell can decide if to sacrifice itself and therefore to die, or to stay alive.In the following we introduce a simple variant for such a pre-game, called the non-deterministic n-die game.

Definition 2.1: The non-deterministic n-die game
Given is the game of life as introduced.C L denotes the set of alive cells in the actual round of play, where for all c i ϵ C L the neighborhood N i is defined as follows: N i = {c j ϵ C L |c j is a neighbor of c i }.For a number n ϵ ℕ, 1 ≤ n ≤ 8 the non-deterministic n-die game is defined by the following "three phases" algorithm: 1) Initialization: Create a list A L and include all alive cells c i ϵ C L in a random order; 2) Sacrifice Decision: Rules of Nature: Apply the rules of nature of the game of life.
Note that the steps 1 and 2 constitute the pre-game: all cells with n neighbors sacrifice themselves.Further, this happens in a non-deterministic way: the cells are ordered in a random sequence and each sacrificing cell is also deleted in the neighborhood of all other cells (step 2).Thus e.g. a cell that had initially more than n neighbors can probably have n neighbors when checked in step 2(a).As a consequence, the fact that the list A L is ordered randomly makes the algorithm non-deterministic.All in all, the non-deterministic n-die game realizes a game of life with a pre-game, where cells are acting in a row and decide to die if they have exactly n neighbors at their turn of decision.
Note that the algorithm accomplishes a fixed decision rule for cells: sacrifice yourself if you have exactly n neighbors.Thus, to make a decision cells have only access to information of the direct neighborhood.And this is intended since even the rules of nature are exclusively based on neighborhood arrangements.In our opinion, the property of access restriction to direct neighborhood information is an important requirement for all following pre-games since this property reflects the spatial character of the rules of nature of the game of life.We denote this requirement as the local information rule.

Simulation Experiments and Results
To find out if the non-deterministic n-die game supports a better occupation share, we started experiments for different n values.First, we ascertained from the basic tests that n-die games with n < 3 have, if any, a detrimental effect on the game of life since these cells i) would die anyway by the under-population and ii) are highly possible in a shorthanded area and therefore probably important to support neighbor cells not to die by under-population.Furthermore, n-die games with n > 6 have (almost) no effect on the game of life, since they 1) as good as never emerge during a simulation run and 2) would directly die by overcrowding.As a consequence, we started simulation experiments for n-die games with 3 ≤ n ≤ 6: the 3-die game, the 4-die game, the 5-die game and the 6-die game.For each n-die game we performed 15 simulation runs over 3000 simulation steps.The resulting courses of the number of alive cells over time are depicted in Figure 2.
As shown in Figure 2, the different n-die games cause quite different cells' behavior.While the 3-die game's performance of enlarging occupation share is below the performance of the basic game of life, the performance of the 4-die game is slightly better and the performance of the 5-die game is remarkable better.The 6-die game's performance is roughly as good as the basic game of life (as depicted in Figure 1).

Explanatory Approach of Results
The simulation results of the four different n-die games reveal remarkable differences in performance.To highlight these differences and to compare them with the basic game, Figure 3 depicts the box plots of the final occupation shares for the basic game and all 4 experiments.The key to explain these variances is to analyze the way pre-game decisions might interact with the rules of nature of the game of life.The 3-die game sacrifices cells that have 3 neighbors.These cells would survive because of the rules of nature, thus the sacrifice rule supports a faster dying in comparison to the basic game of life.The speed of decreasing is not only faster, but also the final occupation is with 1.8% almost only half of that of the basic game.
The 4-die game produces three effects: first, it sacrifices cells that would die anyway by overcrowding of the rules of nature, thus there is no acceleration of dying speed like for the 3-die game.Second, in weakly crowded areas it causes neighbor cells to die by under-population, who would survive in the basic game.Thus it indirectly supports an accelerated dying.Third, it clears crowded areas, thus it also rescues other cells that would otherwise die by overcrowding according to the rules of nature.The third effect seems to be much stronger than the second one, since the final average occupation share is with 6.9% more than twice as high as the final value of the basic game.The extremely high variation of these values reveals a strong competition of both effects.
The 5-die game also sacrifices cells that would die anyway by overcrowding of the rules of nature.And furthermore even the other effects, already described for the 4-die game, are expected.But since a cell with 5 neighbors is supposed to be in a really crowded area, the third effect described for the 4-die game is even much more helpful in the 5-die game: many cells in the neighborhood are rescued that would normally die by overcrowding of the rules of nature.This effect is remarkably strong, since the average occupation share after 3000 steps is with 14.7% almost 5 times as high as the one of the basic game.
The 6-die game sacrifices cells that would probably die because of overcrowding anyway.But it has to be taken into account that cells with 6 neighbors are really rare during a simulation run, especially by reconsidering that each cell initially starts with two neighbors on average.Thus the effect on the population is minute and we observe with 3% a similar resulting occupation share as for the basic game of life.
In summary, we showed that a simple pre-game of cells that sacrifice themselves can strongly improve the performance of increasing occupation share.Especially the 5-die game shows a remarkable performance by revealing an occupation share that is with 14.7% almost 5 times as high as the one of the basic game.Nevertheless, we will show that further features like learning and neighborhood communication can improve the performance even more.

Neighborhood Situations and Learning
Within the n-die game cells make their decisions in dependence of their individual circumstances determined by the number of neighbors.But what would happen if they are able to gain more information?What if they can obtain neighbor cells' circumstances and take this additional information into account to make their decision?To answer these questions we applied a more elaborate pre-game that enables cells to request information about circumstances of neighbor cells and make their decision based on their own and a neighbor's situation, which is called the n × m-die learning game.Furthermore, the cells do not have fixed rules like in the previous section, but learn rules by learning dynamics.Thus before we give the definition for the algorithm of the n × m-die learning game, we first introduce the learning dynamics applied in our models, called reinforcement learning.

Reinforcement Learning
Reinforcement learning can be captured by a simple model based on urns, also known as Pólya urns [2].An urn models a probabilistic choice in the sense that the probability of making a particular decision is proportional to the number of balls (in the urn) that correspond to that action choice.By adding or removing balls from an urn after each encounter, an agent's (here: cell's) behavior is gradually adjusted.In this work we apply reinforcement learning in a way that the cells of the game of life learn how to behave in the pre-game.
Let's formalize this: first of all we distinguish between different states of cells by means of the number of alive neighbor cells a particular alive cell has.Thus T = {t 1 , t 2 , t 3 , t 4 , t 5 , t 6 , t 7 , t 8 } is the set of states a cell can be in, where to be in state t i means to have i alive neighbor cells.Furthermore, there are two actions between a cell can choose: to sacrifice itself and therefore to die, or to stay alive.Thus A = {a die , a stay } is the set of possible actions.Finally the cell's basis for a decision also involves a neighbor's state; thus we define a situation as a set of state tuples: Γ = {γ = (t i , t j )|t i ϵ T is the state of an alive cell c, t j ϵ T the state of an alive neighbor cell of c}.
A simple reinforcement learning account RL = {σ, Ω} is defined by response rule σ and update rule Ω.The response rule depicts a probabilistic action choice for a given situation, thus σ ϵ (Γ → Δ(A)).As already mentioned, such a response rule can be modeled by an urn model in the following way: for each situation γ there is an urn ʊ γ filled with balls of any type a ϵ A. To make a probabilistic action choice in a given situation means to draw a ball from the appropriate urn.Thus the response rule σ(a|γ) is defined as follows: we measure that an action is successful for a given situation?In our model we call an action a successful, if it contributes in a positive way to the occupation share of the whole population.To be more concrete: let's say OS a is the occupation share of the next round if action a is performed, OS ¬a is the occupation share of the next round, if not.Now an action a is considered as successful, if and only if OS a > OS ¬a , i.o.w.if the occupation share of the next round is higher with performing action a than without performing action a.Consequently, the update rule is informally defined as follows: if action a is successful in situation γ, then increase the number of balls of type a in urn ʊ γ by one ball.In this way successful behavior reinforces itself since it makes the appropriate choice more probable in subsequent rounds.

The n × m-Die Learning Game
The idea of the n × m-die learning game is similar to the n-die game in the sense that cells can make a pre-"rules of nature" decision to sacrifice.But there are two crucial differences.First, for the n-die game the basis of decision making is an alive cell's state t, but for the n × m-die learning game it is its situation (own state and an alive neighbor's state).Second, for the n × m-die learning game the decision-finding process is modeled by reinforcement learning, while for the n-die game the rules are fixed.
Thus by taking the n-die game as template and incorporating reinforcement learning for decisions grounded on situations, the n × m-die learning game can be defined as follows:

Definition 3.2: The n × m-Die Learning Game
Given is the game of life as introduced.C L denotes the set of alive cells in the actual round of play, N i denotes the set of alive neighbor cells of cell c i .RL = {σ, Ω} is a reinforcement learning account with urns ʊ γ for all γ ϵ Γ.The n × m-die learning game is defined by the following "three phases" algorithm: 1) Initialization: For all γ ϵ Γ: fill urn ʊ γ with 50 balls of type a die and 50 balls of type a stay 2) Sacrifice Decision: For all c i ϵ C: a) pick randomly a neighbor c j ϵ N i and request its state t m b) play action a via response rule σ(a|γ), where γ = (t n , t m ), and t n is the state of c i c) if a = a die : label c i as dead cell d) make an urn update of urn ʊ γ via update rule Ω Delete all as dead labeled cells 3) Rules of Nature: Apply the rules of nature of the game of life In each round an alive cell requests a state of a random alive neighbor cell (step 2(a)) and for the state tuple of own and neighbor's state γ = (t n , t m ) it chooses an action a according to the reinforcement learning response rule σ (step 2(b)).In case that action a die is chosen, the cell gets the label dead and will be deleted after the loop.To complete the reinforcement learning process, the appropriate urn ʊ γ will be updated in step 2(d) according to update rule Ω of the reinforcement learning account.

Simulation Experiments and Results
To find out if the n × m-die learning game supports a better occupation share than the basic game of life and probably outperforms the quite successful non-deterministic 5-die game of Section 2, we started 20 simulation runs for the n × m-die learning game, each over 3000 simulation steps.The resulting courses of the number of alive cells over time are depicted in Figure 4.
While some of the simulations runs performed poorly and the number of alive cells decreases to 100, most of them performed remarkable well with values oscillating around 1500 alive cells after 3000 steps.This mismatch is caused by the fact that cells accomplish a successful learning strategy in most but not all simulation runs.In other words, the experiments yielded a partition of runs that (a) succeeded in developing a successful strategy (successful runs), and (b) runs that failed (failed runs).And as depicted in right illustration of Figure 4 the difference is apparently sharp: failed runs have with an average occupation share of 1.4% a worse performance than the worst n-die game of the experiments in Section 2.1, the 3-die game.On the other hand, successful runs show with an average occupation share of 28.4% an almost twice as good performance as the 5-die game, the by far best n-die game of the experiments of Section 2.1.There is a good case to belief that the cells seem to improve performance just by the fact that they have additional information, not only about their own state, but also about the neighbor state.And if they succeed to learn a successful strategy, they outperform the best fixed strategy, that only considered a cell's own state: the 5-die game.This result raises the question of how such a successful strategy looks like: what kind of strategy must the cells learn to maintain a high occupation share?
As a basic result, all successful strategies that evolved in simulation runs have particular properties in common: they all involve situations for which cells learn a definite decision.This can be illustrated by two sets: a set of situations for which cells definitely sacrifice Γ die = {γ ϵ Γ|σ(a|γ) = 1.0}, and a set of situations for which cells do definitely not sacrifice Γ stay = {γ ϵ Γ|σ(a|γ) = 0.0}.In almost all successful simulation runs the cells' strategies contain the same definite decisions, depicted by the following two sets: Note that both sets reveal the fact that definite decisions are only made in dependence of the cell's neighbor's state and completely independent of the cell's own state, since the own state t i is defined for all possible numbers of alive neighbors 1 ≤ i ≤ 8 for both sets Γ die and Γ stay .Furthermore, two rules for a successful strategy can be derived from these results.These two rules are called neighbor treatment rules, given as follows:

Definition 3.3: Neighbor treatment rules
For the n × m-die learning game a successful strategy can be characterized by the following two rules: 1) Sacrifice if your neighbor has exactly 4 neighbors 2) Never sacrifice if your neighbor has less than 4 neighbors Furthermore, there weren't any other states or combinations that describe salient features that all successful strategies had in common.Thus, these results elicit two interesting conclusions.First, it is much more important for a cell's decision process to include its neighbor's state than its own: cells learn successful strategies by specific definite decisions independent of the own state, but completely dependent of the neighbor's state.Second, successful strategies that follow the two rules of Definition 3.3 achieve an average occupation share of 28.4% and therefore outperform the strategy of the 5-die game by almost factor 2.

Neighborhood Communication
In the n × m-die game of Section 3 the cells are able to obtain the neighbor's state.They obtain the number of neighbors a neighbor has.But this precondition violates the local information rule we postulated at the end of Section 2.1, since cells gain information from observing facts beyond the direct neighborhood.
A possibility to comply with the local information rule and also give cells access to the states of direct neighbors is the following idea: instead of that a cell c i can observe the state of a neighbor cell c j , cell c j can communicate it's state to c i , if c i requests the information.Thus to gain information above the own direct neighborhood, cells have to communicate.This can be modeled accurately with a game-theoretic account, called the signaling game.

The Signaling Game
A signaling game, first introduced by [3], is a dynamic game SG = [(S, R), T, M, A,U] played between a sender S and a receiver R. S has private information: a state t ϵ T. To communicate the state, S sends a message m ϵ M to R, and R responds with a choice of action a ϵ A. For each round of play, players receive utilities depending on the communicative performance.Further, we will consider a variant of this game where the number of messages n = |M| is variable.We denote a signaling game with n messages as SG n .In addition, the set of states T and the set of actions A is given as already defined in Section 3.1.Finally, the utility function U: T × A → ℕ is defined by the way the appropriate action improves the occupation share: U(t, a) = 1, if OS a > OS ¬a , else 0.
Additionally, the cells have to learn how to assign messages to states and actions, thus we combine the signaling game with a reinforcement learning account (see e.g.[4] for a more detailed description): there are urns ʊ t for states and urns ʊ m for messages; and different response rules: the sender response rule σ(m|t) = a(ʊ t )⁄|ʊ t |; and the receiver response rule ρ(a|m) = a(ʊ m )/|ʊ t |.

The n-Messages Signaling Game
As initially remarked, the innovation of the new game is the fact that a cell c i cannot observe the state t k of a neighbor cell c j .Instead of that a cell c i can request this information and c j has to communicate that it is in state t k in terms of sending a message m ϵ M that cell c i has to construe.Furthermore, the decisions i) which message to send for a given state, and ii) how to construe a received message, is not initially given, but has to be learned by reinforcement learning.Thus the n-messages signaling game conforms to the n × m-die learning game and extends it by communication via signaling games.It is defined as follows: Definition 4.1: The n-Messages Signaling Game Given the game of life as introduced.C L denotes the set of alive cells in the actual round of play, N i denotes the set of neighbor cells of a cell c i .SG n = [(S, R), T, M, A,U] is a signaling game as already introduced, RL = {σ, Ω} is an reinforcement learning account with urns ʊ t for all t ϵ T and urns ʊ m for all m ϵ M. The n-messages signaling game is defined by the following "three phases" algorithm: 1) Initialization: a) For all t ϵ T, for all m ϵ M: fill urn ʊ t with 100⁄|M| balls of type m b) For all m ϵ M: fill urn ʊ m with 50 balls of type a die and 50 balls of type a stay 2) Sacrifice Decision: For all c i ϵ C: a) pick randomly a neighbor c j ϵ N i and make a state request for its state t b) c j sends a message m ϵ M via response rule σ(m|t) c) c i plays action a via response rule σ(a|m) d) if a = a die : label c i as dead cell e) make an urn update of urn ʊ t and ʊ m via update rule Ω Delete all as dead labeled cells 3) Rules of Nature: Apply the rules of nature of the game of life Notice that the game conforms to the n × m-die learning game in almost all points, but instead of observing the neighbor's state, it is communicated via a signaling game (steps 2(a) -2(c)), and communicative behavior is updated by reinforcement learning (step 2(e)).

Simulation Experiments and Results
To find out if the n-messages signaling game can at least uphold the performance of the n × m-die learning game, we started experiments for different n-values for the n-messages signaling game: the 2-messages, the 4-messages, the 6-messages and the 8-messages signaling game, for each 20 simulation runs over 3000 simulation steps.As a basic result, all different n-messages signaling games performed roughly as well as the n × m-die learning game: if cells learned a successful communication strategy, the occupation share was around 25% on average after 3000 steps; if not, the occupation share was below 3% on average.Additionally, this results show that the number of messages n did not influence the performance of successful runs.
But the number of messages n did influence the percentage of simulation runs for which a successful strategy evolved.And that in the following way: the higher n and therefore the more messages are at disposal for communication, the more simulation runs were successful.This result is a good index for assuming that the probability that successful communication emerges increases with the number of messages n.The percentage of successful runs in dependence of n is depicted in Figure 5 (left).
In a next step we analyzed what kind of communication strategy turned out to lead to successful communication.As shown in the last section, the two neighbor treatment rules said 1) sacrifice if your neighbor has 4 neighbors and 2) don't sacrifice if your neighbor has less than 4 neighbors.And this rules are incorporated in almost all successful communication strategies.An exemplary successful strategy that evolved (at least in a quite similar fashion) in all successful runs is depicted in Figure 5 (right).The first rule is accomplished by the fact that cells learned exactly one message m x , called the death message, to communicate that the sender cell is in state t 4 and the receiver cell learned to construe it with a die , thus to sacrifice.Furthermore, this message was also quite often used to communicate t 6 (The reason for this way of usage remains to be analyzed and goes beyond the scope of this article).
All the other messages were either a) used to communicate that the sender cell is in one of the other states and the receiver cell learned to construe it with a stay or b) not used at all.Thus in the former case all these messages m i are member of the live-on set M L that is a subset of M/{m x }, that moreover accomplishes the second neighbor treatment rule.If M/{m x } − ML ≠ Ø, then there were also remaining messages not used for any state, called unused messages m u ϵ M/{m x } − ML.

Conclusions
As a basic result, we were able to show that the game of life's performance, in terms of occupation share of alive cells, can be strongly improved by integrating pre-games that give alive cells the opportunity to sacrifice before the rules of nature of the game of life are applied.In Sections 2 and 3 we were able to show that such pre-"rules of nature" decisions were especially successful (in comparison with the basic game of life's occupation share of around 3%) by complying with the following strategies: • 5-die rule: Sacrifice, if you have exactly 5 neighbors (occupation share: 14.7%); • Neighbor treatment rules: Sacrifice, if your neighbor has exactly 4 neighbors; and don't sacrifice, if your neighbor has less than 4 neighbors (occupation share: 28.4%).Further, we argued that the state of the neighbor cannot be directly observed without violating the local information rule of direct neighbor access.Thus in Section 4 we integrated signaling games to obtain the neighbor's state by communication.We were able to show that 1) two messages were sufficient to learn a successful communication strategy, but 2) the more messages are provided, the higher the probability that cells learn a successful communication strategy.All in all, we were able to show that if cells have the possibility to make decisions according to their individual state and especially according to a neighbor's state, they can strongly increase the occupation share.Furthermore, by integrating communication via signaling games, we were able to detect specific strategy patterns that ensure successful communication.Thus, it would be quite interesting to analyze how such characteristics of strategic communication depend on the game of life's rules of nature.Experiments with altered rules of nature can possibly reveal such dependencies and are therefore interesting hints for the impact of environmental features on the evolution of communication.

Figure 1 .
Figure 1.The number of alive cells over 3000 simulation steps of the game of life for 15 different runs.The number of cells decreases from an occupation share of initially around 25% (1225 alive cells) to finally 3.2% (157.6 alive cells) on average over all runs.

Figure 2 .
Figure 2. The number of alive cells over 3000 simulation steps for different n-die games.The number of cells decreases from an occupation share of initially 25% (1225 cells) to average values of 1.8% (88 cells) for the 3-die game, 6.9% (337 cells) for the 4-die game, 14.7% (721 cells) for the 5-die game and 3% (147 cells) for the 6-die game.

Figure 3 .
Figure 3. Box plots of the final occupation share for the basic game and the 4 n-die games.

Definition 3 . 1 :
Reinforcement Learning Response RuleBy defining a(ʊ γ ) as the number of balls of type a in urn ʊ γ and |ʊ γ | as the overall number of balls in this urn, the probability σ(a|γ) to make action choice a in situation γ is given as follows:Next, the update rule of a reinforcement learning account should reinforce successful behavior.But how can

Figure 4 .
Figure 4.The number of alive cells over 3000 simulation steps for the n × m-die learning game: the average occupation share over all runs is 17.6%.As evident in the left figure there is a clear separation of runs with successful and failed learning of an efficient strategy.For the former group the average occupation share is 28.4% (right figure, middle box plot), for the latter group 1.4% (right figure, right box plot).

Figure 5 .
Figure 5.The number of messages n of the game supportively influences the probability of learning a successful strategy (left figure).The strategy that succeeds in the n-message signaling games needs at least two messages to compare between two actions.The death message m x is generally sent in states s 4 and s 6 , as exemplarily depicted by a successful strategy profile (right figure).