Equilibria Immune to Deviations by Coalitions in Infinite Horizon Non-Cooperative Games

Infinite horizon 
discrete time non-cooperative games with observable actions of players and discounting 
of future single period payoffs are a suitable tool for analyzing emergence and 
sustainability of cooperation between all players because they do not contain 
the last period. A subgame perfect equilibrium is a standard solution concept 
for them. It requires only immunity to unilateral deviations in any subgame. It 
does not address immunity to deviations by coalitions. In particular, it does 
not rule out cooperation based on punishments of unilateral deviations that the 
grand coalition would like to forgive. We first briefly review concepts of 
renegotiation-proofness that rule out such forgiveness. Then we discuss the 
concept of strong perfect equilibrium that requires immunity to all deviations 
by all coalitions in all subgames. In games with only one level of players 
(e.g. members of the population engaged in the same type of competitive 
activity), it fails to exist when the Pareto efficient frontier of the set of 
single period payoff vectors has no sufficiently large flat portion. In such a 
case, it is not possible to punish unilateral deviations in a weakly Pareto efficient 
way. In games with two levels of players (e.g. members of two populations with 
symbiotic relationship, while activities within each population are 
competitive), it is possible to overcome this problem. The sum of benefits of 
all players during a punishment can be the same as when nobody is punished but 
its distribution between the two populations can be altered in favor of the 
punishers.


Introduction
The issue of emergence and sustainability of cooperation between human beings or animals (especially in populations whose members are originally quite egoistic) is one of the most interesting and important topics that can be studied using game theoretic models (see [1] for basic results in this field). Infinite horizon discrete time non-cooperative games with discounting of future single period payoffs and observable actions of players are a suitable tool for the analysis of this issue. Infinite horizon removes the problem of the (often artificially introduced) last period, in which a player can deviate from cooperation without a fear of punishment. Discounting of future single period payoffs, besides reflecting a limited patience of players, can reflect also a probability of continuation of the game in the following period. This can be an additional justification of using infinite horizon model. We express players' payoffs as their average discounted single period payoffs.
In non-cooperative games the set of feasible actions of a coalition is the Cartesian product of the sets of feasible actions of its members. Thus, cooperation between players has the form of coordination of their activities (taking into account their effects on payoffs of other players), without creating new opportunities. In coalitional games forming of a coalition can lead to new actions that were not available to its members when they acted separately or as members of another coalition. Therefore, emergence and sustainability of cooperation in an environment modeled by a non-cooperative game are a more robust result than its emergence and sustainability in an environment (with the same players) modeled by a coalitional game. (This precisely holds for super-additive coalitional games. See, for example, [2], p. 258, for their definition.) Moreover, an infinite horizon non-cooperative game (as well as any other extensive form non-cooperative game) allows us to describe precisely the order of actions taken by players. A coalitional game describes only payoff consequences of actions.
An infinite countable repetition of a strategic form non-cooperative game is the simplest case of an infinite horizon discrete time non-cooperative game. In such a game, players' actions in the current period do not affect the sets of feasible actions in the future periods and their payoff consequences. Thus, the strategic environment after different non-terminal histories is the same. In other words, the same infinite horizon discrete time non-cooperative game is played after each non-terminal history. Despite this, players' current period actions can depend on histories of past actions. This makes punishments of deviations from the agreed upon strategy profile possible.
A difference game is a more complex case of an infinite horizon discrete time non-cooperative game. In these games, players' current period actions affect the sets of feasible actions in the future periods and their payoff consequences. Thus, the strategic environment after different non-terminal histories is different. In other words, different infinite horizon discrete time non-cooperative games are played after different non-terminal histories. Even the sets of players who take actions after different histories can be different.
A stochastic game is the most complex case of an infinite horizon discrete time non-cooperative game. In these games, the sets of players' feasible actions and their payoff consequences in the future periods depend not only on players' actions in the current period but also on random factors. Thus, again, different infinite horizon discrete time non-cooperative games are played after different non-terminal histories. The sets of players who take actions after different histories can be different and they can depend on random factors.
Although players' actions that are not observable by other players pose additional challenges for analysis of cooperation, in the present paper we restrict attention to games with observable actions of players.
Players can take actions in continuous time. Nevertheless, (unless they are automata) they react to actions by other players only in discrete time instances. This justifies the use of discrete time games for the analysis of cooperation.
A subgame perfect equilibrium developed by Selten [3] [4], is the standard solution concept used for infinite horizon discrete time non-cooperative games with discounting of future single period payoffs and observable actions. It requires that no player can increase his average discounted single period payoff by a unilateral deviation in any subgame. (In a game with observable actions of players each subgame is a proper subgame.) It does not take into account deviations by non-singleton coalitions. In particular, it does not address the question whether carrying the punishment of a unilateral deviation is in the interest of all punishing players. Of course, a unilateral deviation of a punisher during a punishment is punished in a way that is credible from the point of view of any single player. Nevertheless, deviation by the grand coalition (consisting of all players) during the punishment of a unilateral deviation is not addressed. It can happen that it is in the interest of all players (both the punishers and the punished player) to forgive a deviation and return to cooperation (as if a deviation had not occur). An-ticipation of such behavior can stimulate unilateral deviations. This problem led to development of various versions of a renegotiation-proof equilibrium that require immunity of an equilibrium strategy profile to renegotiations by the grand coalition. We discuss them in the following section.

Renegotiation-Proof Equilibria
Most of the concepts of a renegotiation-proof equilibrium are defined for two-player games and restrict the set of strategy profiles to which the grand coalition can renegotiate. An equilibrium strategy profile is immune to renegotiation to another strategy profile if and only if the latter does not increase continuation payoff of each player. A weakly renegotiation-proof equilibrium developed in [5] (which coincides with an internally consistent equilibrium in [6]) is a subgame perfect equilibrium with the additional property that there are no two continuation equilibrium payoff vectors that are strictly Pareto-ranked. Thus, it requires only immunity to renegotiation to some other continuation equilibrium of the analyzed equilibrium. This restriction reduces requirements on players' computational and negotiation abilities. In order to play any subgame perfect equilibrium the players have to specify continuation equilibrium for all non-terminal histories, including those that do not occur along the equilibrium path. Thus, switching from the continuation equilibrium prescribed as a punishment of a unilateral deviation to a continuation equilibrium prescribed when no player is punished does not require any additional calculation. The additional negotiation effort reduces to reaching unanimous consent with already specified strategy profile. In the case of forgiving a punishment it reduces to reaching the agreement to play as if a deviation did not take place.
The grand coalition's possibilities for renegotiation (as well as requirements on players' computational and negotiation abilities) are taken one step further in a strong renegotiation-proof equilibrium developed in [5] (which coincides with a strongly consistent equilibrium in [6]). It is a weakly renegotiation-proof equilibrium with the additional property that none of its continuation equilibrium payoff vectors is strictly Pareto dominated by some continuation equilibrium payoff vector of another weakly renegotiation-proof equilibrium. Thus, a strong renegotiation-proof equilibrium is always a weakly renegotiation-proof equilibrium but not vice versa.
A renegotiation-proof equilibrium developed in [7] is a Markov perfect equilibrium with the additional property that none of its continuation equilibrium payoff vectors is strictly Pareto dominated by the equilibrium payoff vector of an alternative Markov perfect equilibrium. The alternative Markov perfect equilibrium need not be renegotiation-proof itself. A Markov perfect equilibrium is a subgame perfect equilibrium in Markov strategies. A player's Markov strategy assigns the same action to all non-terminal histories, after which a player takes an action, that belong to the same payoff relevant state, i.e. the sets of feasible actions after them and their payoff consequences are the same.
Renegotiation-proof equilibria in infinite horizon games with more than two players are studied in [8] [9] and [10]. In [8] and [9] only immunity to renegotiation to some other continuation equilibrium of the analyzed equilibrium is required. Nevertheless, the necessary and sufficient condition for immunity of an equilibrium strategy profile to renegotiation is different than in the papers discussed above. In [8] all players who did not deviate should block renegotiation from a continuation equilibrium triggered by a deviation back to the original continuation equilibrium. Thus, an equilibrium strategy profile is immune to renegotiation from a continuation equilibrium triggered by a deviation back to the original continuation equilibrium if and only if the latter does not increase continuation payoff of any player who did not deviate. Since the stage game studied in [8] is symmetric and punishments are symmetric, the latter definition applies also to immunity to renegotiation from one continuation equilibrium triggered by a deviation to another continuation equilibrium triggered by a deviation. Nevertheless, an equilibrium strategy profile is immune to renegotiation from the original continuation equilibrium to a continuation equilibrium triggered by a deviation if and only if the latter does not increase a continuation payoff of each player. Thus, there is a slight inconsistency in the necessary and sufficient condition for immunity of an equilibrium strategy profile to renegotiation.
The concept of a strictly weakly renegotiation-proof equilibrium in [9] replaces the requirement that no two continuation equilibrium payoff vectors are strictly Pareto ranked by the stronger requirement that no two continuation equilibrium payoff vectors are weakly Pareto ranked. An equilibrium strategy profile is immune to renegotiation if and only if the latter does not increase a continuation payoff of at least one player without decreasing a continuation payoff of any other player. Thus, the necessary and sufficient condition for immunity of an equilibrium strategy profile to renegotiation is consistent and it is formulated in the same way for renegotia-tion between any two continuation equilibria.
A strict renegotiation-proof equilibrium in [10] does not impose any restriction on the set of strategy profiles to which the grand coalition can renegotiate. It requires immunity of an equilibrium strategy profile to renegotiation to any strategy profile in each subgame. An equilibrium strategy profile is immune to renegotiation if and only if the latter does not increase a continuation payoff of at least one player without decreasing a continuation payoff of any other player.
When no restriction is placed on the set of strategy profiles to which the grand coalition can renegotiate, the main potential obstacle for the existence of a renegotiation-proof equilibrium is the conflict between (weak or strict) Pareto efficiency of payoff vectors in continuation equilibria triggered by a deviation and effectiveness of the punishment of a deviation. The latter need not be achieved by a strategy profile satisfying the former requirement. Since this conflict arises also in the case of a (strict) strong perfect equilibrium, we postpone its discussion to Subsection 3.1.

Strong Perfect Equilibrium
The most general solution concepts for infinite horizon games with observable actions of players, which are immune to deviations by coalitions, are a strong perfect equilibrium developed in [11] and a strict strong perfect equilibrium developed in [10]. The former requires that no coalition can in some subgame increase continuation payoff of each of its members. The latter requires that no coalition can in some subgame increase continuation payoff of at least one of its members without decreasing continuation payoff of some other member. Hence, each continuation equilibrium payoff vector of a (strict) strong perfect equilibrium is weakly (strictly) Pareto efficient.
A strong perfect equilibrium is sometimes criticized for being too strong. It is argued (as in [12] that studies games with finite time horizon) that deviations by coalitions, which are themselves vulnerable to a deviation by some strict subset of a deviating coalition, need not be punished. Nevertheless, [9], Section 6, shows that in infinitely repeated games with discounting of future single period payoffs all deviations by all coalitions other than the grand one have to be punished. This argument can be generalized for all infinite horizon non-cooperative games with discounting of future single period payoffs and observable actions of players.

Games with Only One Level of Players
In most infinite horizon non-cooperative games with discounting of future single period payoffs studied in the literature all players are at the same level. There is only one population of players. Relations between them are horizontal. Their activities are usually competing, in some cases complementary. Examples include firms producing identical goods or substitutes or complements, or animals of the same or related species competing for food. For generic payoff functions the weak Pareto efficient frontier of the set of single period payoff vectors has no sufficiently large flat portion (i.e., a sufficiently large portion that is a subset of a hyperplane of the vector space with the dimension equal to the number of players).The reason is that a change in a player's single period action changes not only his single period payoff but also single period payoffs of all other active players. If it is the case in infinitely repeated games (e.g., if the weak Pareto efficient frontier of the set of stage game payoff vectors in infinitely repeated game of two persons has no line segment of positive length), then the vector of average discounted payoffs in a continuation equilibrium triggered by a deviation cannot be weakly Pareto efficient. (When the use of correlated strategies is possible in the stage game, the set of stage game payoff vectors is convex. Hence its Pareto efficient frontier is either flat or concave when viewed from the origin.) Therefore, such repeated game does not have a strong perfect equilibrium. In an infinite horizon game with different states and sets of players' feasible actions dependent on them (as in stochastic games) the situation may be even worse. Flat Pareto efficient frontiers of sets of single period payoff vectors at different states need not be sufficient for weak Pareto efficiency of the vector of average discounted payoffs in a continuation equilibrium triggered by a deviation. The latter vector is a convex combination of single period payoff vectors at different states. If the sets of single period payoff vectors are large enough, it can be dominated by a vector of payoffs generated by infinite repetition of some state and single period payoff vector at it.

Games with Two Levels of Players
Strategic interaction between living organisms or organizations at different vertical levels is quite common.
These levels can stem from technological relationships (e.g. firms producing inputs and firms using them) or from other type of division of labor (e.g. from cooperation between different species inhabiting and defending the same territory).
Games with two levels of players and their perfect information are so far rare in the literature. Reference [10] is an exception. It analyzes (in Chapter 3) an infinite horizon game between groups of firms on both sides of the market-the group of producers of inputs and the group of firms buying and using these inputs (in their production process or for sale).
In games with two levels of players there are (at least) two types of actions. Players' actions of the first type have impact both on other players and the environment (in which players' activities take place). Their values are decision variables in maximization of the sum of players' payoffs, both in a single period and over the infinite horizon. Actions of the second type have impact only on distribution of the sum of players' payoffs between them. (In [10] outputs of firms are the first type of actions and prices of inputs are actions of the second type.) It is important that the distribution of the sum of payoffs depends on actions of the second type of all players. There is no player who could impose it by his unilateral action. (In [10] this property of the game follows from the fact that a contract for trade between two players, specifying both delivered quantity of the good and price, is concluded if and only if the seller's and the buyer's proposal coincide.) Thus, games in which one player acts as the boss of all others and sets their remuneration (like in principal-agent games, see [13] for their description) do not belong to the class of games with two levels of players considered in this paper.
Two types of players' actions imply that strict Pareto efficient frontier of the set of single period payoff vectors has an infinite flat portion (i.e., a portion that is an infinite subset of a hyperplane of the vector space with the dimension equal to the number of players).Thus, the main obstacle for the existence of a strong perfect equilibrium (strict strong perfect equilibrium) is removed. This holds also for an infinite horizon game with different states and sets of players' feasible actions dependent on them. Then, for discount factor close to one, the sum of players' payoffs over the infinite horizon is maximized by infinite repetition (after a finite number of periods) of single state or a finite cycle of states. In the latter case, the strict Pareto efficient frontier of the set of payoff vectors over the cycle has an infinite flat portion.
A (strict) strong perfect equilibrium has to be immune also to deviations by coalitions other than the grand one. Conditions, under which a strategy profile with this property exists, can be quite complicated. (For an example, see [10], Proposition 3.4 on p. 36, conditions (3.33) -(3.35).) Nevertheless, they can be simplified when the set of players is the minimal one needed to achieve the maximal sum of single period payoffs (or the maximal sum of payoffs over a finite cycle), i.e. if removal of any player would decrease the sum of single period payoffs (or payoffs over a finite cycle). Such situations arise from evolution of herds or families of animals. Then a coalition may be unable to weakly Pareto improve the vector of payoffs of its members in any subgame by refusing to cooperate with players in the complementary coalition. If it is able to do so, it may be possible to punish its deviation by assigning (for a finite number of periods) most of the maximized sum of players' payoffs to only one level of players.

Conclusion
Although it may be hard for all coalitions to coordinate actions of their members, at least some of them usually can do so. Therefore, it is important to study and apply solutions concepts for infinite horizon non-cooperative games that are immune to deviations by coalitions. Such analysis is important also from evolutionary viewpoint. Evolution can often lead to formation of small groups of cooperating individuals at various vertical levels (e.g. animals in symbiotic relationship), in which no one is redundant. That is, there are no "unemployed" individuals eagerly waiting for somebody else's failure in order to replace him. If a group becomes too large (so that it loses the above property), it usually splits. In such groups it is much easier to achieve coordination that is immune to deviations by coalitions than in large groups with "unemployed" members (like in a market with a large number of firms on both sides and free entry). Thus, the application of game theoretic solution concepts immune to deviations by coalitions can shed an important light on evolution of human society and animal species.