Equilibrium Allele Distribution in Trading Populations

This paper derives the conditions under which fitness-reducing alleles can survive in a long-run stationary equilibrium for a trading population, extending the results in Saint-Paul (2002) for arbitrary systems of sexual reproduction.


Introduction
In Saint-Paul (2002), I consider the evolution of the gene pool in a population under alternative economic institutions, and shown that alleles that cannot survive natural selection under autarky can survive under trade, because individuals can specialize in activities so as to avoid the fitness disadvantages associated with these alleles. The results are based on a very simplified representation of sexual reproduction, with only one chromosome (instead of pairs of chromosomes), and only two loci that determine the individual's productivity at two activities that affect fitness. This paper generalizes these results for a more general system of sexual reproduction, with an arbitrary number of chromosomes and loci. Its contribution is twofold. First, it provides a set of assumptions under which one can meaningfully state that some alleles dominate their alternatives and eventually eliminate them in the long-run. Second, it extends the results in Saint-Paul (2002), by characterizing the distribution of alleles for a trading population in a long-run equilibrium ( LRE), defined as a a stationary distribution of alleles which is also an equilibrium in an economic sense.
The central result is that fitness-reducing alleles can survive in a trading population, provided their frequency is not too large. However, the greater the number of loci that matter for fitness, the more stringent the conditions under 1 which these alleles can survive. That means that in the long-run, we expect low alleles to survive only at a relatively small number of loci. Knowing more about the long-run distribution of alleles when their initial distribution does not satisfy the conditions for an LRE would involve analyzing the dynamics, which I do not do here but is an interesting topic for further research.

Notations and genetic properties of stationary populations
A genotype consists of an n−tuple g = (p 1 , ..., p n ), where i = 1, ..., n denotes a particular locus, and p i ∈ {0, 1, ..., K} is interpreted as the number of alleles of the "high type" at locus i (in the actual world where chromosomes come by pairs, one has K = 2). Therefore, there are K − p i alleles of the "low type" at locus i. The set of possible genotypes is denoted by S. We will also denote by g[i] the ith element of g.

The survival function
The survival rate of an individual only depends on its genotype, and is denoted by ϕ(g). Note that the ϕ(g) function is not independent of culture. The opportunity to trade and specialize will dramatically change the ϕ(g) mapping. It is useful to introduce the genetic improvement operators T (i) z , which, for any genotype g such that g[i] < K − z, maps it into another genotype T Thus, having more of a high allele at locus i cannot increase mortality, everything else equal. Note that this assumes that the role played by an allele in mortality has the same sign regardless of what other alleles are present.

2
We will say that a locus i is selective if

The distribution of offsprings
We assume a quite general process for transmitting genes to offsprings, which in particular is compatible with real-world genetics. When genotypes g 0 and g 00 mate, the fraction of their offsprings with genotype g is given by a probability distribution function F g 0 g 00 (g). We shall assume that it satisfies the following properties: This says that on average, the number of high alleles at locus i among offsprings, denoted by u i (g 0 , g 00 ), is equal to the its average between the two parents. For a given pair of parents, the average among actual offsprings will be different from the parental average. However, with a continuum of individuals, the law of large number will apply, and u i (g 0 , g 00 ) will be equal to the population average of the number of H-alleles at i among all offsprings of all couples with genotypes g 0 and g 00 .

Allele independence
This assumption tells us that, among offsprings with the same parental genotypes, the distribution of other genes among those who have the same number of high alleles at locus i, does not depend on that particular number. If that property did not hold, having many good alleles at one locus could in principle be systematically correlated with having many bad alleles at another locus, and this complementarity could sustain a positive amount of mortality-increasing alleles in the long-run, or, conversely, eliminate mortality-reducing ones.

Mixing
For any g 0 , g 00 , i, for all z such that ,there exists g such that g The RHS of (5) is the maximum number of H-alleles at locus i if one inherits K/2 alleles from each parent; the LHS is the minimum number of H-alleles.
That assumptions says that for any number between these two bounds, there is a positive probability for a couple g 0 , g 00 to have an offspring with exactly that number. Furthermore, we can pick up that offspring such that at all other loci, its number of high alleles is between that of its two parents. Loosely speaking, that means that the distribution of offsprings spans all possible cases.

Monotonicity
For any i,any g 00 , any g 0 such that g 0 [i] < K, and any g such that g This assumption says that if instead of g 0 , a genetically improved genotype at locus i mates with g 00 , then holding the alleles at other loci constant, the proportion of H-alleles at locus i improves in a first-order stochastic dominance sense: offsprings are more likely to have a higher umber of H-alleles at i. Formally, applying the T

Demographics
These assumptions allow to write down the demographic evolution equations of each genotype. We denote by N t total population at date t and by x g (t) the fraction of people with genotype g. People mate randomly. There are x g 0 (t)x g 00 (t)N t matches of types g 0 and g 00 at date t. They produce ν offpsrings, and a fraction ϕ(g) of offsprings with genotype g reach maturity. Consequently, x g (t) evolves according to x g 0 (t)x g 00 (t)F g 0 g 00 (g).
Adding all these equations across all possible genotypes we get that It is also useful to define the population frequency of high alleles at locus i : Note that if the gene conservation law holds, then one also has 5 3 Elimination of less fit alleles In this section, I provide the basic results regarding the elimination of less fit alleles. A first lemma, which derives from the random mating and mixing properties, states that if a genotype exists and if a high allele exists in the population at locus i, then we can find another genotype that differs from it only in that it is "improved" at locus i, unless, of course, the initial genotype has the maximum number of H-alleles at i.
LEMMA 1 -Assume the mixing property holds. Assume there exists a steady state, a locus i and a genotype g such that in that steady state, PROOF -First note that because of random mating there exists a positive measure of matches between two arbitrary genotypes, provided these genotypes are in positive measure in the parent population.
If g[i] > 0, the mixing property applied at locus i implies that offsprings of g with itself include T (i) 1 g with positive probability. Assume g[i] = 0. Since h i > 0, there exists g 0 such that g 0 [i] > 0 and x g 0 > 0. We can then iterate the mixing property, by looking at stage k at the mates between g and g (k) , , by applying the mixing property at locus j we know that among the offsprings between g and g (k) , there exists one g (k+1) such that | . In other words, the "genetic distance" between g (k) and g strictly goes down with k. Once we have reached the stage for all j 6 = i, we apply the same procedure to locus i, until we have produced an offspring such that The following key result tells us that genes which increase mortality eventually disappear: PROPOSITION 1 -Assume that one of these two conditions holds: (i) locus i is selective, OR (ii) ϕ is monotonic at i and there exists one genotypeĝ such that xĝ > 0 in steady state,ĝ[i] < K, and ϕ(T Assume (A3) and (A4) holds. Then in any steady state with h i > 0, one PROOF -The frequency of the high allele at i evolves according to In steady state, we have that N t+1 /N t = n, and The term Q(g 0 , g 00 ) = P g ϕ(g)F g 0 g 00 (g)g[i] can be rewritten as follows: That can be rewritten as: This formula rests on the fact that all the genotypes such that g[i] = z can be deducted by applying the transform T Furthermore, the allele independence property implies that for g such that where q z (g 0 , g 00 , i) = P g,g[i]=z F g 0 g 00 (g) is the total fraction of genotypes with g[i] = z among the offsprings of g 0 and g 00 . 1 Note that one must have Hence: This inequality rests on the fact that P k z=0 q z (g 0 , g 00 , i) = 1. It holds with a strict inequality unless all the q z (g 0 , g 00 , i) but one are equal to zero.
We now show that unless h i = 0 or h i = K, there exists a pair of genotypes (g 0 , g 00 , i) such that x g 0 > 0, x g 00 > 0, and (14) strictly holds. First note that if 0 < h i < K, there exists a genotype g 0 such that x g0 > 0, and mixing property implies that for two parents of the same genotype g 0 , there 1 If q 0 (g 0 , g 00 , i) = 0, we can write down the same steps using the smallest value of z such that qz(g 0 , g 00 , i) > 0 as a benchmark. 2 The only other possibility is to only have genotypes such that g 1 [i] = 0 and such that g 2 [i] = K, but random mating and mixing imply that they will produce offsprings such that 0 < g[i] < K. 8 is a positive probability of having an offpsring g such that g[i] = z, for any z between 2 max(K/2, g 0 [i]) − K and 2 min(K/2, g 0 [i]). As long as K ≥ 2 and 0 < g 0 [i] < K, there are more than two values of z that satisfy that property.
Consequently, there are at least two strictly positive values of q z (g 0 , g 0 , i), and one can take g 0 = g 00 = g 0 .
Thus, if 0 < h i < K, it must be that there exists a pair (g 0 , g 00 ) such that x g 0 > 0, x g 00 > 0, and (14) strictly holds.
Alternatively, consider the case where ϕ is monotonic. Then (14) also holds.
From (14), we get that Once again, there exists a pair (g 0 , g 00 ) such that x g 0 > 0, x g 00 > 0, and (15) strictly holds. The RHS can be rewritten where the first step derives from (13) and the second one from (12).
Inequality (15) means that the fitness of the high alleles in the gene pool of the offsprings of g 0 and g 00 is higher than the average fitness of the offsprings as individuals, because those with more H-alleles at i live longer. In order to get that, the allele invariance property is needed. Otherwise, it could be that the offsprings of g 0 , g 00 that have a high g[i] have a lower fitness than the others because they are systematically poorly endowed at other loci.
Going back to (11), we see that where the strict inequality comes from the fact that Q(g 0 , g 00 ) > S(g 0 , g 00 ) for at least one pair (g 0 , g 00 ) such that x g 0 x g 00 > 0.
We now havẽ where we have applied gene conservation and R(g 0 , g 00 , z) is defined as R(g 0 , g 00 , z) = X g F T (i) z g 0 ,g 00 (g)ϕ(g).
Observe that R can be rewritten as Iterating the monotonicity property, we find that Φ(v, z) is nondecreasing in z, while Φ(K, z) does not depend on z. We then have that Since ϕ is monotonic, the term in brackets is nonpositive. Thus, the sum is nondecreasing in z, while the last term is constant in z. Therefore, the LHS is nondecreasing in z, for any g such that g[i] = 0. Summing this property across these g's, we also find that R(g 0 , g 00 , z) is nondecreasing in z. Roughly, that property means that the average mortality of offsprings improves when one parent is genetically enhanced at locus i. The monotonicity property is needed to get that. Otherwise, it could be that parents with more H-alleles at i, everything else equal, have an F g 0 g 00 systematically biased toward high-mortality genotypes.
Let us now go back to (17), which we can rewritẽ For a given g 00 , we have that P g 0 ,g 0 [i]=0 P k z=0 x T (i) z g 0 = 1, that 1 2 (z + g 00 [i]) increases with z and that R(g 0 , g 00 , z) weakly increases with z. Thus, once again, we have the following inequality: Consequently, where the steady-state condition (10)has been used to derive the first term.
By virtue of (16), (3) and (6), the last term in that formula must be equal tõ h i /2, so thath i ≥ nhi υ . (16) then implies that h i > h i , which is a contradiction. Hence, it must be that either h i = 0 or h i = K. Q.E.D.
The last set of inequalities tell us that since parents who have a greater g 0 [i] have children with a higher fitness, these parents' children tend to increase the survival rate of the high allele at i relative to average. Since, in addition, the survival rate of the high allele at i among their children is greater than their children's average survival rate, these two effects together imply that the fitness of the high allele at i is strictly higher than average. But that cannot be in steady state, unless ϕ i = 0 or K.

12
We now describe how an individual's genotype g affects his/her productivity at various activities, depending on the ecomic setting.
The alleles present at a given locus i determines the individual's productivity at a corresponding activity denoted by the same index i. This productivity is a strictly increasing function f i (g[i]) of g[i], the number of H-alleles at locus i. Any individual has a total time endowment equal to 1. The time allocation constraint of genotype g is therefore given by where v i is the individual's output in activity i.
Finally the individual's fitness is where y i is the individual's consumption of activity i, and u is the "utility function", which is concave in each argument, and satisfies the "Inada conditions": Proof -Type T (i) 1 g has a more favorable time budget constraint than type g. Therefore, it achieves a higher fitness. The rest follows from the previous subsection. Q.E.D.
Note that the case h i = 0 is not of interest: it means that the high allele does not exist at that locus.

Trade
Let us now look at the trade case. Each good i is traded at price p i . We assume the following normalization for the price vector (p i ) People allocate their time between the various activities so as to maximize their income R(g) = P n i=1 p i v i , subject to the time allocation constraint (18). Their demand vector is the one which maximizes u subject to their budget constraint: Types with lower incomes must achieve lower fitness and therefore disappear in the long-run.
Furthermore, ϕ must be monotonic at all loci. The reason is that the vector (v 1 , ...., v n ) supplied by a genotype g can also be supplied by genotype T (i) 1 g. On the other hand, all loci need not be selective, as genotypes with fewer H-alleles at locus i may achieve the same income as fitter genotypes, by just specializing.
Define a long-run equilibrium (LRE), as a stationary state such that the economy is in equilibrium, i.e. each genotype sets its supply and demand as just described, and markets clear for each good. The following proposition generalizes the results derived for the two loci case in Saint-Paul (2002).
PROPOSITION 3 -(i) In any LRE such that h i > 0, ∀i, a given type only supplies goods corresponding to loci in their genotype where they have the highest (ii) In any LRE such that h i > 0, the price vector is p* = (p * 1 , ..., p * q ) such that (iii) In any LRE, there exists a locus j such that ϕ j = K, i.e. allele H is fixed at locus j.
Proof of (i) -Iterating the mixing property with appropriately chosen parents, one can easily show that if h i > 0, ∀i, in steady state there exists a strictly positive supply of genotypes with a arbitrary, strictly positive number of Halleles g[i] at each locus i. In particular, there exists a strictly positive mass of the best genotype g max : x g max > 0. Next, note that if R(g 0 ) > R(g), then genotype g 0 achieves higher fitness than g, and hence ϕ(g 0 ) > ϕ(g).
Assume there exists a genotypeĝ such that v l (ĝ) > 0 for l such that , ..., v n (ĝ)) achieves a strictly higher income level and is feasible (i.e. satisfies (18) . But, given that ϕ is monotonic, Proposition 1, under assumption (ii), would then imply that ϕ i = K, which makes it impossible forĝ to exist. Consequently, any type g only supplies goods where it has an H-allele.
Proof of (ii) -The price vector defined by (20) is the one which makes type g max indifferent between all activities. Assume there exists an LRE with a different price vector. Then there exists a pair of goods (j, k) such that and v j (g max ) = 0 since more income is yielded for type g max by offering good k than good j.
Since u satisfies the Inada conditions, the demand for good j is strictly positive; since g max does not supply good j, there exists g 6 = g max such that x g > 0 and v j (g) > 0. By virtue of (i), g[j] = K. Furthermore, g[k] < K, otherwise g would prefer to supply k instead of j as well.
The income of type g is R productive than g at all activities. The supply vector where the last inequality comes from (21). But, this cannot hold since it again 1ĝ ) > ϕ(ĝ). Furthermore, as x g > 0, iterating Lemma 1 implies that xĝ > 0. Monotonicity of ϕ then implies that (ii) in Proposition 1 is satisfied.
Consequently, h k = K. But that contradicts the requirement that g[k] < K.

Q.E.D.
Proof of (iii) -Suppose not; then by iterating the mixing property with appropriately chosen parents, one can prove that x gmin > 0. But that contradicts (i). Q.E.D.
The preceding proposition tells us what properties an LRE must necessarily have, but does not tell us whether an LRE exists and whether, as in the preceding analysis, one can construct equilibria with a positive level of some L-alleles.
We now establish a result which tells us that an LRE exists with a strictly positive proportion of L-alleles, provided these alleles are not too frequent.
Then there exists an LRE with a distribution {x g } of genotypes if and only if this distribution satisfies the following property: Proof -We first prove that this condition is necessary. The RHS of (22) is the total time supplied by genotypes inS (relative to the population's total time). Proposition 3, (i) implies that it must be allocated among goods i such that g[i] = K, i.e. among goods in S. It also implies that in any candidate equilibrium, income per capita (equal to the income of any genotype) must be equal to . Thus, D i is the per capita amount of good i consumed and produced in any candidate equilibrium. The LHS of (22) is therefore the total time input needed to produce all the goods in S. It must be greater than or equal to its RHS, since genotypes inS cannot produce any other good. Otherwise, supply would exceed demand. Note that (22) applied to S = ∅ implies that one H -allele is fixed (the LHS is then the total supply of all genotypes g such that g[i] < K, ∀i). Also, (22) applied to S = {1, ...n} boils down to Walras' law, since it is equivalent to P q i=1 p * i w i (p * ) ≥ 1, and by Walras law P q i=1 p * i w i (p * ) = 1. Let us now prove sufficiency. In order to do so, we construct a set of functions m i (g), representing the share of time of genotype g devoted to activity i, such that: 17 If we are able to construct such functions, then this is indeed an equilibrium, since supply equals demand for all goods, and since the price vector in (20) implies that a genotype is indifferent between supplying all the goods at which it has K H-alleles.
To construct the m i (g), we use the following algorithm. We start from any arbitrary allocation m (0) i (g) satisfying (23) and (25). This defines the initial stage. 3 Then we move from stage (k) to stage (k + 1) as follows. At the beginning of stage (k), the set {1, ...n} can be partitioned into three subsets: That is, those goods for which supply equals demand, those for which there is excess demand, and those for which there is excess supply. Note that since That is, people who do produce goods in H A cannot produce goods in H B .
For any good i, let Clearly, one has g[i] = K, for all g ∈ G(i). We then have .
This strict inequality comes from the fact that H − is non-empty. Furthermore, This is because if g ∈H A and m , which will be true provided ∆ = P g∈G(i 1 ) m (k) i q (g)x g . In such cases the new allocation restores equilibrium in market i 1 (resp. i q ) (ii) Or, the chain (i 1 , ...i q ) and its associated chain of genotypes (g 1 , ...g q−1 ) no longer satisfy Q; that is the case if ∆ = m (k) i l (g l ) for some l, in which case m (k+1) i l (g l ) = 0. In such a case, we have constructed a new allocation such that P g | {i, m i (g) = 0} | has increased by at least one unit, and which satisfies (23) and (25).
Thus, at each stage, the quantity P g | {i, m i (g) = 0} | + | H Clearly, conditions (22) are pretty stringent, so that it is not straightforward to construct an equilibrium. However for x g max close enough to 1, i.e. x g small enough when g 6 = g max , they are clearly satisfied, since x gmax appears on the RHS only for S = {1, ...q}, in which case (22) is always satisfied with equality, due to Walras' law: P q i=1 Di z Hi = P q i=1 p * i f i (p * ) = 1 = P g x g . Therefore there always exist equilibria with a strictly positive fraction of genotypes with L-alleles, provided this fraction is small enough.
Note that the greater the number of loci, the greater the number of conditions that must hold. Intuitively, that suggests that the equilibrium fraction of L−alleles must become smaller. Intuitively, if the initial distribution of alleles in the population is such that (22) is violated, we expect a number of H-alleles to eventually become fixed, which is equivalent to a reduction in n. The process 22 would continue until n is small enough for the number of relevant activities not to be too large, so that (22) holds.