Individual variation evades the Prisoner's Dilemma

Background The Prisoner's Dilemma (PD) is a widely used paradigm to study cooperation in evolutionary biology, as well as in fields as diverse as moral philosophy, sociology, economics and politics. Players are typically assumed to have fixed payoffs for adopting certain strategies, which depend only on the strategy played by the opponent. However, fixed payoffs are not realistic in nature. Utility functions and the associated payoffs from pursuing certain strategies vary among members of a population with numerous factors. In biology such factors include size, age, social status and expected life span; in economics they include socio-economic status, personal preference and past experience; and in politics they include ideology, political interests and public support. Thus, no outcome is identical for any two different players. Results We show that relaxing the assumption of fixed payoffs leads to frequent violations of the payoff structure required for a Prisoner's Dilemma. With variance twice the payoff interval in a linear PD matrix, for example, only 16% of matrices are valid. Conclusions A single player lacking a valid PD matrix destroys the conditions for a Prisoner's Dilemma, so between any two players, PD games themselves are fewer still (3% in this case). This may explain why the Prisoner's Dilemma has hardly been found in nature, despite the fact that it has served as a ubiquitous (and still instructive) model in studies of the evolution of cooperation.


Background
The Prisoner's Dilemma (PD) has generated many hundreds of publications [1,2] spanning the biology, moral philosophy, sociology, economics and political science literature [1,[3][4][5][6]. It received special attention because it sets paradoxical conditions to examine how and when cooperation can evolve even when a rational player is bound to defect. As a result, a large body of literature has grown up around the problem of finding optimal strategies for playing the game but "the preoccupation with new and improved strategies has sometimes distracted from the main point: explaining animal cooperation" [7]. Recent publications have concentrated on how special conditions such as kin selection [3], iterated interactions [8], spatial structuring [9] or indirect reciprocity through reputation [10] can escape the paradox and lead to evolution of cooperation. Other studies pushed the iterated PD concept further to identify optimal strategies when players' decisions are not made simultaneously [11][12][13] or when players have payoffs that are not symmetrical [14].
Our aim is not to criticise these theoretical developments at all. Rather, we suggest a reason for why PD may be relatively rare in the first place, and therefore point out that such special conditions (to permit cooperation despite the PD) need not be invoked if the game is not common in nature anyway. As Clements & Stephens [7] wrote: "Understanding the ambiguities surrounding the Iterated Prisoner's Dilemma has stimulated 14 years of ingenious biological theorizing. Yet despite this display of theoretical competence, there is no empirical evidence of non-kin cooperation in a situation, natural or contrived, where the payoffs are known to conform to a Prisoner's Dilemma". These authors (Clements & Stephens) were interested in suggesting mutualism as an alternative model of cooperative behaviour to the iterated Prisoner's Dilemma (IPD), after finding no empirical evidence for the latter in experimental settings.
Treatments of Prisoners Dilemma, and many other types of games, assume identical payoff matrices (or expected utility), for all players. However, in nature this is highly unlikely. In humans, for example, various studies find outcomes of Prisoner's Dilemma to be dependent on subject-to-subject variation [2,15], including in personality traits [16]. Among animals, we expect similar variation to be dependent on expected reproductive consequences of certain outcomes, which are of more or less value to individuals depending on their size, age, sex, health, condition, social status, coalition membership, expected life span, available mates, food availability and so on. We therefore suggest that the Prisoner's Dilemma may not be a valid model where individual variation in payoffs disrupts the essential structure of the game. Our simulations in this paper test what degree of variation cause PD games to become rare.
Of course, capturing the essence of complicated behaviour using simple models is an important step in theory development, and it is easy to challenge such models by pointing out elements or complications that are missing. However, the Prisoner's Dilemma paradigm has become so prevalent in the cooperation literature that it has perhaps not been challenged enough. The implicit assumption in the PD game that different individuals do not vary in their payoffs for certain interactions is so fundamentally important that we suggest it may pose a critical problem, rather than just "another" complication, for the Prisoner's Dilemma model.
Once varying payoffs in interactions are considered, all such games can be seen to be various special cases of "biological markets" [17], a set of theories which describe the trade of commodities between individuals and, importantly, do not constrain utility functions across individuals. Goods are commonly valued differently by each actor, leading to the focus on "zones of potential agreement" in the economic literature, where transactions occur within a region where individuals approach, but do not reach, their ideal payoff. Indeed, payoffs vary in time as well as between individuals -cooperation may be the best option under certain circumstances but not under others. Such variation is highly dependent on the state of the individual, or in other words, the identity of both you and your opponent. As people, for example, we all have different utility functions depending on numerous biological and extraneous variables. For instance, a loan is typically more valuable to the borrower than the lender, and in negotiating it the participants represent these highly asymmetrical payoff matrices.
The classical pay-off matrix for the Prisoner's Dilemma defines values which satisfy the required inequality T > R > P > S, and R > (T + S) / 2 [18][19][20] (The latter inequality is to prevent the possibility that players collude and split the payoffs). This denotes the temptation to defect (T), which is the best option against the reward for mutual cooperation (R) which is in turn better than mutual defection (P), with the worst payoff coming from the Sucker's payoff (S) for cooperating when the opponent defects. Hence, rational players should always defect regardless of what the opponent does (in both a one shot game and for predicted responses over repeated games due to backward induction from an expected final defection), leading to the question of why cooperation emerges. The game has attracted considerable attention because it seems useful to deduce why humans and animals cooperate in such games when the temptation to defect is the rational solution. However, before investigating this apparent paradox, why should these inequalities (of T, R, P and S) and, thus, this particular game be particularly common? If it is not, then cooperation may not be surprising or paradoxical after all. For example, as soon as the sucker's payoff exceeds the punishment for mutual defection (S > P) and everything else stays the same (i.e. T > R > S > P), one enters a new game, "chicken" (also called the "snowdrift" or "hawk-dove" game), in which one no longer necessarily expects mutual defection [4]. Indeed, if R > T and S > P, then the game becomes "mutualism," in which players are expected to cooperate all the time, regardless of what the opponent does [7].
In support of this view, and despite the voluminous literature, examples of Prisoner's Dilemma in nature are virtually non-existent. One of the heralded potential examples, predator "inspection" in shoaling fish [21] has proved something of a controversy [20]. Various authors have questioned whether the observed behaviour is cooperation maintained by reciprocity in a Prisoner's Dilemma at all, rather than simple by-product mutualism [22] or some other mechanism [23,24]. In general, other candidate models to explain altruism among non-relatives have tended to be ignored [19,25,26], and the usefulness of Prisoner's Dilemma as a paradigm for the study of cooperation is beginning to be brought into question [27].
Firstly then, the particular inequalities of certain games may not be likely anyway. But even where they are, more importantly, we show that variance in payoffs among individuals (i.e. individual-specific utility functions) can, in itself, violate the pre-requisite inequality of the payoffs in the matrix. How high does this variation have to be, before such prescribed games become unimportant?

Model of asymmetry in payoffs
If a payoff has an expected (mean) value of p, but residual error variance Î dependent on the player's utility function, then the actual payoff p' will be: Here we assume a normal distribution of errors, Î ~ N (0, s 2 ), which means actual payoffs will also be normally distributed p' ~ N (p, s 2 ). The two-by-two payoff matrix for examples of Prisoner's Dilemma and mutualism games are depicted below, with a generic matrix involving T, R, P and S, which will be used henceforth ( Table 1).
The variance s 2 , for each of four payoff outcomes may vary independently as s 2 T , s 2 R , s 2 P , s 2 S (where s 2 T is the variance of T, for example). Any one of these can destroy the special conditions for PD. Thus, PD occurs if and only if T > R > P > S, for all two-way interactions when Î T = Î R = Î P = Î S = 0, as in previous literature or, at least, when those variances are small enough not violate the > 1 ratios of T/R, R/P and P/S. Now, allowing variations in payoff, the actual payoff is the expected payoff plus the variation in utility according to the individual. So replacing T, R, P and S for unique payoff symbols that incorporate both the initial starting payoff and the payoff variance, the PD with varying payoffs can be formalised, using the variances as described above, thus: The variance in any one term can destroy the special conditions for PD (as it could any other prescribed fixed payoff game). It is clear then that, depending on the relative values of the four payoffs, some region of a 3-D phase space (with dimensions T/R, R/P and P/S) will describe valid PD games, while other regions will describe other games such as mutualism. However, both are just certain regions within a larger universe of 24 possibilities in which various other types of symmetric games will be encountered [4]. These other undefined regions all represent different potential games so the whole is best described as a market, since different individuals meeting each other have different payoff matrices -or values -for tradable commodities (obtained from cooperation or defection) [17,28]. Prices change according to the relative value to the "buyer" and "investor" and, since these vary, the interactions between members of any one population can move dynamically around the phase space, sometimes in and out of special areas such as those defined as the Prisoner's Dilemma. If such instances are rare though, strategies based on those games are useless.

Analytical solutions to variations in payoffs
A simple varying payoff structure can be analysed intuitively. Consider 1000 payoff matrices with fixed payoffs

Generic
Prisoner's Dilemma C denotes cooperation and D denotes a defection by both row and column player. Payoffs in the generic game denote the temptation to defect (T), the reward for mutual cooperation (R), punishment for mutual defection (P), and the sucker's payoff (S). In Prisoner's Dilemma, T is the best option against R, which is in turn better than P, with the worst payoff coming from S -cooperating when the opponent defects. Mutual defection is the Nash equilibrium since it is better to defect regardless of which strategy the opponent plays. In Mutualism, R is the highest payoff and S approaches T, so cooperation is always the best option regardless of your opponent's move. Mutual cooperation is a strong Nash equilibrium as all other options are less profitable.
plus normally distributed errors (~ N (0, 1)), with variance equal to the spacing between the payoffs in an arbitrary PD matrix (e.g. 3,4,5,6). Neighbouring payoffs will overlap, and therefore destroy rankings for PD, a predictable number of times according to the following logic. A PD payoff matrix exists in which payoff X 1 must be ranked above payoff X 2 (e.g. T and R, or, P and S). If the distribution of the higher payoff X 1 is ~ N (1, 1), and the lower payoff X 2 ~ N (0, 1) then the probability of these two payoffs swapping rank is P(X 2 -X 1 > 0). Subtracting the two distributions X 1 -X 2 gives another normal distribution X 1,2 (~ N (-1, 2)) (means are subtracted, variances summed according to standard theory [29]), with which we can test the exact probability that payoffs X 1 and X 2 swap ranks. This probability is P(X 2 -X 1 > 0), or equivalently, P(X 2 > X 1 ). Since in a normal distribution the probability P(X £ x) = P (Z £ (x -m)/s), where m is the distribution mean, s the standard deviation and Z the standard normal variable [29], the probability the lower payoff is higher than the higher payoff, when both vary as N (0, 1), is: 1 -P (Z < (0 + 1) /Ö 2), Thus any neighbouring pair of payoffs will overlap 24% of the time. However, since there are a total of four payoffs that can overlap in this way, there are three separate inequalities where a rank change can occur. Thus, the value we want to know is the probability of at least one rank change occurring in a 2 ´ 2 matrix. The solution to this turns out to be unattainable analytically, since the probability of each rank change is not independent and thus it requires solving integrals for four distributions conditional on two unknowns. An equation based on Bayes' Theorem cannot therefore be solved. However, the solution can be found instead by simulation.

Simulations of variance in payoffs
If we simulate matrices with imposed variations in payoff values we can calculate the frequency of cases which do not meet the conditions for Prisoners Dilemma. There are two parameters that will affect this frequency: (1) the variance and, (2) the linearity of intervals between the ranked payoffs in the matrix.

(a) Basic simulation
To begin, we consider a (p, p + 1, p + 2, p + 3) matrix to equalise payoff intervals. Each simulation drew 1000 random numbers from a normal distribution with mean = 0 and variance = 1 and added these as normally distributed random errors to the fixed payoffs in 1000 matrices. Next we calculated all T/R, R/P and P/S values and determined the number of matrices which retained the correct rank structure for PD, i.e. those in which all ratios > 1. Means and standard deviations of valid PD matrices were calculated from 5 simulations of 1000 matrices per model.
The payoff ratios can be investigated graphically ( Figure  1), but only in 2-dimensions, so two graphs are shown. For PD, a matrix must satisfy the requirement that all ratios are > 1. There were only 376 valid PD matrices out of 1000 in this case (3,4,5,6). Thus, only 37.6% of matrices were valid Prisoner's Dilemmas (Figure 2), i.e. meeting the condition of all three inequalities. We also simulated results for a typical payoff matrix that is often cited in the literature (0, 1, 3, 5) [1,30]. Only 633.2 of the 1000 matrices (63.3%) were valid Prisoner's Dilemmas.

(b) Simulations with different amounts of variation
Different amounts of variation can be equated with changes in payoff intervals (increasing the former is the same as decreasing the latter). The more different payoffs are from each other, the more random variation would be required to cause rank changes. This is demonstrated in the simulations in which relative variation was increased (on a base matrix of 3, 4, 5, 6), through decreasing payoff intervals, by 1.0, 0.5 to 0.25 increments (Figure 2). The percentage of valid PD cases decreased from 37.6%, to 15.8%, to 9.1%. In other words, a PD matrix occurs only 15.8% of the time when the variance is twice the payoff interval.

(c) Simulations with different linearity of payoffs intervals
The effect is insensitive to changes in absolute values, but not to the linearity of the payoff intervals. In this simulation, we varied one of the parameters, the spacing between payoffs (hereafter "payoff interval"). As payoff intervals increase, the variance of the normally distributed errors becomes relatively less important and cause less rank changes, since overlaps are less likely when payoffs are more separated from each other. Where this is manifested only at one end of the payoff ranking sequence (i.e. where there is non-linearity in payoff intervals, such as 0, 1, 3, 5), the largest intervals are more immune to rank changes, relative to the more bunched up payoffs. Therefore, fewest rank changes will occur in linear sequences with large payoff intervals. This is demonstrated in the simulations in which the linearity of payoffs was distorted ( Figure 2). This was simulated by increasing payoffs from intervals of all 1.0 (matrix 3, 4, 5, 6), to intervals of 1.

Conclusions
Although people commonly look for, or frame, various observed scenarios within a Prisoner's Dilemma, the game will not always occur unless variation in payoffs is low. This assumption is rarely, if ever, tested. We have demonstrated that, starting from a common Prisoner's Dilemma payoff structure, relatively small amounts of variation drastically reduce the proportion of matrices that still conform to Prisoner's Dilemma. Moreover, while variance in payoff can violate the required inequalities of payoffs for a valid PD matrix, a PD game between two individuals is invalid if just a single individual has an invalid PD matrix. Thus, while we cited PD matrix violations, valid PD games are even less likely still (Figure 2), because they are the product of the independent probabilities of each individual having a PD matrix. For example, using the popular (0, 1, 3, 5) matrix where the payoff variance is equal to the smallest payoff interval, a proportion of only 0.633 2 (0.400) games will be valid Prisoner's Dilemmas. This effect is greatly enhanced in the (3, 4, 5, 6) matrix, where the proportion of valid games will be 0.376 2 (0.141). Doubling the variance leaves only 0.158 2 (0.025) of valid PD games. Therefore, relatively small variations may be enough to discard PD as a useful model to apply. Certainly, researchers may need to revise critical significance values in applying statistical methods to determine if PD occurs in a study population.
As an example, animals which cooperate to remove ectoparasites from each other by allogrooming have formerly been thought to be in a Prisoner's Dilemma [31,32]. However, because ectoparasite burdens vary greatly among individuals (in European badgers, Meles meles, some individuals are recorded as having no ectoparasites while others can have over 100), the relative payoffs for different strategies and, therefore, the game individuals find themselves in, will depend on the relative differences between actors [33]. In the case of the badgers, ectoparasites carry lethal diseases, so coming into contact to engage in cooperative allogrooming carries the risk of

Figure 1
Inequalities in simulated Prisoner's Dilemma (PD) matrices. Values greater than 1 indicate that the ratio conforms to that part of a valid PD matrix. All of three ratios must be > 1 for the entire matrix to conform to PD. These are representative graphs for a single simulation on 1000 initial payoff matrices of (3, 4, 5, 6) after adding normally distributed errors of zero mean and a variance of 1.0. Note that ratios can be quite large in cases where small payoffs (< 3.0) became even smaller after adding negative errors (P may be divided by an S approaching 0, in which case the value P/S becomes large). To avoid very high ratio values we increased all initial mean payoff values in our simulations so that the minimum was three. The frequency of valid PD cases is not affected by increasing absolute values of the payoffs. further contamination, as well as the prospect of having one's own ectoparasites removed. This suggests that one's relative ectoparasite burden (in comparison to that of a potential partner in the game) will determine the game and consequently whether one should cooperate or not [33].
Other examples support the idea that payoff variation may be key to resolving the emergence of cooperation in supposed Prisoner's Dilemmas. A recent study of cooperative territorial defence in lions, Panthera leo, described the correct ranking structure for a Prisoner's Dilemma [34,35]. However, cooperators did not retaliate against defectors. Instead, they carried on cooperating with them in future interactions, ruling out the tit-for-tat or Pavlov strategies [8] and implying that they did not experience the Prisoner's Dilemma despite the apparent PD matrix [35]. Interestingly, individual strategies already varied be-tween individuals in the population and, furthermore, experimental manipulation caused individuals to change their strategies in unpredictable ways. That is, they did not conform to the predictions of how they should have reacted according to the Prisoner's Dilemma. Two other popular examples fit our implicit prediction that PD is more likely to occur if payoffs are fixed. First, the single uncontested example of PD in nature arises in the interactions of bacteriophages [36,37], simple RNA structures for which payoff variation among individuals is unlikely because of their uniform architecture (uniform, at least, in comparison to higher organisms). Second, a famous example of PD comes from the trench-warfare of World War I [1], in which opposing lines of troops preferred to cooperate in maintaining a local peace rather than risk death in mutual defection (and reversion to reciprocated attacks). Since payoffs for each individual were derived in terms of risk of death, variation among soldiers' preference orderings was

Figure 2
Simulations of Prisoner's Dilemma (PD) matrices with added normally distributed random error of variance 1.0 in all cases. Bars represent the number of matrices in which all inequalities for PD remained in tact, given as the mean +/-2 standard deviations (from 5 simulations) of valid PD matrices out of 1000. To determine whether inequalities conformed to PD, each of the payoff ratios T/R, P/S and R/P were calculated, from which deviations from the requisite < 1 were used to detect violations of each part of the PD ranking sequence. Diamonds represent the consequent number of valid PD games +/-2 standard deviations, between any two randomly drawn players, expected under the same parameters. Payoffs of 0 or 1 (as in the simulation of the "typical" payoff matrix 0, 1, 3, 5) sometimes became negative after the addition of random error, in which case we checked for inequalities with absolute values rather than ratios. unlikely. But even if not, this was essentially a game between groups, not individuals, so the payoff matrix is also for the group and should tend towards constant mean values due to the Central Limit Theorem.
It is interesting to briefly consider public goods games, given that they are essentially a generalization of PD to several players in a group. In such situations, variation in payoffs among individual players is again likely to affect the outcome and the probability of cooperation. On the negative side, some individuals may not incur such a high cost for initiating widespread defection in subsequent games. On the positive side, other individuals may incur relatively lower costs in contributing to the public good, resulting in a better overall outcome for the group even if some defectors persist.
Prisoner's Dilemmas could be argued to reemerge even where payoffs vary widely, because of the fact that adjacent payoffs will sometimes turn out to be very far apart, rather than being very close together (and overlapping). However, this will not salvage the situation as a Prisoner's Dilemma game, because the crucial criterion remains, exclusively, whether there are changes in the payoffs' rank order. Absolute values of payoffs will not alter the game itself. Of course, if certain payoffs were systematically very large and distant from other payoffs (e.g. if there were a very large temptation to defect, T, relative to R), then this may produce an enhanced selection pressure for strategies that perform as though they are within a Prisoner's Dilemma, even if the game is not valid 100% of the time. But this logic is unlikely to hold, because if payoffs do indeed vary, then the variation itself will prevent such events from being systematic. In other words, overlaps between payoffs will occur as often as large intervals between them, but only the former alter the actual game experienced. Furthermore, simulations have shown that differing values of T do not influence dominant strategies [12].
Further studies are needed to investigate what happens when variable matrices meet each other. For example, if rank changes are common, do players playing one type of game in a population of mixed games tend overall to encourage or discourage cooperation in their neighbours? Or, since optimal choices would be highly dependent on the identity of players, can single dominant strategies still be devised to deal effectively with opponents with variable payoffs (and therefore with unpredictable strategies). Indeed, celebrated strategies such as "tit-for-tat" [1], or "win-stay, lose-shift" [8] may not be robust enough to evolve wherever PD occurs, in the best case, in less than in 100% of the games played within a population.
We end by reiterating the sentiments of Clements & Stephens [7]: "Mutualism and the Prisoner's Dilemma represent end points of a range of conceivable cooperative situations. Perhaps it is time to explore the rich set of possibilities between mutualism and the Prisoner's Dilemma". Certainly, with all the intense research and enthusiastic application of PD to real world situations, we may expect that we should have observed more convincing empirical support by now if it ever were to hold as a paradigm befitting to it's immense theoretical appeal. We suggest that the concept of biological markets provides a good alternative framework for modelling variable games [1,17,28,38]. At the least, future studies should justify the assumption of zero payoff variation, given the importance of its implications. Finally, while we have pointed out that the PD game can be upset by payoff variation, it remains for future research to probe the question of whether, and how, noisy payoff matrices will actually promote or hinder the evolution of cooperative behaviour itself.

Author's contributions
Author 1 (DDPJ) originated this idea and carried out the simulations. Author 2 (PS) aided the theoretical development. Author 3 (JB) carried out the mathematical sections. All authors read and approved the final manuscript.