Free wheeling: Prisoner's dilemma

The classical prisoner's dilemma (PD) is described below:

Two suspects, A and B, are arrested by the police. The police have insufficient evidence for a conviction. They visit each of them to offer the same deal to both - if one testifies for the prosecution against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the full sentence of 10 years. If both stay silent, the police can sentence both prisoners to only six months in jail for a minor charge. If each betrays the other, each will receive a two-year sentence. Each of the two prisoners must make the choice of whether to betray the other or to remain silent. Neither prisoner knows what choice the other prisoner will make. And hence the dilemma

The dilemma, in summary
1. Both Stay Silent - Both serve six months
2. Prisoner A Stays Silent and Prisoner B Betrays - Prisoner A serves ten years, Prisoner B goes free
3. Prisoner A Betrays and Prisoner B Stays Silent - Prisoner A goes free, Prisoner B serves ten years
4. Both Betray - Both serve two years

Each prisoner has two options: to cooperate with his accomplice and stay quiet, or to defect and betray his accomplice in return for a lighter sentence. The outcome of each choice depends on the choice of the accomplice, but the player must choose without knowing what their accomplice has chosen to do.

If my partner stays quiet, my best move is to betray as I then walk free instead of receiving the minor sentence. If my partner betrays, my best move is still to betray, as by doing it I receive a relatively lesser sentence than staying silent. At the same time, the other prisoner's thinking would also have arrived at the same conclusion and would therefore also betray. So going from the best outcome for an individual, we would both betray.

However, from the perspective of the optimal outcome for the group (of two), the correct choice would be for both to cooperate with each other, as this would reduce the total jail time served by the group to one year total. Any other decision would be worse for the two prisoners considered together. When the prisoners both betray each other, each prisoner achieves a worse outcome than if they had cooperated.

The above game interested game theorists a lot, and from the basic construct a more general form of the game emerged.
There are two players (1 and 2). Each player has two choices in an interaction - "Cooperate” or "Defect". Each player makes his choice simultaneously, ie., they know each others moves only post facto. At the end of a turn, each player receives the amount as described below, depending on what each of them did (cooperate or defect) -
1. If player 1 defects and player 2 cooperates, player 1 gets the Temptation to Defect for a payoff of 5 points while player 2 receives the Sucker's payoff of 0 points.
2. If the reverse happens, the exact reverse in terms of score happen.
3. If both cooperate they get the Reward for Mutual Cooperation payoff of 3 points each
4. If they both defect they get the Punishment for Mutual Defection payoff of 1 point.

This is basically same as the original game without the same terminologies. Here positive points are scored in every case (or zero), which gives the players an illusion that they are not losing. This simulates real life better.

From this the next step is the iterative prisoners dilemma. Here a set of two players play each other repeatedly. This means that players have memories of what the other did last time, and that influences his strategy against the particular player.

Finally the big step was a tournament of several players, each playing the above game with everyone else simultaneously and iteritively. For example if there are 100 players, each is playing the game individually against each of the other 99 repeatedly, say 20-25 rounds or more.

What strategy is best under these circumstances – cooperate or defect? Moreover, in a society where every interaction has a potential to cooperate or defect, what strategy would finally emerge in life? Would people defect from each other to maximize individual gains or would they end up cooperating? These questions had immense value.

Dr. Robert Axelrod, a game theorist got interested in the above, and back in 1984 he decided to conduct a tournament. He invited many players to give their strategies, which were fed in a computer. Each player had to play other players with a strategy for over 100 rounds. The strategies had to be programmed. He received several computer programs, some of which were very complex. Some were very simple such as always cooperate or always defect. Some had random moves, and so on. On the day of the tournament, the strategies were fed in and the computer flagged off. There were over 70 participants with individual strategies.

The end result was startling. The winner was a simple strategy developed by a person called Anatole Rapoport called Tit for Tat. This strategy was to cooperate on the first move and then with each opponent do exactly what the opponent did on the last move. If he cooperated, cooperate, if he defected, defect.

This was a surprise. This simple strategy beat all the complex strategies which tried to look at various patterns in each opponent.
Axelrod concluded that the lessons were that the strategies needed to be:

1) Nice - It will not defect before its opponent does. Almost all of the top-scoring strategies were nice. Defecting strategies faced heavy retaliation, except for the very few “always nice” strategies.
2) Punishing - The successful strategy, though nice, needed to punish dissenting behavior. A non-retaliating strategy is Always Cooperate. This is a very bad choice, as "nasty" strategies will ruthlessly exploit such softies like parasites.
3) Forgiving - Another quality of successful strategies is that they must be forgiving. Tit for tat, for example, punishes dissent, but when the other comes back with cooperation, it again cooperates. This stops long runs of revenge and counter-revenge, which reduce points.
4) Consistent – Strategies that had random elemnts, tended to elicit retaliatory behavior. Consistent strategies worked well in getting cooperation from other strategies.

Axelrod then started thinking of other strategies such as Tit for tat that starts with defect instead of cooperate, tit for two tats, and others. He published his analysis and asked for another participation in a tournament. Once again he got great response, except that now the participants had full report of the last tournamnet. Once again the tournamnet was flagged off. Axelrod himself was looking for new successful strategies.

However, once again Tit for tat came up trumps over all other strategies. It was an overwhelming evidence in favor of Tit for tat.

The whole exercise has direct relevance to life. In life and in business, we are dealing with hundreds of people and we are cooperating or defecting all the time and building a track record and history with each interaction. Hence everyone else is also reacting to our moves. The success of Tit for tat strategies and the lessons drawn by Axelrod provide a guideline for us. To succeed in life we need to elicit cooperative behavior from others, which Tit for tat does beautifully.

Kindly note that one needs to be non-envious, that is not strive to score more than the opponent. Tit for tat will not score more than its opponent. At best they will both be equal. In fact Tit for tat treats them as partners not opponents. That is the secret of its success.

I would encourage the reader to google on prisoner’s dilemma and derive more insights into it.

Free wheeling

Thursday, December 7, 2006

Prisoner's dilemma

1 comment: