Researchers have struggled for years to understand the conditions under which cooperation can emerge. The applications of such research can be found in industry, in society and in nature. As an applied mathematician the tool used in my research is a game called the prisoner’s dilemma . It’s origins go back to the 1950’s and it has been an ongoing research topic until today.
The prisoner’s dilemma is a two player game where the players can choose between two strategies, cooperation and defection. Since the 1980’s breakthrough, many are in search of the dominant strategy of the iterated version of the game; the iterated prisoner’s dilemma. Many explore not only the strategies themselves but also what makes them dominant and robust.
Since 2016 as part of my PhD, I was given the chance to work with a team of talented people on training a set of complex strategies and assessing their dominance and robustness. Some of the first outputs from this collaboration include:
- Evolution Reinforces Cooperation with the Emergence of Self-Recognition Mechanisms: an empirical study of the Moran process for the iterated Prisoner’s dilemma,
- Reinforcement Learning Produces Dominant Strategies for the Iterated Prisoner’s Dilemma.
The first paper, describes how several optimisation methods, such as genetic and particle swarm algorithms can be used to train dominant strategies of the iterated prisoner’s dilemma. The performance of the trained strategies were verified through tournament simulations. Most of the opponent strategies are from the literature. The second paper follows a similar approach but this time the robustness of the trained strategies is explored through an evolutionary process, called a Moran process.
In the first paper, it can be seen that iterated prisoner’s dilemma strategies can be represented with several different methods. Lookup tables, finite state machines and neural networks are all valid presentations of the strategies. One of the most beneficial ways is using finite state machines. A finite state machine, allows you to determine a player's next move by following a map of actions.
The image above is one of the trained strategies, an 8-state strategy, described in the second paper, represented using a finite state machine. Transition arrows are labelled O/P where O is the opponent’s last action and P is the player’s response. Note that the strategy’s first move, enters state 1, is always cooperation.
Drawing the finite state representation of the trained strategies found in both articles has been one my contributions to these papers. The strategies were given to my in python code with the following format:
Note that writing the strategies this way is in line with how finite state machines strategies are encoded in the Python package Axelrod used in this work, see documentation.
As a software developer I am very comfortable with the programming language Python. So on my first attempt to draw these strategies I used a Python tool called networkx. Networkx, allows me to generate a simple graph to find the best layout, using the library's already defined layouts.
All of this leads to the Tikz code below:
Note that the labels for the states of the finite state machine begin from 1 and not zero.
Furthermore, a small test was written to ensure that the transitions and actions of the strategies source code corresponded with the Tikz code.
All the work described in this blog was made possible due to an open source library called the Axelrod Python Library, http://axelrod.readthedocs.io/. I would like to thank the co authors of the papers mentioned in this blogs for making me part of this research,