Home Let's play Rock Paper Scissors!
Post
Cancel

Let's play Rock Paper Scissors!

I’ve been thinking a about n-player games and recently have been pondering on how to generalize the Rock Paper Scissors game to more than two players, and more than three actions. Turns out, it’s not that difficult, and it becomes quite elegant.

The original game is a zero-sum game with two players, and the payoff matrix is as follows:

1\2 :rock: :page_facing_up: :scissors:
:rock: 0,0 -1,1 1,-1
:page_facing_up: 1,-1 0,0 -1,1
:scissors: -1,1 1,-1 0,0

There are several ways to generalize this game to more than two players. I saw two attempts1,2 on two different axes regarding this, none of them were complete however, and none of them attempted to generalize on both axis. Here I attempt to generalize to N players and M actions.

Generalizing to more than two players

TL;DR:

  • You can count for each player how many players they defeated and how many players defeated them, and the difference is their payoff.

For this generalization, I took a step further from this approach1(https://math.stackexchange.com/questions/277120/mixed-strategy-nash-equilibrium-of-rock-paper-scissors-with-3-players?rq=1). The complete payoff matrix for three players would look like this:

1\(2, 3) :rock: :rock: :rock: :page_facing_up: :rock: :scissors: :page_facing_up: :rock: :page_facing_up: :page_facing_up: :page_facing_up: :scissors: :scissors: :rock: :scissors: :page_facing_up: :scissors: :scissors:
:rock: 0, 0, 0 -1, -1, 2 1, 1, -2 -1, 2, -1 -2, 1, 1 0, 0, 0 1, -2, 1 0, 0, 0 2, -1, -1
:page_facing_up: 2, -1, -1 1, -2, 1 0, 0, 0 1, 1, -2 0, 0, 0 -1, -1, 2 0, 0, 0 -1, 2, -1 -2, 1, 1
:scissors: -2, 1, 1 0, 0, 0 -1, 2, -1 0, 0, 0 2,- 1, -1 1, -2, 1 -1, -1, 2 1, 1, -2 0, 0, 0

Vertically there are the first player’s actions, while horizontally there are the second and third player’s actions to avoid having to write out the matrix three times, for each possible pair. To be fair, the payoff “matrix” of an N-player game is actually an N-dimensional tensor, but this notation is more compact and readable. I actually modified the payoff values from the first reference, since those 0.5 payoffs didn’t really make sense to me. It might lead to another valid generalization, but it was unclear to me how it would scale to even more players.

From what I gather, that model tried to capture the idea that multiple winning actions share the gains, but multiple losing actions didn’t share the losses. I broke this asymmetry by making a simple rule for the payoffs. After gathering everyone’s actions, the final payoff for player p is the difference between the number of players defeated and the number of players that defeated p. This rule checks the base case of the two-player zero-sum Rock Paper Scissors game, and generalizes to more than two players. Moreover, and possibly most importantly, it still preserves the Nash-Equilibrium of the game of uniform randomization.

Let’s see what the payoff matrix would look like for four players:

(1,2)\(3,4) :scissors: :scissors: :scissors: :page_facing_up: :scissors: :rock: :page_facing_up: :scissors: :page_facing_up: :page_facing_up: :page_facing_up: :rock: :rock: :scissors: :rock: :page_facing_up: :rock: :rock:
:scissors: :scissors: 0, 0, 0, 0 1, 1, 1, -3 -1, -1, -1, 3 1, 1, -3, 1 2, 2, -2, -2 0, 0, -1, 1 -1, -1, 3, -1 0, 0, 1, -1 -2, -2, 2, 2
:scissors: :page_facing_up: 1, -3, 1, 1 2, -2, 2, -2 0, -1, 0, 1 2, -2, -2, 2 3, -1, -1, -1 1, 0, 0, -1 0, -1, 1, 0 1, 0, -1, 0 -1, 1, 0, 0
:scissors: :rock: -1, 3, -1, -1 0, 1, 0, -1 -2, 2, -2, 2 0, 1, -1, 0 1, -1, 0, 0 -1, 0, 1, 0 -2, 2, 2, -2 -1, 0, 0, 1 -3, 1, 1, 1
:page_facing_up: :scissors: -3, 1, 1, 1 -2, 2, 2, -2 -1, 0, 0, 1 -2, 2, -2, 2 -1, 3, -1, -1 0, 1, 0, -1 -1, 0, 1, 0 0, 1, -1, 0 1, -1, 0, 0
:page_facing_up: :page_facing_up: -2, -2, 2, 2 -1, -1, 3, -1 0, 0, 1, -1 -1, -1, -1, 3 0, 0, 0, 0 1, 1, 1, -3 0, 0, -1, 1 1, 1, -3, 1 2, 2, -2, -2
:page_facing_up: :rock: -1, 1, 0, 0 0, -1, 1, 0 1, 0, -1, 0 0, -1, 0, 1 1, -3, 1, 1 2, -2, 2, -2 1, 0, 0, -1 2, -2, -2, 2 3, -1, -1, -1
:rock: :scissors: 3, -1, -1, -1 1, 0, 0, -1 2, -2, -2, 2 1, 0, -1, 0 -1, 1, 0, 0 0, -1, 1, 0 2, -2, 2, -2 0, -1, 0, 1 1, -3, 1, 1
:rock: :page_facing_up: 1, -1, 0, 0 -1, 0, 1, 0 0, 1, -1, 0 -1, 0, 0, 1 -3, 1, 1, 1 -2, 2, 2, -2 0, 1, 0, -1 -2, 2, -2, 2 -1, 3, -1, -1
:rock: :rock: 2, 2, -2, -2 0, 0, -1, 1 1, 1, -3, 1 0, 0, 1, -1 -2, -2, 2, 2 -1, -1, 3, -1 1, 1, 1, -3 -1, -1, -1, 3 0, 0, 0, 0

As one can observe, this quickly gets out of hand, should we write it out in normal form. The payoff matrices for larger numbers of players are really not suited to be written out explicitly.

Generalizing over the number of actions

Tl;DR:

  • The number of actions must be odd for the game to be fair.
  • The graph of the game is a regular tournament.
  • Each action must beat as many other actions as it is beaten by.

Rock beats scissors, scissors beats paper, paper beats rock. How can we add another action to preserve this property?

To answer this, I will mostly summarize 2, the graph we obtain is called a regular tournament and then extend it to the generalization across the number of players as well.

One naive(and terribly wrong) approach would be to introduce one more action into this cycle of beats-beaten relationships, but one must note that every action has to either beat or be beaten by every other action, a simple cycle would not suffice. If this is not preserved, some actions are beaten less than they beat, and their choice would be unambiguous, rendering the entire game useless.

In other words, in a directed graph, where actions are nodes and edges represent the beating/beaten relationship. This can be imagined as a complete undirected graph, on whose edges we add a direction in such a way that the graph remains balanced. That is, every action should beat as many other actions as it is beaten by. Otherwise, the game would not be fair.

Since we started from a complete undirected graph, and say, M nodes, we have $\frac{M\cdot(M-1)}{2}$ edges, each node having $M - 1$ edges. For the fairness condition described above, after adding directions to the graph, the in- and out-degree of each node must be the same. This implies, that first, $M - 1$ must be even, which further implies that $M$ must be odd. As pointed out in 2, the graph we obtain is called a regular tournament.

We can generalize this constructively. That is, we start by a degenerate game having only one action. This game would always yield a draw no matter what. Then we add two more nodes, and add directed edges between them in a way that preserves balance. For three actions, there will be obviously only one way to do this. Proving however, that for larger numbers of nodes this still holds, has not been proven in 2 either. Not discriminating among neither the actions, nor the nodes, we only count the number of ways in which the game can be generalized that are not isomorphic and vertex-transitive. Otherwise, we can invert all the edges or permute nodes and get the same games, which of course would be an uninteresting way of distinguishing between games. Non-isomorphism and vertex transitivity rule out such redundant solutions.

Having an odd number $M$ of actions, and noting each action as an integer from $0$ to $M-1$, deciding the beating relationship between two arbitrary actions can be given by a simple formula:

\[Winner(a,b)= \left\{ \begin{array}{ll} max(a,b), & (a + b)\mod 2 =0 \\ min(a,b), & (a + b)\mod 2 =1 \end{array} \right.\]

Due to symmetry, the $max$ and $min$ operators can be reversed.

Nash Equilibrium of the N-player M-action game

TL;DR:

  • The Nash-equilibrium of the game is uniform randomization even with larger number of actions and players.

It can be trivially seen that choosing one single action as a strategy is easily exploitable. A mixed-strategy Nash-equilibrium must make players indifferent to each other’s strategies. This can get complicated to think of if we think how one player could make all others indifferent. The trick is to think the other way around, that is, every other player must make the first player indifferent to their joint strategies. If each player has a mixed strategy, then in a mixed strategy profile $s$, we can write the probability of player $i$ choosing action $j$ as $p_{ij}$. Further, let’s denote an action profile as $a = (a_0,a_1,…,a_{n-1})$ where $a_i$ is the action chosen by player $i$. The payoff of player $i$ given an action profile $a$ is $P_i(a)$. Then, each player’s utility for doing a certain fixed action $j$ must be the same as the utility for all other actions, and this has to hold true for all players. The utility of a player $i$ doing action $j$ given a mixed strategy profile of opponents as $s_{-i}$ is

\[(1) \ \ U_i(j,s_{-i}) = \sum_{a_{-i}}P_i(j,s_{-i})\cdot {\rm I\!P}(a_{-i}),\ \ \ \ \ \ \forall i \in \{0,1,...,N-1\}, \forall j \in \{0,1,...,M-1\}\]

Here $a_{-1}=(a_0,…a_{i-1},a_{i+1},…a_{n-1})$ and given the mixed strategy profile $s_{-i}$, we can define the probability of $a_{-i}$ as:

\[{\rm I\!P}(a_{-i}) = \prod_{k=0,k\neq i}^{N-1}p_{ka_{k}}\]

For each player, these utilities have to be equal across all actions, which leads to $\frac{M\cdot(M-1)}{2}$ equations for each player, which means a total of $N\cdot \frac{M\cdot(M-1)}{2}$ equations requiring that the game is balanced, and $N$ more equations for the mixed strategies to be valid for each player, that is:

\[\sum_{j=0}^{M-1}p_{ij} = 1, \ \ \ \forall i \in \{0,1,...,N-1\}\]

It would be an instance of absolute algebraic beauty to find a closed-form solution for this system of equations. It is also obvious, that the equations yielded by the conditions for the game to be balanced, are not linear in the $p_{ij}$-s, they appear in all possible combinations. If we focus on one player only for a little, the probabilities of the rest of the players will appear in all possible combinations, precisely $M^{N-1}$. We also observe that the subsystems of equations yielded by each player are symmetric across players.

Since for each action the number of defeated and defeating actions is the same, the unweighted sum of payoffs for each player

$U_i(j,s_{-i}) = \displaystyle \sum_{a_{-i}}P_i(j,s_{-i})$ is zero, which implies that a solution of the system consists of those $p_{ij}$-s for which all ${\rm I!P}(a_{-i})$ values are equal. One can observe that $p_{ij}=\frac{1}{M}$ is a solution of the system for all actions $j$ and all players $i$, yielding $0$ reward in expectation.

Thus we have demonstrated that in the extended game, uniform randomization is still a Nash-equilibrium.

Sources

  1. https://math.stackexchange.com/questions/277120/mixed-strategy-nash-equilibrium-of-rock-paper-scissors-with-3-players?rq=1
  2. https://math.stackexchange.com/questions/3229686/generalizing-rock-paper-scissors-game
  3. Gergely, M. I., & Czibula, G. Policy-Based Reinforcement Learning in the Generalized Rock-Paper-Scissors Game.
This post is licensed under CC BY 4.0 by the author.
Contents