Escolar Documentos
Profissional Documentos
Cultura Documentos
INSTITUTO DE INFORMÁTICA
CURSO DE CIÊNCIA DA COMPUTAÇÃO
Porto Alegre
January 2018
UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL
Reitor: Prof. Rui Vicente Oppermann
Vice-Reitora: Profa . Jane Fraga Tutikian
Pró-Reitor de Graduação: Prof. Vladimir Pinheiro do Nascimento
Diretora do Instituto de Informática: Profa . Carla Maria Dal Sasso Freitas
Coordenador do Curso de Ciência de Computação: Prof. Raul Fernando Weber
Bibliotecária-chefe do Instituto de Informática: Beatriz Regina Bastos Haro
ABSTRACT
We present a solution method for deception-based games, which are zero-sum games
without perfect information, and apply it to the popular party game The Resistance. The
methods presented include adaptations of the techniques from (KOLLER; MEGIDDO;
von Stengel, 1994), together with novel improvements such as symmetry reductions.
These methods allow us to solve the game with up to 8 players, which would require
the analysis of 8.5 ∗ 1011 game tree nodes, if without symmetry reductions, and even more
without knowledge trees.
RESUMO
1 INTRODUCTION.........................................................................................................9
2 DEFINITIONS FROM GAME THEORY ...............................................................10
2.1 Games and their Players ........................................................................................10
2.2 Strategies..................................................................................................................11
2.3 Nash Equilibria .......................................................................................................11
2.4 Representations of Games ......................................................................................12
3 TECHNIQUES FOR FINDING GAME SOLUTIONS ..........................................15
3.1 Solutions to Games..................................................................................................15
3.2 Solving Perfect Information Games ......................................................................17
3.2.1 Approximation Techniques for Larger Games.......................................................18
3.2.2 Issues to Adapt to Imperfect Information Games ..................................................19
3.3 Approximation Techniques for Imperfect Information Games..........................19
3.4 Solving Imperfect Information Games with Linear Programming ...................20
3.5 The Lemke-Howson Algorithm .............................................................................21
3.6 Adapting to a Linear Programming Model..........................................................25
4 A GROUP DECEPTION GAME ..............................................................................26
4.1 Rules .........................................................................................................................26
4.2 Modeling a Multi-Player Game as a Two Player Game ......................................27
4.3 Size of the Game’s Trees.........................................................................................28
5 STRATEGIES AND THEIR APPLICATION TO SOLVING THE RESIS-
TANCE..................................................................................................................30
5.1 Adapting Games to Two Players............................................................................30
5.2 Adapting Koller’s Dual Method ............................................................................30
5.3 Symmetry Reduction ..............................................................................................34
6 RESULTS.....................................................................................................................37
7 CONCLUSION ...........................................................................................................39
REFERENCES...............................................................................................................40
9
1 INTRODUCTION
The life of any person is riddled with decision-making opportunities, which can
have great impact to all involved. Decision-making is also of great importance to any busi-
ness, requiring plenty of employees, most often called managers, who are tasked solely
with performing it. Many have profited from the rapid technology advances in the latest
years by using these new technologies to help improving their decisions. These advances
have also brought a rise in Artificial Intelligience, which brings with it many unanswered
questions about decision-making, such as on how to evaluate it mathematically.
In order to provide those decision improvements, and to solve those questions, we
can turn to Game Theory. Games can be viewed as mathematical models for decision-
making interactions. They range from very simple, such as the children’s game "Rock-
Paper-Scissors", to real-life examples, such as auctions, and to complex games, such as
Go, Chess and Poker.
Artificial Intelligence has advanced to the point of being much better than any
human player on games as Chess and Go, and has recently beaten some top Poker play-
ers as well. However, it still has many questions left unanswered, such as on how to
adapt solutions like AlphaGo, which masters a wide variety of games, such as Chess and
Go, to games where deception is involved, like Poker, where the best AIs are based on
specifically restricted models of the game.
This work is an analysis and implementation of the state-of-the-art of methods
for solving games played by two agents, where there is no cooperation possible, and
where deception is involved. These restrictions, other than the possibility of deception, are
particularly useful for the definition of an “optimal strategy”, as common shortcomings
of the classic notion of Nash Equilibrium don’t manifest in these games.
We apply known combinatorial optimization techniques together with novel im-
provements in order to solve the game The Resistance, a popular party game based on
deception and asymmetry of information.
10
In this chapter we will present our definitions of games and strategies, which are
the same as in Game Theory literature, such as in (KOLLER; MEGIDDO, 1992).
Strategies
Nash Equilibria
In order to provide optimal strategies for a game, we can study a very important
concept from game theory, the Nash Equilibrium (NASH, 1951).
Definition 1. Let Pi (S1 , S2 , . . .) be the payoff for player i if players use strategies from
the set S1 , S2 , . . ., that is, player j uses strategy Sj . This set of strategies is a Nash
Equilibrium if ∀i, ∀Si0 , Pi (S1 , S2 , . . . , Si−1 , Si0 , Si+1 , . . .) ≤ Pi (S1 , S2 , . . .)
This means that a Nash Equilibrium is a set of strategies, one for each player, such
that no player can improve his own payoff by changing just his own strategy. This is
an indicator of optimality, in a way, since it means that each player is playing their best,
assuming the other players’ strategies are known. This can also be seen as a saddle point
in the payoff function for the first player, since the payoff cannot increase when moving
along the possible strategies for the first player, and it cannot decrease along the second
player’s strategies. Nash demonstrated that, for any game with a finite amount of pure
strategies, there is at least one Nash Equilibrium, though it may be composed of mixed
strategies.
One basic example is the game of Rock-Paper-Scissors. In this game, both players
choose simultaneously between three actions, called Rock, Paper, and Scissors. If both
players choose the same action, the game is a tie. Otherwise, if one player chose Rock
and the second chose Paper, the player that chose Paper wins; similarly, a player choosing
Scissors wins against a player choosing Paper; and Rock wins against Scissors. We then
model payoffs, again, as being 0 for ties, 1 for a winning player, and -1 for a losing player.
There is no optimal pure strategy for this game, since, for any of the three that
12
exist, there is another that beats it, and, since the game is symmetric, the optimal payoff
should be 0, not -1. This means that any Nash equilibrium must consist in a pair of
1
mixed strategies. In fact, we have only one: where both players play each action with 3
probability each. You can note that this strategy means that, no matter what the opponent
1 1 1
does, there is 3
chance of a tie, 3
chance of a win and 3
chance of a loss, for an average
payoff of 0.
Note that optimal strategies are not always this obvious. For instance, if the game
1
was adapted so that a win of paper against rock had payoffs of 2
and − 12 rather than 1
and -1, the Nash Equilibrium would shift to playing paper with 40% chance, rock with
40% chance, and scissors with 20%. Note that this strategy is a Nash Equilibrium since
paper wins with 40% chance, ties with 40%, and loses with 20%, for an average payoff
of 40% ∗ 12 + 40% ∗ 0 + 20% ∗ −1 = 0, while rock has an average payoff of 40% ∗ − 21 +
40% ∗ 0 + 20% ∗ 1 = 0, and scissors 40% ∗ 1 + 40% ∗ −1 + 20% ∗ 0 = 0.
Mixed strategies also have a payoff of 0, due to the linearity of expectation: if and
opponent plays rock, paper and scissors with probabilities r, p and s, his expected payoff
is r ∗ P (Rock, S) + p ∗ P (P aper, S) + s ∗ P (Scissors, S) = r ∗ 0 + p ∗ 0 + s ∗ 0 = 0.
In fact, this analysis applies to any game, meaning that we can find the best response to a
strategy by searching only among pure strategies. This will be formalized in Theorem 2.
Representations of Games
Figure 2.2: Example of extensive form more compact than its normal form
Extensive form games may be converted to normal form, with each pure strategy
being a set of choices that say, given each possible game state, which action the player
should pick. Note that this may be exponential on the size of the tree, as exemplified on
Figure 2.2: for that game, the second player has 8 pure strategies, even though the tree
only has 6 leaves. A pure strategy there consists in choosing which of 2 actions to take
for each of 3 possible states. This simple example can be trivially extended to contain 2n
pure strategies with only 2n leaves.
15
In this chapter we present what are optimal strategies for two-player zero-sum
games, through the use of theorems commonly known in Game Theory, and we present
state-of-the-art techniques to find these strategies.
Solutions to Games
In this work we limit our study to two-player zero-sum games, since in these games
there is an unambiguous definition of “optimal strategy”:
Definition 2. P (S1, S2) is the payoff for the first player if he plays with strategy S1 and
the second player plays with strategy S2. In this case, the payoff for the second player
will be −P (S1, S2), since we are considering only zero-sum games.
Proof. If we have two equilibria (S1, S2) with payoff p and (S10 , S20 ) with payoff p0 ,
from the definition of Nash Equilibrium we know that
and that
−P (S1, S20 ) ≥ −P (S10 , S20 ) = −p0
thus p ≤ p0 . Similarly,
P (S10 , S2) ≤ P (S1, S2) = p
p ≥ p0
Thus p = p0
With this we may call any strategy that belongs to a Nash Equilibrium as “opti-
mal”, since it gives the player an optimal payoff if the opponent has an optimal strategy
too, and this payoff may only increase if the opponent uses another strategy. It is possible
for the player to obtain a better payoff, but only if it is known that the opponent is not us-
ing an optimal strategy. In this work, we assume that is not the case, since, while there are
16
biases in human play, even for simple games such as Rock-Paper-Scissors, as explored by
(WANG; XU; ZHOU, 2014), they can be somewhat corrected by aware players, and fully
corrected by the use of external tools, such as randomness generators like dice or mobile
apps, and can be assumed to not exist in non-human players, like artificial intelligences.
Useful properties of optimal strategies include that they work against unpredictable
opponents, and that they work against opponents that can predict your strategy.
Theorem 2. A strategy is optimal if it achieves at least the optimal payoff against each
pure strategy.
Proof. Since the payoff function against a fixed strategy is a linear function on the prob-
abilities of each pure strategy, the payoff for a mixed strategy is the weighted average of
the payoffs of the pure strategies it uses. Thus, it cannot surpass the payoff of the best
pure strategy it uses.
Thus we may, when validating the optimality of a solution, only check its outcome
against pure strategies, without having to consider mixed ones.
Definition 3. Given a strategy S1 , we call Best Response any strategy S2 = arg max P2 (S1 , S2 ) =
arg min P1 (S1 , S2 ).
In particular, from the previous theorem we know that there is always a pure best
response to any strategy. Also, from the definition, all strategies in a Nash Equilibrium
are best responses to each other.
Theorem 3. The Nash Equilibria define a continuous, convex surface in the strategy
space.
Proof. If we have two optimal strategies, S1 and S10 , by playing the first with some
probability p and the second with probability 1−p we arrive at another Nash Equilibrium,
since the payoff against each pure strategy is at least the optimal payoff.
in the concept of Nash Equilibrium. In this case, the choice is between selfishness and
altruism, which is not a mathematical choice, but rather a philosophical one, so we keep
these games out of the scope of this work.
There are many methods to find optimal strategies, each better suited to different
kinds of games.
For games with perfect information, that is, games in extensive form, and where
every knowledge partition contains a single node, there is always at least one optimal and
pure strategy, which can be found through backward induction (FUDENBERG; TIROLE,
1991).
Definition 4. The Subgame Payoff for a game state is the payoff function for the game
that starts at that state.
For a perfect information game, the optimal subgame payoff for a state is the
expected payoff if both players play optimally starting from that state. This value can be
found recursively, with backward induction:
With this, we can see that the optimal strategy for perfect information games is to always
follow the action that maximizes the subgame payoff, or that minimizes it if you are the
second player, since by this recursive definition above this set of strategies is a Nash
Equilibrium.
18
Figure 3.1: Part of the game tree for Tic-Tac-Toe, with subgame payoffs noted
In Figure 3.1 we have a small portion of the result of applying this algorithm to the
popular game of Tic-Tac-Toe. The game consists in two players, one placing X’s on the
board, and the other placing O’s, alternately, and without the ability to overwrite previous
placements. The winner is who manages to place three of their symbol in a straight line
first, including a diagonal.
On games where the outcome is either a win, a tie or a loss, and with no random-
ness involved, such as chess, go, or tic-tac-toe, this subgame payoff is always 1, 0 or -1,
representing respectively that, from that state, the first player can force a win, that optimal
play will lead to a tie, and that the second player can force a win.
That method can find the optimal strategy on any perfect-information game, as
long as the game tree is not too large. For instance, checkers (SCHAEFFER et al., 2007).
However, many games such as Chess or Go have trees that are too large to be computed.
In those cases, it is not possible to explicitly compute a complete, perfect strategy. How-
ever, there have been increasing advances in obtaining approximate strategies for these
games. More specifically, usually a heuristic is used to approximate the subgame payoff,
and then this approximation is improved through techniques such as Alpha-Beta Prun-
ing (KNUTH; MOORE, 1975) and Monte Carlo Tree Search (TESAURO; GALPERIN,
1997). These techniques perform the propagation of these approximate subgame payoffs,
but only in part of the game tree, such as only for the immediate children of the current
19
node, since the whole tree would be too big to be explored normally.
While chess has very optimized engines such as StockFish (ROMSTAD et al.,
2017), for the game of Go, Google has developed AlphaGo (SILVER et al., 2016), a
neural-network based AI trained via reinforcement learning, with the goal of being more
general purpose, that is, being able to solve perfect information games in general. Al-
phaGo has recently been adjusted into AlphaZero (SILVER et al., 2017), which can play
other games, such as Chess, and has quickly surpassed by far previous champion Chess-
specific engines.
Even though that may be the case, there is an approach based on machine learning
that computes approximate solutions to games without perfect information: the Counter-
factual Regret Minimization (CRM) method (ZINKEVICH et al., 2008). It computes a
function that, given the current game state and player knowledge, returns the probability
of choosing each action. The core of the algorithm is how it improves a strategy with rein-
forcement learning (SUTTON; BARTO, 1998). After having it play against an opponent,
who is also being trained with this strategy, a regret value is calculated for each action
that was not taken. To calculate the value, the game is replayed with the only difference
being that the action is taken instead of not taken. The regret is, then, the difference of
the payoffs of these two games, or 0 if taking the action lowered the payoff. The strategy
then is shifted towards picking actions with higher regret value, with a shift proportional
to the regret value.
Fixpoints for this strategy shifting algorithm are Nash Equilibria, since they cor-
20
respond to strategies where all regrets are 0, meaning no action could be taken that would
improve the player’s payoff. However, since regrets are computed when playing one pure
strategy at a time, not all possible paths of a mixed strategy, in most games these fix-
points don’t exist, since they would mean we have an optimal pure strategy. Rather, the
algorithm usually converges to a strategy where the shifts cancel out, on average. This
can give an intuition that CRM is not very efficient, since the computed function never
properly converges. Indeed, only recently has this method been optimized enough to be
competitive against human players in a limited variant of poker (BOWLING et al., 2015).
The main reason why these approximative methods don’t work as well for imper-
fect information games is the fact that the exact solution methods for imperfect informa-
tion games are very different from the methods for solving perfect information games.
While the latter use essentially a dynamic programming method, the former have to take
into account that most subgames may interfere with each other’s solutions, and use linear
programming models instead.
The basic method for finding an exact solution is taking the game model in normal
form, and then translating Theorem 2 into linear programming. Essentially, the formula-
tion tries to maximize the final payoff, and restricts this payoff to be the minimum payoff
against any pure strategy.
Let P si be the set of pure strategies for player i. Then the formulation becomes
max z
subject to:
X
∀p2 ∈ P s2 , S[p1 ]P (p1 , p2 ) ≥ z
p1 ∈P s1
X
S[p1 ] = 1
p1
S[p1 ] ∈ R+ , ∀p1 ∈ P s1
This formulations finds z, the optimal payoff, and S, the strategy for the first player.
While this provides a very good solution for normal form games, it is not as ef-
ficient for extensive form games, since, as already observed, the normal form formu-
21
lation usually has size exponential on the size of the game tree. However, (KOLLER;
MEGIDDO, 1996) demonstrates the existence of some solution with linear support, and
in fact there are methods for finding more compact solutions.
The basic algorithm for studying two player games that may not be zero-sum is
the Lemke-Howson algorithm, which can be adapted to extensive form zero-sum games,
as demonstrated in (KOLLER; MEGIDDO; von Stengel, 1996). It works similarly to
the Simplex algorithm, pivoting pure strategies that are best reponses to the opponent’s
current strategy into the current solution, while pivoting out strategies that are not best
reponses anymore.
The model used in that work involves modeling a mixed strategy not by assigning
probabilities not to pure strategies, but rather by assigning probabilities p to the leaves of
the game tree. Then, the probability p of an internal node is defined by the probabilities
of its children, as will be described later in this section. The strategy represented by these
probabilities means that, when reaching some node n, each possible action corresponds
to a child node c, and the probability that the strategy picks this action is p(c)/p(n).
In order to apply this formulation, the game tree is converted into two trees, one
for each player, corresponding to their knowledge. Then the strategy of the first player
affects only the probabilities in his tree, the strategy for the second player affects only the
second tree probabilities, and the probabilities of random events only affect the payoffs
on the leaves.
Definition 5. The knowledge tree for a player is a tree where each node belongs to either
the set of action nodes, A, the set of information nodes, I, or the set of leaf nodes, L. This
tree is defined by having a bijection between its nodes and the original game’s knowledge
states for the player. This means that A represents the states where the player may take an
action, with a bijection between children and possible actions. I represents states where
it is not the player’s turn, with a bijection between children and possible information that
the player may receive at that state. And L is the set of states when the game is over. We
represent the children of a node n as the set C(n).
Each node in this tree corresponds, thus, to a knowledge partition in the original
game tree. This formulation uses the fact that players have knowledge of their actions and
22
of the history of their perception. The study of games where players don’t have perfect
recall, meaning that they may end up at the same knowledge state from more than one
knowledge path, that is, meaning that the knowledge does not follow a tree, is out of the
scope of this work, specially since it is proven NP-complete by (KOLLER; MEGIDDO,
1992).
The policy for having random event probabilities only affect leaves, as mentioned
earlier, simplifies the modeling, since then there is no need to consider how these events
interact between the two knowledge trees. It makes it so that the probability of a node is
the sum of the probabilities of its children, if the node is in A, or the probability is equal
to each of its children’s probabilities, if the node is in I. Without the policy for random
events, these values could be calculated in much more complex ways, due to concerns
with conditional probabilities. By only affecting the leaves, we can model the final payoff
for each pair of leaves, one leaf from each tree, and make it not just the original payoff for
the corresponding end state, but that payoff multiplied by the product of the probabilities
that random events led to that state. Note that this payoff is thus not being affected by
either player’s strategies, and is 0 if those leaves cannot be reached simultaneously, as that
would mean some random event probability would be 0. We can summarize these rules
in the following formulation:
X
max min S1 [l1 ]P (l1 , l2 )S2 [l2 ]
S1 S2
l1 ∈L1 ,l2 ∈L2
subject to:
X
∀a1 ∈ A1 , S1 [a1 ] = S1 [c1 ]
c1 ∈C(a1 )
max min − 0.5S1 [0P ] + 0.5S1 [1P ] + 0.25(S1 [0BF ] + S1 [1BF ])(S2 [0F ] + S2 [1F ]
S1 S2
In this formulation, we labeled every node with the initials of the edge labels in
the path from the root to the node. While the optimal strategy is quite trivial, being to
always bet or call if you have a good card and to never bet nor call if you have a bad card,
24
Figure 3.2: The tree for the poker-like game described, split into the points of view for
both players
the analysis required to demonstrate that fact is not so straightforward, but is automatic
when using the LCP formulation above.
Lemke-Howson, as the simplex method, is not suitable for very degenerate sys-
tems, and is also quite complex to be optimized when dealing with larger games. (KOLLER;
MEGIDDO; von Stengel, 1994) has solved these issues by adapting the linear comple-
mentarity model into a linear programming model, thus enabling optimizations achieved
in linear programming that are not yet available for linear complementarity, since the
latter is not nearly as studied as the former.
The adaptation found involves the use of the dual program for the restrictions of
the second player. By fixing the first player’s strategy, we have that the second player’s
strategy should satisfy a normal minimization linear program. By taking the dual of that
program, it becomes a maximization program, which stays linear even after making the
first player’s strategy into variables again. That is, this modified system contains variables
for the first player’s strategy and for the dual of the second player’s, and is a purely
maximization linear program, meaning it can be solved with simplex or other methods
for normal linear programs. We provide the full derivation and a more straightforward
formulation of this method in Section 5.2.
26
In this work, we solved the party game The Resistance, since it is a somewhat
popular game, is based on knowledge economics, and is not too complex.
Rules
We can see that the strategy for the spies is to sabotage missions as long as that
doesn’t give the resistance too much information, and the resistance’s strategy is to agree
on missions that avoid players who are probably spies. Note that mission assignments are
dominated by the resistance, since they form the majority of players.
Even though this game has more than two players, we can model it as a two-
player zero-sum game, thus allowing us to apply the methods discussed in Chapter 3 to
find optimal strategies to it.
We consider that we have one player who represents the resistance and the other
represent the spies. Then, the mission voting can be controlled by the resistance player,
while the spies player decides on whether to sabotage each mission or not, but cannot
decide to sabotage a mission where not enough spies were assigned.
This is a simplification of the game, as it assumes that all players are coordinated
in their strategy, and that the spies can also make synchronized decisions without alerting
the resistance. To be more precise, we can simplify the game in this manner by having all
players agree on a strategy before the game starts. In a real scenario, this is often achieved
after a series of games, where strategies can converge after enough discussions among the
group of players.
Then the resistance can make their decisions based on agreed protocols, such as
die-rolling, coin-tossing, or cryptographic exchanges. This makes the voting aspect of
the game irrelevant, since all resistance players would vote on the decision they agreed
upon, and any spies that vote on another option would have no effect on the final result,
but would improve the knowledge of the resistance. As for the spies’ strategy, they can
just sabotage missions the two-player optimal strategy says them to.
There is an important problem, however, when more than one spy is chosen to
a mission. In that case, optimal 2-player strategy would dictate that, if they wanted to
sabotage the mission, exactly the amount necessary would do it, and if not, no spy would
sabotage the mission. So they face two issues: when not all chosen spies would need to
sabotage the mission, they have to know who would do it. This can easily be prearranged,
before the game starts. The second issue is when more than one spy would need to sab-
otage the mission, but their strategy dictates that they should do it but not with 100%
probability, meaning they would need to take a synchronized decision. This can still be
28
solved, albeit with more complex pre-game preparations, such as deciding on commu-
nication protocols that would not be detected by the other players. We can also solve
the model ignoring this problem, then adapt the strategy found accordingly, though this
adaptation falls out of the scope of this work.
In Figure 4.1 we can see how the game scales with the amount of players. We can
note, particularly, that the size of the game grows exponentially as rounds pass, with the
exception that many game stories finish after the 4th round. We can also see that the game
size increases super-exponentially with the amount of players. Indeed, we could not gen-
erate any results for instances with more than 8 players, as even the reductions presented
in Section 5.3 were not enough to make those instances fit in our server’s memory.
29
Our strategy for solving games is to first model them as a two-player zero-sum
game. With a multi-player game, such as Resistance, we can, as discussed, group to-
gether players who aim to cooperate, and who are competing against the other team. We
have observed that, in general, this is possible, albeit with varying degrees of faithfulness
between the model and the actual game, depending on the game. For instance, we could
model a political scenario, if we wanted to have an idea of a strategy, by limiting it to
two competing parties, at the cost of not taking into consideration bipartisan agreements,
betrayals, ethics, among many other factors.
We then adapt the method from (KOLLER; MEGIDDO; von Stengel, 1994), in
order to have a more straightforward formulation, thus enabling a more effective imple-
mentation and future improvements. We do so by figuring out what kind of constraints
may arise from performing the dual.
The first step from the cited work is to isolate the second player’s variables. Thus
we have the partial formulation:
31
X X
min ( S1 [l1 ]P (l1 , l2 ))S2 [l2 ]
l2 ∈L2 l1 ∈L1
subject to:
X
∀a2 ∈ A2 , S2 [a2 ] = S2 [c2 ]
c2 ∈C(a2 )
First we note that the variables in the primal model correspond to nodes, and
constraints may be of two kinds: representing an edge where the parent is an information
node, and representing a choice node and its children. While every choice node only
appears in two constraints, its parent’s and its own, information nodes appear in one
constraint for every neighbor edge.
When taking the dual, these roles reverse. Each primal constraint from a choice
node, since it contained many primal variables, becomes a dual variable that appears in
many dual constraints. Meanwhile, an information node primal variable, since it appeared
in many primal constraints, becomes a dual constraint that contains many dual variables.
We can observe that in the dual:
max M [Root2 ]
subject to:
X
∀l2 ∈ L2 , M [l2 ] ≤ S1 [l1 ]P (l1 , l2 )
l1 ∈L1
M [p2 ] ∈ R, ∀p2 ∈ A2 ∪ I2 ∪ L2
When analyzing these new constraints and variables, we can see that an informa-
tion node dual constraint is a sum, meaning the dual value for the corresponding node
is the sum of its children. Also, the dual constraints where a choice node dual variable
32
appears form a minimum constraint, meaning a choice node dual value is the minimum
value of its children.
As for the root and leaves, in the primal the root appeared in the single constraint
involving a constant (root = 1), and the leaves are the ones that appeared in the final value
definition. In the dual, these two roles are also reversed: the root value becomes also the
final value of the system, and the leaves have values assigned to them.
The complete interpretation of this dual is that, while in the primal we were inter-
ested in the optimal probability a strategy should choose for each node, S[node], in the
dual we only calculate the best payoff possible when we reach each node, M [node]. If a
node represents a choice, the player may choose the child node that has the best payoff,
thus the value of a choice node is the minimum of its children. And if a node represents
waiting for information, the child node that is followed is taken at random, thus the value
of the parent node is a weighted average of the values of its children. However, we already
weigh the nodes accordingly, since random probabilities and the other player’s strategy
already affect the leaves’ values, thus this weighted average is translated into simply a
sum.
We can return the first player’s restrictions to the system now, since we know that
the dual we just modeled finds the same final values as its primal (WOLFE, 1961):
max M [Root2 ]
subject to:
X
∀a1 ∈ A1 , S1 [a1 ] = S1 [c1 ]
c1 ∈C(a1 )
S1 [p1 ] ∈ R+ , ∀p1 ∈ A1 ∪ I1 ∪ L1
M [p2 ] ∈ R, ∀p2 ∈ A2 ∪ I2 ∪ L2
33
So the final system optimizes “probability” variables for one player, S, and “pay-
off” variables for the second, M . The constraints for the first player are:
Then we have the second player’s constraints on his leaves: they must have payoff equal
to the expected payoff if the second player reached that leaf, but again the random event
probability policy of affecting only leaves, as described in Section 3.5, also affects this
payoff. After that, these leaf payoffs are propagated towards the root of the second
player’s tree, through the other second player constraints:
In the linear system, these constraints are translated as M [n2 ] ≤ f (C(n2 ) for f repre-
senting sum or minimum, respectively, which is not the equality M [n2 ] = f (C(n2 ) one
could expect. These two formulations, however, are equivalent, since, when maximizing
the final payoff, these variables end up with the maximum value they can get, meaning
they will be equal to the limit imposed on them.
While this formulation and similar can be found in other works, such as (BOSAN-
SKY et al., 2014), it is important to link it to the dual method from (KOLLER; MEGIDDO;
von Stengel, 1994), since this brings new points of view to this formulation, and may help
with improving it and its usages. For example, in this work we are interested in obtain-
ing the optimal strategies for both players, but this simplified formulation only obtains
the strategy S of the first player, giving only the much less useful payoff values M for
the second. This would mean that both this system and its dual would need to be solved
separately in order to obtain the strategies for both players, S1 and S2 . However, by
understanding the duality in the formulation, we may take the dual constraints for the sec-
ond player, which use variables M , and calculate their associated primal variables, thus
obtaining the strategy S2 for the second player, without the need for a complete reformu-
lation and re-solving of the system. This improves the efficiency of the solution by more
than double, since solving one formulation of the system may be much faster than solving
its dual, if the game is quite asymmetric, as is the case for Resistance.
34
Symmetry Reduction
One important decision when solving a game is on how to treat symmetry. Al-
phaGo, for example, takes all symmetries of its training sets and uses them as part of the
training set too, growing it by a factor of 8. In this work, however, our bottleneck is not
that there is too little data being produced, but that there is too big of a game tree being
generated. With that in mind, we take the opposite stance to AlphaGo, pruning symmetry,
in order to reduce the size of our game tree. We do so with two techniques: by generating
the knowledge trees directly, without generating the complete game tree; and by merging
equivalent nodes and subtrees.
The strategy of generating knowledge trees directly is quite straightforward: in-
stead of generating the complete game tree, and only then using that tree to generate the
two knowledge trees, it is possible to skip that first step, at the cost of a more complex
implementation. The main benefit, and also the main challenge, of generating knowledge
trees directly, is that events not controlled by the player only manifest into the creation of
new nodes when they give new information to the player.
This means that, for example, in a card game where both players start with the
private knowledge of their N cards, the full game tree starts with one node for each com-
bination of 2N cards, while knowledge trees only consider the combinations of N cards
dealt to the player that owns the tree. Since the amount of combinations grows exponen-
tially, cutting the amount of cards by half represents a great improvement in the amount
of nodes generated. This technique is not immediately useful for the game of Resistance,
since it only reduces the size of knowledge trees of players who don’t have too much in-
formation. This means that it reduces the size of the resistance player’s tree, but the spies
still end up with a knowledge tree of size very similar to the full game’s tree, since they
have complete knowledge of the game’s state.
We can, however, reduce the spies player’s tree by noticing that he has many equiv-
alent subtrees in his knowledge tree, and so does the resistance player. We define two sub-
trees as equivalent if they have an ismorphism, with possible relabeling of actions. Note
that this isomorphism also requires the payoffs for leaf pairs to also match up. For Resis-
tance, these isomorphisms are relabelings of the players. For example, if the resistance
player sends players 1 and 2 into the first mission, that brings the same results as if he had
sent players 1 and 3, if we swapped the names of players 2 and 3. We can then assume
that there is an optimal strategy where every equivalent action is taken with the same
35
Figure 5.1: Comparison of knowledge tree sizes with symmetry reduction on Resistance
with 8 players
Figure 5.2: Comparison of total knowledge tree sizes with symmetry reduction on Resis-
tance
37
6 RESULTS
In order to solve the linear programming models derived form the game’s descrip-
tion, we used the GLPK solver, and the CPLEX solver. For GLPK, we generated the
model in its MathProg language, while for CPLEX we used its C++ API. GLPK was not
able to handle very large instances of the game, being very slow to solve the models, so
its numbers were not interesting, thus we only present the results with CPLEX.
The tests were run in a server equipped with the 8-core FX-8150 AMD processor,
which runs at 3.6GHz. It has 32GB of memory, Ubuntu 16.04 (xenial), on Linux 4.4.0.
CPLEX was much more efficient than GLPK, by using multiple threads, and using
a barrier method. The barrier method is specially useful because the models generated by
our techniques are quite degenerate. Indeed, only one independent variable is not 0, in the
constraint S[Root] = 1.
With these results we can see the advantage of obtaining the strategy of the second
player via dual variables. Indeed, the time taken to solve the model with roles reversed
was up to 9 times faster, due to the asymmetry between the two players, in information
and strategy, that is inherent to this game.
Table 6.2: Resources used to solve Resistance, with first player controlling the resistance
Players 5 6 7 8
CPU 220% 217% 249% 200%
Time 2.820s 13.620s 7m 21.200s 9h 47m 32s
Memory 311MB 1,057MB 7,491MB 31,720MB
Iterations 31,114 118,370 1,019,223 7,216,077
Table 6.3: Resources used to solve the dual of Resistance, that is, with first player con-
trolling the spies
Players 5 6 7 8
CPU 229% 226% 365% 462%
Time 1.910s 10.080s 2m 58.650s 1h 7m 54s
Memory 214MB 691MB 5,352MB 31,748MB
Iterations 3,156 8,156 265,379 1,864,616
We can note that, in these results, the spies have an advantage. In games with
human players, however, this difference is usually mitigated by the fact that it is quite hard
38
Figure 6.1: CPLEX iterations to solve Resistance, in both normal and dual form
for a human spy player to never act in a suspicious manner, as noted by (DEMYANOV et
al., 2015), and also probably by the lack of knowledge of optimal strategies and by the lack
of synchronization among spies, as discussed in Chapter 4, however these considerations
fall out of the scope of this work.
39
7 CONCLUSION
We have solved the game The Resistance for up to 8 players, which is an imperfect
information game with 8.5 ∗ 1011 knowledge nodes. This was achieved through a com-
bination of known techniques, specially the ones described in (KOLLER; MEGIDDO;
von Stengel, 1994), with novel methods, particularly the symmetry reduction method we
introduce.
Optimal solutions for games have quite a few uses, including the modification of
protocols in order to improve their expected results, as studied in (LIU; MARSCHNER,
2017), and the development of artificial intelligences to play or aid the play of the situa-
tions from which the solved models were extracted from.
In our work, we found a partial solution for the problem of cross-interfering game
subtrees in imperfect games, albeit requiring very restrictive assumptions, which is the
symmetry reduction. Indeed, we found that if multiple choices lead to equal sub-game
trees, other than permutations and relabeling, there is always some optimal solution where
they receive the same probability. This result, if adapted into a generic solution, may
prove important for the adaptation of algorithms usch as AlphaGo to imperfect infor-
mation games, as this is a major issue faced in the transition from perfect to imperfect
information.
We have also attempted to implement a method similar to the one described in
(BOSANSKY et al., 2014): by restricting the game model temporarily, it is possible to
obtain an optimal strategy for a simplified version of the game. Then this simplification
can be expanded iteratively until the strategy found corresponds to a good approximation
of an optimal strategy. Our implementation was inconsistent, meaning we did not obtain
results from this approach, but it still seems very promising, as it heavily reduces the
game tree sizes being worked with. We have modified the cited approach by improving
the algorithm for the tree expansion. In particular, we used the strategy from (KOLLER;
MEGIDDO, 1992), which efficiently discovers best responses, that could then be used as
new expansion branches.
40
REFERENCES
WANG, Z.; XU, B.; ZHOU, H.-J. Social cycling and conditional responses in the
rock-paper-scissors game. Scientific reports, Nature Publishing Group, v. 4, p. 5830,
2014.