Você está na página 1de 8

,(((&,*  Tainan, Taiwan August 31, 2015 – September 2, 2015

Mining game logs to create a playbook for unit AIs

Daniel Wehr Jörg Denzinger


Department of Computer Science Department of Computer Science
University of Calgary University of Calgary
Calgary, AB, Canada, T2N 1N4 Calgary, AB, Canada, T2N 1N4
Email: dkwehr@ucalgary.ca Email: denzinge@cpsc.ucalgary.ca

Abstract—We present a method for mining game logs for backwards for occurrences of the intended directive and store
plays, sequences of actions for a group of units achieving an the information of the log at that time, the information in the
objective with a high likelihood and in many logs. The mining following state (in the game) and the information of a given
moves through a log backwards, identifying states that achieve the number of game states before the found state. This constitutes
objective and taking this state and certain surrounding ones as a a play candidate. In the second phase, we eliminate from a
play candidate. After filtering out irrelevant information and too
play candidate all irrelevant information, while the third phase
costly candidates, we cluster similar candidates and abstract the
candidates in large enough clusters into a play. We applied these filters out candidates that achieve the directive at too high a
general ideas to the game Battle for Wesnoth and our evaluation cost. The following phases look at the whole set of candidates
showed that we are able to consistently mine successful plays, that made it through the first three phases. Phase four clusters
some of which are also often applied in logs that were not used the plays according to a game specific similarity function
for the mining. and, finally, phase five generalizes the candidate plays in each
cluster into a play for the game.
I. I NTRODUCTION Since nearly every phase requires specific information
about the particular game the logs are coming from, we
Analyzing logs of systems that either record the interac- present in this paper the method applied to the game Battle
tions of the components of the system or the interactions of for Wesnoth (see [9]). Our experiments, using data mining
the system with users and other systems is not just useful evaluation techniques, show that we are consistently able to
for debugging of such systems. Analyzing the internal in- find plays that are occuring in many logs and that are often
teractions can identify system performance optimizations by, successful even for a harsh definition of ”many” and ”often”.
for example, identifying bottlenecks or unintended interaction
sequences. Analyzing the interactions with the outside can This paper is organized as follows: After this introduction,
identify attempts to compromise the system but also the usage in Section II we present a short description of the game
intentions and preferences of human users. For nearly all the Battle for Wesnoth and a unit/agent oriented view of the game
intended analyses large or multiple logs are needed to be inspired by [6]. Section III first provides a definition for what
sure enough about identified patterns which makes the use of we consider a play and then presents the different phases of
automated learning or mining necessary. mining plays from game logs. In Section IV we evaluate our
method by mining plays from a set of training logs and then
In the area of games we have also already seen approaches checking a set of test logs for instantiations of the mined plays.
to mine game logs, for example to find clusters of player be- Section V discusses related work and Section VI provides
havior (see [3]) or to identify aim robots (see, for example, [2]). concluding remarks and ideas for future work.
While these approaches concentrate on the outside interactions
of a game, in this paper we present an approach for mining
game logs that aims at the internal interactions of game units II. BASICS
with the goal of improving their cooperation. In this section, we will first present a basic description
More precisely, we aim at finding what we call plays, of Battle for Wesnoth and then present a detailed view on
sequences of actions for a group of units (or NPCs) that accom- the game from the perspective of an individual unit (agent),
plish a given directive with a high probability of success and a inspired by [6], which will be the basis for the plays we want
high probability of being triggered in a single game (preferably to mine.
several times). Examples of plays are action sequences by a
group of game units that kill a particular type of enemy in a A. Battle for Wesnoth
particular terrain, sequences that result in taking a city under
given circumstances, or a particular play from a particular Battle for Wesnoth is a turn based strategy game where
situation in a team sports game that results in a score. [1] the player controls an army of units. Additionally, the game
showed that adding plays to the choices for an AI can lead to includes an economic component that allows for the creation
substantial improvements when trying to learn a way to beat of new units during a game, but we are not using this aspect
a particular other AI. of the game in this work. We are focusing on the two player
local scenario mode where normally a human player battles an
Our proposed method for learning plays from game logs AI player, but we have AI players battle other AI players for
works in several phases. In the first phase, we scan a log the creation of our game logs (see Section IV).

‹,((( 391
,(((&,*  Tainan, Taiwan August 31, 2015 – September 2, 2015

Battle for Wesnoth is played on a map consisting of is its decision function. Of particular interest with respect to
hexagonal tiles. Each tile is a composite that can consist of plays is the set Sit of possible situations Ag can be in, which
terrain types such as rivers, forests, and mountains or structural is essentially Ag’s view of a game state (although, naturally,
areas such as bridges, castles, and villages. Any unit entering there is a data area in Dat that would store plays and fAg
a tile ”pays” a movement cost which is based both on the needs to make use of stored plays). [6] proposed to describe
type of terrain and the type of unit. The tile a unit is on also an element sit ∈ Sit as a subset of so-called observations
provides offensive and defensive bonuses, based on the type Obs, sit ⊆ Obs, that are simple predicates describing a game
of unit. For example, Elves have a defensive bonus for being state from the perspective of the agent, more precisely from
in forest tiles. There is also a day/night cycle that provides the perspective of the tile the agent occupies and other tiles
advantages and disadvantages to different units. Finally, tiles of importance to the agent. An element of Obs is part of the
with a village on it provide healing to a unit that rests on it. current situation of an agent if this predicate is true in this
situation.
Each player in a Battle for Wesnoth game controls an army
consisting of one leader unit and many follower units. A unit We subdivided Obs into three disjunctive subsets Obs =
can either move or attack an enemy unit or do nothing. Each Obsown ∪ Obsteam ∪ Obsenemy that contain the predicates
unit has an allotted number of movement points which it uses dealing with an agent itself, the other agents in the team
to move across the tiles. Because different tiles have different of the agent and the enemy agents/units. As mentioned
movement costs the distance that a unit can travel often varies. above, Obs and naturally also the three subsets represent
If a unit attacks another unit it forfeits all of its movement abstractions, which is rather obvious due to the fact that
points and cannot take any further actions that turn, therefore there are no predicates dealing with properties of the map
if a unit needs to be moved to a different tile it must perform of tiles. In fact, Obsown deals purely with the health of
the move action prior to performing an attack action. the agent, containing four predicates: HasFullHP, HasLt75HP,
A move action from one tile to another is not guaranteed to HasMoreThanHalfHP and HasLessThanQuarterHP with the
succeed because the unit can get ambushed by stealthy enemy obvious meanings. Obsteam contains predicates indicating that
units in which case the unit is stopped at the ambush tile. An a tile is adjacent to a friendly unit (AdjacentToFriendly), to
attack action is not guaranteed to succeed because the success the leader (AdjToFriendlyLeader), to a damaged friendly unit
rate is determined by the type of weapon used, the tile the (AdjacentFriendlyIsHurt) or to a healer (AdjToHealer). The set
enemy unit is on, the tile the attacking unit is on, and the Obsenemy provides information like a tile having the enemy
day/night cycle, all of which combine to form an attack success leader on it (HasEnLeader), or a unit that cannot retaliate
percentage ranging from 40% to 80%. Additionally, under (EnCan’tRetaliate). The rather complex boni and penalties due
certain circumstances an attacked unit can retaliate wounding to terrain are abstracted to ”being disadvantaged” (EnIsDisad-
or killing the attacker. All of this makes Battle for Wesnoth vantaged). There are additionally predicates about the health
non-deterministic in that actions taken by a unit have a chance of an enemy unit, similar to the ones for friendly units and
to fail and may not change the game state at all. also similar predicates to Obsteam about adjacent tiles.

A game of Battle for Wesnoth consists of a sequence of Given a game state representation s containing the po-
turns. Each turn, a player may act for their side while the sitions of all agents and their health together with the
opponent waits. These roles are reversed when the acting game map, a situation description using Obs for each
player indicates the turn is over. A player wins the game by agent can be easily computed using the tile the agent
defeating the opponent’s leader unit. In the next subsection, is occupying in s and all tiles visible/reachable by it
we will provide a more formal view on Battle for Wesnoth within the turn. A log for a game is then a sequence
and the units. s1 , Acv11 , s2 , Acv12 , s3 , ..., s2i−1 , Acvi1 , s2i , Acvi2 , ... of game
states si and action vectors Acvil = (al1,i , ..., alml ,i ) with
B. An agent-oriented view of Battle for Wesnoth alj,i ∈ Actlj for agent Agjl , with l indicating the team Agjl
belongs to (in our case l ∈ {1, 2} and ml indicating the number
In order to be useful, plays need to capture the key of units in each team).
properties of the game states that are responsible for achieving
the intended objective and abstract away other properties. In
III. M INING LOGS FROM BATTLE FOR W ESNOTH
addition, we would like plays to be usable both for an AI that
controls all units and for AIs that control only a single unit In this section, we will first formally define what a play
(as proposed in [6]). Since given plays for all single units it is is and then go through the steps we used to mine such plays
easy for a central AI to control all these units, we will focus in from logs from Battle for Wesnoth.
this paper on representing plays from the perspective of a unit.
For both, the key properties and the unit (agent) perspective, A. Plays formally
the concept of observations from [6] is a good starting point.
The goal of this research is the mining of game logs of
Seeing a game unit as an agent means that a unit Ag is Battle for Wesnoth for plays, which are essentially sequences
defined as Ag = (Sit,Act,Dat,fAg ), where Act is the set of of actions for a group of agents that should achieve a
actions it can perform1 , Dat is the set of possible value com- given directive, in our case killing an enemy. Obviously, a
binations of its internal data areas and fAg : Sit × Dat → Act play needs to be triggered and, given the non-deterministic
1 Since Battle for Wesnoth can have a unit perform several moves and/or an elements of the game, a play can fail at any point while it is
attack in a turn, we assume that Act contains also all such combinations as performed. This leads us to the following definition of a play
individual actions. P lay:

392
,(((&,*  Tainan, Taiwan August 31, 2015 – September 2, 2015

P lay = (Roles, ActionT r, T rigger). C. Eliminating irrelevant information


Here, Roles = (r1 , ..., rk ) is a vector of placeholders for
agents that needs to be instantiated by agents from a concrete At this stage, the states and action vectors of a play
game state s. ActionT r is a tree structure, where a node has candidate contain information about all agents of the teams.
the form But usually only a few of the agents of the team that managed
(((r1 , cond1 , a1 ), ..., (rk , condk , ak )), {subtr1 , ..., subtro }). the kill really contribute to this outcome, and naturally only
The ai s are actions (that an agent fulfilling the role ri has to the actions and features of those agents should be represented
be able to perform, see later) and the condi s are conditions in a play. Therefore, in this step we first have to identify the
that have to be fulfilled in the agent’s (fulfilling ri ) current agents that are relevant.
situation. This means that a condi is a subset of Obs, We start by putting all agents that were killed in si (i.e.
respectively an extension of Obs that allows us to also all agents that were alive in si but not in si+1 ) into the set
describe health differences between agents (RelHDiff) in of relevant agents. In all plays we mined there was only one
addition to the ”concrete” health predicates mentioned before. such agent. All agents that performed an attack action on the
A subtrj is, as the name suggests, a sub-tree of ActionT r killed agent naturally are also relevant. We will refer to the
describing the actions of the agents for the next turns. By team of those agents as the attacker team or the friendly team
having several sub-trees, a play can describe several variants and to the other team as the enemy. All agents that in any of
in reaction to what the enemy units are doing. We consider the states of the candidate plan (except for si+1 ) are adjacent
the ”length” of a play to be the number of turns an agent to any of the relevant agents so far are also relevant. We also
would require to perform their role actions, for a path through consider all agents relevant that in any of the states of the
ActionTr. candidate (again, except for si+1 ) are within a given distance
reldist of any of the killed agents. This distance is measured
T rigger is a collection of conditions that need to be by considering the shortest path length (in tiles) between the
fulfilled for a play to be enacted. For this, we first need a agents. In our experiments, we used reldist = 3. Finally, we
function fras that provides for each role ri ∈ Roles an actual also consider any friendly agent as relevant that in one state
agent Agi (note that these assignments will include one agent of the candidate play moved out of a tile that a unit that is
from the other team, the target of the play). For i = j we already relevant moved onto in the same turn.
require fras (ri ) = fras (rj ). Determining fras is part of the
trigger evaluation and after finding fras , we have to test all Given all relevant agents as defined above, we now elim-
of the conditions condi in the root node of ActionT r, if inate from each si all agents that are not relevant (creating
they are true under fras . Additionally, T rigger has to contain states si ∗). We also create out of all Acvil new action vectors
additional conditions on fras out of a set Obsrole that describe for the relevant agents only (Acvil ∗).
required properties of the agent assigned to a role, like the type
of agent (for example, being an archer, or being one of a set D. Eliminating too costly candidates
of different types).
So far, a play candidate only has established that the
With this, determining that a play can be activated comes actions by the attacker team have resulted in killing a unit
down to finding an fras that fulfills T rigger. And enacting of the enemy. While this is part of our goal, a play achiev-
the play is done by following a path through ActionT r where ing that while loosing a lot of team members from the
the conditions for the agents assigned by fras to the roles are attacker team or just while having one or several of the
fulfilled in the game state reached when a node is reached and team members near death is not really a good play and
each agent performs the prescribed action from the node. not something we want to use. To filter out the costly play
candidates we define the utility util of a play candidate plc =
s1 ∗, Acv1l1 ∗, ..., slmax ∗, Acvlmax
l2
∗, slmax+1 ∗ as follows2 :
B. Creating a play candidate lmax/2
util(plc) = j=1 ( enDg(s2j ∗, s2j−1 ∗) × wenDg −
Given a game log s1 , Acv11 , s2 , Acv12 , s3 , ..., s2i−1 , Acvi1 , f rDg(s2j+1 ∗, s2j ∗) × wf rDg +
s2i , Acvi2 , ..., play candidates are identified by going through enLDg(s2j ∗, s2j−1 ∗) × wenLDg −
the log backwards and constructing a candidate out of every f rLDg(s2j+1 ∗, s2j ∗) × wf rLDg −
game state in which an agent of either team was killed and the
states (and action vectors) around this state. More precisely, if enP H(s2j ∗) × wenP H +
si is such a game state, the initial play candidate will consist f rP H(s2j−1 ∗) × wf rP H +
of the lmax game states before si and the game state si+1 enD(s2j ∗) × wenD −
and all action vectors between these states, i.e. f rD(s2j−1 ∗) × wf rD +
l1 l2
si−lmax , Acvi−lmax , si−lmax+1 , Acvi−lmax+1 , ..., si ,
l2 enLDs2j ∗) × wenLD −
Acvi , si+1
Note that the possible play created out of this candidate will f rLD(s2j−1 ∗) × wf rLD )
use the actions of agents from the team that is l1 , since this In the formula above, the function enDg(s, s ) computes the
team was responsible for the killed agent (from l2 ). Due to damage taken by the relevant attacked agents (the enemy)
that, lmax should be an even number, since the first state between state s and s (i.e. during the turn of the attacking
in a candidate as defined above would not be of interest team). f rDg(s, s ) is doing the same for the relevant attacking
otherwise (and we accounted already for that by indicating
l1
that Acvi−lmax is taken by the agents in l1 ). 2 We renumbered states to start with index 1 for easier readability.

393
,(((&,*  Tainan, Taiwan August 31, 2015 – September 2, 2015

agents. enLDg(s, s ) and f rLDg(s, s ) compute the damages in the two states have different move distances from each other
to enemy and friendly leader (if they are relevant). enP H(s) and the target unit then it is very unlikely that the same actions
and f rP H(s) compute the overall potential healing enemy, applied in each state will result in the same outcome in the two
resp. friendly, units get after state s. Finally, enD(s), f rD(s), follow-up states. So, states with such difference in formation
enLD(s) and f rLD(s) compute the number of dead enemy should not be considered similar at all.
agents, dead friendly agents, dead enemy leaders and dead
friendly leaders (from the set of relevant agents and naturally Our solution to all these problems is to first compute a
with a maximum of 1 for the leaders) in state s. Please note similarity matrix for the start states of all the play candidates
that by having slmax+1 ∗ included, we are discouraging suicide in the following manner. If si is the start state of play candidate
attacks by attackers to kill a target enemy. plci and sj for plcj , then for each of the different assignments
assign of agents in si to agents in sj (within the different
All of the w components in the formula are weight factors teams, naturally), we first compute the move distances between
allowing us to manipulate the influence its corresponding func- all agents in a state. If there are any differences in agent
tion component has. For our experiments we used wenDg = distances between the two states, then we consider them not
wf rDg = 1, wenLDg = wf rLDg = 3, wenP H = wf rP H = comparable. Otherwise, we compute the similarity sima as
0.8, wenD = wf rD = 10 and wenLD = wf rLD = 1000. Due sima (si , sj , assign) = typeM is(si , sj , assign) × wtype +
to this weighting scheme, a play candidate with a positive healthM is(si , sj , assign) × whealth
utility passed the filter. After this filtering step, information where typeM is(si , sj , assign) is the number of agents in si
about the states resulting from an enemy turn and enemy for which assign did not assign an agent in sj of the same
actions are not necessary anymore and they are discarded from type and healthM is(si , sj , assign) is computed by putting
the play candidates for the next phases. each agent into a health category (more than 75 percent, more
than 50 percent, more than 25 percent, less than 25 percent,
E. Clustering play candidates but only one of these categories can be used) and summing up
over all agents the following penalty values: if the unit in si has
Our mining approach has now reached the stage where a higher category than the assigned unit in sj , then the penalty
we have a set of candidate plays {plc1 , ..., plcp } of sufficient is 0, otherwise it is the number of categories the unit in si is
utility. But it is unclear, if these candidate plays can be used below the category of the unit in sj (so, for example, if the
in enough games or often enough in some single games to unit in si is in the more than 25 percent category and the unit
be useful. The solution to this problem cannot be achieved in sj is in the more than 75 percent category, then the penalty
by looking at individual candidates, but it is within the set of is 2). As usual, the ws are weights and in our experiments
candidates, since it contains all candidates that can be extracted we set them as wtype = 2 and whealth = 1. The similarity
from the given logs. Following the idea from [8] to identify sim of si to sj sim(si , sj ) is then the minimal value over all
recurring tasks, we use clustering of the candidate plays to possible assignments. We also record the particular assignment
detect groups of similar candidates big enough to be considered that was used for computing this value, since we will use this
useful. assignment in all future computations involving the two play
Before we perform the main clustering, we split the set of candidates.
candidate plays into several sets in a pre-clustering step, using
criteria where different values are known to be incompatible The similarity matrix allows us already to identify candi-
to ensure that candidates that differ in these values always end date plays that are not similar at all to any candidates and
up in different clusters. These criteria are the enemy leader these plays will be filtered out. The remaining candidates are
being a target of the play or not and the number of attacker then clustered using the PAM algorithm (see [5]) and the
agents. We then perform the following clustering in each of similarity matrix. PAM is very similar to the well-known k-
these sets (and create plays out of all the clusters large enough means algorithm and has similar limitations. PAM is given
in each set). a number k of clusters and then randomly selects k candidate
plays as the so-called medoid of a cluster. Then each remaining
A key component of any clustering algorithm is the similar- candidate play is assigned to the cluster it is most similar to
ity (or distance) function between the elements to be clustered. the medoid using the similarity matrix and the first state in the
Here, candidate plays present a little bit of a challenge due to play. For each of the clusters, we sum up the similarity values
what they contain. If we look at two states (which has to be at of each element to the medoid as this cluster’s ”cost”. We then
the core of such a similarity function), then there is an obvious try to improve (i.e reduce) this cost by trying to swap one
need to determine an assignment of the agents of each state to medoid with another play candidate in the cluster, recalculate
the agents of the other state and after such an assignment is clusters and cost and go through all possible swaps in all
done, the similarity does not only have to look at the formation clusters to find the best swap, i.e. the one that reduces the cost
of the agents (resp. the differences there), but also at properties the most. If we find such a reduction, we then make this swap
of the agents, like type and health. With regard to a property permanent and use this new group of clusters as a starting point
like health, we have to deal with the fact that, for the purpose for the improvement, again. We repeat this for a given number
of fighting with other units, there is some anti-symmetry here. of iterations (in our experiments 1000) or until no improvement
If in a state s a unit Agi has more than 50 percent of its health is found. This is one run of the PAM algorithm and due to
remaining and in the state s the corresponding unit Agi has 25 the initial randomness we restart the algorithm several times
percent of its health, then, for the purpose of a battle, the unit (50 in our experiments). Since we also do not know what the
Agi should be considered somewhat similar to Agi , but not best value k is, we do the above for the different k under
vice versa. And on the formation side, if corresponding units consideration (between 1 and 15 in our experiments).

394
,(((&,*  Tainan, Taiwan August 31, 2015 – September 2, 2015

The above procedure leaves us with quite a number of components of this root of a subtree are created as described
different partionings of the candidate plays into clusters. Since for the root node of the whole play.
we do not want to have a candidate play in several clusters,
we need to determine which of these partionings (and con- IV. E VALUATION
sequently clusters) we want to use in the next phase. There
are several ways to address this in the literature and we have In this section, we first present the set-up of our experi-
chosen to use the partioning with the best Silhouette value (see ments with the method presented in the last section and then
[7]) and to select all clusters in this best partitioning of a size present the results of these experiments. We will first take
of at least minClSize for going forward to the final step of a look at the overall numbers of plays identified in various
our method. In our experiments, we used minClSize = 5. mining runs and the successes of these plays in logs that were
Please note that while we have so far only used the first state not used to create them and then we will look at some of the
in each play, the action vectors and other states will be used found plays in more detail.
in the final step.
A. Set-up of experiments
While Battle for Wesnoth has already facilities to create
F. Generalizing clusters into plays
game logs and there are even some logs available on the
As stated above, the set of clusters {Cl1 , ..., Clt } resulting Internet, what these logs store is not enough for our purpose.
from the above phase of our method is used to create a play Therefore we needed to create game logs with all necessary
out of each Cli . The root of the ActionT r component of the information on our own, which would have been a rather
play P layi for Cli is constructed by using the first state of the tedious process if we had not had the system from [6] available,
medoid and abstracting this state into a set of conditions cond which allowed us to learn a wide variety of unit AIs that we
for each of the agents of the friendly team by including all could combine to create players playing against each other.
observations from Obs that are true in this state, resp. the part We used different combinations of the learned unit AIs to play
of it that is observable by the agent. The same observations also games against each other and the build-in game AI on the The
go into the trigger T rigger (as mentioned before), but we add Freelands map, resulting in 2000 logs of games.
to it the appropriate observations from the role observations To evaluate our mining for plays, an experiment consisted
(Obsrole ) and we report the observations around the health of of randomly selecting half of the game logs to apply our
the various agents in the following way. method to (training examples) and then we checked the other
If the health observation for the target is the same in every half of logs (test examples) for game states fulfilling the
candidate of the cluster, we use this observation for the target triggers of the found plays. For every such state we then
and add for all other agents predicates describing the health checked what happened in the log against the triggered play
of this agent relative to the target’s health. For example, if the and counted the play as successfully applied, if the targeted
target has more than 50 percent health and the other agent has enemy was killed while the attacker units in the play where
more than 75 percent, then the predicate for this agent will be following a path through ActionT r (or immediately after
RelHDiff(+1). If the other agent has less than 25 percent, then going through the whole path). Otherwise, we consider this
it will have RelHDiff(-2) added as health observation. If the an unsuccessful trigger of the play. To get an idea of how
health observation for the target varies in the candidates, we reliable our mining approach is, we performed 10 of these
first compute the relative health differences between agents experiments, each for two different values of lmax/2, namely
within a candidate as described above and then we put into 2 and 3.
the play the highest health observation for the target and for
each other agent the smallest relative distance to the target in B. Mining plays: the numbers
any of the candidates. The action for each role in the root of While it naturally would be best of we could provide
ActionT r is determined as the action taken by most of the detailed information on each of the plays mined in each of
corresponding agents in the cluster. the experiments, we do not have the space for doing this. As
To complete ActionT r, we recursively go over the fol- Table I shows, in the experiments mining plays of length 2 we
lowing steps until we have reached the last state in the play have on average over 32 plays and while increasing the length
candidates. Given a cluster of play candidates that has created to 3 reduces the number of plays, it is still over 23 on average.
a node in ActionT r that has no subtrees, yet, we create a set We will therefore present statistical results of our evaluations
of clusters of these candidates by applying the PAM algorithm on the level of experiments, providing average, maximum and
we described in the last subsection. Corresponding agents in minimum numbers of plays.
the states of two candidates are determined by the assignment TABLE I. N UMBER OF PLAYS MINED ( ON AVERAGE , MIN AND MAX ;
computed for these two candidates when creating the similarity PLAYS TARGETING LEADER IN PARENTHESES )
matrix for the first states of these candidate plays. There is
lmax/2 2 3
no filtering out of states that are incompatible with all other avg. 32.8 (13) 23.1 (8.3)
candidates, these states simply form their own cluster (and max. 36 (15) 27 (10)
consequently node in ActionT r). With regard to k-values, we min. 27 (9) 18 (5)
set it in PAM from 1 to the cluster size of the cluster we are
sub-partitioning. Each of the clusters of the best partitioning Our goal was to mine plays that are often applicable in
out of the application of PAM is used to create the root node games and that are often successful in killing an enemy.
of a new subtree (of the node we are working on) and the other Table II presents the number of plays that could have been

395
,(((&,*  Tainan, Taiwan August 31, 2015 – September 2, 2015

TABLE V. N UMBER OF PLAYS MINED ACTIVATED IN 10 PERCENT OF


activated in the set of test logs on average once or more than LOGS WITH MORE THAN 25 PERCENT SUCCESS ( ON AVERAGE , MIN AND
once per log. While the reported numbers are substantially MAX )
smaller than the number of plays (as was expected given the
lmax/2 2 3
fact that we used quite a number of different AI players in
avg. 11.3 4.9
creating the logs), we have more than a third of all plays max. 14 7
often applicable. This becomes even better, when we take into min. 7 2
account that plays that target the enemy leader are unlikely to
be activated more than once per log. In fact, in Table I the
numbers in parentheses indicate the number of mined plays under rather harsh definitions of these terms (given the non-
that target the enemy leader and with this we have more determinism of the game). While, especially due to the tree
than half of the non-leader plays being often applicable. The structure of our plays, the plays found in the different exper-
highest number of average applications of a play per log in all iments most of the time were not identical, in our subjective
experiments was over 23, with all but one experiment having judgment there are at least some similarities. This is another
at least one play that was applicable in the test logs more than reason why we presented our data on the experiment level.
10 times per log.
TABLE II. N UMBER OF PLAYS MINED ACTIVATED MORE THAN ONCE C. Mining plays: selected plays
PER LOG IN TEST SET ( ON AVERAGE , MIN AND MAX )
In this section, we will take a closer look at two of the
lmax/2 2 3 mined plays. Our first play example is a play that attempts
avg. 13.5 10.1 to kill the enemy leader. In our evaluations, its trigger was
max. 17 12
min. 10 8
matched in a little over 30 percent of the test logs. And then it
was successful in killing the leader 73.4 percent of the cases,
As for the success of the mined plays, Table III shows the which naturally means that the whole game was won. In short,
number of plays that, in the test logs, can be ”played” through the play is best described as killing a lone leader, which, as
and are successful in killing the target in more than half of the pictures in Figures 1 and 2 show, can occur if the leader
the times they would be triggered. Naturally, the longer a play stays in the castle.
is the more likely it is that the players producing the logs The initial conditions of the play are that the first friendly
would use different actions for their units, which then results unit is the leader adjacent to the two other friendlies with a
in different outcomes. The drop in the numbers in Table III relative health to the target that is -1 (or better). The second
from length 2 to length 3 reflects that. But even for a length unit has to be an archer or a shaman that is one move away
of 3 actions we have at least 2 plays in each experiment that from the third unit and the target. It also has to have a relative
are rather successful. health of at least -3. The third unit has to be a shaman or fighter
TABLE III. N UMBER OF PLAYS MINED WITH MORE THAN 50 PERCENT fulfilling the location requirements indicated in the conditions
SUCCESS IN EVALUATION ( ON AVERAGE , MIN AND MAX ) for the other two units and having a relative health of at least
-3. Our instantiation in Figure 1 has fras assign as the second
lmax/2 2 3
avg. 10.5 3.4
unit an archer and as the third unit a warrior (which is a type
max. 13 5 of fighter). The target has to be a leader with a health of 100
min. 8 2 percent at the most.
The actions of the friendly units have them move to the
Finally, Table IV reports the number of plays that both were target and attack it, with the second and third unit having to end
”activated” once or more on average in the test logs and that up being adjacent to the first unit. These moves are indicated
were successful more than half of the time. As this table shows, by the red arrows in the figures for this concrete instantiation.
each of the mining experiments produced at least one such In our instantiation, the enemy’s counterattack kills the archer.
play, with at least 3 for the experiments mining shorter plays. Its own actions are to move where the archer died and to attack
If we lower the requirements to 0.1 activations on average and the friendly leader (but not doing a lot of damage).
25 percent success, we get the numbers in in Table V, with
a minimum of 7 such plays of length 2 and 2 of length 3. Our play has a subtree (a node really) that has as conditions
The averages over all experiments show that one third of the that the first unit is adjacent to target and the third unit and
mined plays of length 2 could be considered often applicable that its relative health is +1. The third unit must be adjacent
and successful and over one fifth of the plays of length 3 using to the first and one move away from the target, with a relative
these weaker definitions. health of 0 and it must be a fighter. The target must have a
TABLE IV. N UMBER OF PLAYS MINED ACTIVATED MORE THAN ONCE
health of under 50 percent.
PER LOG AND WITH MORE THAN 50 PERCENT SUCCESS IN EVALUATION
The first unit’s actions are to move adjacent to the target
( ON AVERAGE , MIN AND MAX )
and then to attack it, while the third unit has to move adjacent
lmax/2 2 3 to both the first unit and the target and then attack the target.
avg. 5 2.2 Figure 2 shows this for our instantiation from one of the test
max. 7 3
min. 3 1
logs. Note that in this instantiation the first unit is already
adjacent to the target.
Overall, by just mining 1000 game logs, our method was This example shows that our mining allows for creating
able to mine often applicable and often successful plays even plays that make ”useful” sacrifices. Trading a non-leader unit

396
,(((&,*  Tainan, Taiwan August 31, 2015 – September 2, 2015

for killing the enemy leader is definitely useful. We also see the assigns the first unit to a shaman (upper left) and the second
possibility of strengthening of conditions, especially regarding unit to a fighter (lower right). The target unit is not visible,
the relative health of the units, that can be used to realize that but it is located northwest from these units. The play requires
the play might not work as planned, which is naturally very the second unit to move adjacent to the initial position of the
important in this play, given that one of the friendly units is first unit at a spot that is closer to the target than the first unit
the leader. is. The first unit has to move between the new position of the
second unit and the target.
Figures 3 to 5 show one instantiation of our second exam-
ple play of length 3 with 2 roles to fulfill. This particular play The subtree of the play that was matched by the depicted
was on average 7.6 times per log applicable in our evaluations, instantiation strengthened the conditions for the involved units,
with a success percentage of 35. In short, this play is best now requiring that the first unit is a shaman and the second one
described as hunting lone enemies. a fighter. It also requires that the target is a fighter and that the
friendly units are at least as healthy as this target. Additionally,
The play has two friendly roles and the trigger conditions the first unit has to be adjacent to the target and the second
require that those two friendly units are not the leader and that unit one move away from it. Note that this required that the
they are at least as healthy as the target. The first friendly unit target did move towards the friendly units in its previous turn,
must be two moves away from the target, the second unit 3 which was the case in the instantiation depicted in Figure 4.
moves and both of them must be one move away from each As this figure shows, the actions of the play have unit 2 move
other. Figure 3 shows an instantiation of this where the fras adjacent to the target and the other friendly unit and then attack
the target. The first unit has to move away from the target but
stay adjacent to the second unit (which will result in the first
unit being healed in the next turn).
The next node in the play requires the two friendly units
to be adjacent to each other and one move away from the
target (as seen in Figure 5). It also requires that the first unit’s
relative health to the target is +2 and the second one’s is +1.
The target itself must be below 50 percent of maximum health.
The actions are essentially a repeat of the previous turn, with
the second unit moving towards the target and attacking it.
The first unit moves again so that it is one move away from
the target but adjacent to the other friendly unit. This should
result in killing the target (as it did in the instantiation).
This play and its instantiation highlight several features
of our mining method, again. The level of abstraction that we
used for triggering a play really focuses on the requirements for
the actions that form the play and allow for different outcomes
of these actions (and the actions of the enemy unit). While the
players in the test logs naturally are not aware of plays and
therefore can not ”abandon” a play if the conditions of none
Fig. 1. Killing a lone leader: approach actions of the subtree roots are met, the fact that these conditions can
be more strict than the initial trigger should allow, again, for
exactly such a cancelling of a play in applications that are built
with the concept of plays in mind.

V. R ELATED W ORK
From a mining perspective, our method aims at finding
sequences of events in data that are both occuring sufficiently
often and have a high utility. There are only very few works
in that area and [10] is currently the leading method known.
But [10] does not deal with events created by groups of agents
(or complex scenarios with states). It also allows for having
unrelated events in between events of a sequence (which we
do not do, which made our task a little bit easier). On the
other side, we include in our plays the possibility of several
different follow-up actions, which is also not part of [10].
If we look at data mining in games in general and especially
of game logs, then we see a lot of works in the last few
years and we even have a first overview article with [4].
Fig. 2. Killing a lone leader: going for the kill But, as this overview article shows, the focus on mining of
game logs is on creating information about players and not so

397
,(((&,*  Tainan, Taiwan August 31, 2015 – September 2, 2015

much on information about the game (and our plays represent


information about good move combinations and sequences
of units). An example of this difference can be seen in [3]
which, like us, uses clustering methods but with the purpose
of clustering players of games and detecting player types. This
requires not just abstraction of game states, but an abstraction
of whole games.

VI. C ONCLUSION AND F UTURE W ORK


In this paper, we presented a mining method for plays,
sequences of actions for a group of units intended to achieve
a given directive, out of game logs for the game Battle for
Wesnoth. Our representation of plays allows for executing
them by a central AI, but also by unit AIs. Our method
identifies game state sequences that end with the directive
fulfilled, filters out irrelevant information from these states and
discards sequences that fulfill the directive at too high a cost.
It then clusters similar sequences to identify clusters that are
big enough to represent a play. Our evaluation of the mining
method, using training logs and test logs and experiments Fig. 3. Hunting lone enemies: approach actions
with different sets of these logs, showed that we are able to
find several plays with a high chance of success and often
applicable consistently, even out of a set of logs that was
created using quite a number of (learned AI) players.
While we have a method for mining plays and [1] indicates
that plays (resp. macros that are more simple then our plays)
are very useful, future work will focus on exploring the
usefulness of plays more. [1] focused on testing of AI players,
requiring to find only one way to beat the AI. In contrast, [6]
presented a concept for learning general game behavior and
plays should be even more useful with this regard. We also
intend to look into mining plays for other games.

R EFERENCES
[1] M. Atalla, J. Denzinger, Improving Testing of Multi-Unit Computer
Players for Unwanted Behavior using Coordination Macros, Proc. CIG-
09, Milan, 2009, pp. 355–362.
[2] K.-T. Chen, H.-K. Kenneth Pao, H.-C. Chang, Game bot identification
based on manifold learning, Proc. 7th ACM SIGCOMM WS on
Network and System Support for Games, NetGames, New York, 2008,
pp. 21–26. Fig. 4. Hunting lone enemies: first attack and ensuring healing
[3] A. Drachen, R. Sifa, C. Bauckhage, C. Thurau, Guns, swords and data:
Clustering of player behavior in computer games in the wild, Proc.
CIG-12, Granada, 2012, pp. 163–170.
[4] A. Drachen, C. Thurau, J. Togelius, G.N. Yannakakis, C. Bauckhage,
Game Data Mining, in M. Seif El-Nasr, A. Drachen, A. Canossa (eds.):
Game Analytics - Maximizing the Value of Player Data, Springer, 2013,
pp. 205–253.
[5] L. Kaufman, P.J. Rousseeuw, Finding Groups in Data: An Introduction
to Cluster Analysis, Chapter 2, John Wiley & Sons, 1990, pp. 68–125.
[6] S. Paskaradevan, J. Denzinger, D. Wehr, Learning cooperative behavior
for the shout-ahead architecture, WIAS Vol. 12(3), IOS Press, 2014,
pp. 309–324.
[7] P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and
validation of cluster analysis, J. of Comp. and App. Math. Vol. 20,
1987, pp. 53–65.
[8] J.P. Steghöfer, J. Denzinger, H. Kasinger, B. Bauer, Improving the
Efficiency of Self-Organizing Emergent Systems by an Advisor, Proc.
EASe 2010, Oxford, 2010, pp. 63–72.
[9] D. White, Battle for Wesnoth, http://www.wesnoth.org/ (as seen on
23.2.2015).
[10] C.-W. Wu, Y.-F. Lin, P.S. Yu, V.S. Tseng, Mining high utility episodes
in complex event sequences, Proc. 19th ACM SIGKDD, Chicago, 2013, Fig. 5. Hunting lone enemies: going for the kill
pp. 536–544.

398

Você também pode gostar