Você está na página 1de 4

Emergent contrast in agent-based modeling of language

Robert Staubs and Joe Pater


May 25, 2015
In this short piece, we study a small abstract instance of a formal system to
better understand its properties, a tactic fruitfully applied in much of Alan Princes
research. Here, we seek to better understand the emergence of contrast in an
agent-based model in Pater (2012). Simplifying from the case in that paper, here
we have just two meanings (BIRD and TREE), and two possible pronunciations
([ba] and [ku]). Either meaning can map to either pronunciation. Contrast exists
to the extent that they map to different ones.
The mappings are controlled by bidirectional mapping constraints relating each of
the meanings to each of the pronunciations:
(1)

Map(x, y)
Meaning x and pronunciation y map to one another

With our two meanings and two pronunciations, we have 4 of these constraints.
Each one is violated when a given meaning maps to another pronunciation (in
production), or when a given pronunciation maps to another meaning (in
interpretation). We assume Maximum Entropy grammar (MaxEnt; Goldwater &
Johnson, 2003), a probabilistic weighted constraint extension of Optimality
Theory (Prince & Smolensky, 1993), in which the probability of a candidate
mapping is proportional to the exponential of the weighted sum of constraint
violations. The evaluation of the two potential mappings from BIRD by a MaxEnt
grammar is shown in (2). The weights are shown beneath the constraint names,
and violations are indicated with negative integers. The weighted sums are
shown in the column headed H, for Harmony (Smolensky & Legendre, 2006).
The constraint preferring the mapping to [ba] has greater weight, and (BIRD,
[ba]) thus has higher probability.
(2) Illustrative MaxEnt production tableau
Map(BIRD, [ba]) Map(BIRD, [ku]) H p
BIRD
2
1
[ba]
-1
-1 0.73
[ku]
-1
-2 0.27
In our agent-based model, we have just two agents who learn from one another.
For each step of their interaction, we randomly select one as the teacher, and the
other as the learner. We then randomly select one of the meanings for the
teacher to produce. The pronunciation is determined by sampling from the
probability distribution over the candidate pronunciations defined by the MaxEnt
grammar.

The learner is given only the pronunciation, and infers the meaning using the
same mapping constraints with the same weights as in production, as in Robust
Interpretive Parsing (Jarosz, 2013; Tesar & Smolensky, 2000), and Bidirectional
OT (Boersma & Hamann, 2008). In production, meaning is given and
pronunciations compete, and in interpretation a pronunciation is given and
meanings compete. In (3), we see the evaluation of two mappings from [ba], and
the activity of the two relevant constraints.
(3) Illustrative MaxEnt interpretation tableau
Map(BIRD, [ba]) Map(TREE, [ba]) H p
[ba]
2
1
BIRD
-1
-1 0.73
TREE
-1
-2 0.27
After sampling a meaning, the learner then uses the grammar to produce its own
pronunciation (that is, generate its own expectation given the inferred meaning).
If that pronunciation does not match the teachers, the weights are updated by
subtracting the violations for the learners pronunciation from those for the
teachers (assuming the learners inferred meaning), scaling the resultant vector
by the learning rate, and adding it to the learners pre-update weights (the delta
rule / perceptron update / HG-GLA). This update shifts probability from the
learners mapping onto the mapping composed of the teachers pronunciation
and the learners interpretation.
(4) Illustrative learning step
i. Randomly sampled meaning
ii. Teachers production
iii. Learners interpretation
iv. Learners production
v. Update

BIRD
[ba]
TREE
[ku]
Raise Map(TREE, [ba]),
Lower Map(TREE, [ku])

We start with zero weights. Given zero weights, the two pronunciations are
equally probable for each of the meanings, and the two meanings are equally
probable for each of the pronunciations. Contrast can be measured as the
probability difference of one of the pronunciations across the two meanings.
Probability difference 1 corresponds to the two meanings having maximally
distinct pronunciations, and probability difference 0 corresponds to there being
no difference. At the outset of learning, there is probability difference 0.
The following graph shows the mean and standard deviations of probability
difference over 100 runs with 10,000 iterations of agent interaction. The learning
rate was 0.1, and weights were truncated at zero. Over time, contrast tends to
increase.

To see why the agents would tend to move towards contrast instead of
homophony, consider what happens to the interpretation probabilities when an
agent performs an update. In our above example with a teachers [ba] that the
learner interprets as TREE, and then produces itself as [ku], the learner
increased the weight of Map(TREE, [ba]), and decreased the weight of
Map(TREE, [ku]). Not only does this increase the probability of interpreting [ba]
as TREE, but it also increases the probability of interpreting [ku] as BIRD. The
probability of BIRD given [ku] is increased because the probability of TREE given
[ku] is decreased. To make this more concrete, imagine that the pre-update
weights were all at 1, giving all of the candidates equal probability, in both
production and interpretation. Given a learning rate of 0.1, the post-update
weights will be as shown in (5) and the interpretation probabilities will be as
shown in (6).

(5)

Post-update weights
Map(BIRD, [ba])
Map(BIRD, [ku])
Map(TREE, [ba])
Map(TREE, [ku])

(6)

1
1
1.1
0.9

Post-update interpretation probabilities


Given [ba]
Given [ku]

p(BIRD) = 0.48
p(BIRD) = 0.52

p(TREE) = 0.52
p(TREE) = 0.48

The interpretation step thus provides a link between the probability of a


pronunciation for one meaning and another: when its probability is increased for
one meaning, it is automatically decreased for another. Note that this follows
from the structure of the model; there is no need to stipulate a preference for
contrast.
References
Boersma, P., & Hamann, S. (2008). The evolution of auditory dispersion in
bidirectional constraint grammars. Phonology, 25(02), 217270.
Goldwater, S. J., & Johnson, M. (2003). Learning OT constraint rankings using a
maximum entropy model. In J. Spenader, A. Erkisson, & O. Dahl (Eds.),
Proceedings of the Stockholm Workshop on Variation within Optimality
Theory (pp. 111120).
Jarosz, G. (2013). Learning with hidden structure in Optimality Theory and
Harmonic Grammar: Beyond Robust Interpretive Parsing. Phonology,
30(01), 2771.
Pater, J. (2012). Emergent systemic simplicty (and complexity). Proceedings
from Phonology in the 21st Century: In Honour of Glyne Piggott. McGill
Working Papers in Linguistics, 22(1). Retrieved from
http://www.mcgill.ca/mcgwpl/archives/volume-221-2012
Prince, A., & Smolensky, P. (1993). Optimality Theory: Constraint interaction in
generative grammar. Malden, MA and Oxford, UK: Blackwell.
Smolensky, P., & Legendre, G. (2006). The harmonic mind. Cambridge,
Massachusetts: MIT Press.
Tesar, B., & Smolensky, P. (2000). Learnability in Optimality Theory. The MIT
Press.

Você também pode gostar