Escolar Documentos
Profissional Documentos
Cultura Documentos
Map(x, y)
Meaning x and pronunciation y map to one another
With our two meanings and two pronunciations, we have 4 of these constraints.
Each one is violated when a given meaning maps to another pronunciation (in
production), or when a given pronunciation maps to another meaning (in
interpretation). We assume Maximum Entropy grammar (MaxEnt; Goldwater &
Johnson, 2003), a probabilistic weighted constraint extension of Optimality
Theory (Prince & Smolensky, 1993), in which the probability of a candidate
mapping is proportional to the exponential of the weighted sum of constraint
violations. The evaluation of the two potential mappings from BIRD by a MaxEnt
grammar is shown in (2). The weights are shown beneath the constraint names,
and violations are indicated with negative integers. The weighted sums are
shown in the column headed H, for Harmony (Smolensky & Legendre, 2006).
The constraint preferring the mapping to [ba] has greater weight, and (BIRD,
[ba]) thus has higher probability.
(2) Illustrative MaxEnt production tableau
Map(BIRD, [ba]) Map(BIRD, [ku]) H p
BIRD
2
1
[ba]
-1
-1 0.73
[ku]
-1
-2 0.27
In our agent-based model, we have just two agents who learn from one another.
For each step of their interaction, we randomly select one as the teacher, and the
other as the learner. We then randomly select one of the meanings for the
teacher to produce. The pronunciation is determined by sampling from the
probability distribution over the candidate pronunciations defined by the MaxEnt
grammar.
The learner is given only the pronunciation, and infers the meaning using the
same mapping constraints with the same weights as in production, as in Robust
Interpretive Parsing (Jarosz, 2013; Tesar & Smolensky, 2000), and Bidirectional
OT (Boersma & Hamann, 2008). In production, meaning is given and
pronunciations compete, and in interpretation a pronunciation is given and
meanings compete. In (3), we see the evaluation of two mappings from [ba], and
the activity of the two relevant constraints.
(3) Illustrative MaxEnt interpretation tableau
Map(BIRD, [ba]) Map(TREE, [ba]) H p
[ba]
2
1
BIRD
-1
-1 0.73
TREE
-1
-2 0.27
After sampling a meaning, the learner then uses the grammar to produce its own
pronunciation (that is, generate its own expectation given the inferred meaning).
If that pronunciation does not match the teachers, the weights are updated by
subtracting the violations for the learners pronunciation from those for the
teachers (assuming the learners inferred meaning), scaling the resultant vector
by the learning rate, and adding it to the learners pre-update weights (the delta
rule / perceptron update / HG-GLA). This update shifts probability from the
learners mapping onto the mapping composed of the teachers pronunciation
and the learners interpretation.
(4) Illustrative learning step
i. Randomly sampled meaning
ii. Teachers production
iii. Learners interpretation
iv. Learners production
v. Update
BIRD
[ba]
TREE
[ku]
Raise Map(TREE, [ba]),
Lower Map(TREE, [ku])
We start with zero weights. Given zero weights, the two pronunciations are
equally probable for each of the meanings, and the two meanings are equally
probable for each of the pronunciations. Contrast can be measured as the
probability difference of one of the pronunciations across the two meanings.
Probability difference 1 corresponds to the two meanings having maximally
distinct pronunciations, and probability difference 0 corresponds to there being
no difference. At the outset of learning, there is probability difference 0.
The following graph shows the mean and standard deviations of probability
difference over 100 runs with 10,000 iterations of agent interaction. The learning
rate was 0.1, and weights were truncated at zero. Over time, contrast tends to
increase.
To see why the agents would tend to move towards contrast instead of
homophony, consider what happens to the interpretation probabilities when an
agent performs an update. In our above example with a teachers [ba] that the
learner interprets as TREE, and then produces itself as [ku], the learner
increased the weight of Map(TREE, [ba]), and decreased the weight of
Map(TREE, [ku]). Not only does this increase the probability of interpreting [ba]
as TREE, but it also increases the probability of interpreting [ku] as BIRD. The
probability of BIRD given [ku] is increased because the probability of TREE given
[ku] is decreased. To make this more concrete, imagine that the pre-update
weights were all at 1, giving all of the candidates equal probability, in both
production and interpretation. Given a learning rate of 0.1, the post-update
weights will be as shown in (5) and the interpretation probabilities will be as
shown in (6).
(5)
Post-update weights
Map(BIRD, [ba])
Map(BIRD, [ku])
Map(TREE, [ba])
Map(TREE, [ku])
(6)
1
1
1.1
0.9
p(BIRD) = 0.48
p(BIRD) = 0.52
p(TREE) = 0.52
p(TREE) = 0.48