Você está na página 1de 13

Average Conditional Entropy of the Tlingit Verbal Inflection Paradigm: A Brief Report

Seth Cable
In the year of our Lord 2000 (or was it 2001?), Alan taught the very first field methods
course that I ever took. The language was Hungarian. We constructed target sentences
poking fun at Trent Lott, referencing Bartk and goulash. Alan spoke excitedly of our
having discovered noun incorporation ON THE HOOF! My final paper was absurdly
long, but the comments back which I received only a few days later were nearly as
long themselves. They made me feel like I could really do this kind of work, which I found
myself having grown in love with over the course of the term.
In light of this, I thought the following brief report might be a suitable
contribution to this venue. So that there is no misunderstanding, I do believe that there is
far more to morphological learning than is suggested here. However, having witnessed
the intricacies of Tlingit verbal inflection reduce to slag several leading theories of
morphology, I am excited by any perspective that can shed light onto how a system of this
sort could ever be learned by earthly beings.
1.

Overview

With assistance from PhD student Presley Pizzo, I employed a program written and made
available to me by Rob Malouf to explore the consequences that the inflectional system of
Tlingit (Na-Dene; Alaska, British Columbia, Yukon) may have for the proposals of Ackerman &
Malouf (2013), in particular their Low Conditional Entropy Conjecture (LCEC).1
Three different paradigmatic representations of the Tlingit verbal inflectional system
were examined. For two of the three, the average conditional entropy was within the range
predicted by the LCEC. I explain below how this accords well with specialists understanding of
the exceptional status of imperfective aspect within the Tlingit inflectional system.
Interestingly and importantly, for all three paradigmatic representations, the average conditional
entropy was much lower than that of the comparable, artificial systems randomly generated by
the bootstrap simulation (Ackerman & Malouf 2013). This supports the view that, even though
average conditional entropy of the Tlingit verbal inflectional system may be relatively high, the
system is organized in a way that minimizes such entropy.
2.

Basic Background on the Tlingit Verbal Inflectional System

I assume Leers (1991) analysis of the Tlingit verbal inflectional system, though my own
presentation below differs slightly in superficial respects, largely following the notational and
terminological changes advanced by Crippen (2013). According to Leers analysis, in order to
inflect a Tlingit verb (or verbal theme), a speaker must know the value of the following three
parameters.

Given my unfamiliarity with Python, Pizzo did the work of actually running the program on the input paradigms I
constructed.

(1)

Three Parameters Defining the Inflectional Classes of Tlingit


a.
The Primary Imperfective Type (if any) of the verb [ 22 possible values]
b.
The Conjugation Class of the verb
[5 possible values]
c.
The Root Type of the verbal root
[9 possible values]

With this information, a speaker can project the surface forms of the verb for each of the
following 18 Tense-Aspect-Mood (TAM) categories.
(2)

The Inflectional Categories of the Tlingit Verb


a.
Imperfective
b.
Irrealis Imperfective
c.
Perfective
d.
Irrealis Perfective
e.
Future
f.
Irrealis Future
g.
Potential
h.
Irrealis Potential
i.
Habitual
j.
Irrealis Habitual
k.
Imperative
l.
Hortative
m.
Admonitive
n.
Consecutive
o.
Conditional
p.
Contingent
q.
Progressive
r.
Repetitive

For more information on the morphosyntax and semantics of each of these TAM forms, I refer
the reader to Leer (1991).
Returning to the key parameters in (1), the Primary Imperfective Type (1a) of the verb
determines the realization of the verb in the imperfective and irrealis imperfective. According to
Leer (1991), there are at least 22 Primary Imperfective Types; it should be noted, however, that
other specialists count as many as 27 such types (Crippen 2013). For the purposes of this study, I
assume the following 22 Primary Imperfective Types.
(3)

The Primary Imperfective Types of Tlingit (Leer 1991, Crippen 2013)


a.
y-Stative Imperfective
b.
:-Stative Imperfective
c.
h-Stative Imperfective
d.
n-Positional Imperfective
e.
:-Positional Imperfective
f.
-Positional Imperfective
g.
:-Processive Imperfective
h.
-Processive Imperfective
i.
h-Processive Imperfective

j.
k.
l.
m.
n.
o.
p.
q.
r.
s.
t.
u.
v.

n-Processive Imperfective
xh-Processive Imperfective
g-Processive Imperfective
(I)g-Processive Imperfective
yoo(I)g-Processive Imperfective
ch-Processive Imperfective
t-Processive Imperfective
s-Processive Imperfective
l-Processive Imperfective
x-Processive Imperfective
t-Processive Imperfective
h-Extensional Imperfective
y-Extensional Imperfective

Next, the Conjugation Class (1b) determines the affixal realizations of the other 16
TAM values in (2), as well as the stem variant appearing in those forms. To illustrate, a verb in
the so-called ga-conjugation will have in its future form the affixes and stem variant in (4a),
while a verb in the 0-conjugation will have the affixes and stem variant in (4b).
(4)

Morphological Realization of Future in Two Conjugation Classes


a.
b.

Ga-Conjugation
0-Conjugation

kei # ga-w-gha-[ :-Stem Variant ]


ga-w-gha-[ :-Stem Variant ]

Similarly, a ga-conjugation verb will have in its perfective form the affixes and stem-variant in
(5a), while a 0-conjugation verb will have the affixes and stem variant in (5b).
(5)

Morphological Realization of Perfective in Two Conjugation Classes


a.
b.

Ga-Conjugation
0-Conjugation

wu-[I]-[ h-Stem Variant ]


wu-[I]-[ y-Stem Variant ]

Under one method of counting, there are five conjugation classes in Tlingit, which receive some
version of the following mnemonic labels.
(6)

Conjugation Classes of Tlingit (Crippen 2013)


a.
b.
c.
d.
e.

0-Conjugation
0/y-Conjugation 2
na-Conjugation
ga-Conjugation
gha-Conjugation

Most authors view the 0/y-Conjugation as a subtype of 0-Conjugation verbs, rather than its own conjugation
class. Such a distinction is not important for present purposes. Since 0/y-Conjugation verbs sometimes take different
stem variants from (plain) 0-Conjugation verbs, the present study views them as a separate conjugation class.

Finally, the Root Type (1c) determines the surface appearance of the stem variant in a
given verbal form. For example, if the Root Type of the verbal root is CVC-Varying, then the
h-Stem variant will surface as CVVC, with a long low-toned vowel nucleus. However, if the
Root Type is CVC-Glottal, then the h-Stem variant will surface as CVVC, with a long hightoned vowel nucleus. By one method of counting, there are nine different root types in Tlingit.
(7)

The Root Types of Tlingit (Crippen 2013)


a.
b.
c.
d.

CV Varying
CV Fading
CVC Varying
CVC Glottal

e.
f.
g.
h.
i.

CVC Invariant
CVVC Invariant
CVVC Invariant
CVV Invariant
CVV Invariant

Note that five of the Root Types are invariant; this means that every stem built from the root
has the same surface form. For example, a CVVC-Invariant root always surfaces with a long
high-toned vowel in every form, no matter what the underlying stem variant specification is.
Given that all three parameters in (1) must be known in order to inflect a Tlingit verb, one
could view the possible valuations of these parameters as defining the inflectional classes (ICs)
of Tlingit. Given that there are not any substantive grammatical constraints on the possible
combinations of those parameters, the ICs of Tlingit would seem to be all 990 logically possible
combinations. However, due to the morphophonology of the language, the contrasts between
certain Primary Imperfective Types collapses with particular Root Types. Bearing this in mind,
the total possible ICs of Tlingit reduces to 790. Note, however, that given the relative rarity of
certain Primary Imperfective Types, many of these logically possible ICs do not seem to be
attested. Nevertheless, the logical structure of the Tlingit system entails their grammatical
possibility, and it is reasonable to assume that a competent speaker of Tlingit would know how
to inflect a verb from every logically possible class. Thus, in this study, I will ignore such
accidental gaps within the overall system of Tlingit inflectional classes, and so will assume the
existence of all 790 of the aforementioned ICs.
3.

Basic Background on the Low Conditional Entropy Conjecture

This report will assume familiarity with the work of Ackerman & Malouf (2013) on conditional
entropy in inflectional paradigms. I will however, for rhetorical purposes, briefly and
superficially summarize the key ideas. The reader is referred to the original paper for precise
technical definitions and equations.
Briefly put, the conditional entropy of Y given X H(Y|X) is a measure of the
uncertainty in the value of Y given knowledge of the value of X. The higher the value of H(Y|X),
the less information that X provides regarding Y. If H(Y|X) = 0, then the value of X completely
determines the value of Y. However, if H(Y|X) =1, then knowing X gives one a 50/50 chance of
picking the correct value for Y; if H(Y|X) = 2, then knowing X gives a 1/4 chance of picking the
correct value for Y; if H(Y|X) = 3, then it is a 1/8 chance, etc. Thus, as H(Y|X) approaches
infinity, the information that the value of X provides regarding the value of Y diminishes.
This notion of conditional entropy can provide a tool for measuring the learnability of a
given inflectional system. To see this, let us imagine the task of the language learner as being to

produce novel inflectional forms of a given verb V based upon the inflected forms of V that
theyve already heard. Intuitively, if hearing a single inflected form of V allowed the learner to
correctly predict all other inflected forms of V, then that inflectional system would be rather easy
to learn. However, if hearing a single inflected form of V only gave the learner on average a 1/4
chance of correctly picking the other inflected forms of V, then that system would intuitively be
harder to learn than the former system. With this in mind, let us assume that the variable X
ranges over possible the surface realizations of a particular verb V in a particular inflectional
form F. Now, let us suppose that Y ranges over possible surface realizations of the same verb V
in a different inflectional form F. Thus, the conditional entropy H(Y|X) would represent how
much information form F provides regarding the realization of form F. If averaged across all
possible pairs of inflectional forms <F,F> in the inflectional system, we would thereby have a
concrete measure of how confidently learners can, on average, predict novel inflectional forms
on the basis of forms theyve already encountered.
Building upon these ideas, Ackerman & Malouf (2013) calculate the average conditional
entropy for the inflectional paradigms of several languages. To illustrate, the nominal inflectional
system of Modern Greek is represented as in (8), where each row is a different nominal
inflectional class (or declension) while each column is the suffixal realization of a different
combination of case and number features.
(8)

Nominal Inflectional System of Modern Greek (Ackerman & Malouf 2013)

From representations such as these, Ackerman & Malouf are able to calculate for every pair of
Case-Number features <F, F>, the conditional entropy of the realization of F given the
realization of F. The table in (9) presents their calculations for the Modern Greek system in (8).

(9)

Conditional Entropies for the Modern Greek (Ackerman & Malouf 2013)

For example, we find in the chart above that the conditional entropy of the Genitive Singular
given the Nominitive Singular is 1, meaning that knowing the Nominative Singular of a
particular verb V gives one a 50/50 shot at correctly choosing the Genitive Singular of V.
However, the conditional entropy of the Nominative Plural given the Accusative Plural is 0,
meaning that the form of the Accusative Plural completely determines the form of the
Nominative Plural, as can be seen from a quick glance at the table in (8).
Finally, averaging together all the conditional entropies in the chart above, one finds that
the overall average conditional entropy for the Modern Greek nominal inflectional system is
0.644. Thus, for a learner of Modern Greek, knowing any particular inflected form F provides on
average a better than 50/50 chance of correctly choosing another inflected form F. Bearing in
mind that there can be as many as 5 different possible realizations of a given case/number
combination, this suggests that the inflectional system in (8) is organized in a way that facilitates
efficient learning, i.e., the prediction of novel inflected forms on the basis of encountered forms.
Ackerman & Malouf (2013) perform similar calculations for the inflectional systems of a
number of typologically distinct languages. Interestingly, as shown in the chart below, in nearly
all cases, the overall conditional entropy for the inflectional system is at or below 0.7.
(10)

Average Conditional Entropies for Ten Languages (Ackerman & Malouf 2013)

Note that even for a language such as Mazatec, with 109 different inflectional classes, and where
there can be as many as 94 different possible realizations of a particular inflectional feature, the
overall average conditional entropy for the system is still just .709. Moreover, looking down the
rightmost column of (10), we find that for 9/10 of these languages, knowing just one inflected
form for a given verb gives the learner on average a better than 50/50 chance at choosing any
other inflected form.
To highlight the surprising nature of this finding, Ackerman & Malouf (2013) compare
the observed entropy averages in (10) to those of randomly generated languages of comparable
superficial complexity. To illustrate, they generate alternative versions of (e.g.) Mazatec by
randomly constructing 109 different inflectional classes, where each class is created by randomly
selecting for each inflectional form one of the possible surface realizations of that form.
Importantly, the conditional entropy for such randomized variants of Mazatec is on average 1.1,
significantly higher than the observed average of 0.709. This lends further support to the view
that the true Mazatec inflectional system is organized to facilitate the prediction of novel
inflected forms from observed forms.
Taking this all together, Ackerman and Malouf put forth the Low Conditional Entropy
Conjecture (LCEC), which I paraphrase as in (11).
(11)

The Low Conditional Entropy Conjecture (LCEC)


The average conditional entropy (ACE) of a natural language inflectional system will
tend to be low (i.e., at or below 0.7; Robert Malouf, p.c.), or will be lower than the
average ACE of randomized variants of that system.

Readers who are interested in the relationship between the LCEC above and the traditional
notion of a system of principle parts are referred to Ackerman & Maloufs (2013) paper.
4.

The Conditional Entropy of the Tlingit Verbal Inflectional System

Given the 790 logically possible inflectional classes of Tlingit verbs, it seems to provide an
interesting test case for the LCEC in (11). Similarly, if the ACE of the Tlingit verbal inflectional
system is relatively low, that may provide some insight into how such a remarkably complex
system has been so diachronically stable. For these reasons, I sought to calculate the ACE of the
full Tlingit verbal inflectional system, as described in Section 2.
As mentioned in Section 1, in order to make these calculations, PhD student Presley
Pizzo ran a Python program written by Rob Malouf on paradigm tables constructed by myself.3
The first inflectional paradigm table we examined contained all 790 of the aforementioned verbal
inflectional classes and all 18 of the TAM forms in (2). When run on this table, the program
output the table of conditional entropies given in (12) below.

The Python code is available at https://github.com/rmalouf/morphology/blob/master/paradigms/entropy.py.

(12)

Conditional Entropies of the 18 TAM Forms in (2)

(13)

Conditional Entropies of the Reorganized and Simplified Tlingit Verbal Inflectional System

The first thing to note regarding the table in (12) is that the ACE for the verbal
inflectional system as described in Section 2 is 1.229, higher than any of the ACEs recorded in
table (10). Put crudely, this figure indicates that for a Tlingit-learner, knowing a particular
inflected form gives them, on average, between a 1/4 and 50/50 chance of predicting another
inflected form. Although such predictive power is not as strong as that found for the languages in
(10), it is important to compare this to the average ACE of randomized variants of the Tlingit
system. To this end, the Python program written by Malouf also generates such randomized
variants and averages their ACEs. The result of these calculations appears under ** Bootstrap
in (12). There, we find that the average ACE of randomly generated variants of the Tlingit
system with similar superficial complexity is 3.513, significantly higher than the observed value
of 1.229. Thus, although an ACE of 1.229 may seem at first to challenge the LCEC, the contrast
between the observed ACE and the average ACE of the randomized systems lends credence to
the view that the Tlingit verbal inflectional system is structured to facilitate the prediction of
novel inflectional forms.
Probing further, however, a closer examination of the table in (12) can yield some insight
into why the ACE of the Tlingit system in Section 2 is so relatively high. Note the rather high
values throughout the first two rows and the first two columns. This, of course, reflects the fact
that for Tlingit, knowing the imperfective (and irrealis imperfective) form a given verb does not
help one to predict the other TAM forms of the verb, and vice versa. This is because the
imperfective forms of a verb can, at most, only determine the Primary Imperfective Type (1a)
and Root Type (1c) of the verb; the imperfective form provides no information regarding the
Conjugation Class of the verb, which determines 16 of the 18 TAM forms in (2). Similarly, the
other 16 TAM forms in (2) provide no information regarding the Primary Imperfective Type of
the verb, and so provide no information regarding the realization of the imperfective (and irrealis
imperfective) forms. This issue is well known to Tlingit specialists, and creates a special problem
for language documentarians and lexicographers (Edwards 2009, Eggleston 2013).
In this sense, the imperfective forms stand outside of the larger TAM system of Tlingit.
After all, as the summary in Section 2 makes clear, the imperfective form of a Tlingit verb is
essentially stipulated as part of its lexical entry (Leer 1991, Edwards 2009, Eggleston 2013). For
this reason, one would be warranted in the view that the features imperfective and irrealis
imperfective are separate dimensions of the verbal inflectional paradigm in Tlingit. Under this
alternate view, the primary inflectional categories of the Tlingit TAM system are simply (2c)(2r), and the primary inflectional classes are solely defined by the parameters in (1b) and (1c).
Consequently, under this view, there are merely 45 (9x5) possible (primary) inflectional classes.
Given the plausibility of this alternate view of the Tlingit verbal inflectional system, I
sought to calculate the ACE of the Tlingit verbal system under this reorganization.
Consequently, a second paradigm table was input to Maloufs python program, one containing
only the aforementioned 45 inflectional classes and the 16 remaining TAM forms in (2). The
resulting calculations were output as table (13) above. Note that in (13), we find that removing
the imperfective forms from the Tlingit verbal inflectional system dramatically reduces the
systems ACE; the ACE of the simplified system is 0.739, well within the range predicted by the
LCEC. Furthermore, it should also be noted that the average ACE of the randomized variants of
the simplified inflectional system is 1.663, significantly higher than the observed value.
Thus, by isolating the imperfective forms from the rest of the verbal inflectional system,
the learnability of the resulting system is thereby increased. Interestingly, we can push this
initial result even further. Note the first two rows in (13) are also relatively high. This reflects the

fact that the perfective form of a Tlingit verb provides relatively little information regarding the
realizations of the other TAM categories. This is due to the fact that in the perfective, there is no
contrast between three of the five Conjugation Classes.4 Interestingly, for this reason, the nonperfective (and non-imperfective) forms of a verb do provide good information regarding the
realization of the perfective form, a fact reflected in the relatively low conditional entropy values
in the first two columns of (13).
In sum, the perfective form of a Tlingit verb provides little information regarding the
realizations of the other TAM forms, an issue that is again already well-known to Tlingit
specialists (Edwards 2009, Eggleston 2013). This raises the question of how ACE may be
affected by separating out both perfective and imperfective from the verbal inflectional system.
For this reason, I sought to calculate the ACE of a verbal inflectional system identical to the one
in (13), but lacking the categories perfective and irrealis perfective; such a paradigm table
was thus input to Maloufs Python program. The resulting calculations were output as table (14)
below. Note that (14) shows that the ACE of the resulting system is just 0.575, a relatively low
number, comparable to that of Fur and Russian in table (10). Further, note again that the average
ACE of the randomized variants of this system is still quite high by comparison, at 1.573.

That is, verbs of the na-, ga- and gha-conjugations all have the same form in the perfective, while verbs of the 0and 0/y-conjugations receive a distinct form.

10

(14)

Conditional Entropies of the Simplified Tlingit Verbal Inflectional System, Without Perfective

11

We find, then, that by removing just four of the eighteen TAM categories in (2), the ACE
of the verbal inflectional system drops considerably. That is, as is well-known to Tlingit
grammarians, the non-(im)perfective verbal forms constitute a morphological subsystem within
which relatively reliable predictions of form can be made. This, however, raises the question of
how reliably a learner will encounter such non-(im)perfective forms. Here, we paradoxically find
that these most informative verbal inflections are actually rather infrequent in natural speech,
especially as compared to the perfective and imperfective forms. This is a widely shared
impression amongst Tlingit scholars, and can be corroborated via formal textual counts. For
example, we find in (15) below that in a 40-minute conversation between two fluent elders,
Shgonde Walter Soboleff and Keiheenk'w John Martin, approximately 75% of the verbs were
either perfective or imperfective.
(15)

Count of Inflected Verbal Forms in Recorded Tlingit Conversation 5


a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
l.
m.
n.
o.
p.
q.
r.

Imperfective
Irrealis Imperfective
Perfective
Irrealis Perfective
Future
Irrealis Future
Potential
Irrealis Potential
Habitual
Irrealis Habitual
Imperative
Hortative
Admonitive
Consecutive
Conditional
Contingent
Progressive
Repetitive

156
5
224
13
35
2
1
1
51
5
5
2
0
1
1
9
12
7

Perfective / Irrealis Perfective:


237/530
Imperfective / Irrealis Imperfective: 161/530
Non-(Im)perfective:
132/530

45%
30%
25%

It should be noted, however, that although each of the individual verbal forms in (15e)-(15r) is
rather infrequent, when taken as a whole, the non-(im)perfective forms are by no means rare,
accounting for a full quarter of the spoken verbal forms. Furthermore, this itself accords well
with the relatively low ACE of the forms in (15e)-(15r). Given that each of the forms in (15e)(15r) is rather infrequent, it would be of much benefit to learners if a single such form allowed
reliable prediction of the others.
5

This conversation was recorded as part of Alice Taffs Tingit Conversation Documentation Project,
http://www.uas.alaska.edu/arts_sciences/humanities/alaska-languages/cuped/video-conv/index.html. It is listed as
Conversation 16: Shgonde Walter Soboleff and Keiheenk'w John Martin.

12

As one final remark, it would be interesting if a learning simulation could be designed to


explore the diachronic stability of a system with the properties above, where the highest ACE
within the system holds between relatively infrequent surface forms.
References
Ackerman, Farrell and Robert Malouf. 2013. Morphological Organization: The Low
Conditional Entropy Conjecture. Language 89(3): 429-464.
Crippen, James A. 2013. Tlingitology Seminar Notes: Background and Morphology.
Manuscript. University of British Columbia.
Edwards, Keri M. 2009. Dictionary of Tlingit. Juneau, AK: Sealaska Heritage Institute.
Eggleston, Keri M. 2013. 575 Tlingit Verbs: A Study of Tlingit Verb Paradigms. PhD
Dissertation. University of Alaska Fairbanks.
Leer, Jeff. 1991. The Schetic Categories of the Tlingit Verb. PhD Dissertation. University of
Chicago.

13

Você também pode gostar