Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract sue. Every sentence parsed had at least one parse – the
parse with which it was originally observed.1
This paper presents empirical studies and The sentences were parsed using an implementa-
closely corresponding theoretical models of tion of the probabilistic chart-parsing algorithm pre-
the performance of a chart parser exhaus- sented in (Klein and Manning, 2001). In that paper,
tively parsing the Penn Treebank with the we present a theoretical analysis showing an
Treebank’s own CFG grammar. We show worst-case time bound for exhaustively parsing arbi-
how performance is dramatically affected by trary context-free grammars. In what follows, we do
rule representation and tree transformations, not make use of the probabilistic aspects of the gram-
but little by top-down vs. bottom-up strate- mar or parser.
gies. We discuss grammatical saturation, in-
cluding analysis of the strongly connected 2 Parameters
components of the phrasal nonterminals in
The parameters we varied were:
the Treebank, and model how, as sentence
length increases, the effective grammar rule
Tree Transforms: N OT RANSFORM, N O E MPTIES,
N O U NARIES H IGH, and N O U NARIES L OW.
size increases as regions of the grammar
are unlocked, yielding super-cubic observed Grammar Rule Encodings: L IST, T RIE, or M IN
time behavior in some configurations. Rule Introduction: T OP D OWN or B OTTOM U P
The default settings are shown above in bold face.
1 Introduction We do not discuss all possible combinations of these
settings. Rather, we take the bottom-up parser using an
This paper originated from examining the empirical untransformed grammar with trie rule encodings to be
performance of an exhaustive active chart parser us- the basic form of the parser. Except where noted, we
ing an untransformed treebank grammar over the Penn will discuss how each factor affects this baseline, as
Treebank. Our initial experiments yielded the sur- most of the effects are orthogonal. When we name a
prising result that for many configurations empirical setting, any omitted parameters are assumed to be the
parsing speed was super-cubic in the sentence length. defaults.
This led us to look more closely at the structure of
the treebank grammar. The resulting analysis builds 2.1 Tree Transforms
on the presentation of Charniak (1996), but extends In all cases, the grammar was directly induced from
it by elucidating the structure of non-terminal inter- (transformed) Penn treebank trees. The transforms
relationships in the Penn Treebank grammar. On the used are shown in figure 1. For all settings, func-
basis of these studies, we build simple theoretical tional tags and crossreferencing annotations were
models which closely predict observed parser perfor- stripped. For N OT RANSFORM, no other modification
mance, and, in particular, explain the originally ob- was made. In particular, empty nodes (represented as
served super-cubic behavior. - NONE - in the treebank) were turned into rules that
We used treebank grammars induced directly from
generated the empty string ( ), and there was no col-
the local trees of the entire WSJ section of the Penn lapsing of categories (such as PRT and ADVP) as is of-
Treebank (Marcus et al., 1993) (release 3). For each ten done in parsing work (Collins, 1997, etc.). For
length and parameter setting, 25 sentences evenly dis- 1
Effectively “testing on the training set” would be invalid
tributed through the treebank were parsed. Since we if we wished to present performance results such as precision
were parsing sentences from among those from which and recall, but it is not a problem for the present experiments,
our grammar was derived, coverage was never an is- which focus solely on the parser load and grammar structure.
TOP TOP TOP TOP TOP 360 List-NoTransform
exp 3.54 r 0.999
S-HLN S S S 300 Trie-NoTransform
exp 3.16 r 0.995
CD
CD NN
CD NN JJ
NN
NNS 3 Observed Performance
CD NN
DT NN NN NN
NN
DT
DT
NN
NNS
NN
DT NNS
DT
JJ
NNS
NN
In this section, we outline the observed performance
DT
NP
JJ
CC
NN
NP NP
JJ NN NP CC
PP
NP
of the parser for various settings. We frequently speak
CC NP
SBAR
in terms of the following:
NP PP NN
NN PP NNS
NP SBAR
PRP
NN PRP SBAR
span: a range of words in the chart, e.g., [1,3]4
QP
NN NNS
QP NNS
PRP
1.001
Avg. Traversals
Ratio (TD/BU)
NoEmpties exp 2.60 r 0.999 0.999
exp 3.28 r 1.000 Trie Edges
10.0M NoUnariesHigh 10.0M exp 2.86 r 1.000 0.998
Traversals
exp 3.74 r 0.999 Min
0.997
NoUnariesLow exp 2.78 r 1.000
5.0M exp 3.83 r 0.999 5.0M 0.996
0.995
parsing longer sentences as temporary objects cause transform is chosen, there will be NP nodes missing
increasingly frequent reclamation. To see past this ef- from the parses, making the parses less useful for any
fect, which inflates the empirical exponents, we turn to task requiring NP identification. For the remainder of
the actual traversal counts, which better illuminate the the paper, we will focus on the settings N OT RANS -
issues at hand. Figures 4 (a) and (b) show the traversal FORM and N O E MPTIES .
curves corresponding to the times in figure 3.
The interesting cause of the varying exponents 3.4 Grammar Encodings
comes from the “constant” terms in the theoretical Figure 4 (b) shows the effect of each tree transform on
bound. The second half of this paper shows how traversal counts. The more compacted the grammar
modeling growth in these terms can accurately predict representation, the more time-efficient the parser is.
parsing performance (see figures 9 to 13).
3.5 Top-Down vs. Bottom-Up
3.2 Memory Figure 4 (c) shows the effect on total edges and
The memory bound for the parser is . Since
traversals of using top-down and bottom-up strategies.
There are some extremely minimal savings in traver-
the parser is running in a garbage-collected environ-
ment, it is hard to distinguish required memory from sals due to top-down filtering effects, but there is a cor-
utilized memory. However, unlike time and traversals responding penalty in edges as rules whose left-corner
which in practice can diverge, memory requirements cannot be built are introduced. Given the highly unre-
match the number of edges in the chart almost exactly, strictive nature of the treebank grammar, it is not very
since the large data structures are all proportional in surprising that top-down filtering provides such little
size to the number of edges .6
benefit. However, this is a useful observation about
Almost all edges stored are active edges ( for "!$#&% real world parsing performance. The advantages of
top-down chart parsing in providing grammar-driven
sentences longer than 30 words), of which there can be
'
: one for every grammar state and span. Pas- prediction are often advanced (e.g., Allen 1995:66),
sive edges, of which there can be , one for ev- '( but in practice we find almost no value in this for broad
coverage CFGs. While some part of this is perhaps
ery category and span, are a shrinking minority. This
is because, while is bounded above by 27 in the tree- due to errors in the treebank, a large part just reflects
bank7 (for spans 2), numbers in the thousands (see the true nature of broad coverage grammars: e.g., once
you allow adverbial phrases almost anywhere and al-
figure 12). Thus, required memory will be implicitly
modeled when we model active edges in section 4.3. low PPs, (participial) VPs, and (temporal) NPs to be
adverbial phrases, along with phrases headed by ad-
3.3 Tree Transforms verbs, then there is very little useful top-down control
left. With such a permissive grammar, the only real
Figure 4 (a) shows the effect of the tree transforms on constraints are in the POS tags which anchor the local
traversal counts. The N O U NARIES settings are much trees (see section 4.3). Therefore, for the remainder of
more efficient than the others, however this efficiency the paper, we consider only bottom-up settings.
comes at a price in terms of the utility of the final
parse. For example, regardless of which N O U NARIES 4 Models
6
A standard chart parser might conceivably require stor-
ing more than )+*,.-
traversals on its agenda, but ours prov-
In the remainder of the paper we provide simple mod-
els that nevertheless accurately capture the varying
ably never does.
7 magnitudes and exponents seen for different grammar
This count is the number of phrasal categories with the
introduction of a TOP label for the unlabeled top treebank encodings and tree transformations. Since the term
nodes. of
comes directly from the number of start,
split, and end points for traversals, it is certainly not ADJP ADVP FRAG INTJ NAC NP
responsible for the varying growth rates. An initially NX PP PRN QP RRC S
plausible possibility is that the quantity bounded by SBAR SBARQ SINV SQ TOP UCP
the term is non-constant in in practice, because VP WHADVP WHNP WHPP X
longer spans are more ambiguous in terms of the num- Figure 5: The empty-reachable set for the N OT RANS -
ber of categories they can form. This turns out to FORM grammar.
be generally false, as discussed in section 4.2. Alter-
TOP
tence the span is from and where in the sentence the ADJP ADVP
FRAG INTJ NP CONJP
span begins. Thus, for a given span size, we report the PP PRN QP S
SBAR UCP VP NAC
average over all spans of that size occurring anywhere WHNP
SBARQ
WHADJP WHPP
4.1 Treebank Grammar Structure
WHADVP
The reason that effective growth is not found in the
component is that passive saturation stays almost Figure 7: The same-span-reachability graph for the
constant as span size increases. However, the more in- N O E MPTIES grammar.
9
teresting result is not that saturation is relatively con-
stant (for spans beyond a small, grammar-dependent one instance of , every node not dominating that in-
size), but that the saturation values are extremely large stance is an instance of an empty-reachable category.
compared to (see section 4.2). For the N OT RANS - The same-span-reachability relation induces a graph
FORM and N O E MPTIES grammars, most categories over the 27 non-terminal categories. The strongly-
are reachable from most other categories using rules connected component (SCC) reduction of that graph is
which can be applied over a single span. Once you get shown in figures 6 and 7.9 Unsurprisingly, the largest
one of these categories over a span, you will get the SCC, which contains most “common” categories (S,
rest as well. We now formalize this. NP, VP, PP, etc.) is slightly larger for the N OT RANS -
Definition 2 A category 9
is empty-reachable in a FORM grammar, since the empty-reachable set is non-
grammar if : 9
can be built using only empty ter- empty. However, note that even for N OT RANSFORM,
the largest SCC is smaller than the empty-reachable
minals.
set, since empties provide direct entry into some of the
The empty-reachable set for the N OT RANSFORM lower SCCs, in particular because of WH-gaps.
grammar is shown in figure 5.8 These 23 categories Interestingly, this same high-reachability effect oc-
plus the tag - NONE - create a passive saturation of 24 curs even for the N O U NARIES grammars, as shown in
for zero-spans for N OT RANSFORM (see figure 9). the next section.
Definition 3 A category is same-span-reachable ;
9
from a category in a grammar if can be built : ; 4.2 Passive Edges
9
from using a parse tree in which, aside from at most The total growth and saturation of passive edges is rel-
8
atively easy to describe. Figure 8 shows the total num-
The set of phrasal categories used in the Penn Tree-
9
bank is documented in Manning and Schütze (1999, 413); Implied arcs have been removed for clarity. The relation
Marcus et al. (1993, 281) has an early version. is in fact the transitive closure of this graph.
30.0K 30.0K
25.0K 25.0K
NoTransform NoTransform
0.0K 0.0K
0 10 20 30 40 50 0 10 20 30 40 50
Sentence Length Sentence Length
Figure 8: The average number of passive edges processed in practice (left), and predicted by our models (right).
30 30
25 25
20 20
NoTransform NoTransform
NoEmpties NoEmpties
15 15
NoUnariesHigh NoUnariesHigh
NoUnariesLow NoUnariesLow
10 10
5 5
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Span Size Span Size
Figure 9: The average passive saturation (number of passive edges) for a span of a given size as processed in
practice (left), and as predicted by our models (right).
ber of passive edges by sentence length, and figure 9 The predictions are shown in figure 8. For the N O -
shows the saturation as a function of span size.10 The T RANSFORM or N O E MPTIES settings, this reduces to:
<T@FE
@2G VU IXW CY I <>=Z?&@ /NG <[=?&@8C\/]/^
S<[=?&@BA
grammar representation does not affect which passive
edges will occur for a given span.
The large SCCs cause the relative independence of We correctly predict that the passive edge total ex-
passive saturation from span size for the N OT RANS - ponents will be slightly less than 2.0 when unaries are
FORM and N O E MPTIES settings. Once any category in present, and greater than 2.0 when they are not. With
the SCC is found, all will be found, as well as all cate- unaries, the linear terms in the reduced equation are
gories reachable from that SCC. For these settings, the significant over these sentence lengths and drag down
passive saturation can be summarized by three satura-
<>=?$@BA
the exponent. The linear terms are larger for N O -
tion numbers: zero-spans (empties) , one-spans
<>=?$@8C <>=?$@
T RANSFORM and therefore drag the exponent down
(words) , and all larger spans (categories) . more.11 Without unaries, the more gradual satura-
Taking averages directly from the data, we have our tion growth increases the total exponent, more so for
first model, shown on the right in figure 9. N O U NARIES L OW than N O U NARIES H IGH. However,
For the N O U NARIES settings, there will be no note that for spans around 8 and onward, the saturation
same-span reachability and hence no SCCs. To reach curves are essentially constant for all settings.
a new category always requires the use of at least one
overt word. However, for spans of size 6 or so, enough 4.3 Active Edges
words exist that the same high saturation effect will Active edges are the vast majority of edges and essen-
still be observed. This can be modeled quite simply tially determine (non-transient) memory requirements.
by assuming each terminal unlocks a fixed fraction of While passive counts depend only on the grammar
the nonterminals, as seen in the right graph of figure 9, transform, active counts depend primarily on the en-
but we omit the details here. coding for general magnitude but also on the transform
Using these passive saturation models, we can di- for the details (and exponent effects). Figure 10 shows
rectly estimate the total passive edge counts by sum- the total active edges by sentence size for three set-
mation: tings chosen to illustrate the main effects. Total active
growth is sub-quadratic for L IST, but has an exponent
<D@FE
@2G 5HIJLK A M/NPORQF S<>=?$@ J of up to about 2.4 for the T RIE settings.
11
Note that, over these values of , even a basic quadratic _
10
The maximum possible passive saturation for any span function like the simple sum has a best- H _`^*S_badce-_[fZg
greater than one is equal to the number of phrasal categories fit simple power curve exponent of only for the same hicZj k
l
in the treebank grammar: 27. However, empty and size-one reason. Moreover, note that has a higher best-fit expo- T_ mefg
spans can additionally be covered by POS tag edges. nent, yet will never actually outgrow it.
2.0M 2.0M
1.5M 1.5M
0.0M 0.0M
0 10 20 30 40 50 0 10 20 30 40 50
Sentence Length Sentence Length
Figure 10: The average number of active edges for sentences of a given length as observed in practice (left), and
as predicted by our models (right).
14.0K 14.0K
12.0K 12.0K
10.0K 10.0K
List-NoTransform List-NoTransform
8.0K 8.0K exp 0.111 r 0.999
exp 0.092 r 0.957
Trie-NoTransform Trie-NoTransform
6.0K 6.0K exp 0.297 r 0.998
exp 0.323 r 0.999
Trie-NoEmpties Trie-NoEmpties
4.0K 4.0K exp 0.298 r 0.991
exp 0.389 r 0.997
2.0K 2.0K
0.0K 0.0K
0 5 10 15 20 0 5 10 15 20
Span Length Span Length
Figure 11: The average active saturation (number of active edges) for a span of a given size as processed in
practice (left), and as predicted by our models (right).
15.0M 15.0M
List-NoTransform List-NoTransform
Avg. Traversals
Avg. Traversals
exp 2.60 r 0.999 exp 2.60 r 0.999
Trie-NoTransform Trie-NoTransform
10.0M exp 2.86 r 1.000 10.0M exp 2.92 r 1.000
Trie-NoEmpties Trie-NoEmpties
exp 3.28 r 1.000 exp 3.47 r 1.000
5.0M 5.0M
0.0M 0.0M
0 10 20 30 40 50 0 10 20 30 40 50
Sentence Length Sentence Length
Figure 13: The average number of traversals for sentences of a given length as observed in practice (left), and as
predicted by the models presented in the latter part of the paper (right).
ing portion of the trie average about 3.7 (min FSAs 5 Conclusion
4.2).20 Making another uniformity assumption, we as-
sume that this combination probability is the contin- We built simple but accurate models on the basis of
uation degree divided by the total number of passive two observations. First, passive saturation is relatively
labels, categorical or tag (73). constant in span size, but large due to high reachability
among phrasal categories in the grammar. Second, ac-
In figure 13, we give graphs and exponents of the tive saturation grows with span size because, as spans
traversal counts, both observed and predicted, for var- increase, the tags in a given active edge are more likely
ious settings. Our model correctly predicts the approx- to find a matching arrangement over a span. Combin-
imate values and qualitative facts, including: ing these models, we demonstrated that a wide range
For L IST, the observed exponent is lower than for of empirical qualitative and quantitative behaviors of
T RIEs, though the total number of traversals is dra- an exhaustive parser could be derived, including the
matically higher. This is because the active satura- potential super-cubic traversal growth over sentence
tion is growing much faster for T RIEs; note that in lengths of interest.
cases like this the lower-exponent curve will never
actually outgrow the higher-exponent curve. References
Of the settings shown, only T RIE -N O E MPTIES James Allen. 1995. Natural Language Understand-
exhibits super-cubic traversal totals. Despite ing. Benjamin Cummings, Redwood City, CA.
their similar active and passive exponents, T RIE - Eugene Charniak. 1996. Tree-bank grammars. In
N O E MPTIES and T RIE -N OT RANSFORM vary in Proceedings of the Thirteenth National Conference
traversal growth due to the “early burst” of active on Artificial Intelligence, pages 1031–1036.
edges which gives T RIE -N OT RANSFORM signifi- Michael John Collins. 1997. Three generative, lex-
cantly more edges over short spans than its power icalised models for statistical parsing. In ACL
law would predict. This excess leads to a sizeable 35/EACL 8, pages 16–23.
quadratic addend in the number of transitions, caus- Jay Earley. 1970. An efficient context-free parsing al-
ing the average best-fit exponent to drop without gorithm. Communications of the ACM, 6:451–455.
greatly affecting the overall magnitudes.
Dan Klein and Christopher D. Manning. 2001. An
Overall, growth of saturation values in span size in-
agenda-based chart parser for arbitrary prob-
creases best-fit traversal exponents, while early spikes abilistic context-free grammars. Technical Report
in saturation reduce them. The traversal exponents dbpubs/2001-16, Stanford University.
therefore range from L IST-N OT RANSFORM at 2.6 to R. Leermakers. 1992. A recursive ascent Earley
T RIE -N O U NARIES L OW at over 3.8. However, the fi- parser. Information Processing Letters, 41:87–91.
nal performance is more dependent on the magnitudes, Christopher D. Manning and Hinrich Schütze. 1999.
which range from L IST-N OT RANSFORM as the worst, Foundations of Statistical Natural Language Pro-
despite its exponent, to M IN -N O U NARIES H IGH as the cessing. MIT Press, Boston, MA.
best. The single biggest factor in the time and traver- Mitchell P. Marcus, Beatrice Santorini, and Mary Ann
sal performance turned out to be the encoding, which Marcinkiewicz. 1993. Building a large annotated
is fortunate because the choice of grammar transform corpus of English: The Penn treebank. Computa-
will depend greatly on the application. tional Linguistics, 19:313–330.
Robert C. Moore. 2000. Improved left-corner chart
20
This is a simplification as well, since the shorter prefixes parsing for large context-free grammars. In Pro-
that tend to have higher continuation degrees are on average ceedings of the Sixth International Workshop on
also a larger fraction of the active edges. Parsing Technologies.