Escolar Documentos
Profissional Documentos
Cultura Documentos
Guy Perrier
LORIA - université Nancy 2
BP 239
54506 Vandœuvre-lès-Nancy cedex - France
perrier @loria.fr
1
cat ~ np • a virtual feature t ∼ v expresses a linguistic prop-
gen = <3> ?
num = <4> ?
erty that needs to be realised by combining with
pers = <5> ? an actual feature (an actual feature is a positive,
negative or neutral feature).
cat <- s
cat ~ n | np mood = cond | ind | inf In figure 1, the empty node representing the trace of
typ = decl the prepositional phrase extracted from the relative
cat = s
clause carries a positive feature cat → pp and a nega-
funct = obj tive feature f unct ← h1i?, which means that this node
provides a prepositional phrase that needs to receive a
cat <- pp cat -> pp
funct -> <1>? funct <- <1>?
syntactic function. The tree root carries a virtual fea-
prep <- <2>? prep -> <2>? ture cat ∼ np which means that the node represents
a virtual noun phrase which has to combine with an
cat = n | np | pp
actual noun phrase.
/qui/ The descriptions labelled with polarised fea-
cat -> np ture structures are called polarised tree descriptions
funct <- adj | aobj | dat | deobj | obl
gen = <3> ? (PTDs) in the rest of the article.
num = <4> ?
pers = <5> ?
2
[1] [3] [8] [[3-8]]
cat -> np cat ~ s cat = s cat = s
funct <- ? mood = ind mood = ind mood = ind
[12] [[7-12]]
[[2]]
[6] [7] voit [[6]] voit
Jean
la cat ~ aux | v cat = v la cat = v
cat = np
cat = clit mood = ind mood = ind cat = clit mood = ind
funct = subj
tense = pres tense = pres
Fig. 2: PTD associated with the sentence Jean la voit and its minimal saturated model
Figure 2 presents an example of parsing for the sen- in figure 1 by means of an underspecified domi-
tence Jean la voit (Jean sees her).3 The left side shows nance relation. The constraint linked to this dom-
the set of initial PTDs associated with the sentence by inance relation expresses that the dependency of
the grammar. The grammar being lexicalized, each the prepositional phrase on the verb of which it
PTD is anchored by a word of the sentence and it has is the complement can only cross an unspecified
been extracted from a lexicon. These PTDs have been sequence of embedded object clauses.
gathered in a unique PTD and precedence relations
between anchors have been added to express word or-
der in the sentence. These relations do not appear in • Inside the prepositional phrase, there is a sec-
figure 2. ond unbounded dependency between the head
The computation of the model shown on the right of the constituent and the qui relative pronoun,
side of figure 2 from the initial description shown on which can be embedded arbitrarily deeply. This
the left side is performed by a sequence of 3 node merg- dependency is also represented in figure 1 with
ings.4 The interaction of tree constraints with these an underspecified dominance relation and the
mergings entails two other mergings and a partial tree linked constraint expresses that all embedded con-
superposition. stituents from the prepositional phrase to the qui
relative pronoun are common nouns, noun phrases
or prepositional phrases.
3 The expressivity of Interac-
tion Grammars
3.2 Polarities used for modelling nega-
In the limits of this article, we have chosen to illustrate tion
three aspects which are especially significant.
In French, negation can be expressed with the help of
the particle ne paired with a specific determiner, pro-
3.1 Unbounded dependencies and un- noun or adverb. The position of the particle ne is fixed
derspecified dominance relations before an inflected verb but the second component of
Underspecified dominance relations are used to repre- the pair, if it is a determiner like aucun or a pronoun
sent unbounded dependencies and the feature struc- like personne, can have a relatively free position in the
tures that can be associated with these relations allow sentence, as illustrated by the following examples:
the expression of constraints on these dependencies: (a) Jean ne parle à aucun collègue
barriers to extraction for instance. (Jean speaks to no colleague).
Relative pronouns, such as qui or lequel, give rise (b) Jean ne parle à la femme d’aucun collègue
to pied piping as the following sentence shows: Jean (Jean speaks to the wife of no colleague).
[dans l’entreprise de qui] Marie sait que l’ingénieur (c) Aucun collègue de Jean ne parle à sa femme
travaille , est malade (Jean [in whose firm] Marie (No colleague of John’s speaks to his wife).
knows that the engineer works , is ill): As figure 3 shows, the pairing of ne with aucun is
expressed with a neg polarised feature attached to the
• There is a first unbounded dependency between node representing the maximal projection of the ver-
the verb travaille and its extracted complement bal kernel: aucun is waiting for such a feature, which
dans l’entreprise de qui. The trace of the ex- will be provided by ne. The relatively free position of
tracted complement is denoted by the symbol. aucun is expressed by an underspecified dominance re-
The dependency is modelled in the PTD associ- lation of the node representing the clause on the noun
ated with the qui relative pronoun represented phrase that it introduces. The constraint linked to
3 We have simplified the figure by ignoring agreement features. this dominance relation expresses the fact that aucun
4 The head of each node includes the numbers of the nodes can only introduce arguments of the verbal head of the
from the initial PTD which have been merged. sentence or complements of these arguments.
3
cat ~ s
cat = np | pp
cat ~ s
mood = <1> cond | imp | ind | inf | presp | subj cat -> np
funct <- <2>?
cat ~ v
gen = m
neg <- true
num = sg
cat ~ v pers = 3
mood = <1> cond | imp | ind | inf | presp | subj
neg -> true
Fig. 3: PTDs respectively associated with the particle ne and the determiner aucun
3.3 The adjunction of modifiers by specify the way of combining the components of each
means of virtual polarities dimension: for the syntactic dimension, PTD union
is performed; for the lexicon interface dimension, it is
In French, the position of adjuncts in the sentence is realised as unification between feature structures.
relatively free, as illustrated by the following example. The current grammar is composed of 448 classes, in-
In the sentence Jean va rendre visite à Marie cluding 121 terminal classes, which are compiled into
(Jean is going to visit Marie), the sentence modifier 2059 PTDs. These classes are ranked by family. Some
le soir (tonight) can appear at any position marked classes from a family can be used in the definition of
with a symbol, according to different communicative classes belonging to another family. This is the case
goals. for instance for the Complement family, which include
The virtual polarity f ∼ v did not exist in the pre- classes related to complements of predicative struc-
vious version of IG [6]. Modifier adjunction was per- tures. It is used by three other families: Adjective,
formed by addition of a new level in the syntactic tree Noun and VerbDiathese, which respectively refer to
of the constituent being modified. Sometimes, intro- adjectives, nouns and various verbal diatheses.
ducing an additional level is justified linguistically, but
in most cases it introduces artificial complexity and
ambiguity. Taking again an idea of [4], with his sys- 4.2 The link with a lexicon indepen-
tem of black and white polarities, we have introduced dent of the formalism
virtual polarities. This allows a modifier to be added
The grammar, in its current setting, is totally lexi-
as a new daughter of the node that it modifies with-
calised: each elementary PTD of the grammar has
out changing the rest of the syntactic tree, in which
a unique anchor node intended to be linked with a
the modified node is situated. This operation is called
word of the language. Each PTD is associated to a
sister adjunction and it is used in some formalisms:
feature structure, which describes a syntactic frame
dependency grammars, description substitution gram-
corresponding to words able to anchor the PTD, the
mars [8]. This way of modelling modifiers is more flex-
description being independent of the formalism. This
ible and it allows the previous examples to be treated
feature structure constitutes the PTD interface with
without difficulty, including parenthetical clauses.
the lexicon.
The set of features used in the interfaces differs from
that used in PTDs because they do not play the same
4 The architecture of the gram- role: they do not aim at describing syntactic structures
mar but they are used for describing the morpho-syntactic
properties of the words of the language in a way inde-
4.1 The modular organisation of the pendent of the formalism.
grammar The left side of figure 4 shows a non anchored PTD
describing the syntactic behaviour of a transitive verb
The grammar has been built with the XMG tool [2], in the active voice. The PTD is accompanied by its
which allows grammars to be written with a high level interface, which is a two level feature structure.
of abstraction in a modular setting and to be compiled The lexicon associates words of the language to syn-
into low level grammars, usable by NLP systems. tactic frames in a form identical to the PTD interfaces.
A grammar is organised as a class hierarchy by For instance, the central part of figure 4 shows a lexical
means of two composition operations: conjunction and entry for the verb voit in its transitive use.
disjunction. It is also structured according to several The PTD anchoring is then performed by unification
dimensions, which are present in all classes. Our gram- of the PTD interfaces with the compatible entries of
mar uses only two dimensions: the first one is the syn- the lexicon. Figure 4 on its right side shows a PTD
tactic dimension, where objects are PTDs, and the anchored by the transitive verb voit. This PTD comes
second one is the dimension of the interface with the from the unification between the lexical entry for voit
lexicon, where objects are feature structures. presented in the center of the figure and the interface
To define the conjunction of two classes one needs to of the non anchored PTD on the left side of the figure.
4
aux=avoir
cat = v
mood = ind
num = sg
head
passiv=total
cat = v pers = 3
mood = <4>cond | ind | subj voit refl = maybe
num = <1>? tense = pres
head
pers = <2>? cat = np
tense = <3>? obj
N0VN1 funct = obj
refl = maybe | never
cat = np cat = np
obj subj
funct = obj funct = subj
cat = np
subj
funct = subj
cat -> s
mood = ind
cat -> s
mood = <4>cond | ind | subj
aux=avoir cat <- np cat = v
cat = v funct -> subj mood = ind cat <- np
cat <- np cat = v mood = ind num = sg reflex = false funct -> obj
funct -> subj mood = <4>cond | ind | subj cat <- np num = sg pers = 3 tense = pres
head
num = <1>? reflex = false funct -> obj passiv=total
pers = <2>? tense = <3>? pers = 3
voit refl = maybe /voit/
tense = pres cat = v
cat = v
mood = <4>cond | ind | subj cat = np mood = ind
obj
num = <1>? funct = obj num = sg
pers = <2>? cat = np pers = 3
tense = <3>? subj tense = pres
funct = subj
Fig. 4: From left to right, a non anchored PTD describing the syntactic behaviour of a transitive verb in the
active voice, a lexical entry for the transitive verb voit and the PTD after anchoring with the verb voit
5 Evaluation on a sentence test with a large lexicon for such a task. It is necessary to
enrich the grammar because some common linguistic
suite phenomena are not yet taken into account. We also
Our goal is to evaluate the coverage of our grammar need to improve the efficiency of the parser to contain
in the most detailed manner. The least costly way of the possible explosion resulting from the increase of
doing this is to use the grammar for parsing a sentence the grammar size in combination with the increased
test suite illustrating most rules of French grammar. sentence length.
It is important that the suite includes not only posi-
tive examples but also negative examples to test the References
overgeneration of the grammar. [1] J. Bresnan. Lexical-Functional Syntax. Blackwell Publishers,
There are not many corpora of this type for French. Oxford, 2001.
We have chosen the TSNLP [3], which includes 1690 [2] D. Duchier, J. Le Roux, and Y. Parmentier. XMG : Un compi-
positive sentences and 1935 negative sentences. It is lateur de méta-grammaires extensible. In TALN 2005, Dour-
far from covering all of French grammar; in particular, dan, France, 2005.
it includes very few complex sentences but it stresses [3] S. Lehmann, S. Oepen, S. Regnier-Prost, K. Netter, V. Lux,
some phenomena such as coordination or the postion J. Klein, K. Falkedal, F. Fouvry, D. Estival, E. Dauphin,
H. Compagnion, J. Baur, L. Balkan, and D. Arnold. tsnlp —
in the sentence of the adverbial complements. On the Test Suites for Natural Language Processing. In Proceedings
other hand, our grammar covers phenomena that are of COLING 1996, Kopenhagen, 1996.
ignored by the TSNLP: the passive and middle voice [4] A. Nasr. A formalism and a parser for lexicalised dependency
of verbs, the subcategorisation of predicative nouns grammars. In 4th International Workshop on Parsing Tech-
nologies (IWPT), 1995.
and adjectives, the control of the subject of infinitive
complements, the relative and interrogative clauses. . . [5] G. Perrier. Interaction grammars. In CoLing ’2000, Sar-
rebrücken, pages 600–606, 2000.
For the parsing, we used LEOPAR5 , which is a
parser devoted to IG. With the current grammar, the [6] G. Perrier. La sémantique dans les grammaires d’interaction.
parser accepts 88% of the 1690 positive TSNLP sen- Traitement Automatique des Langues, 45(3):123–144, 2004.
tences and rejects 85% of the 1935 negative sentences. [7] G. K. Pullum and B. C. Scholz. On the Distinction be-
The 15% of accepted negative sentences are due to the tween Model-Theoretic and Generative-Enumerative Syntactic
Frameworks. In LACL 2001, Le Croisic, France, volume 2099
fact that the grammar ignores phonological rules and of Lecture Notes in Computer Science, pages 17–43, 2001.
semantics. The 12% of unanalysed positive sentences
[8] O. Rambow, K. Vijay-Shanker, and D. Weir. D-tree substi-
are due to various reasons: speech sentences, frozen or tution grammars. Computational Linguistics, 27(1):87–121,
semi-frozen expressions, phenomena that are not yet 2001.
taken into account (causatives, superlatives. . .). [9] C. Retoré. The Logic of Categorial Grammars, 2000.
ESSLI’2000, Birmingham.