Você está na página 1de 11

ISSN (Print) : 0974-6846

Indian Journal of Science and Technology, Vol 10(13), DOI: 10.17485/ijst/2017/v10i13/110448, April 2017 ISSN (Online) : 0974-5645

Verb Morphological Generator for Telugu


Sasi Raja Sekhar Dokkara1*, Suresh Varma Penumathsa1 and Somayajulu G. Sripada2
1
Department of Computer Science, Adikavi Nannayya University, Rajah Rajah Narendhra Nagar, East Godavari,
Rajahmundry - 533296, Andhra Pradesh, India, dsairajasekhar@gmail.com, vermaps@yahoo.com
2
Department of Computing Science, University of Aberdeen, UK, yaji.sripada@abdn.ac.uk

Abstract
Objectives: The current work is a morphological generation engine that generates the required inflected Telugu
verb form from an input specification consisting of lexicalized grammatical constituents and associated features.
Methods/Statistical Analysis: The method employed in this paper is based on finite state techniques to develop a
computational model for morphological generation of verbs in Telugu. The current work is a module of a surface realization
engine for Telugu, a java application developed for generation of well-formed Telugu sentences. Test samples were taken
from grammar text books for Telugu language and tested thoroughly with various alternatives of the subject with respect
to person, number and gender. Findings: The evaluation was performed on a small data set because bigger authentic data
sets were not available online. Hence the findings cannot be generalized but the results show that the verbs are not evenly
distributed across all the classes. The results also show that no verbs were found belonging to some of the classes which
means verbs belonging to those classes are not regularly used. The findings cannot be compared with any other results
published because very little work was done previously in this area of research in Telugu language. The evaluation report
clearly suggests that instead of going for complete coverage of verbs better to extend the coverage based on utility in NLG
systems. Application/Improvements: The current work has its application in general purpose surface realization engines
and machine translation systems. We intend to create a generalized morphology engine which generates the required word
form for Telugu words.

Keywords: Finite Automata, Morphological Generator, Morphophonemic Group, Natural Language


Generation, Personal Suffix, Tense Mode Suffix, Verb Class, Verb Forms

1. Introduction play a very important in Natural Language Generation


(NLG) of free word order languages like Telugu. In
Telugu is a Dravidian language spoken by people from the practice it is always advantageous to have Morphological
south Indian states of Andhra Pradesh and Telangana. It Generator as a separate component that is separate from
is a morphologically rich free word order language with the rest of the NLG system3. The current work is a separate
nearly 90 million first language speakers. In this paper module of a surface realization engine for Telugu2, a
we describe a morphology engine which automatically java application which is the final subtask of a Natural
generates the different forms of verbs in Telugu. Language Generation (NLG) pipeline4. The sentence
Morphological Analyser (MA) and Morphological realization engine for Telugu is designed following the
Generator (MG) are two very important parts of Natural Simple NLG5 approach which is a very popular surface
Language Processing (NLP) applications like machine realization engine for English.
translation systems 1and surface realization engines2. The morphological engine described in this paper
A Morphological Analyser analyses a given word and is modelled on the morphological engine for English3.
processes it into its root along with its grammatical Because Telugu is morphologically rich language the
information whereas a Morphological Generator given a morphology of Telugu verbs and nouns is comparatively
root along with its grammatical information generates the more complex. For example the morphology of Telugu
corresponding word. Morphological Generators (MG) verbs involves defining morphology for six different

* Author for correspondence


Verb Morphological Generator for Telugu

classes of verbs. Similarly morphology of nouns involves state approaches that have been widely applied to building
defining morphology for seven different classes of MAs and MGs for a diverse range of languages3,8-11.
nouns with respect to the grammatical feature number. Therefore, in the current work we apply finite state
We therefore have two separate morphological engines techniques to Telugu Morphology. Amongst Indian
one for verbs and one for nouns and pronouns. In the languages12 reported highest number of morphology
implementation instead of using tools like Flex or JFlex tools for Tamil. According to their survey a wide range of
we programmed our morphological engine in Java using approaches, from corpus based through suffix stripping
the regular expression package. As mentioned earlier to finite state exist. A database approach is described
in this paper we describe the morphological engine for in 13where they store all the word forms in a relational
verbs. database. For Telugu language, 1describes a word and
The process of verb morphology depends on the way paradigm based morphological analyser and generator.
in which the verbs are classified. Linguistic classification An item and arrangement based morphological generator
of verbs in Telugu into a small number of conjugation is described in 14for Telugu. A rule based (item and
types is done based on the morphophonemic changes process based) morphological generator for Telugu is
the verb stems undergo when inflected with tense-mode describe in15.
suffixes. The model of the analysis decides the number of
types into which the verbs can be classified.
In the current work the verb morphological generator
2. Input Specification
does not have an explicit lexicon or word list but has a The verb morphology engine in the current work is part
computational model based on finite state techniques to of a surface realization engine which is responsible for
classify all the verbs into a few regular classes and a very automatic generation of grammatically well-formed
small list of words for the irregular class. The suffixes to be Telugu sentences. The input for the surface realization
added to the verbs are maintained in separated XML files engine is an XML file which has all the grammatical
and concatenated to the variants of the verb roots to form information required both at the sentence level and word
the final inflected form. level. Figure 1 shows an example XML specification
Morphology has been well studied both by corresponding to the Telugu sentence (1).
theoretical6,7 and computational linguists8,9. From a sIwa rAmudini piliciMxi. (Sita called Rama.) (1)
theoretical perspective, structure of words is explained by The notation used in Figure 1 to specify Telugu words
the following three models: in English is called as WX notation16. It is a very popular
Item and Arrangement model which is a morpheme transliteration scheme for representing Indian languages
based morphological approach in which word forms are in the ASCII character set. This scheme is widely used in
analysed as arrangements of morphemes. In this model Natural Language Processing in India. In WX notation
a morpheme is treated as the minimal meaningful unit the small case letters are used for un-aspirated consonants
of a language and words are treated as concatenation of and short vowels while the capital case letters are used for
morphemes. aspirated consonants and long vowels. The retroflexed
Item and Process model which is lexeme based voiced and voiceless consonants are mapped to‘t, T, d and
morphology in which a word form is assumed to be a D’. The dentals are mapped to ‘w, W, x and X’. Hence the
result of applying rules that alter a stem to produce a new name of the scheme “WX”, referring to the idiosyncratic
one. In these model inflectional rules, derivational rules, mapping.
and compounding rules are applied to a stem to obtain
the required word form.
Word and Paradigm model which is a word based 3. Morphological Generation
morphological approach which states generalizations Process
that hold between the different forms of inflectional
paradigms. The current work as mentioned in Section 1 is a
From a computational perspective though, the three combination of a computational model based on finite
theoretical models described above have been shown to state techniques and XML files. The computational model
offer no significant computational advantage to the finite is a java application which uses the “java.util.regex”

2 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada

<? xml version=”1.0” encoding=”UTF8” standalone=”no”>


<document>
<sentence type=” ” predicatetype=”verbal” respect=”no”>
<nounphrase role=”subject”>
<head pos=”noun” gender=”nonmasculine” number=”singular” person=”third” casemarker=” ”
stem=”basic”>sIwa</head>
</nounphrase>
<nounphrase role=”complement”>
<head pos=”noun” gender=”masculine” number=”singular” person=”third” casemarker=”ni”
stem=”basic”>rAmudu</head>
</nounphrase>
<verbphrase type=” ”>
<head pos=”verb” tensemode=”pasttense”>piluc</head>
</verbphrase>
</sentence>
</document>

Figure 1. XML Input Specification.

package. The input to the computational model is the used by the surface realization engine and therefore we
verb lemma, the tense mode of the verb, PNG (person, discuss about the finite forms in detail.
number and gender) of the subject and the case marker of
the subject. The “Pattern” class of the “regex” package has 4.1 The Imperative
a method “matches” which creates a finite state automata The imperative verbs are used to express a command or
for a given regular expression to identify the class to which a request. The meaning of the imperative verb takes the
a given verb lemma belongs. The computational model form of a command in the singular and a request in the
also computes the final constituent of the stem in the plural.
inflected verb and finally concatenates it to the required The imperative forms of the verb are only used when
suffixes extracted from the XML files. the first person in the singular addresses the second
person either in the singular or in the plural. Therefore,
4. Verb Forms the imperative forms carry two suffixes. In the case of
negative imperative the second person suffix is added
The input for the verb in the example XML specification to the verb root + “ak” (negative imperative suffix). The
of Figure 1 is as follows: imperative suffixes are as shown in Table 1.
<head pos=”verb” tensemode=”pasttense”>piluc</
Table 1. Imperative suffixes
head>
Form II Person Singular II Person Plural
The first attribute is “pos” which stands for part of
speech and the second attribute is “tensemode”. Affirmative u(in some cases “i”) aMdi
Verbs in Indian languages inflect for tense, aspect, Negative ak-u ak-aMdi
modality (mode), and PNG (person, number and gender)
endings. The verbs co-occur with tense, aspect, and Principles for the formation of the imperative verbs
modality in most of the languages whereas aspect and • The basic verb stems undergo the same changes as
modality are packed into a single verbal inflection word in the case of the negative tense when the imperative
in Telugu and referred to as “tensemode” in the current suffixes are added (see section 4.7).
work. There are a total of 18 verb forms including both • The rules of stem final vowel loss and harmony (i.e.
finite and non-finite forms which are of importance in change of medial “u” to “a” when followed by “a”)
Telugu. Our morphological engine has the capacity to apply to imperative verbs.
generate all the verb forms but only the finite forms are

Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 3
Verb Morphological Generator for Telugu

Example: pAdu (sing) + aMdipAdaMdi (request to Example: mIru gudiki rAvAlsiMxi (You must have
sing) come to the temple)
• Stems ending in “s” preceded by a long vowel change The obligative verb does not agree with the subject in
“s” to “y” in the imperative mood. These stems person, gender, or number. It occurs always in the third
optionally add the suffix “i” instead of “u” in the person non-masculine singular or without any personal
singular. When the “i” suffix is added the stem vowel suffix.
is optionally shortened and “y” becomes “yy”.
Example: 4.4 The Future Habitual
ces (do) + u ceVyyi (command to do) The future habitual tense in Telugu can express an action
ces (do) + aMdiceVyyaMdi (request to do) or a state that will take place in the future or an action
Exception: When the stem vowel is “A” it is not or state that is habitual. The sentence “nenu annaM
shortened. wiMtAnu” can mean either ‘I will eat food’ or ‘I eat food’
Example: Principles for the formation of Future Habitual Tense
rAs (write) + u rAyi(command to write) • The basic tense suffix for future habitual tense is “wA/
• In the case of basic stems having two syllables ending wun”
“c” or “s” the final consonant is replaced by “v” before • The verb stems like “ammu” (sell), “adugu” (ask)
the imperative suffix. occur unchanged before the tense suffix.
Example: Example: ammu(sell) + wAammuwA (will sell)
piluc (call) + u piluvu(command to call) • In the case of the basic stems ending in “s” or a long
kalus (meet) + aMdikalavaMdi(request to meet) vowel the tense suffix is added directly.
• When the stem variant ends in a long vowel the Example: kalus (meet) + wAkaluswA (will meet)
beginning of the imperative suffixes is dropped. • In the case of basic stems ending in “n” the tense
Example: suffix changes to “tA/tun”.
rA(come)+ u rA (command to come) Example: win + tAwiMtA
• One irregular verb in the imperative is “pax-a” (go). • Single syllable stems ending in “tt” (koVttu) (beat),
The last “a” here is treated as the imperative suffix. “pp” (ceVppu) (tell) change to “da”(kodawA) (will
beat), “bu” (cebuwA) (will tell) respectively before
4.2 The Abusive the tense suffixes “wA” and “du”(koduwuMtA)
Many verbs cannot occur in this mood due to semantic (beats), “bu” (cebuwuMtA) (tells) respectively before
restrictions. A few verbs like “kAlu” (to burn), “kUlu” (to the tense suffix “wuM”.
fall), “cAvu” (to die), “pagulu” (to break) etc., occur in this • Stems ending in “c”, “cc”, “Mc” changes those elements
mood. Some example sentences using abusive verb forms to “s” before the tense suffix.
are as follows: Example: piluc (call) + wApiluswA(will call)
nI illu kUla (May you house fall)
nI kadupu kAla (May your womb (children) burn)
4.5 The Past
nI mokaM pagala (May your face break)
In Telugu the past tense corresponds to two past tenses in
English for example “vaccAnu” in Telugu represents both
4.3 The Obligative ‘I came’ and ‘I have come’.
The obligative is formed by adding the finite or perfective Principles for the formation of past tense
form of a defective verb “vAl” to the infinitive of a main • The tense suffix “e/iM”, and the personal suffix are
verb. The finite form of this verb in the future habitual added to the verb stem to form the past tense
tense is “vAli” (must). Some example sentences using Example: piluc +iMpiliciM
Obligative verb forms are as follows: • The stem final “u” before the tense suffix “e/iM” is
nenu iMtiki veVlYlYAli (I need to go home) dropped as a result of sandhi formation.
mIru mA Uru rAvAli (You should come to our town) Example: wodugu+ ewodigA
The perfective participle of “vAl” is “vAlsi” only • A non-initial “u” in the stem becomes “i” when the
inflected in non-masculine singular. past tense suffix is added.

4 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada

Example: piluc+ epilicA Principles for the formation of negative tense


• Verb stem suffixes that end with a short vowel + n • The negative tense is formed by adding the negative
generally have “nA” as the past tense suffix but the tense suffix “a” to the basic stem followed by the
3rd person singular female has “na” as the past tense personal suffix.
suffix. Example: win (eat) + a  win-a (will not eat)
Example: win + nAwinnA • The medial “u” of the basic stems having two or more
• The past tense suffix for the verb stem “pad” (fall) is syllables of the form (C)VCuC(u) changes to “a”
“dA” but in the case of third person female singular when followed by the negative suffix.
it is “da”. Example: wodugu (wear) + a wodag-a (will not
Example: pad + dApaddA wear)
• The verb stem ending in “s” becomes “S” in some • A number of basic stems ending in “c”, “s” replaces
cases when the past tense suffix follows. these constants by “v”,” y” in the negative tense as in
Example: kalus+ e kaliSA the case of imperative.
Example: ces (do) + a ceVyya (will not do)
4.6 The Hortative piluc (call) + a piluva (will not call)
An Imperative verb that includes the speaker is called the
hortative verb. In Telugu the hortative verb is formed by 4.8 The Durative
adding to the verb stem the hortative suffix “xA” followed The durative verb is not a regular finite verb as the other
by the first person plural “mu/M”. The hortative form also finite forms discussed earlier. The durative verb is a
conveys a future meaning involving both the addresser compound verb as at least two verb roots are involved in
and the addressed. its construction (the main verb and “un”).
Principles for forming the hortative verb form are as Telugu language does not distinguish present, past
follows: and perfect continuous tenses as English does. It is shown
• The hortative tense form is obtained by adding the by the use of adverb of time or only by the context of
verb stem in the habitual future to the hortative suffix discourse.
followed by the first person plural In the absence of time specifying clues the durative
Example: ammu (sell) + xA-M ammuxAM ((we) verb carries the present continuous meaning.
will sell) The durative verb is formed by adding to the basic
• In the case of the future habitual tense forms ending verb stem the durative suffix “w/t” followed by “un” in its
in “c” and “s” they change to “d” in the hortative. finite form.
Example: piluc (call) + xA-M pil-ux + xA-M The principles for the formation of the durative finite
piluxxAM((we) will call) (Table 3 class IIa1) verb
• In the case of verb stems ending in a short vowel
4.7 The Negative Finite followed by n the durative suffix are “t”. The durative
In Telugu the negative tense happens by the formation of verb is “basic stem + t + finite form of un”
a verb paradigm rather than the use of a separate word of Example: vin + t + unvin-t-un(hearing)
negation as in most languages. The negative verbs are in • In all the other cases the durative suffix is “w”. The
the future habitual tense and negate the affirmative verb durative verb is “basic stem + w + finite form of un”
occurring in that tense. Some example sentences using Example: cus+w + uncus-w-un (seeing)
the negative finite verb forms are as follows:
nenu annaM winanu (I will not eat food) 5. Morphology Engine
vAdu iMtiki rAdu (He will not come home)
The negative suffix in Telugu is “a”. It occurs after the In the current work the verb root “piluc” of the example
verb root and before the personal suffix in the verb. The in Figure 1 becomes “piliciMxi” after going through a
personal suffix in the negative tense is same as in the other few steps. The steps the verb root undergoes to get the
tenses except for third person singular non-masculine required inflection are as follows:
and third person plural non-human which are “xi” and • Identification of the morphophonemic group based
“yi” become “xu” and “vu” in negative tense. on the tense mode.

Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 5
Verb Morphological Generator for Telugu

• Identification of the verb inflectional class of the Table 2. In the current work Table 2 is an XML file named
given verb. “tensemodeidentification.xml” and is used for identifying
• Extraction of the phonetic alternations based on the the morphophonemic group.
morphophonemic group and the verb inflection class.
• Extraction of the tense mode suffix. 5.2 Identification of Verb Inflection Class
• Extraction of the personal suffix based on person, Telugu verbs are divided into six classes’17 of which classes
gender, number of the subject. I, II, III, IV, and V are conjugations of weak (regular)
• Formation of the final inflected verb by concatenating verbs and Class VI consists of strong (irregular) verbs.
the extracted constituents to the verb root.
Class I consist of four subclasses which are as follows:
5.1 I dentification of Morphophonemic • Verb bases with three syllables of the form (C1)
Group V1C2V2C3V3 (C stands for consonant, V stands for
There are three morphophonemic groups namely A, B and vowel, and the occurrence of consonant inside ( )
C in Telugu. In the current work the morphophonemic is optional) in which “u” occurs as V2 and V3, and
group A is divided into three groups namely A123, A4, C2isnot “c” or “s”.
and A5 because the phonetic alteration of certain verb Example:wodugu (to wear), kuduru (to be settled).
classes are different for these subgroups of the group A. • Disyllabic bases of the form. (C1)V1C2V2 or (C1)
Group C is also divided into two groups namely C 1-8 V1C2C3V2.
and C9 for the same reason as A. Each of the tense modes Example: padu (to fall), ekku (to climb)
in Telugu belongs to one morphophonemic group. Table • Monosyllabic bases of the form (C1)V1C2 where “n”
2 shows the list of tense modes and the morphophonemic or “l” occur as the final consonant.
group they belong. Example: nAn (to become wet), cAl (to be sufficient)
• Disyllabic bases of (C1)V1C2V2C3 type where the final
Table 2. Tense modes and their Morphophonemic
consonant is “l” and the second vowel is “u”.
groups
Example:kadul (to move)
Tense mode Morphophonemic Group
Present Participle A123
Durative Class II consists of two subclasses which are as follows:
Habitual Future • Disyllabic bases of the (C1)V1C2V2C3 type in which
Conditional A4 the final consonant is “c” or “s” and the second vowel
Hortative A5 is “u”.
Past Participle B Example:piluc- (to call), wadus- (to get wet)
Past Tense • Monosyllabic bases of the (C1)V1C2 type in which the
Past Verbal Adjective final consonant is “c” or “s”.
Concessive Example:wis- (to take out), rac- (to smear)
Future Habitual Verbal Adjective
Conditional
Infinitive C1-8 In the implementation of the morphology engine the
Abusive Class II verbs are further divided into sub classes. The
Negative Tense subclass ‘a’ is further divided into ClassIIa1 and ClassIIa2
Negative Participle where ClassIIa1 has the final consonant as “c” and
Negative Verbal Adjective ClassIIa2 has the final consonant as “s”. The subclass ‘b’ is
Obligative also further divided into two subclasses namely ClassIIb1,
Negative Imperative and ClassIIb2.
Imperative Plural
Imperative Singular C9
Class III consists of three sub classes which are defined
as follows:
In the example of Figure 1 the tense mode for the
• A few monosyllabic bases of the form (C1)V1C2 with
verb is specified as “pasttense”. The morphophonemic
final “c” belong to this sub class.
group for past tense is identified as group B by looking at

6 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada

Example:cAc- (to stretch out) automata and the state 4 is the final state. We can see that
• A few stems in final “uc” or “c” belong to this sub the first consonant C1 is optional going to the same state.
class. In the example of Figure 1 the first consonant is “p”, which
Example:kAluc- (to set fire) the finite automata takes as input and goes to the same
• A few stems with final “inc” belong to this sub class. state 0. The finite automata then takes V1which is “i” as
Example:wittiMc- (to cause to scold) input and goes to state 1, at state 1 it takes C2which is “l” as
input and goes to state 2, at state 2 it takes V2 which is “u”
Class IV consists of two sub classes which are defined as as input and goes to state 3 and finally at state 3 it takes C3
follows: which is “c” as input and goes to the final state 4.
• Monosyllabic bases of the type (C1)V1C2C3- which
end in final “tt” or in final “pp” belong to this sub
class.
Example:kott- (to beat), ceVpp- (to tell or speak).
• Two monosyllabic bases of the same type, one in final
“nn” another in “lYlY” belong to this sub class.
Example:wann- (to kick), veVlYlY (to go) Figure 2. Finite Automata for Class IIa.

In the current work the ClassIVa sub class is further 5.3 Extraction of Phonetic Alternations
subdivided into ClassIVa1, and ClassIVa2. The extraction of phonetic alternations is done based on
Each monosyllabic base in ClassIVb is treated as the verb class and the morphophonemic group of the
a separate class and ClassIVb becomes ClassIVb1 and specified tense mode. Table 3 clearly shows the phonetic
ClassIVb2. alterations each verb class goes through in the process of
generating the final inflected form of the verb.
In the case of the verb “piluc” in the example of Figure
Class V consists of seven monosyllabic bases of type (C1)
1 it is clearly shown in Table 3 at class IIa1 under group B
V1C2 in final “n” belong to this class. The seven bases are
(to which the tense mode “pasttense” belongs) the value
an- (to say), kan-1 (to see) kan-2 (to bring forth), kon- (to
is “pil-ic”.
buy), win- (to eat), vin- (to hear).
In the current work the Table 3 is implemented in two
steps.
Class VI consists of irregular bases. The irregular bases • The required deletions and replacements are
that belong to this class are icc- (to give), cacc- (to die), performed on the verb root through the programming
weVcc- (to bring), vacc- (to come), av- (to become), pO logic.
(to go), cUc- (to see), lec- (to rise), le (to be), pax- (to go, • The required alterations to be added are extracted
depart). from the XML file.
The first part is the java programming logic which
The verb “piluc” in the example of Figure 1 is of the along with the identification of the verb class performs the
form (C1)V1C2V2C3a disyllabic base where the final required deletion to form the variant of the verb which is
consonant is “c” and the second vowel is “u”. It belongs to the final constituent of the stem in the inflected verb. The
the Class IIa. Figure 2 is the diagrammatic representation fragment of the java code which does the required process
of the finite automata created by the computational model is presented in Figure 3.
for Class IIa. The state 0 is the start state of the finite

if (Pattern.matches(“[^aAiIIuUeEoOM]?[aAiIIuUeEoOM][^aAiIIuUeEoOM][u][c]”,verb)) {
vclass = “classIIa1”;
verb = verb.substring(0,verb.length() - 2);
}

Figure 3. Fragment of Code.

Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 7
Verb Morphological Generator for Telugu

Table 3. Phonetic Alternations for the Verb Classes before the tense mode suffixes
Class Canonical form of Morphophonemic Groups
the basic alternant Group A123 Group A4 Group A5 Group B Group C1-8 Group C9
and Example word
Ia (C)VCuCu uCu uCu uCu iC aC uC
woVdugu woVd-ugu woVd-ugu woVd-ugu woVd-ig woVd-ag woVd-ug
Ib (C)VC(C)u u u u - - -
pAdu pAd-u pAd-u pAd-u pAd pAd pAd
Ic (C)Vn/l - - - - - -
nAn- nAn nAn nAn nAn nAn nAn
Id (C)VCul ul il ul il al ul
kaxul- kax-ul kax-il kax-ul kax-il kax-al kax-ul
IIa1 (C)VCuc us is ux ic av -
piluc pil-us pil-is pil-ux pil-ic pil-av pil-uc
IIa2 (C)VCus us Is ux is av -
wadus wad-us wad-is wad-ux wad-is wad-av wad-us
II b1 (C)Vs s s x s (V)yy (V)yy
wIs wI-s wI-s wI-x wI-s wi-yy wi-yy
IIb2 (C)Vc s S x c y y
vAc vA-s vA-s vA-x vA-c vA-y vA-y
IIIa (C)Vc s s x c c c
kAc kA-s kA-s kA-x kA-c kA-c kA-c
IIIb (C)VCuc us is ux c c c
kAluc kAl-us kAl-is kAl-ux kAlu-c kAlu-c kAlu-c
IIIc .*iMc is is ix iMc iMc iMc
wittiMc witt-is witt-is witt-ix witt-iMc witt-iMc witt-iMc
IVa1 (C)Vtt du Di Da tt Tt tt
koVtt koV-du koV-di koV-da koV-tt koV-tt koV-tt
IVa2 (C)Vpp bu bi ba pp pp pp
ceVpp ceV-bu ceV-bi ceV-ba ceV-pp ceV-pp ceV-pp
IVb1 (C)Vnn M M M nn Nn nn
wann wa-M wa-M wa-M wa-nn wa-nn wa-nn
IVb2 (C)VlYlY lY lY lY lYlY lYlY lYlY
veVlYlY veV-lY veV-lY veV-lY veV-lYlY veV-lYlY veV-lYlY
V (C)Vn n N N n N n
vin vi-n vi-n vi-n vi-n vi-n vi-n

The fragment of code presented in Figure 3 deletes the Group A


last two letters in the example word “piluc” as follows: Group A Suffixes begin with consonants (w, t, x, d)
pilucpil
In step 2 the alternation “ic” is extracted from the Group A123
XML file “palterations.xml”. The suffixes belonging to Group A123 are presented in
Table 4.
5.4 Extraction of Tense Mode Suffix Table 4. Suffix Alternant Table for group A123
Tense mode suffixes are those suffixes which are
Grammatical Name Suffix Alternant
agglutinated to the verb based on the tense mode.
Durative Participle wu/tu
Durative w/t
Morphophonemic Criteria Habitual Future wA/tA/wun/tun
The tense-mode suffixes in Telugu language are based
on the morphophonemic groups namely A, B, and C Group A4
discussed in section 5.1. The suffixes belonging to Group A4 are presented in Table
5.

8 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada

Table 5. Suffix Alternant Table for group A4 Table 9. Suffix Alternant Table for group C9
Grammatical Name Suffix Alternant Grammatical Name Suffix Alternant
Conditional we/te Imperative Singular u/i/-

Group A5 In the case of the verb “piluc” in the example of Figure


The suffixes belonging to Group A5 are presented in Table 1 the tense mode being “pasttense” which belongs to
6. Group B and the subject being “nonmasculine” the tense
mode suffix is “iM”.
Table 6. Suffix Alternant Table for group A5
Grammatical Name Suffix Alternant
Hortative xA
5.5 Extraction of Personal Suffix
Telugu verbs inflect to encode gender, number, and
person suffixes of the subject. In the current work the
Group B morphology engine gets the information about attributes
Group B suffixes begin with a front vowel (i, eV, e). The of the subject and uses that information to agglutinate the
suffixes belonging to Group B are presented in Table 7. gender, number, person suffixes and tense mode suffix to
Table 7. Suffix Alternant Table for group B the verb. The eight personal suffixes of the finite verb for
Grammatical Name Suffix Alternant different persons and numbers are listed in Table 10.
Past Participle i
Table 10. Personal Suffixes
Past Tense e
Person Singular Plural
iM
nA 1st person -nu -mu
dA 2nd person -vu -ru
Past Verbal Adjective ina/na 3rd person -du (masculine) -ru (human)
Concessive inA/nA 3rd person -xi (non-masculine) -yi (non-human)
Future Habitual Verbal Adjective E
Conditional iwe In the case of the verb “piluc” in the example of Figure
1 the subject “sIwa” is non-masculine, singular, 3rd person
Group C which means the personal suffix is “xi” from the above
Group C suffixes begin with a back vowel (a, A, u). table.

Group C1-8
5.6 Formation of the Final Inflected Verb
The suffixes belonging to Group C1-8 are presented in
The final inflected verb is formed by concatenation of all
Table 8.
the strings formed from section 5.3 to section 5.5.
Table 8. Suffix Alternant Table for group C1-8 Final verb verb +phonetic alternation+ tense mode
Grammatical Name Suffix Alternant suffix+ personal suffix
Infinitive a/an/- In the case of the verb “piluc” in the example of Figure
Abusive a/nu 1:
Negative Tense a/- Final verb pil+ic+iM+xi which is piliciMxi.
Negative Participle aka/ka
akunda/kunda
Negative Verbal Adjective ani/ni 6. Evaluation
Obligative ali
Negative Imperative aku/ku We report an evaluation of the accuracy of our Telugu
Imperative Plural aMdi/ndi Verb Morphology engine with respect to the Telugu verb
database downloaded from the Telugu Wiktionary at
Group C9 https://en.wiktionary.org/wiki/Category:Telugu_verbs.
The suffixes belonging to Group C9 are presented in Table We have run the verbs in the database by giving them as
9. input to the surface realizer rather than the morphology

Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 9
Verb Morphological Generator for Telugu

engine separately because we wanted to test them with 7. Conclusion


various alternatives of the subject with respect to person,
number and gender. In this paper we described a morphology engine for
A total of 503 verbs were downloaded and the Telugu verbs based on finite state techniques. We intend
evaluation was performed. The verbs were tested for to extend the coverage of the morphology engine based
habitual future, durative and past tense. The most on the utility of the verbs in NLG applications. We also
important part of the evaluation was categorizing the intend to look into the possibility of creating a generalized
verbs into different verb classes based on our reference morphology engine which can generate the intended
grammar book18. The results of the evaluation are given word forms for all the word classes in Telugu language.
in Table 11.

Table 11. Evaluation Results 8. References


Class No. of verbs identified in each class
1. Rao GUM, Kulkarni PA. Computer Applications in Indian
Ia 33
Languages, Hyderabad: The Centre for Distance Education,
Ib 118 University of Hyderabad. 2006.
Ic 2 2. Dokkara SRS, Penumathsa SV, Sripada SG. A Simple Sur-
Id 1 face Realization Engine for Telugu. Brighton: Proceedings
of the 15th European Workshop on Natural Language Gen-
II a1 10
eration (ENLG). 2015 Sep; p. 1-8.
II a2 0 3. Minnen GJ, Carroll, Robust DP. Applied morphological
II b1 24 generation. Mitzpe Ramon, Israel: Proceedings of the 1st
II b2 0 International Natural Language Generation Conference.
2000; p. 201-8.
III a 4
4. Reiter E, Dale R. New York: Cambridge University Press:
III b 6 Building natural language generation systems. 2000.
III c 141 Crossref
IV a1 26 5. Gatt A, Reiter E. SimpleNLG: A realization engine for prac-
tical applications. Proceedings of ENLG. 2009; p. 90-93.
IV a2 3
Crossref
IV b1 4 6. Hockett. Two models of grammatical description. Word.
IV b2 0 1954; 10:210-34.
V 38 7. Gregory T. Cambridge University Press: Stump: Inflectional
VI 8 Morphology: A Theory of Paradigm Structure. 2001.
8. Beesley Kenneth R, Karttunen L. Palo Alto, CA: CSLI Pub-
lications: Finite State Morphology. 2003.
The results show that 418 verbs were identified as to 9. Roark Brain, Sproat R. Oxford: Computational approaches
to Morphology and Syntax. 2007.
belonging to different classes and were able to generate 10. Karttunen L. Computing with Realizational Morphology.
the different verb forms without any errors. The results 2003.
also show that 85 verbs were not recognized as belonging 11. Karttunen L, Kenneth R Beesley. Twenty-Five Years of Fi-
to any verb class according to the current work. Among nite State Morphology. Inquiries into Words, Constraints
the 85 words which were not identified as belonging to and Contexts. 2005.
12. Antony PJ, Soman KP. Computational morphology and
the verb classes are words like “pilupu” which are not
natural language parsing for Indian languages: a literature
considered as verbs according to our grammar reference. survey. International Journal of Computer Science and En-
Some of the words end with “agu” which means “to gineering Technology. 2012.
become” but our grammar reference considers only “avu” 13. Goyal V, Lehal GS. Portland, Oregon, USA: Hindi to Pun-
as the verb to be used to mean “to become” and we did not jabi Machine Translation System. Proceedings of the ACL-
consider these two to be the same otherwise the number HLT System Demonstrations. 2011; 21:1-6.
14. Sri Badri Narayanan R, Saravanan S, Soman KP. Data Driv-
of failed verbs would have been reduced by 20. We intend en Suffix List and Concatenation Algorithm for Telugu
to use this evaluation results to drive the development of Morphological Generator. International Journal of Engi-
the morphology engine to extend the coverage. neering Science and Technology (IJEST). 2009.

10 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada

15. Ganapathiraju M, Levin L. TelMore: Morphological Gener- 17. Krishnamurti BH. Berkley and Los Angeles: University of
ator for Telugu Nouns and Verbs. Alexandria, Egypt: Pro- California Press: Telugu Verbal Bases a comparative and
ceedings of Second International Conference on Universal Descriptive Study. 1961.
Digital Library. 2006; p. 17-19. 18. Krishnamurti BH, Gwynn JPL. Oxford University Press: A
16. Bharati A, Chaitanya V, Sangal R. New Delhi, Prentice-Hall Grammar of Modern Telugu. 1985.
of India: Natural Language Processing A Paninian Perspec-
tive. 1995.

Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 11

Você também pode gostar