Escolar Documentos
Profissional Documentos
Cultura Documentos
Indian Journal of Science and Technology, Vol 10(13), DOI: 10.17485/ijst/2017/v10i13/110448, April 2017 ISSN (Online) : 0974-5645
Abstract
Objectives: The current work is a morphological generation engine that generates the required inflected Telugu
verb form from an input specification consisting of lexicalized grammatical constituents and associated features.
Methods/Statistical Analysis: The method employed in this paper is based on finite state techniques to develop a
computational model for morphological generation of verbs in Telugu. The current work is a module of a surface realization
engine for Telugu, a java application developed for generation of well-formed Telugu sentences. Test samples were taken
from grammar text books for Telugu language and tested thoroughly with various alternatives of the subject with respect
to person, number and gender. Findings: The evaluation was performed on a small data set because bigger authentic data
sets were not available online. Hence the findings cannot be generalized but the results show that the verbs are not evenly
distributed across all the classes. The results also show that no verbs were found belonging to some of the classes which
means verbs belonging to those classes are not regularly used. The findings cannot be compared with any other results
published because very little work was done previously in this area of research in Telugu language. The evaluation report
clearly suggests that instead of going for complete coverage of verbs better to extend the coverage based on utility in NLG
systems. Application/Improvements: The current work has its application in general purpose surface realization engines
and machine translation systems. We intend to create a generalized morphology engine which generates the required word
form for Telugu words.
classes of verbs. Similarly morphology of nouns involves state approaches that have been widely applied to building
defining morphology for seven different classes of MAs and MGs for a diverse range of languages3,8-11.
nouns with respect to the grammatical feature number. Therefore, in the current work we apply finite state
We therefore have two separate morphological engines techniques to Telugu Morphology. Amongst Indian
one for verbs and one for nouns and pronouns. In the languages12 reported highest number of morphology
implementation instead of using tools like Flex or JFlex tools for Tamil. According to their survey a wide range of
we programmed our morphological engine in Java using approaches, from corpus based through suffix stripping
the regular expression package. As mentioned earlier to finite state exist. A database approach is described
in this paper we describe the morphological engine for in 13where they store all the word forms in a relational
verbs. database. For Telugu language, 1describes a word and
The process of verb morphology depends on the way paradigm based morphological analyser and generator.
in which the verbs are classified. Linguistic classification An item and arrangement based morphological generator
of verbs in Telugu into a small number of conjugation is described in 14for Telugu. A rule based (item and
types is done based on the morphophonemic changes process based) morphological generator for Telugu is
the verb stems undergo when inflected with tense-mode describe in15.
suffixes. The model of the analysis decides the number of
types into which the verbs can be classified.
In the current work the verb morphological generator
2. Input Specification
does not have an explicit lexicon or word list but has a The verb morphology engine in the current work is part
computational model based on finite state techniques to of a surface realization engine which is responsible for
classify all the verbs into a few regular classes and a very automatic generation of grammatically well-formed
small list of words for the irregular class. The suffixes to be Telugu sentences. The input for the surface realization
added to the verbs are maintained in separated XML files engine is an XML file which has all the grammatical
and concatenated to the variants of the verb roots to form information required both at the sentence level and word
the final inflected form. level. Figure 1 shows an example XML specification
Morphology has been well studied both by corresponding to the Telugu sentence (1).
theoretical6,7 and computational linguists8,9. From a sIwa rAmudini piliciMxi. (Sita called Rama.) (1)
theoretical perspective, structure of words is explained by The notation used in Figure 1 to specify Telugu words
the following three models: in English is called as WX notation16. It is a very popular
Item and Arrangement model which is a morpheme transliteration scheme for representing Indian languages
based morphological approach in which word forms are in the ASCII character set. This scheme is widely used in
analysed as arrangements of morphemes. In this model Natural Language Processing in India. In WX notation
a morpheme is treated as the minimal meaningful unit the small case letters are used for un-aspirated consonants
of a language and words are treated as concatenation of and short vowels while the capital case letters are used for
morphemes. aspirated consonants and long vowels. The retroflexed
Item and Process model which is lexeme based voiced and voiceless consonants are mapped to‘t, T, d and
morphology in which a word form is assumed to be a D’. The dentals are mapped to ‘w, W, x and X’. Hence the
result of applying rules that alter a stem to produce a new name of the scheme “WX”, referring to the idiosyncratic
one. In these model inflectional rules, derivational rules, mapping.
and compounding rules are applied to a stem to obtain
the required word form.
Word and Paradigm model which is a word based 3. Morphological Generation
morphological approach which states generalizations Process
that hold between the different forms of inflectional
paradigms. The current work as mentioned in Section 1 is a
From a computational perspective though, the three combination of a computational model based on finite
theoretical models described above have been shown to state techniques and XML files. The computational model
offer no significant computational advantage to the finite is a java application which uses the “java.util.regex”
2 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada
package. The input to the computational model is the used by the surface realization engine and therefore we
verb lemma, the tense mode of the verb, PNG (person, discuss about the finite forms in detail.
number and gender) of the subject and the case marker of
the subject. The “Pattern” class of the “regex” package has 4.1 The Imperative
a method “matches” which creates a finite state automata The imperative verbs are used to express a command or
for a given regular expression to identify the class to which a request. The meaning of the imperative verb takes the
a given verb lemma belongs. The computational model form of a command in the singular and a request in the
also computes the final constituent of the stem in the plural.
inflected verb and finally concatenates it to the required The imperative forms of the verb are only used when
suffixes extracted from the XML files. the first person in the singular addresses the second
person either in the singular or in the plural. Therefore,
4. Verb Forms the imperative forms carry two suffixes. In the case of
negative imperative the second person suffix is added
The input for the verb in the example XML specification to the verb root + “ak” (negative imperative suffix). The
of Figure 1 is as follows: imperative suffixes are as shown in Table 1.
<head pos=”verb” tensemode=”pasttense”>piluc</
Table 1. Imperative suffixes
head>
Form II Person Singular II Person Plural
The first attribute is “pos” which stands for part of
speech and the second attribute is “tensemode”. Affirmative u(in some cases “i”) aMdi
Verbs in Indian languages inflect for tense, aspect, Negative ak-u ak-aMdi
modality (mode), and PNG (person, number and gender)
endings. The verbs co-occur with tense, aspect, and Principles for the formation of the imperative verbs
modality in most of the languages whereas aspect and • The basic verb stems undergo the same changes as
modality are packed into a single verbal inflection word in the case of the negative tense when the imperative
in Telugu and referred to as “tensemode” in the current suffixes are added (see section 4.7).
work. There are a total of 18 verb forms including both • The rules of stem final vowel loss and harmony (i.e.
finite and non-finite forms which are of importance in change of medial “u” to “a” when followed by “a”)
Telugu. Our morphological engine has the capacity to apply to imperative verbs.
generate all the verb forms but only the finite forms are
Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 3
Verb Morphological Generator for Telugu
Example: pAdu (sing) + aMdipAdaMdi (request to Example: mIru gudiki rAvAlsiMxi (You must have
sing) come to the temple)
• Stems ending in “s” preceded by a long vowel change The obligative verb does not agree with the subject in
“s” to “y” in the imperative mood. These stems person, gender, or number. It occurs always in the third
optionally add the suffix “i” instead of “u” in the person non-masculine singular or without any personal
singular. When the “i” suffix is added the stem vowel suffix.
is optionally shortened and “y” becomes “yy”.
Example: 4.4 The Future Habitual
ces (do) + u ceVyyi (command to do) The future habitual tense in Telugu can express an action
ces (do) + aMdiceVyyaMdi (request to do) or a state that will take place in the future or an action
Exception: When the stem vowel is “A” it is not or state that is habitual. The sentence “nenu annaM
shortened. wiMtAnu” can mean either ‘I will eat food’ or ‘I eat food’
Example: Principles for the formation of Future Habitual Tense
rAs (write) + u rAyi(command to write) • The basic tense suffix for future habitual tense is “wA/
• In the case of basic stems having two syllables ending wun”
“c” or “s” the final consonant is replaced by “v” before • The verb stems like “ammu” (sell), “adugu” (ask)
the imperative suffix. occur unchanged before the tense suffix.
Example: Example: ammu(sell) + wAammuwA (will sell)
piluc (call) + u piluvu(command to call) • In the case of the basic stems ending in “s” or a long
kalus (meet) + aMdikalavaMdi(request to meet) vowel the tense suffix is added directly.
• When the stem variant ends in a long vowel the Example: kalus (meet) + wAkaluswA (will meet)
beginning of the imperative suffixes is dropped. • In the case of basic stems ending in “n” the tense
Example: suffix changes to “tA/tun”.
rA(come)+ u rA (command to come) Example: win + tAwiMtA
• One irregular verb in the imperative is “pax-a” (go). • Single syllable stems ending in “tt” (koVttu) (beat),
The last “a” here is treated as the imperative suffix. “pp” (ceVppu) (tell) change to “da”(kodawA) (will
beat), “bu” (cebuwA) (will tell) respectively before
4.2 The Abusive the tense suffixes “wA” and “du”(koduwuMtA)
Many verbs cannot occur in this mood due to semantic (beats), “bu” (cebuwuMtA) (tells) respectively before
restrictions. A few verbs like “kAlu” (to burn), “kUlu” (to the tense suffix “wuM”.
fall), “cAvu” (to die), “pagulu” (to break) etc., occur in this • Stems ending in “c”, “cc”, “Mc” changes those elements
mood. Some example sentences using abusive verb forms to “s” before the tense suffix.
are as follows: Example: piluc (call) + wApiluswA(will call)
nI illu kUla (May you house fall)
nI kadupu kAla (May your womb (children) burn)
4.5 The Past
nI mokaM pagala (May your face break)
In Telugu the past tense corresponds to two past tenses in
English for example “vaccAnu” in Telugu represents both
4.3 The Obligative ‘I came’ and ‘I have come’.
The obligative is formed by adding the finite or perfective Principles for the formation of past tense
form of a defective verb “vAl” to the infinitive of a main • The tense suffix “e/iM”, and the personal suffix are
verb. The finite form of this verb in the future habitual added to the verb stem to form the past tense
tense is “vAli” (must). Some example sentences using Example: piluc +iMpiliciM
Obligative verb forms are as follows: • The stem final “u” before the tense suffix “e/iM” is
nenu iMtiki veVlYlYAli (I need to go home) dropped as a result of sandhi formation.
mIru mA Uru rAvAli (You should come to our town) Example: wodugu+ ewodigA
The perfective participle of “vAl” is “vAlsi” only • A non-initial “u” in the stem becomes “i” when the
inflected in non-masculine singular. past tense suffix is added.
4 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada
Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 5
Verb Morphological Generator for Telugu
• Identification of the verb inflectional class of the Table 2. In the current work Table 2 is an XML file named
given verb. “tensemodeidentification.xml” and is used for identifying
• Extraction of the phonetic alternations based on the the morphophonemic group.
morphophonemic group and the verb inflection class.
• Extraction of the tense mode suffix. 5.2 Identification of Verb Inflection Class
• Extraction of the personal suffix based on person, Telugu verbs are divided into six classes’17 of which classes
gender, number of the subject. I, II, III, IV, and V are conjugations of weak (regular)
• Formation of the final inflected verb by concatenating verbs and Class VI consists of strong (irregular) verbs.
the extracted constituents to the verb root.
Class I consist of four subclasses which are as follows:
5.1 I dentification of Morphophonemic • Verb bases with three syllables of the form (C1)
Group V1C2V2C3V3 (C stands for consonant, V stands for
There are three morphophonemic groups namely A, B and vowel, and the occurrence of consonant inside ( )
C in Telugu. In the current work the morphophonemic is optional) in which “u” occurs as V2 and V3, and
group A is divided into three groups namely A123, A4, C2isnot “c” or “s”.
and A5 because the phonetic alteration of certain verb Example:wodugu (to wear), kuduru (to be settled).
classes are different for these subgroups of the group A. • Disyllabic bases of the form. (C1)V1C2V2 or (C1)
Group C is also divided into two groups namely C 1-8 V1C2C3V2.
and C9 for the same reason as A. Each of the tense modes Example: padu (to fall), ekku (to climb)
in Telugu belongs to one morphophonemic group. Table • Monosyllabic bases of the form (C1)V1C2 where “n”
2 shows the list of tense modes and the morphophonemic or “l” occur as the final consonant.
group they belong. Example: nAn (to become wet), cAl (to be sufficient)
• Disyllabic bases of (C1)V1C2V2C3 type where the final
Table 2. Tense modes and their Morphophonemic
consonant is “l” and the second vowel is “u”.
groups
Example:kadul (to move)
Tense mode Morphophonemic Group
Present Participle A123
Durative Class II consists of two subclasses which are as follows:
Habitual Future • Disyllabic bases of the (C1)V1C2V2C3 type in which
Conditional A4 the final consonant is “c” or “s” and the second vowel
Hortative A5 is “u”.
Past Participle B Example:piluc- (to call), wadus- (to get wet)
Past Tense • Monosyllabic bases of the (C1)V1C2 type in which the
Past Verbal Adjective final consonant is “c” or “s”.
Concessive Example:wis- (to take out), rac- (to smear)
Future Habitual Verbal Adjective
Conditional
Infinitive C1-8 In the implementation of the morphology engine the
Abusive Class II verbs are further divided into sub classes. The
Negative Tense subclass ‘a’ is further divided into ClassIIa1 and ClassIIa2
Negative Participle where ClassIIa1 has the final consonant as “c” and
Negative Verbal Adjective ClassIIa2 has the final consonant as “s”. The subclass ‘b’ is
Obligative also further divided into two subclasses namely ClassIIb1,
Negative Imperative and ClassIIb2.
Imperative Plural
Imperative Singular C9
Class III consists of three sub classes which are defined
as follows:
In the example of Figure 1 the tense mode for the
• A few monosyllabic bases of the form (C1)V1C2 with
verb is specified as “pasttense”. The morphophonemic
final “c” belong to this sub class.
group for past tense is identified as group B by looking at
6 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada
Example:cAc- (to stretch out) automata and the state 4 is the final state. We can see that
• A few stems in final “uc” or “c” belong to this sub the first consonant C1 is optional going to the same state.
class. In the example of Figure 1 the first consonant is “p”, which
Example:kAluc- (to set fire) the finite automata takes as input and goes to the same
• A few stems with final “inc” belong to this sub class. state 0. The finite automata then takes V1which is “i” as
Example:wittiMc- (to cause to scold) input and goes to state 1, at state 1 it takes C2which is “l” as
input and goes to state 2, at state 2 it takes V2 which is “u”
Class IV consists of two sub classes which are defined as as input and goes to state 3 and finally at state 3 it takes C3
follows: which is “c” as input and goes to the final state 4.
• Monosyllabic bases of the type (C1)V1C2C3- which
end in final “tt” or in final “pp” belong to this sub
class.
Example:kott- (to beat), ceVpp- (to tell or speak).
• Two monosyllabic bases of the same type, one in final
“nn” another in “lYlY” belong to this sub class.
Example:wann- (to kick), veVlYlY (to go) Figure 2. Finite Automata for Class IIa.
In the current work the ClassIVa sub class is further 5.3 Extraction of Phonetic Alternations
subdivided into ClassIVa1, and ClassIVa2. The extraction of phonetic alternations is done based on
Each monosyllabic base in ClassIVb is treated as the verb class and the morphophonemic group of the
a separate class and ClassIVb becomes ClassIVb1 and specified tense mode. Table 3 clearly shows the phonetic
ClassIVb2. alterations each verb class goes through in the process of
generating the final inflected form of the verb.
In the case of the verb “piluc” in the example of Figure
Class V consists of seven monosyllabic bases of type (C1)
1 it is clearly shown in Table 3 at class IIa1 under group B
V1C2 in final “n” belong to this class. The seven bases are
(to which the tense mode “pasttense” belongs) the value
an- (to say), kan-1 (to see) kan-2 (to bring forth), kon- (to
is “pil-ic”.
buy), win- (to eat), vin- (to hear).
In the current work the Table 3 is implemented in two
steps.
Class VI consists of irregular bases. The irregular bases • The required deletions and replacements are
that belong to this class are icc- (to give), cacc- (to die), performed on the verb root through the programming
weVcc- (to bring), vacc- (to come), av- (to become), pO logic.
(to go), cUc- (to see), lec- (to rise), le (to be), pax- (to go, • The required alterations to be added are extracted
depart). from the XML file.
The first part is the java programming logic which
The verb “piluc” in the example of Figure 1 is of the along with the identification of the verb class performs the
form (C1)V1C2V2C3a disyllabic base where the final required deletion to form the variant of the verb which is
consonant is “c” and the second vowel is “u”. It belongs to the final constituent of the stem in the inflected verb. The
the Class IIa. Figure 2 is the diagrammatic representation fragment of the java code which does the required process
of the finite automata created by the computational model is presented in Figure 3.
for Class IIa. The state 0 is the start state of the finite
if (Pattern.matches(“[^aAiIIuUeEoOM]?[aAiIIuUeEoOM][^aAiIIuUeEoOM][u][c]”,verb)) {
vclass = “classIIa1”;
verb = verb.substring(0,verb.length() - 2);
}
Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 7
Verb Morphological Generator for Telugu
Table 3. Phonetic Alternations for the Verb Classes before the tense mode suffixes
Class Canonical form of Morphophonemic Groups
the basic alternant Group A123 Group A4 Group A5 Group B Group C1-8 Group C9
and Example word
Ia (C)VCuCu uCu uCu uCu iC aC uC
woVdugu woVd-ugu woVd-ugu woVd-ugu woVd-ig woVd-ag woVd-ug
Ib (C)VC(C)u u u u - - -
pAdu pAd-u pAd-u pAd-u pAd pAd pAd
Ic (C)Vn/l - - - - - -
nAn- nAn nAn nAn nAn nAn nAn
Id (C)VCul ul il ul il al ul
kaxul- kax-ul kax-il kax-ul kax-il kax-al kax-ul
IIa1 (C)VCuc us is ux ic av -
piluc pil-us pil-is pil-ux pil-ic pil-av pil-uc
IIa2 (C)VCus us Is ux is av -
wadus wad-us wad-is wad-ux wad-is wad-av wad-us
II b1 (C)Vs s s x s (V)yy (V)yy
wIs wI-s wI-s wI-x wI-s wi-yy wi-yy
IIb2 (C)Vc s S x c y y
vAc vA-s vA-s vA-x vA-c vA-y vA-y
IIIa (C)Vc s s x c c c
kAc kA-s kA-s kA-x kA-c kA-c kA-c
IIIb (C)VCuc us is ux c c c
kAluc kAl-us kAl-is kAl-ux kAlu-c kAlu-c kAlu-c
IIIc .*iMc is is ix iMc iMc iMc
wittiMc witt-is witt-is witt-ix witt-iMc witt-iMc witt-iMc
IVa1 (C)Vtt du Di Da tt Tt tt
koVtt koV-du koV-di koV-da koV-tt koV-tt koV-tt
IVa2 (C)Vpp bu bi ba pp pp pp
ceVpp ceV-bu ceV-bi ceV-ba ceV-pp ceV-pp ceV-pp
IVb1 (C)Vnn M M M nn Nn nn
wann wa-M wa-M wa-M wa-nn wa-nn wa-nn
IVb2 (C)VlYlY lY lY lY lYlY lYlY lYlY
veVlYlY veV-lY veV-lY veV-lY veV-lYlY veV-lYlY veV-lYlY
V (C)Vn n N N n N n
vin vi-n vi-n vi-n vi-n vi-n vi-n
8 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada
Table 5. Suffix Alternant Table for group A4 Table 9. Suffix Alternant Table for group C9
Grammatical Name Suffix Alternant Grammatical Name Suffix Alternant
Conditional we/te Imperative Singular u/i/-
Group C1-8
5.6 Formation of the Final Inflected Verb
The suffixes belonging to Group C1-8 are presented in
The final inflected verb is formed by concatenation of all
Table 8.
the strings formed from section 5.3 to section 5.5.
Table 8. Suffix Alternant Table for group C1-8 Final verb verb +phonetic alternation+ tense mode
Grammatical Name Suffix Alternant suffix+ personal suffix
Infinitive a/an/- In the case of the verb “piluc” in the example of Figure
Abusive a/nu 1:
Negative Tense a/- Final verb pil+ic+iM+xi which is piliciMxi.
Negative Participle aka/ka
akunda/kunda
Negative Verbal Adjective ani/ni 6. Evaluation
Obligative ali
Negative Imperative aku/ku We report an evaluation of the accuracy of our Telugu
Imperative Plural aMdi/ndi Verb Morphology engine with respect to the Telugu verb
database downloaded from the Telugu Wiktionary at
Group C9 https://en.wiktionary.org/wiki/Category:Telugu_verbs.
The suffixes belonging to Group C9 are presented in Table We have run the verbs in the database by giving them as
9. input to the surface realizer rather than the morphology
Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 9
Verb Morphological Generator for Telugu
10 Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology
Sasi Raja Sekhar Dokkara, Suresh Varma Penumathsa and Somayajulu G. Sripada
15. Ganapathiraju M, Levin L. TelMore: Morphological Gener- 17. Krishnamurti BH. Berkley and Los Angeles: University of
ator for Telugu Nouns and Verbs. Alexandria, Egypt: Pro- California Press: Telugu Verbal Bases a comparative and
ceedings of Second International Conference on Universal Descriptive Study. 1961.
Digital Library. 2006; p. 17-19. 18. Krishnamurti BH, Gwynn JPL. Oxford University Press: A
16. Bharati A, Chaitanya V, Sangal R. New Delhi, Prentice-Hall Grammar of Modern Telugu. 1985.
of India: Natural Language Processing A Paninian Perspec-
tive. 1995.
Vol 10 (13) | April 2017 | www.indjst.org Indian Journal of Science and Technology 11