Você está na página 1de 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/319702869

Amharic-English Speech Translation in Tourism Domain

Conference Paper · September 2017


DOI: 10.18653/v1/W17-4608

CITATIONS READS

0 878

3 authors:

Michael Melese Woldeyohannis Laurent Besacier


Addis Ababa University Université Grenoble Alpes
7 PUBLICATIONS   9 CITATIONS    295 PUBLICATIONS   2,398 CITATIONS   

SEE PROFILE SEE PROFILE

Million Meshesha
Addis Ababa University
39 PUBLICATIONS   180 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Iban ASR View project

intrusion detection View project

All content following this page was uploaded by Michael Melese Woldeyohannis on 14 September 2017.

The user has requested enhancement of the downloaded file.


Amharic-English Speech Translation in Tourism Domain
Michael Melese Woldeyohannis
Addis Ababa University, Addis Ababa, Ethiopia
michael.melese@aau.edu.et

Laurent Besacier Million Meshesha


LIG Laboratory, UJF, BP53, Addis Ababa University,
38041 Grenoble Cedex 9, France Addis Ababa, Ethiopia
laurent.besacier@imag.fr michael.melese@aau.edu.et

Abstract guage using a computer (Gao et al., 2006). Speech


translation research for major and technologi-
This paper describes speech translation
cal supported languages like English, European
from Amharic-to-English, particularly
languages (like French and Spanish) and Asian
Automatic Speech Recognition (ASR)
languages (like Japanese and Chinese) has been
with post-editing feature and Amharic-
conducted since the 1983s by NEC Corporation
English Statistical Machine Translation
(Kurematsu, 1996). The advancement of speech
(SMT). ASR experiment is conducted
translation captivates the communication between
using morpheme language model (LM)
people who do not share the same language.
and phoneme acoustic model (AM).
The state-of-the-art of speech translation sys-
Likewise, SMT conducted using word and
tem can be seen as the integration of three major
morpheme as unit.
cascading components (Gao et al., 2006; Jurafsky
Morpheme based translation shows a 6.29 and Martin, 2008); Automatic Speech Recognition
BLEU score at a 76.4% of recognition (ASR), Machine Translation (MT) and Text-To-
accuracy while word based translation Speech (TTS) synthesis.
shows a 12.83 BLEU score using 77.4% ASR is the process by which a machine infers
word recognition accuracy. Further, after spoken words, by means of talking to computer,
post-edit on Amharic ASR using corpus and having it correctly understand a recorded au-
based n-gram, the word recognition accu- dio signal. Beside ASR, MT is the process by
racy increased by 1.42%. Since post-edit which a machine is used to translate a text from
approach reduces error propagation, the one source language to another target language.
word based translation accuracy improved Finally, TTS creates a spoken version from the text
by 0.25 (1.95%) BLEU score. of electronic document such as text file and web
We are now working towards further im- document.
proving propagated errors through differ- As one major component of speech transla-
ent algorithms at each unit of speech trans- tion, Amharic ASR started in 2001 (Melese
lation cascading component. et al., 2016). A number of attempts have been
made for Amharic ASR using different methods
1 Introduction
and techniques towards designing speaker inde-
Speech is one of the most natural form of com- pendent, large vocabulary, contineous speech and
munication for humankind (Honda, 2003). Com- spontaneous speech recognition.
puter with the ability to understand natural lan- In addition to ASR, a preliminary English-
guage promoted the development of man-machine Amharic machine translation experiments was
interface. This can be extended through different conducted using phonemic transcription on the
digital platforms such as radio, mobile, TV, CD Amharic corpus (Teshome et al., 2015). The
and others. Through these, speech translation fa- result obtained from the experiment shows that,
cilitates communication between the people who it is possible to design English-Amharic machine
speak different languages. translation using statistical method.
Speech translation is the process by which spo- As the last component of speech translation,
ken source phrases are translated to a target lan- a number of TTS research have been attempted

59
Proceedings of the First Workshop on Speech-Centric Natural Language Processing, pages 59–66
c
Copenhagen, Denmark, September 7–11, 2017. 2017 Association for Computational Linguistics
using different techniques and methods as dis- based writing system called fidel (âÔl) written
cussed by (Anberbir and Takara, 2009). Among and read from left to right. Amharic graphemes
these, concatenative, cepstral, formant and a sylla- are represented as a sequence of consonant vowel
ble based speech synthesizers were the main meth- (CV) pairs, the basic shape determined by the con-
ods and techniques applied. sonant, which is modified for the vowel.
All the above research works were conducted The Amharic writing system is composed of
using different methods and techniques beside four distinct categories consisting of 276 different
data difference and integration as a cascading symbols; 33 core characters with 7 orders (€, ∫,
component. Moreover, dataset and tools used in ‚, ƒ, „, … and †), 4 labiovelars with 5 orders sym-
the above research are not accessible which makes bol (q, u, k and g), 18 labialized consonants with
difficult to evaluate the advancement of research 1 order (wƒ) and 1 labiodental characters consist-
in speech technology for local languages. ing 7 orders (€, ∫, ‚, ƒ, „, … and †).
However, there is no attempt to integrate ASR, In Amharic writing system, all the 276 distinct
SMT and TTS to come up with speech transla- orthographic representation are indispensable due
tion system for Amharic language. Thus, the main to their distinct orthographic representation.
aim of this study is to investigate the possibility However, as part of speech translation, speech
to design Amharic-English speech translation sys- recognition mainly deals with distinct sound.
tem that controls recognition errors propagating Among those, some of the graphemes generate
through cascading components. same sound like (h, M, u and Ω) pronounced as
h/h/.
2 Amharic Language On the other hand, Machine translation empha-
sizes on orthographic representation which result
Amharic is a Semitic language derived from Ge’ez
the same meaning in different graphemes. As a
with the second largest speaker in the world
result, normalization is required to minimize the
next to Arabic (Simons and Fennig, 2017). The
graphemes variation which leads to better trans-
name Amharic (€≈r{) comes from the district
lation while minimizing the ASR model. Table 1
of Amhara (€≈•) in northern Ethiopia, which is
presents the Amharic character set before and after
thought to be the historic, classical and ecclesi-
normalization.
astical language of Ethiopia. Moreover, the lan-
guage Amharic has five dialectical variations spo- Unnormalized Normalized Difference

ken named as: Addis Ababa, Gojam, Gonder, Core Character 33 27 6


Labiovelar 4 4 0
Wollo and Menz. Labialized 18 18 0
Amharic is the official working language of Labiodental 1 1 0

government of Ethiopia among the 89 languages Total 276 234 42

registered in the country with up to 200 differ- Table 1: Distribution of Amharic character set
ent spoken dialects (Simons and Fennig, 2017; adopted and modified from (Melese et al., 2016)
Thompson, 2016). Beside these, Amharic lan-
guage is being used in governmental administra-
As a result, graphemes that generate the same
tion, public media and national commerce of some
sound are normalized in to the seven order of core
regional states of the country. This includes; Addis
character. The normalization is based on the usage
Ababa, Amhara, Diredawa and Southern Nations,
of most characters frequency in Amharic text doc-
Nationalities and People (SNNP).
ument. This includes, normalization from (h, M,
Amharic language is spoken by more than 25
u and Ω) to h, (…, e) to …, (U, s) to s and (Õ, Ý)
million with up to 22 million native speakers. The
to Õ along with order.
majority of Amharic speakers found in Ethiopia
even though there are also speakers in a number 3 Tourism in Ethiopia
of other countries, particularly Italy, Canada, the
USA and Sweden. Tourism is the activity of traveling to and stay-
Unlike other Semitic languages, such as Ara- ing in places outside their usual environment
bic and Hebrew, modern Amharic script has in- for not more than one year to create a direct
herited its writing system from Ge’ez (gez) (Yi- contact between people and cultures (UNWTO,
mam, 2000). Amharic language uses a grapheme 2016). Ethiopia has much to offer for international

60
tourists1 ranging from the peaks of the rugged one step further helps in solving language barriers
Semien mountains to the lowest points on earth problem.
called Danakil Depression which is more than 400 Therefore, this study attempts to come up with
feet below sea level. an Amharic-English speech translation system
In addition, tourism become a pleasing sustain- taking tourism as a domain.
able economic development that serves as an alter-
native source of foreign exchange for the counties 4 Data Preparation
like Ethiopia.
Moreover, The 2015 United Nations World Nowadays, Amharic language suffers from a lack
Tourism report (UNWTO, 2016) and the World of speech and text corpora for ASR and SMT. Be-
Bank2 report indicate that, in 2015 a total of side these, collecting standardized and annotated
864,000 non-resident tourists come to Ethiopia to corpora is one of the most challenging and ex-
visit different tourist attraction. These include; pensive tasks when working with under resourced
ancient, medieval cities and world heritages reg- languages (Besacier et al., 2006; Gauthier et al.,
istered by UNESCO as tourist attraction. Since 2016).
the year 2010 until 2015, the average number of For Amharic speech recognition training and
tourist flow increase by 13.05% per year. development, 20 hours of read speech corpus pre-
According to Walta Information Center3 , cit- pared by Abate et. al (2005) were used. How-
ing Ethiopia Ministry of Culture and Tourism, ever, due to unavailability of standardized corpora
Ethiopia has secured 872 million dollars in first for speech translation in tourism domain, a text
quarter of its 2016/17 fiscal year from 223,032 corpus is acquired from resourced and technolog-
international tourists. The revenue was mostly ically supported languages particularly English.
through conference tourism, research business and Accordingly, a parallel English-Arabic text data
other activities. Majority of the tourists were from was acquired from the Basic Traveller Expres-
USA, England, Germany, France and Italy speak- sion Corpus (BTEC) 2009 which is made avail-
ing foreign languages. Beside this, tourists ex- able through International Workshop on Spoken
press their ideas using different languages, the ma- Language Translation (IWSLT) (Kessler, 2010).
jority of the tourists can speak and communicate A parallel Amharic-English corpus has been pre-
in English to exchange information about tourist pared by translating the English BTEC data using
attractions. a bilingual speaker. This data is used for the de-
Due to this, language barriers are a major prob- velopment of speech translation cascading compo-
lem for today’s global communication (Nakamura, nent such as, ASR and SMT.
2009). As a result, they look for an alternate The corpus has a total of 28,084 Amharic-
option that lets them communicate with the sur- English parallel sentences. To keep the dataset
rounding. consistent, the text corpus has been further prepro-
Thus, speech translation system is one of the cessed, such as typing errors are corrected, abbre-
best technologies used to fill the communication viations have been expanded, numbers have been
gap between the people who speak different lan- textually transcribed and concatenated words have
guages (Nakamura, 2009). This is especially been separated.
true in overcoming language barriers of today’s Amharic speech recognition is conducted using
global communication besides supporting under- words and morphemes as a language model with
resourced language. a phoneme-based acoustic model. Similarly word
However, under-resourced languages such as and morpheme have been used as a translation unit
Amharic, suffer from having a digital text and for Amharic in Amharic-English machine trans-
speech corpus to support speech translation. So, lation. Morpheme-based segmentation of train-
after collecting text and speech corpora, moving ing, development, testing obtained by segment-
1
ing word into sub-word unit using corpus-based,
http://www.investethiopia.gov.et/
images/pdf/Investment_Brochure_to_ language independent and unsupervised segmen-
Ethiopia.pdf tation for using morfessor 2.0 (Smit et al., 2014).
2
http://data.worldbank.org/indicator/ Once the Amharic-English BTEC corpus is pre-
ST.INT.ARVL?end=2015
3
https://www.waltainfo.com/ pared, it is divided into training, tuning and test-
FeaturedArticles/detail?cid=28751 ing set with a proportion of 69.33% (19472 sen-

61
tences), 1.78%(500 sentences) and 28.88%(8112 Unit Train Dev Test
Sentence 19,472 500 8,172
sentences), respectively. Word Token 107,049 2,795 37,288
Then, the 8112 (28.38%) test set sentences Amharic
Type 18,650 1,470 4,168

are recorded under a normal office environment Sentence 19,472 500 8,172
Morpheme Token 145,419 3,828 50,906
from eight (4 Male and 4 Female) native Amharic Type 15,679 1,621 4,035
speakers using LIG-Aikuma, a smartphone based Sentence 19,472 500 8,172
English Word Token 157,550 4,024 55,062
application tool (Blachon et al., 2016). Type 10,544 1,227 3,775
Accordingly, a total of 7.43 hours read speech
corpus ranging from 1,020 ms to 14,633 ms with Table 3: Distribution of Amharic-English SMT
an average speech time of 3,297 ms has been col- data.
lected from the tourism domain.
Moreover, as suggested by Melese et al., (2016), quences have been extracted after expanding num-
morphologically rich and under-resourced lan- bers and abbreviation.
guage like Amharic provides a better recognition
accuracy using morpheme based language model 5 System Architecture
with phoneme based acoustic model.
Similarly, language model data for Amharic As discussed in Section 1, the state-of-the-art of
speech recognition has been collected from differ- speech translation suggest to apply through the
ent sources. A text corpus collected for Google integration of cascading components to translate
project (Tachbelie and Abate, 2015) have been speech from source language (Amharic) to a tar-
used in addition to BTEC SMT training data ex- get language (English).
cluding the test data. Table 2 presents the train- As part of the cascading components, the output
ing, development and language model data used of a speech recognizer contains more and presents
for Amharic speech recognition. a variety of errors. These errors further propagates
to the succeeding component of speech translation
Language Model
Train Test which results in low performance.
Word Morpheme
Sentence 10,875 8,112 261,620 261,620 Hence, in this study we propose an Amharic
Token 145,404 50,906 4,223,835 5,773,282
Type 24,653 4,035 328,615 141,851
ASR post-editing module that can detect an er-
ror, identify possible suggestion and finally correct
Table 2: Distribution of Amharic data for ASR. based on the proposal. The correction is made us-
ing n-gram data store using minimum edit disatnce
Like speech recognition, a total of 42,134 sen- and perplexity before the error heads to statistical
tences (374,153 token of 8,678 type) English lan- machine translation.
guage model data have been used for Amharic- Figure 1 presents Amharic-English speech-to-
English machine translation. The data is collected speech translation (S2ST) architecture with and
from the same BTEC corpus excluding test data. without considering ASR post-edit.
Consequently, corpus based and language in-
The post-edit process mainly consists of three
dependent segmentation have been applied on a
different phases; error detection, correction pro-
training, development and test set of Amharic
posal and finally suggest correction as depicted in
SMT data. Morfessor is used to segment words
Figure 2.
to a sub word units. Table 3 presents summary
The first phase of post editing is to detect the
of the corpus used for Amharic-English machine
error from ASR recognition output. Basically, to
translation using word and morpheme as a unit.
detect an error, recognized morpheme units are
Likewise, the post-edit is conducted using a cor-
concatenated to form a word and its existence is
pus based n-gram approach. Accordingly, a cor-
checked in unigram Amharic dictionary.
pus containing 681,910 sentences (11,514,557 to-
Thus, a morpheme-based speech recognition
kens) of 582,150 type data crawled from web in-
output “Î+ -s¶³ …¡ -°È¶Û °sã €Ôr+ -݆∫
cluding news and magazine.
”4 concatenated to form a phrase “Îs¶³ …¡ -
Then, the data is further cleaned, preprocessed
°È¶Û °sã €Ôr݆∫ ”.
and normalized. From this data, a total of
5,057,112 bigram, 8,341,966 trigram, 9,276,600 4
“+” refers to morphemes followed by other morpheme
quadrigram and 9,242,670 pentagram word se- while “-” refer to leading morpheme is there.

62
Figure 1: Amharic-English speech-to-speech
translation architecture (a) without post-edit (b)
with post-edit

If the word is not in the unigram Amharic dic-


tionary, then the “word” is considered as an error
and marked as error(“*”) then it is concatenated
to the remaining words. Accordingly, each to-
ken checked in unigram dictionary and the word
“-°È¶Û” is not in dictionary which is marked as
an error.
If the error is detected during the first phase,
then the correction proposal phase takes the sen-
tence with error mark and creates (w-n+1) n-grams
after adding start “<s>” and end “</s>” symbol,
where w is number of token in sentence and n
specifies n-grams. Otherwise, the sentence is con-
sidered as correct. Figure 2: Amharic ASR post-edit algorithm
Consequently, three pantagram word sequences
are generated from the speech recognition of
“<s> Îs¶³ …¡ -°È¶Û °sã €Ôr݆∫ </s> the error detected and suggestion selected. In this
” sentence. These are; phase, the sum of maximum edit distance has been
set experimentally to 16. The maximum edit dis-
1. <s> Îs¶³ …¡ * °sã
tance 16 was selected to provide at least one sug-
2. Îs¶³ …¡ * °sã €Ôr݆∫ gestion per sentence and minimize the computa-
tion of perplexity. Table 4 depicts a sample of pos-
3. …¡ * °sã €Ôr݆∫ </s> sible correction proposal for a sentence “Îs¶³
Subsequently, we select the n-grams with error …¡ -°È¶Û °sã €Ôr݆∫”.
marks and search in n-gram data store to select Finally, the suggestion is made primarly using
possible candidates for correction after removing minimum edit distance then by calculating the per-
the error mark. If there is no candidate in n-gram, plexity. The minimal edit distance is computed
then go for (n-1)-gram order until bigram. between the word “-°È¶Û” and the underlined n-
Once the candidate identified, the suggestion is gram based possible suggestion from a sentence of
made taking the minimum edit distance between Table 4.

63
Possible suggestion list Distance 6 Experimental results
Îs¶³ …¡ °sã €Ôr݆∫ b†Ål 5
Îs¶³ …¡ bÎ °sã €Ôr݆∫ 5 Speech translation experiments are conducted
Îs¶³ …¡ €nÔ≈y °sã €Ôr݆∫ 5 through cascading components of speech transla-
Îs¶³ …¡ °sã €Ôr݆∫ y‰‡ 5 tion as discussed in Section 1. In speech recog-
Îs¶³ …¡ °sã €Ôr݆∫ †àÚg³ 5 nition experiments, Kaldi (Povey et al., 2011),
Îs¶³ …¡ bÒ °sã €Ôr݆∫ 5 SRILM (Stolcke et al., 2002) and Morfessor 2.0
Îs¶³ …¡ °sã €Ôr݆∫ býl 5 (Smit et al., 2014) have been used for Amharic
Îs¶³ …¡ °sã €Ôr݆∫ y‰‡m 5 speech recognition, language modeling and unsu-
≈n{wm Îs¶³ …¡ °sã €Ôr݆∫ 5 pervised segmentation, respectively.
Îs¶³ …¡ Î√Xµ¤t °sã €Ôr݆∫ 6 Morfessor based segmentation has been applied
Îs¶³ …¡ €nÔ√…n °sã €Ôr݆∫ 6 to segment training, testing and language model
Îs¶³ …¡ €nÔ√°¿ °sã €Ôr݆∫ 6 data for Amharic. In addition to this, Moses and
Îs¶³ …¡ €nÔԒ˜ °sã €Ôr݆∫ 6 MGIZA++ for implementing a phrase based sta-
Îs¶³ …¡ €nÔ√³ °sã €Ôr݆∫ 6 tistical machine translation and Python is used for
Îs¶³ …¡ €nÔ√Œ³ °sã €Ôr݆∫ 6 implementing the post-edit algorithm and to inte-
Îs¶³ …¡ €nÔ√°³ °sã €Ôr݆∫ 6 grate ASR and SMT under the Linux platform.
The entire ASR experiment is conducted using a
Table 4: Sample n-gram based suggestion for a morpheme-based language model with phoneme-
sentence “Îs¶³ …¡ -°È¶Û °sã €Ôr݆∫”. based acoustic model. Accordingly, the exper-
imental result is computed using NIST Scoring
Toolkit (SCTK)5 and presented in terms of word
If the edit distance is the same as a different sug- recognition accuracy (WRA6 ) and morph recogni-
gestion, then the decision is made by selecting the tion accuracy (MRA).
one that result lower perplexity. Thus, the Amharic speech recognition exper-
iment shows a 76.4% for the morpheme-based.
Accordingly, the phrase “Îs¶³ …¡ °sã
Then, after the concatination of morphemes to
€Ôr݆∫ b†Ål” selected due to better perplex-
words, a 77.4% word-based recognition accuracy
ity of language model.
have been achieved.
Similarly, Table 5 presents sample Amharic Consequently, Amharic-English SMT experi-
speech recognition output along with the corrected ment have been conducted with and without con-
sentence using our post-edit technique. sidering Amharic ASR result.
The first two experiments were conducted with-
No Type Sentence recognized and corrected out considering ASR. Accordingly, a word-word
1
Raw €•sn ½m— Ûb}t …ÚË+ ÅÝ y»’l system resulted in a BLEU score of 14.72 while
Edited €•sn ½m— Ûb}t ¤Ë ÅÝ y»’l morpheme-word brings about 11.24 BLEU. Com-
Raw €§kÇn °]¿√+ µ“t
2 bining Amharic ASR with Amharic-English SMT
Edited €§kÇn °]¿√wn µ“t
3
Raw €§Án+ Š‰ å³ ≈gxt …m‰†∫ as cascading component resulted in a 6.29 BLEU
Edited €§kÇn Š‰ å³ ≈gxt …m‰†∫ score through 76.4% of recognition accuracy
Raw €§kÇn [n³Çn ykà±+
4 for Amharic morpheme and English word based
Edited €§kÇn [n³Çn ykà±t
5
Raw Îs¶³ …¡ +gËt …јb½ ¶w translation.
Edited Îs¶³ …¡ †ŒgËt …јb½ ¶w
Similarly, Amharic word with English word
Raw yh ÎÛÍ ‰y hŒm -Ñݵm ym‰l
6 based translation shows a 12.83 BLEU score using
Edited yh ÎÛÍ ‰y hŒm ˆÑݵm ym‰l
7
Raw -h §¥r Ùªr ¤snt ˜ƒt yÔr›l 77.4% recognition accuracy without using ASR
Edited yh §¥r Ùªr ¤snt ˜ƒt yÔr›l
post-edit. The result achieved by ASR can further
Raw [n³Çn ykà± -m
8 be improved by applying post-edit on Amharic
Edited [n³Çn ykà±
9
Raw €§kÇn °]¿√wn +µ“t speech recognition.
Edited €§kÇn °]¿√wn yµ“t
5
evaluation toolkit available at http://my.fit.
Table 5: Sample corrected sentences of Amharic edu/˜vkepuska/ece5527/sctk-2.3-rc1/doc/
speech recognizer. sctk.htm
6
WRA is obtained by concatenating the result obtained by
MRA

64
Table 6 depicts Amaharic-English speech trans- References
lation before and after Amharic ASR post-edit. Solomon Teferra Abate, Wolfgang Menzel, Bairu
Tafila, et al. 2005. An amharic speech corpus for
Before After large vocabulary continuous speech recognition. In
Morpheme Word Word INTERSPEECH, pages 1601–1604.
ASR (%) 76.4 77.4 78.5
SMT (in BLEU) 6.29 12.83 13.08
Tadesse Anberbir and Tomio Takara. 2009. Develop-
ment of an amharic text-to-speech system using cep-
Table 6: Amharic-English Speech Translation re-
stral method. In Proceedings of the First Workshop
sult. on Language Technologies for African Languages,
pages 46–52. Association for Computational Lin-
Accordingly, the morpheme based recognition guistics.
followed by post-edit resulted in a BLEU score
Laurent Besacier, V-B Le, Christian Boitet, and Vin-
of 13.08 at 78.5% of word recognition accuracy cent Berment. 2006. Asr and translation for under-
speech translation. resourced languages. In Acoustics, Speech and Sig-
The result obtained from the n-gram post-edit nal Processing, 2006. ICASSP 2006 Proceedings.
experiment shows an absolute advance by 1.42% 2006 IEEE International Conference on, volume 5,
pages V–V. IEEE.
from word recognition accuracy of 77.4% ob-
tained by concatenating a 76.4% morpheme based David Blachon, Elodie Gauthier, Laurent Besacier,
recognition. Similarly, BLEU score evaluation ad- Guy-Noël Kouarata, Martine Adda-Decker, and An-
vanced by 1.95% (from 12.83 to 13.08). nie Rialland. 2016. Parallel speech collection
for under-resourced language studies using the lig-
aikuma mobile device app. Procedia Computer Sci-
7 Conclussion and Future work ence, 81:61–66.

Speech translation research has been studied for Yuqing Gao, Liang Gu, Bowen Zhou, Ruhi Sarikaya,
more than a decade for resourced and technolog- Mohamed Afify, Hong-Kwang Kuo, Wei-zhong
Zhu, Yonggang Deng, Charles Prosser, Wei Zhang,
ical supported languages like English, European
et al. 2006. Ibm mastor system: Multilingual auto-
and Asian. On the contrary, attempts for under re- matic speech-to-speech translator. In Proceedings of
sourced languages, not yet started, particularly for the Workshop on Medical Speech Translation, pages
Amharic. This paper presents the first Amharic 53–56. Association for Computational Linguistics.
speech to English text translation using the cas-
Elodie Gauthier, Laurent Besacier, Sylvie Voisin,
cading components of speech translation. Michael Melese, and Uriel Pascal Elingui. 2016.
For ASR, a 20 hours of training and 7.43 Collecting resources in sub-saharan african lan-
hours of testing speech were used consuming a guages for automatic speech recognition: a case
study of wolof. In Proceedings of the Tenth Interna-
morpheme-based language model with a phone-
tional Conference on Language Resources and Eval-
mic acoustic model. Whereas for SMT, 19,472 uation LREC 2016, Portorož, Slovenia, May 23-28,
sentence for training and 8112 sentences for test- 2016.
ing used. Similarly to apply ASR post-edit us-
ing n-gram approach, a corpus consisting 681,910 Masaaki Honda. 2003. Human speech production
mechanisms. NTT Technical Review, 1(2):24–29.
sentences were used.
Accordingly, speech translation through ASR Daniel Jurafsky and James H Martin. 2008. Speech
post-editing resulted a 0.25 (1.95%) BLEU score and language processing (prentice hall series in ar-
tificial intelligence). Prentice Hall.
enhancement from the word-based SMT. The en-
hancement seemed as a result of improving ASR Fondazione Bruno Kessler. 2010. A generic weaver
by 1.42% using a corpus based n-gram post-edit. for supporting product lines. In International Work-
The current study shows the possibility of en- shop on Spoken Language Translation, pages 11–18.
ACM.
hancing the performance of speech translation by
controlling speech recognition error propagation Akira Kurematsu. 1996. Automatic Speech Transla-
using post-editing algorithm. tion, volume 28. CRC Press.
Further works need to be done to apply post-
Michael Melese, Laurent Besacier, and Million Meshe-
editing both at the recognition and the translation sha. 2016. Amharic speech recognition for speech
stages of speech translation. translation. In Atelier Traitement Automatique des
Langues Africaines (TALAF). JEP-TALN 2016.

65
Satoshi Nakamura. 2009. Overcoming the language
barrier with speech translation technology. Science
& Technology Trends-Quarterly Review, (31).

Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas


Burget, Ondrej Glembek, Nagendra Goel, Mirko
Hannemann, Petr Motlicek, Yanmin Qian, Petr
Schwarz, et al. 2011. The kaldi speech recog-
nition toolkit. In IEEE 2011 workshop on auto-
matic speech recognition and understanding, EPFL-
CONF-192584. IEEE Signal Processing Society.

Gary F. Simons and Charles D. Fennig. 2017. Ethno-


logue: Languages of the World. SIL, Dallas, Texas.
Peter Smit, Sami Virpioja, Stig-Arne Gronroos, and
Mikko Kurimo. 2014. Morfessor 2.0: Toolkit for
statistical morphological segmentation. In 14th
Conference of the European Chapter of the Associ-
ation for Computational Linguistics, pages 21–24.
European Chapter of the Association for Computa-
tional Linguistics, EACL.
Andreas Stolcke et al. 2002. Srilm-an extensible lan-
guage modeling toolkit. Interspeech.

Martha Yifiru Tachbelie and Solomon Teferra Abate.


2015. Effect of language resources on automatic
speech recognition for amharic. In AFRICON, 2015,
pages 1–5. IEEE.

Mulu Gebreegziabher Teshome, Laurent Besacier,


Girma Taye, and Dereje Teferi. 2015. Phoneme-
based english-amharic statistical machine transla-
tion. In AFRICON, 2015, pages 1–5. IEEE.

Irene Thompson. 2016. About world language. Ac-


cessed: 2017-05-26.
UNWTO. 2016. World tourism organization annual re-
port 2015. Technical report, United Nation, Madrid,
Spain.

Baye Yimam. 2000. Yeamarigna sewasew (Amharic


version). EMPDA, Addis Ababa, Ethiopia.

66

View publication stats

Você também pode gostar