Dees Educ Tech Lit Review

1
The Use of Accent Reduction Software to Improve

Prosody in Non-native Speakers of English
Matchett

The Use of Accent Reduction Software to Improve Prosody in Non-native Speakers of English

F. Dee Matchett

TESL 533 Educational Technology
Carson Newman University
Dr. E. Cody-Mitchell

November 19, 2013

2
Matchett
Abstract
Latin served as the international language for hundreds of years and still has a major
influence in medical vocabulary and legalese. However, English has replaced Latin as the global
language and now serves as the primary language of economic trade. English presses its
influence upon science and medicine as well; hence, professional journals are primarily
published in English. It is also the language of the Internet and is, therefore, exercising great
influence throughout the World Wide Web.
The smaller our world becomes in terms of communication and the blending of cultures, the
more non-natives are finding the need to speak English in a readily comprehensible manner.
This is a literature review to determine the efficacy of using accent reduction software to
improve the prosody of non-native speakers of English.
The mental process involved in the production of segmentals and suprasegmentals will be
explained.
The relationship between the prosody of the speaker and the perception of the listener
will be noted and the implications of how intelligibility affects them will be identified.
Employment discrimination, based upon foreign accent and intelligibility, will be
discussed in relationship to the possible benefits of accent reduction.
The various technologies employed in accent reduction and research regarding their
efficacy will be examined.
Finally, a comparison of currently available accent reduction software will be offered.

3
Matchett
Following are acronyms and keyword definitions as used in this review:
CALL Computer Assisted Language Learning
CAPT- Computer Assisted Pronunciation Training
ELL English Language Learner
ESL English as a Second Language generally refers to the teaching of English when English
is the primary language of instruction (location is usually an English speaking country)
EFL English as a Foreign Language - refers to the teaching of English when English is not the
primary language of instruction (location is usually a non-English speaking country)
L1 native language
L2 second language, in this case English
NSP- non-standard pronunciation
prosody the rhythm, stress and intonation of speech
segmentals- phonological units of language, i.e. vowels and consonant
suprasegmentals- elements of speech that extend over its segments such as changes in pitch,
loudness, duration, tone, and prominence.
Spectrogram visual representation of the spectrum of frequencies in a sound that can indicate
phoneme patterns of speech
Waveform - visual representation of speech that can indicate prosody, rate, intensity, and
loudness by tracking changes in air pressure over time as a sound is produced

4
Matchett
Figure 1 Brocas Area and Wernikes Area
Retrieved from http://thebrain.mcgill.ca, October 16, 2013
Research regarding the mental processes
involved in the production of language has been
ongoing since 1861 when the surgeon, Paul
Broca identified the area of speech formation by
examining a deceased mans brain. The man,
although unable to speak, could hear and
understand the speech of others. Broca rightly
assumed that a lesion found in the brain had been responsible for the mans inability to articulate.
The neurologist Carl Wernicke added to the knowledge of language production when he
discovered another region of the brain that processed speech. People with lesions in this area,
could articulate, but their speech was largely incomprehensible. Both areas are located in the left
hemisphere of the brain. (Bruno, 2002). (See Figure 1)
It was later discovered that numerous nerve fibers form a path of transmission between
Brocas area and Wernickes area. This connection allows Wernickes area to analyze written
words or auditory language, form contextual understandings, and transfer that information to
Brocas area. Brocas area plans the pronunciation of words and sends that information to the
motor cortex that commands the muscle movements required for pronunciation. This is an
oversimplification since all these processes function simultaneously with a variety of other input
such as semantic memory. Semantic memory stores definitions and the articulation pattern
necessary to pronounce words, including tongue placement and mouth position. (Dubuch, 2002)
This explanation does, however, provide us with a basic understanding of how the segmental
aspects of speech are produced.
5
Matchett
Figure 2: Lateralization of language processing
Retrieved from
http://www.frontiersin.org/Journal/10.3389/fnene.2010.00013/full,
October 21, 2013
The production of suprasegmentals that give a language its stress patterns, rhythm and
intonation require assistance from a location in
the right brain. Its interactive relationship with
Brocas area and Wernickes area has been
difficult to adequately describe, but a dual
pathway model proposed by researchers,
Angela Friederici and Kai Alter has gained
acclaim. (Friederici & Alter, 2004) (Friederici,
2011)
The model has been further substantiated by the findings of Yuri Saito. (Saito, Fukuhara,
Aoyama, & Toshima, 2009) The complexity of
this interaction is beyond the scope of this paper,
but basically the dual pathway model describes the synergistic production of the segmental and
suprasegmental aspects of speech. (See Figure 2)
This additional right brain processing enables us to use speech for emotional expression.
Raising voice pitch when surprised or lowering pitch when angry are both functions of right
brain language prosody. We use the left brain, where the segmentals of speech are centered, for
analysis and the right side of the brain, where suprasegmentals are centered, for the artistic
expression of music, art, and dance. This correlates with English being a stress-timed language,
quite musical in nature, with intricate patterns of rhythm, stress, and intonation that can be
difficult for ELLs to master. (Romer-Trillo, 2012) Engaging the right brain should help learners
assimilate the lyrical nature of the language. In this examination of the literature regarding the
intelligibility of non-native speakers of English, we will find indications that when these lyrical
6
Matchett
features are lacking, comprehensibility is negatively affected. Studies indicate that music and
speech are decoded by shared brain processes and the brain responds to the same psychoacoustic
cues, namely: loudness, tempo and speech rate, melodic and prosodic contour, spectral centroid
and sharpness. (Meng, 2009) (Courinho, 2013) These are all areas that can be addressed
through accent reduction software and therefore affect the right brain processing of
suprasegmentals.
According to The New York View, a publication of Columbia Universitys Graduate
School of Journalism, there is a marked increase in the number of non-native speakers seeking
the services of accent modification coaches. (Cheng, 2012) Statistics from training services
confirm this report. Sankin Speech Improvement, LLC, recently reported a 35% increase in
clients seeking accent reduction services. (Sankin, 2013)
A recent report from the US Census Bureau indicates that 21% of the American
population speaks a language other than English at home. (Census, 2011) This figure is only 1%
higher than 2009, but still trending upward as it has been for several decades. The American
Community Survey report summarizes the findings as follows:
This report provides illustrative evidence of the continuing and growing role of non-
English languages as part of the national fabric. Fueled by both long-term historic
immigration patterns and more recent ones, the language diversity of the country has
increased over the past few decades. As the nation continues to be a destination for
people from other lands, this pattern of language diversity will also likely continue.
(Ryan, 2013)
7
Matchett
Figure 3 US Census Bureau, American Community Survey retrieved from http://www.census.gov/prod/2013pubs/acs-22.pdf ,
October 31, 2013

From the 1995-6 school year to the 2005-2006 school year, the state of Tennessee saw a
296% increase in ELL students, thus ranking Tennessee 6
th
in the nation for increase in ELL
student enrollment. (Ariza, Morales-Jones, Yahyan, & Zainuddin, 2010) It would be a mistake to
inadequately serve the language deficits of our increasing population of non-native speakers.
As this population filters into the educational system and the work force, there is a need
to address speech proficiency. The increase in the demand for accent reduction services stated
previously shows that second language speakers are aware of the need to acquire that
proficiency. However, they may not be able to determine what specific factors in their speech
are creating difficulty.(Derwing, 2003) The two primary factors needed for speech to be
8
Matchett
comprehensible are pronunciation and prosody. (Baker, 2007) This corresponds to the
production of segmentals and suprasegmentals described earlier.
In a study of Pronunciation and Intelligibility, (Levis & Levelle, 2011) speech recordings
of native Spanish and Korean speakers of English were evaluated by pronunciation experts to
determine what most impacted intelligibility. Although insufficient pronunciation skills did
result in a loss of comprehension, panelists felt that correcting pronunciation would not improve
intelligibility until the larger problems of rhythm, tempo and word stress (misplaced or lacking)
were addressed. These are all prosodic features of language.
Monroe and Derwing (Munro & Derwing, 2000) studied the effect of accents upon native
English listeners (random people, not pronunciation experts) to determine how they perceived
differences in accented speech from their own speech, the difficulty they experienced in
understanding accented speech, and how much accented speech the hearer actually understood.
The study showed that pronunciation was the least relevant cause of comprehensibility because
listeners quickly adjusted to differences in English pronunciation, This finding demonstrates
empirically that the presence of a strong foreign accent does not necessarily result in reduced
intelligibility or comprehensibility (p.19). A lack of prosody, on the other hand is troublesome
to the listener since, two foreign-accented utterances may both be fully understood (and
therefore be perfectly intelligible), but one may require more processing time than another
(p.19). This slower processing time can be frustrating to the listener and as a result be
interpreted as lack of fluency.
In the study, listeners rated this type of speech as having lower comprehensibility, even
though they could transcribe the speaker perfectly. Generally, this perception is due to the
9
Matchett
missing elements of suprasegmentals that give the English language its characteristic qualities of
rhythm, stress and intonation. Without these qualities, speech becomes laborious to decipher.
This is further supported by the research of Shiri Lev-Ari and Boaz Keysar at the University of
Chicago. They introduce another effect of accentedness: credibility. (Lev-Ari & Keysar, 2010)
They demonstrated that the difficulty listeners have processing accented speech gives them the
perception that accented speakers are less truthful. Truthful statements seemed less credible to
them, when they were difficult to understand. This could place accented speakers in awkward
positions when compared to their native speaking counterparts and result in discrimination.
A study of the relationship between prosodic training and intelligibility among German
speakers of English parallels the evaluation that in the ear of the hearer prosody greatly impacts
meaning and recommends "emphasizing that the goal of pronunciation training is to increase
intelligibility and improve the effectiveness of communication, [thereby] one opens the door to
including training beyond word-level pronunciation to include the prosodic level. (Jackson &
OBrien, 2011)
An interesting study was done on how prosody affects word meaning. Participants
listened to phrases that contained novel words pronounced with matched or mismatched prosodic
elements. In other words, pitch and stress were either wrongly spoken or rightly applied. Word
meaning was correctly inferred much more frequently when prosody was appropriately matched.
These findings suggest that speech contains reliable prosodic markers to word meaning and that
listeners use these prosodic cues to differentiate meanings. (Nygaard, Herold, & Namy, 2009)
Although this study was not focused on second language learning, the results would suggest that
an increase in the ability of NSP (non-standard pronunciation) speakers to use suprasegmentals
in their speech would lead to an increase in comprehensibility.
10
Matchett
Anglo-
America
64%
Hispanic
16%
African-
American
12%
Asian
5%
Unspecifie
d
3%
Figure 4 Population Distributions in the
Workforce
Source: Bureau of Labor Statistics
Anglo-
American
96%
Hispanic
1.2%
African-
American
.8%
Asian
1.8%
Figure 5
Distribution of Minorities
in Corporate America
In a brief published by the Center for Applied Linguistics that cites the findings of
Monroe and Derwing, OBrien and Jackson, and Levis (among others), the recommendation is
made that adult ELLs be made aware of the contribution of stress, intonation and rhythm to
comprehensibility and that teachers be encouraged to improve student pronunciation by not
focusing on perfect pronunciation alone. The brief suggested that Computer Pronunciation
Training (CAPT) could be utilized to address issues of prosody. (Schaetzel, 2009) These
recommendations are steps in the right direction because, Unfortunately, suprasegmental
features such as stress and intonation are often treated by ESL teachers as peripheral frills and
not as central to the conveying of meaning. (Avery & Ehrlich, 2006)
Accentedness not only causes difficulty for
the hearer, the speaker can be greatly affected
as well, especially in their professional
development. It is interesting to note that the
percentage of Hispanic (16%) and Asian (5%)
populations in the workforce is equal to 21%.
This is the same figure stated earlier as the
percentage of non-native English speakers in
the US, indicating a good representation in the
workforce. However, representation among
corporate America is unequal: Asian (1.8
percent); Latino (1.2 percent). (Burns, Barton,
& Kerby, 2012)
11
Matchett
In the Journal of Cultural Diversity, an article debating the effect of Foreign-
Accentedness and upward mobility in the workplace, L2 speakers of English are referred to as
the invisible minority, stating that they are underrepresented and marginalized.
(Akomolafe, 2013) A number of research studies substantiate these descriptions. In one study 65
job recruiters were asked to rate their perception of the speech of applicants. Those with non-
standard grammar and pronunciation were judged more negatively in terms of employability than
those who spoke Standard English. (Atkins, 2000) In a study using a matched-guise technique
that was developed to reveal peoples attitudes, accented speakers were favored for less desirable
jobs and speakers of Standard English were favored for more prestigious positions. The accented
speakers were viewed as less efficient and their communication skills as less suitable for those
jobs.(Cargyle, 2000) These perceptions may relegate non-native speakers to lower levels of
economic success, even though their skills and abilities qualify them for higher level positions.
Spanish accented applicants in another matched-guise study were viewed as having a lower
chance for promotion to management level and were perceived as less competent, although their
qualifications were on par with native speakers of English. (Nguyen, 2010) (Jayesh Shah, Raouf
Seifeldin, 2010)
Beyond discrimination are other issues created by communication problems in the work
place and in education. Even when non-native speakers are hired, they face difficulties on the
job. The unintelligibility of their speech can negatively affect job performance and interaction
with others. These types of problems are pertinent in the medical profession where the number of
foreign born doctors and nurses has increased. International medical graduates (IMGs) now
make up 26% of the US physicians. (Jayesh Shah, Raouf Seifeldin, 2010) Many work in rural
low income areas avoided by native born graduates. They face communication issues because of
12
Matchett
their accented speech that can lead to misunderstandings that put patients at risk. (Khurana,
2013) Efforts to mitigate communication issues through accent modification courses are proving
effective. An analysis of pre- and post-course performance data indicated the efficacy of the
training. (Khurana, 2013) This resulted in the following recommendation:
Thus, communication training should be offered in tiers at several different levels in
colleges, universities and healthcare institutions. It should be offered at subsidized rates
to the students and faculty, with the bulk of cost absorbed by the employer that will
benefit from increased employee productivity and patient or student satisfaction.
Organizations that do not have on-site training capability may provide it through online
training programs. (Khurana, 2013)
The same is holding true for the nursing profession. They are experiencing difficulties
with foreign-born nursing students whose attrition rates are high, primarily due to accent related
communication problems. In a study of 13 students with NSP, the faculty at the Long Island
University School of Nursing experienced difficulty comprehending their speech patterns. There
was concern that in a clinical setting these students would also encounter communication issues
with patients, family members and staff. A speech pathologist was hired to provide accent
modification training. This training proved advantageous to the nursing students in improving
their intelligibility. 12 of the 13 students graduated. The school felt they had effectively
addressed what could have been potentially harmful patient safety issues and were able to greatly
improve their retention of NSP nursing students. (Carr, 2012)
Another study of 15 nursing students in the invisible minority reported that the students
felt lonely and isolated, were disappointed in the absence of acknowledgment of individuality
13
Matchett
from teachers, peers lack of understanding and knowledge about cultural differences, lack of
support from teachers,(Gardner, 2005) and the discrimination they faced.
These factors have a negative effect on self-image that can hinder social integration. So,
in addition to employment and educational barriers, socio-cultural barriers can be difficult for
NSP speakers to mount. The Institute for Research on Public Policy reported on the social
integration of foreign immigrants to Canada. (Derwing & Waugh, 2012) Deficiency in pragmatic
language skills were found to isolate them from integrating more fully into Canadian society.
Since immersion into the society of the second language facilitates fluency, acquiring prosody in
English can be hindered by this lack of socialization. It limits the time that NSP speakers spend
hearing and imitating native speakers. Unbroken, this can lead to a cycle of poor language skills,
social exclusion, isolation, poor self-image, and socio-economic suppression. Computer Assisted
Pronunciation Training (CAPT) is one option for breaking that cycle.
Computer Assisted Language Learning (CALL) refers to the use of computer technology
and software to teach all aspects of language: reading, writing, listening, and speaking. From
CALL has grown a subset of technologies to improve speech: Computer Assisted Pronunciation
Training (CAPT). Instead of taped voice recordings on cassette or CD to imitate in a language
lab, instruction has trended towards CAPT software programs. Much of the technology for
CAPT has been borrowed from speech therapy research and augmented to facilitate language
learners. Automatic Speech Recognition (ASR) technology has been adapted to map speech
patterns for pronunciation comparison. (Qooco, 2009) The learners pronunciation of a given text
is analyzed against an accepted speech model and rated for accuracy. Some CAPT programs
combine ASR with speech waveforms. Prosody, rate, speech and loudness can be read from a
waveform. (McGregor, 2002) Actual phonemes cannot be read within a waveform unless
14
Matchett
frequency components are analyzed and displayed as a spectrogram. (Carmell, 2013) Reading a
spectrogram accurately requires training; for this reason, some linguists have argued against it,
citing that they are presented because of their flashy look, to impress the users. (Neri,
Cucchiarini, Strik, & Boves, 2009) However, it does give the learner a general visual

comparison. The waveform of the model voice can be examined for conformity to the learners
voice. Since the addition of visual display has been shown to increase error recognition,
waveforms can be a good learning aid, as noted in a study of the Kay-Pentax Computerized
Speech Laboratory. Learners were able to use visual feedback from spectrograms to recognize
gaps in their language production that they had not noticed with imitation exercise alone.
(Pearson, 2011) A combination of aural and visual modalities produces an increased
effectiveness in speech production. (Dominic Massaro, Micahel Cohen, Antoinette Gesi, 1993)
This correlates to the previously mentioned stimulation of the right brain during the production
of suprasegmentals and with the well-grounded pedagogical strategy of presenting instruction
using a variety of modalities. In addition to aural and visual stimulation, by its very nature CAPT
Figure 6 Waveform and spectrogram of the same word compute Retrieved from
http://www.cslu.ogi.edu/tutordemos/SpectrogramReading/waveform.html
15
Matchett
also engages the learner kinesthetically. CAPT software offering both aural and visual feedback
is referred to as a Dual-Mode program.
The limitation of ASR programs is that matching voice patterns alone provides
insufficient feedback to the learner regarding how to improve their accuracy. (Hismanoglu,
2011) Researchers are still at work to remedy this, but much has already been improved upon in
the software available today. The trend is for learners to move from a passive role of imitation to
producing authentic speech. This is closer to the natural way that language is acquired. (Eskenazi
& Hansma, 1998) In 1995, Auralog introduced Talk To Me software, the first program able to
process complete sentences. (LanguageOnLine, 2001) It mimicked authentic speech by using
narrow parameters of response. Responses were elicited from the learner that fell within the
parameters that the system recognized. In this way, learners felt as if they were creating original
utterances. (Eskenazi & Hansma, 1998) Some linguists criticized this level of feedback, saying,
Using artificially generated sentences does not necessarily put learners on the path to
communicative ability with natural speech.(Godwin-Jones, 2009) This problem is being
addressed based upon speech-interactive micro world technology in which the learner enters a
virtual world and authors new scenarios. (Holland, Kaplan, & Sabol, 1999)
One application of virtual world language learning is seen in a program entitled Virtual
Pre-K (VPK) that interacts with parents and students. (Cummins, 2007) It boasts a variety of
bilingual activities (English/Spanish) on a website parents can easily navigate, along with hands-
on materials such as flashcards and CDs used in conjunction with the website. The program has
been a great success, They are coming back to school and sharing their enthusiasm, and theyre
doing it in both languages The parents are so eager for knowledge they can use to help their
children. (p.262)
16
Matchett
The next step was a result of the FLUENCY project at Carnegie Mellon University
developed by Maxine Eskanazi using a SPHINX II ARS system. It allowed, automatic
alignment of the predicted text with the incoming speech signal (Eskenazi & Hansma, 1998) for
pronunciation error analysis and correction. The system showed improvement in speech prosody
error recognition and an improved user interface. It also allowed users to select a golden voice
to imitate. The idea was that learners could choose a voice closer to their own as a model. Males
could select a voice with a lower F0 (pitch) and females a voice with a higher F0.
In 2002, Katherine Probst, Yan Ke, and Maxine Eskanazi built upon the FLUENCY
project to further develop the concept of a golden voice. They formed two hypotheses: First,
that imitating a native speaker with voice features similar to the learner would increase the
learners speech development. Secondly, if the learner could choose the voice they wanted to
imitate that was most intelligible to them, this would also lead to better speech development. In
addition to gender selection, with this program the user could select voices with comparable
pitch (F0) and rate of articulation, which is the speed at which articulators move (the lips,
tongue, and other muscles). (Probst, Ke, & Eskenazi, 2002)
The second hypothesis proved to be erroneous. Learners made better progress when the
system matched them to the most similar voice in the database, rather than selecting their own
preference. This shows that the voice that appeals to a learner may not be the best choice for
modeling. (Probst et al., 2002) Surprisingly, however, choosing a voice of the opposite gender
showed a higher percentage of improvement than selecting the same gender. This was thought to
be related to voice intelligibility. The first hypothesis proved true. Learners modeling a voice
matched by the computer for similarity improved the most, showing a 43.3% increase in
pronunciation and prosody. A suitable F0 match proved to be the most important factor in
17
Matchett
improvement at 47.2 %. It should be noted that F0 is a suprasegmental characteristic involving
right brain speech production.
The search for the best golden voice has continued. In
2006, Kwansun Cho and John G. Harris modified the voices of
10 Korean speakers of English using time warp and waveform
overlap to morph the Korean voices with the voices of 10
American English speakers. The resulting voices were then judged for accentedness and found to
be 8.59 percent less accented than the original voices.(Cho & Harris, 2006)
In 2009, the Euronounce Project combined Pitch Line software with the AzAR tutoring
system to integrate improved segmental and suprasegmental characteristics into a database of
German speakers of Polish. (Demenko, Wagner, Cylwik, & Jokisch, 2009) The pitch contour of
the learners speech was displayed in waveform alongside a model teachers voice. Users made
graphical comparisons with the Azar articulation diagram.
The model voice resulting from the addition of Pitch Line was judged to be very
natural(p.4), however, the study concluded that, Although they are promising, further
experiments are indispensable to improve the obtained acoustic models especially for accented
syllables. (p.7) Of the 15 test subjects, 13 considered the new AzAR suitable for individual
study and 2 were willing to use the program with teacher assistance.
An exciting new concept towards the golden voice was introduced in 2007. Since the
technology now existed to modify a speakers voice, why not modify the learners own voice and
let it become the closest acoustical model possible?
Figure 7 AzAR template for
pronunciation assessment
18
Matchett
Here we propose a voice transformation technique that can be used to generate the
(arguably) ideal voice to imitate: the own voice of the learner with a native accent. Our
work extends previous research, which suggests that providing learners with prosodically
corrected versions of their utterances can be a suitable form of feedback in computer
assisted pronunciation training. Our technique provides a conversion of both prosodic and
segmental characteristics by means of a pitch-synchronous decomposition of speech into
glottal excitation and spectral envelope. We apply the technique to a corpus containing
parallel recordings of foreign-accented and native-accented utterances, and validate the
resulting accent conversions through a series of perceptual experiments. Our results
indicate that the technique can reduce foreign accentedness without significantly altering
the voice quality properties of the foreign speaker. (Felps, Bortfeld, & Gutierrez-Osuna,
2009)
A uniquely individual approach had appeared and the results indicated that after this
morphing technique was applied, the perception of accentedness in the learners voice was
greatly reduced. Still there were problems integrating the application for the purpose of
pronunciation learning. To some degree, the inevitable segmental errors of the learner were
transferred to their modified voice. Not until 2007 was a method introduced that overcame this
problem. Ruili Wang and Jingli Lu developed a system that morphed the voice features of the
learners voice with the teachers voice in a way that eliminated learner pronunciation errors,
while retaining the voice qualities of the non-native speaker. Because our voice modification is
based on a teachers voice, the resynthesized utterances can be free from segmental error.
(Wang & Lu, 2011) The dreamed of golden voice had become reality.
19
Matchett
The patented process is now part of SpeedLingua software that also
incorporates the Tomatis Method (TOMATIS-Developpement, 2009). The
learner engages in a receptive listening activity for 15 minutes prior to engaging
in language learning activities. During this pre-exercise time, the learner hears music that is
gradually filtered from the sound frequencies of their native language to those of the target
language. This tunes the ear to hear the dominant rhythm and musical intonation of the language
being practiced. SpeedLingua is the only software available that preconditions the right side of
the brain for language learning and then morphs the users own voice so that they can hear
themselves speak the language as if they were a native speaker, while performing the learning
exercises.
While the technological development of CAPT systems is intriguing, how does that
development figure into the pedagogical aspects of language instruction? Can CAPT be truly
effective and benefit both students and teachers? First lets look at research regarding the
efficacy of several CAPT educational software programs.
In 2003, a small study was conducted with a control group and an experimental
group consisting of 9 middle-aged engineers with multiple language backgrounds:
Arabic, Farsi (2), Hungarian, Polish (2), Romanian, Russian and Somalia. The project use
Aurologs Talk to Me software. Intelligibility was only improved in learners who entered the
study with strong accentedness. Those with higher level pronunciation and prosody skills
showed no significant change. Considering that the subjects only practiced with the software a
total of 12.5 hours, this is still impressive, especially when one considers that these were adult
learners and according to Lennegers Critical Period hypothesis age can hinder pronunciation
skills when acquiring a new language. (Lenneger, 1967) It could be that those with higher level
20
Matchett
skills had already reached their maximum potential for improvement before using the software or
they may have needed more practice hours to show any significant improvement.
Pronunciation Power software was used in a study of university students in Ankara, Turkey.
(Seferoglu, 2005).
This study aimed to find out whether integrating accent
reduction software in advanced English language classes at the university level would
result in improvements in students pronunciation at the segmental and suprasegmental
levels. The difference between the experimental groups pre- and posttest scores was
also found to be statistically significant. (p. 303,313)
A study of school children used PARLING software that was developed for Italian
students learning English. Through stories and games, the children learn to pronounce new
words. The aim of the study was to determine if learners improved their pronunciation at a rate
equally as effective as traditional classroom instruction. The control group and the experimental
group showed comparable improvement indicating that the practice time with the software was
at least as effective as teacher-led instruction. (Neri, Gerosa, Giuliani, & Mich, 2008)
Improvement in the application of technology to CAPT software can be seen in a recent
study with EFL student groups age 22-28 : a control group, a group using Pronunciation Power
(reviewed above) and a third group using NeoSpeech. NeoSpeech uses text-to-
speech technology that allows the user to input any text and hear it spoken by a
synthesized voice. The group using this technology scored higher on post test scores than either
the control group or the group using Pronunciation Power. (Kilickaya, 2008) The use of
authentic language clearly enhanced learning pronunciation skills.
21
Matchett
Other improvements for CAPT are still under development and look promising, such as
high variability phonetic training (HVPT) that exposes learners to multiple voices producing
target sounds instead of a single voice to improve the learner listening skills. Listening skills are
another area of second language acquisition that is difficult to learn and requires much exposure
to the new language. CAPT software with HVPT could facilitate listening skills and that is hoped
in turn, would facilitate improvement in pronunciation. Theoretically, having a more native-like
perceptual system should promote gains in pronunciation accuracy. (Thomson, 2011)
In the HVPT pilot study (Thomson, 2011), users were able to take the listening skills they
acquired and discern meaning from the voices of speakers to whom they had not been previously
exposed. This transfer of skills would be of great benefit to ELLs in real life conversations.
The CAPT program developed for this study resulted in improved intelligibility
scores not only in response to English vowel productions elicited using a voice that
had previously been heard in training, but also in response to productions elicited
using a novel voice. These results suggest that the program helped learners isolate
relevant phonetic cues to vowel identity that were then generalizable to new
speakers. (p.758)

Looking at the conclusions of all the studies referred to previously, several themes
concerning the benefits and limitations of CAPT appear. CAPT is viewed as an adjunct to the
teacher, not a replacement. CAPTs ability to provide drill practice is seen as an asset that frees
teacher time to focus on specific pronunciation problems of individual students that CAPT does
not address. As students develop less dependence on the teacher for pronunciation practice, more
classroom time can be spent on interactive communication that develops conversational skills.
22
Matchett
One teacher commented on what she liked most about CAPT
it takes the personal element out of the feedback. Instead of telling
students no and making them repeat over and over again, we were
instead able to give them a positive goal to work towards. (Pearson,
2011) Students benefited from the computers consistent feedback and endless patience. The
ability for learners to work at their own pace, select a variety of function options, and choose
activities that tailor a program to their individual needs are other assets that were frequently
mentioned. CAPTS flexibility allowed learners to make adjustments that facilitated their own
learning speed:
In our experiments, we also noticed that some of the subjects, who preferred a slow
version of speech material, tended to speed up the speech material a little or switch it
back to the normal speed, when they had caught the pronunciation features in these
utterances. This tendency reflects the fact that their objectives of second language
learning are to perceive and produce natural speech with a regular speed. (Wang & Lu,
2011)

A stress free environment was another area often commented upon. Since CAPT provides
private instruction, correction in front of peers can be avoided. This reduces stress for students
by eliminating the fear of error, which can inhibit them from taking the language risks necessary
for learning. Teachers who feel unequipped to teach pronunciation also appreciate assistance
from CAPT, so their stress-level is reduced as well. CAPT can be especially helpful in EFL
instruction to make up for the lack of exposure to a native language environment.
23
Matchett
CAPT does have limitations. These are being overcome so steadily that only the
comments from the three most recent studies examined in this review will be considered. All of
them were concerned about user friendly feedback. Except in a general manner, it is difficult for
students to take the raw data provided from waveforms and spectrograms and turn that into
meaningful information useful for the needed changes in pronunciation or prosody. this type
of feedback is not in line with the requirement that feedback should first of all be easy to
comprehend. (Neri et al., 2009) Hismanoglu echoes this concern and felt CAPT could be used
more effectively by following his recommendation, Teachers should be able to comprehend
spectrograms, waveforms, and fundamental frequency contours to analyze students articulations
of target language words. (Hismanoglu, 2011) In addition, (Demenko et al., 2009) lists a
number of technical difficulties: weak speech signals, no extrapolation for voiceless sounds, not
entirely correct/reliable F0 extraction.
Hopefully, the technological issues will be remedied as advancements continue to be
made. While the prospect of making CAPT even more effective than it has already proven to be
is an attractive proposition, training teachers in analysis is probably not cost nor time efficient.
Budget constraints would hinder the expense of the training involved. The chore of analyzing the
database of each students practice record seems overly burdensome on the already demanding
time schedules that teachers face. It would seem better to subcontract this task out to linguistic
experts who are well-versed at reading waveforms and spectrograms and rely on their
recommendations to the teacher, but that creates another financial hurdle to cross. Considering
the fast pace of advancements in CALL technology over the last decade, improvement in
feedback may be just around the corner and worth the wait.
24
Matchett
The future of CAPT software appears limitless as researchers look at developing CAPT
applications that will reflect,
... a current understanding of how L2 pronunciation develops. In particular, it [this study]
attempted to address constraints stemming from interactions between L1 and L2
categories, while also increasing the quantity and quality of phonetic experience beyond
what is typically available to adult learners. (Thomson, 2011)
Thomson also looked at the accessibility web-based programs would offer and the
learning potential that would be possible through wireless mobile devices. This would offer new
avenues for research through remote monitoring as well. The potential for CAPT to incorporate
innovative research-based techniques is enormous, and still in its infancy. (p.759) Both
researchers and teachers could collaborate on designing software programs by implementing,
platforms that many ESL teachers and computer lab instructors already use. For
example, the functionality needed for listening to sound files and selecting from among
response alternatives is available in many popular Learning Management Systems, such
as Moodle or Sakai. (p.760)
Most of the software researchers are using to develop CAPT programs is available as
freeware. With a little effort, the potential is available for teachers to create their own software
programs that are tailor made to the needs of their students. Some CAPT software even offers a
teacher authoring feature within the program for teachers to add their own lesson materials.
The significance of stimulating right brain thinking for the processing of suprasegmentals
and the ability of CAPT to facilitate that process through aural, visual, and kinetic modalities has
been made evident in this review. The importance of prosody in making language intelligible to
25
Matchett
the listener and research regarding the efficacy of CAPT for improving both the pronunciation of
the segmental aspects of speech and the production of the suprasegmental elements of speech
have been delineated. The service that accent reduction training can render to open better job
opportunities and career advancement for non-native speakers has been identified. Hopefully,
accent reduction training will help the business world embrace the diversity that people from a
variety of cultures can offer. The use of CAPT as tool to bridge the gap between students needs
and the limitations of a traditional classroom setting has been clarified. In an effort to facilitate
administrators and teachers in selecting appropriate CAPT software an evaluation rubric of
currently available CAPT software will be found in Appendix 1.

26
Matchett

Bibliography

Ariza, E.N., Morales-Jones, C.A., Yahya, N., & Zainududdin H., (2000). Why TESOL?:
Theories & Issues in Teaching Englishs to Speakers of Other Languages in K-12
Classrooms. Dubuque, IA: Kendall Hunt.
Avery P., & Ehrlich, S. (2006) Teaching American English Pronuncation. Oxford: Oxford
University Press.
Akomolafe, S. (2013). FOREIGN-ACCENTED SPEAKERS AND. Journal of Cultural
Diversity, 20(1), 49.
Atkins, C. P. (2000). Do Employment Recruiters Discriminate on the Basis of Nonstandard
Dialect? Journal of Employment Counseling, 30(September 1993), 108119.
Bruno, D. (2002). THE BRAIN FROM TOP TO BOTTOM history. Language Processiing
Areas in the Brain. Retrieved from
http://thebrain.mcgill.ca/flash/d/d_10/d_10_cr/d_10_cr_lan/d_10_cr_lan.html
Burns, C., Barton, K., & Kerby, S. (2012). The State of Diversity in Today s Workforce (pp. 1
7). Washington, D.C. Retrieved from http://www.americanprogress.org/wp-
content/uploads/issues/2012/07/pdf/diversity_brief.pdf
Cargyle, A. (2000). Evaluations of Employment Suitabiiity.pdf. Journal of Employment
Counselingour, 37, 165177.
Carr, S. (2012). IMPROVING COMMUNICATION THROUGH ACCENT MODIFICATION:
Journal of Cultural Diversity, 19(3).
Census. (2011). Language Other Than English Spoken At Home. America Fact Fincer.
Retrieved from
http://factfinder2.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t
Cheng, H. (2012). Accent Reduction Classes Now In Demand _ The New York View. The New
York View. Retrieved October 26, 2013, from http://newyorkview.net/2012/08/accent-
reduction-classes-now-in-demand/
Cho, K., & Harris, J. G. (2006). Towards an Automatic Foreign Accent Reduction Tool. In
Speech Prosody (pp. 25).
Courinho, E. (2013). Psychoacoustic cues to emotion in speech prosody and music. Cognition
and Emotion, 27(4), 658684.
27
Matchett
Cummins, J. (2007). Literacy, Technology, and Diversity. Boston: Pearson Education,Inc.
Demenko, G., Wagner, A., Cylwik, N., & Jokisch, O. (2009). An Audiovisual Feedback System
for Acquiring L2 Pronunciation and L2 Prosody. In 2nd ISCA Workshop on Speech and
Language Technology in Education (SLaTE) (Vol. 2, pp. 25).
Derwing, T. M. (2003). ELL perception of their accent.pdf. The Canadian Modern Language
Review, 59(4), 21.
Derwing, T. M., & Waugh, E. (2012). IRPP S tudy (p. 36). Quebec.
Dominic Massaro, Micahel Cohen, Antoinette Gesi, R. H. (1993). Massaro Bimodal-Speech-
Perception-An-Examination-across-Languages. Journal of Phonetics, 21, 445478.
Eskenazi, M., & Hansma, S. (1998). The fluency pronunciation trainer. In Proceedings of the
STiLL Workshop (p. 6). Pittsburg: Language Technology Institute, Carnegie Mellon
University. Retrieved from http://www.cs.cmu.edu/~max/mainpage_files/Esk-Hans-98.pdf
Felps, D., Bortfeld, H., & Gutierrez-Osuna, R. (2009). Foreign accent conversion in computer
assisted pronunciation training. Speech communication, 51(10), 920932.
doi:10.1016/j.specom.2008.11.004
Friederici, A. D. (2011). The brain basis of language processing: from structure to function.
Physiological reviews, 91(4), 135792. doi:10.1152/physrev.00006.2011
Friederici, A. D., & Alter, K. (2004). Lateralization of auditory language functions: a dynamic
dual pathway model. Brain and language, 89(2), 267276. doi:10.1016/S0093-
934X(03)00351-1
Gardner, J. (2005). Barriers influencing the success of racial and ethnic minority students in
nursing programs. Journal of transcultural nursing: official journal of the Transcultural
Nursing Society / Transcultural Nursing Society, 16(2), 15562.
doi:10.1177/1043659604273546
Godwin-Jones, R. (2009). EMERGING TECHNOLOGIES SPEECH TOOLS AND
TECHNOLOGIES. Language Learning and Technology, 13(3), 411.
Hismanoglu, M. (2011). Computer Assisted Pronunciation Teaching: From Past to Present. In
4th International Online Language Conference (pp. 193203).
Holland, V. M., Kaplan, J. D., & Sabol, M. A. (1999). Preliminary Tests of Language Learning
in a Speech-Interactive Graphics Microworld. In Calico (Vol. 16, pp. 339360). Miami:
Calico Journal.
28
Matchett
Jackson, C. N., & OBrien, M. G. (2011). The interaction between prosody and meaning in
second language speech production. Die Unterrichtspraxis. Teaching German, 44(1), 111.
doi:10.1111/j.1756-1221.2011.00087.x
Jayesh Shah, Raouf Seifeldin, H. A. (2010). International Medical Graduates in American
Medicine: Contemporary challenges and oportunities. America Medical Association.
Retrieved from http://www.ama-assn.org/ama1/pub/upload/mm/18/img-workforce-
paper.pdf
Khurana, P. (2013). Efficacy of Accent Modification Training for International Medical
Professionals. Journal of University Teaching & Learning Practice, 10(2), 13.
Kilickaya, F. (2008). Improving Pronunciation via Accent Reduction and Text-to-speech
Software. In World CALL International Conference (pp. 35).
LanguageOnLine. (2001). The History of TeLL Me More. Innovation for Language Learning.
Retrieved from http://www.languageonline.in.th/history_en.htm
Lenneger, E. (1967). The Geological Foundations of Language. New York: John Wiley and
Sons.
Lev-Ari, S., & Keysar, B. (2010). Why dont we believe non-native speakers? The influence of
accent on credibility. Journal of Experimental Social Psychology, 46(6), 10931096.
doi:10.1016/j.jesp.2010.05.025
Levis, J., & Levelle, K. (2011). PRONUNCIATION AND INTELLIGIBILITY: ISSUES IN
RESEARCH AND PRACTICE PROCEEDINGS OF THE 2 ND ANNUAL
PRONUNCIATION IN Editors. In Pronunciation in Second Language Learning and
Teaching Conference (pp. 5669). Iowa State University.
McGregor, A. (2002). Pronunciation Software Review.
Meng, H. (2009). Developing Speech Recognition and Synthesis Technologies to Support
Computer-Aided Pronunciation Training for Chinese Learners of English *. In 23rd Pacific
Asia Conference on Language (pp. 4042).
Munro, M. J., & Derwing, T. M. (2000). Foreign Accent , Comprehensibility , and Intelligibility
in the Speech of Second Language Learners.
Neri, A., Cucchiarini, C., Strik, H., & Boves, L. (2009). The pedagogy-technology interface in
Computer Assisted Pronunciation Training. In Computer Assisted Language Learning:
Critical Concepts in Linguistics (Vol. IIV, pp. 140164). doi:10.1076/call.15.5.441.13473
Neri, A., Gerosa, M., Giuliani, D., & Mich, O. (2008). The effectiveness of computer assisted
pronunciation training for foreign language learning by children. Computer Assisted
Language Learning. doi:10.1080/09588220802447651
29
Matchett
Nguyen, L. T. (2010). Employment decisions as a function of an applicant. San Hose State
University.
Nygaard, L. C., Herold, D. S., & Namy, L. L. (2009). The semantics of prosody: acoustic and
perceptual evidence of prosodic correlates to word meaning. Cognitive science, 33(1), 127
46. doi:10.1111/j.1551-6709.2008.01007.x
Pearson, P. (2011). PRONUNCIATION AND INTELLIGIBILITY: ISSUES IN RESEARCH
AND PRACTICE PROCEEDINGS OF THE 2 ND ANNUAL PRONUNCIATION IN
Editors. In Pronunciation in Second Language Learning and Teaching Conference (p. 169).
Iowa State University.
Probst, K., Ke, Y., & Eskenazi, M. (2002). Enhancing foreign language tutors In search of the
golden speaker. Speech Communication, 37(3-4), 161173. doi:10.1016/S0167-
6393(01)00009-7
Qooco. (2009). About ASR. Qooco Chinese Learning. Retrieved February 11, 2013, from
http://www.qoocochinese.com/web/help_4.htm
Romer-Trillo, J. (2012). Pragmatics and Prosody in English Language Teaching. Educational
Linguistics, 15, 2314.
Ryan, C. (2013). Language Use in the United States: 2011 (p. 16). Washington, D.C. Retrieved
from http://www.mla.org/map_main
Saito, Y., Fukuhara, R., Aoyama, S., & Toshima, T. (2009). Frontal brain activation in premature
infants response to auditory stimuli in neonatal intensive care unit. Early human
development, 85(7), 4714. doi:10.1016/j.earlhumdev.2009.04.004
Sankin, S. (2013). Accent Reduction Training Demand Is Increasing Sankinspeechimprovement.
PRWeb. Retrieved October 26, 2013, from http://www.prweb.com/releases/accent-
reduction-nyc/regional-accents/prweb10689963.htm
Schaetzel, K. (2009). Teaching Pronunciation to Adult English Language Learners. CAELA
Network Brief. Retrieved from www.cal.org/caelanetwork
Seferoglu, G. (2005). Towards an Automatic Foreign Accent Reduction Tool. British Journal of
Educational Technology, 36(2), 303316.
Thomson, R. (2011). Computer Assisted Pronunciation Training: Target- ing Second Language
Vowel Perception Improves Pronunciation. Calico Jounral, 28(3), 744766.
TOMATIS-Developpement. (2009). The TOMATIS Method , a teaching process for listening (p.
14). Luxembourg: Tomatis Developpement S.A.
30
Matchett
Wang, R., & Lu, J. (2011). Investigation of golden speakers for second language learners from
imitation preference perspective by voice modification. Speech Communication, 53(2),
175184. doi:10.1016/j.specom.2010.08.015
Wendy, B. (2007). Learning prosody and fluency characteristics of second language speech: The
effect of experience on child learners acquisition of five suprasegmentals - ProQuest.
Applied Psycholinguistics. Retrieved from http://0-
search.proquest.com.library.acaweb.org/docview/200859527/fulltextPDF/1414C9067752B
26F3C2/3?accountid=9900

.

31
Matchett

Logo Illustrations

Talk to Me, Copyright 2013, Informer Technologies, Inc Retrieved November 3, 2013
http://softadvice.informer.com/Talk_To_Me_Auralog.html
Pronunciation Power CD-ROM for Mac And Windows. Copyright 1995-2013 ESL.net.
Retrieved September 15, 2013, from http://www.esl.net/pronunciation_power.html
NeoSpeech, www.neospeech.com 2013.All rights reserved. Retrieved November 3, 2013
http://www.neospeech.com/
SpeedLingua, Copyright 2010, Retrieved November 13, 2013 http://www.learnissimo.com

Dees Educ Tech Lit Review

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Dees Educ Tech Lit Review

Enviado por

Direitos autorais:

Formatos disponíveis

1

The Use of Accent Reduction Software to Improve

Você também pode gostar