Escolar Documentos
Profissional Documentos
Cultura Documentos
Angus Fung
Department of Linguistics
University of Calgary
1.
1
I would like to express my gratitude to all those who helped me to complete this thesis. I am deeply
indebted to my supervisor Prof. Dr. J. Archibald whose help, suggestions and encouragement helped
me in all the time for writing of this thesis. I extend my thanks to him for his countless hours of
discussion and commentary.
Introduction
Second language acquisition is the phrase used to describe the process that
people go through when confronted by a need to use a language other than their native
one for communication. People acquire their first and second languages differently.
Some of the issues and processes involved in language acquisition include the idea of
innateness (Is language ability determined genetically?), the relevance of the language
input the language learner receives, and the nature of early (developmental) grammars
(O'Grady et al, 1989). In this paper, I am going to address a number of issues that
have to do with the acquisition of voicing and aspiration contrast in a second language
(L2). My major focus will be on what Cantonese speaker learners do when they are
learning English stops. I will also look at a few other languages and their acquisition of
the two languages. I will also examine the phonological differences between the two
level, substitution by a related sound in the native language, deletion and epenthesis
one of the major dialects of Chinese and the language belongs to the Sino-Tibetan
language family. It is spoken in Guangdong (including Hong Kong), Macau, and in the
southern part of Guangxi (Figure 1). On the other hand, English is a Germanic language
2
Figure 1: Map of Guangdong Province (Wertz, 2003)
China
Figure 1
2. Phonetics of VOT
In this section, I will take a look at the production and perception of stops in
term of different voice onset time, one of the cues to contrast voicing and aspiration
contrast in stops
2.1 Articulation
Voice Onset Time (VOT) is the duration of the period of time between the
release of a plosive and the beginning of vocal fold vibration. This period is usually
3
measured in milliseconds (ms). It is useful to distinguish at least three types of
Figure 2 shows the waveforms of the two plosives “t” and “d”. The arrows
indicate the release burst of the stop consonant and the onset of glottal vibrations for
the vowel. Clearly, the VOT is longer for the voiceless than the voiced stop. This is
due to the glottal abduction, which is the closure of the vocal folds for the voiceless
stop and its temporal relationship to the oral closing and opening movements.
Figure 2. The picture is a waveform of English [t, d] each followed by the vowel
[a]. The y-axis represents amplitude. The x-axis is time - 1.5s overall. Morton, K.
(1995)
When a plosive sound has a fairly long positive VOT (longer than about
50ms). The air from the lungs is traveling quite quickly through the vocal tract. It is
not slowed down either by the vocal folds, which are open, nor by a constriction in
the vocal tract because the plosive has been released. The rapid airflow creates a weak
friction noise. When a voiceless unaspirated plosive is followed by a vowel, the time
when the vocal folds begin vibrating for the vowel will coincide almost exactly with
the time when the plosive is released (give or take up to 20 milliseconds). After a
voiceless aspirated stop, however, the vocal folds will not begin vibrating until well
The production of stops is not always uniform in terms of VOT, but when
you have two or more contrasting stops in a language, for example /t/ and /d/ in
4
English. The two stops would be produced within a particular range of VOT. In the
following graph (Figure 3), it shows the production of a speaker of American English
for words beginning with /d/ or /t/. The production of /d/ ranges from 0ms to 25ms and
Figure 3. VOT production of a single normal adult speaker of American English for
words beginning with /d/ and /t/. Blumstein et al., (1980)
These are just two different possible ways of coordinating the timing between
vocal fold vibration and a closure in the mouth. Various languages make use of many
points along this VOT continuum. In the following diagrams (Figure 4), the top half
represents the closing and opening of a plosive in the mouth and the bottom half
represents the state of the vocal folds -- a straight line means voicelessness and a
Lip closure
5
2 partially voiced /ba/
3 voiceless /pa/
unaspirated
4 aspirated /pha/
Languages that make voicing contrasts usually choose two or three points
along this continuum (Abramson, & Lisker, 1970). English has chosen to use position
2 for its voiced sounds and either 3 or 4 (depending on position in the word or
syllable) for voiceless sounds. French has chosen to use 1 (fully voiced) and 3
The perception of the voicing and aspiration contrasts (e.g. /p/ vs. /b/, /ph/ vs.
/p/) in stops depends on acoustic cues such as VOT. We usually do not perceive
stimuli categorically (Kess 1992). For example, we do not see a colour spectrum from
blue to red as either pure blue or pure red and nothing in between. A colour can be
kind of blue and kind of green. Whereas a stop cannot be kind of [d] and kind of [t]; it
One of the things that people seem to perceive categorically is speech. This is
you get a percept that perfectly matches an example of a particular category. So even
when the physical stimuli change continuously, people would still perceive it
categorically. For example, both /b/ and /p/ are stop consonants and to produce these,
you close your lips, then open them, release some air, and the vocal cords begin
6
vibrating. The difference between /ba/ and /pa/ is the different VOT of the two stops.
For /b/, VOT is very short; voicing begins at almost the same time as the air is
To show the categorical perception of stops, a study by Pisoni & Tash (1974)
used a series of synthetic stimuli that span the VOT continuum between /ba/ and /pa/.
When people were asked to identify these stimuli, they generally have no difficulty:
the lower half of the continuum is consistently identified as /ba/ and the upper half as
/pa/ as show in Figure 5. People did not report hearing something that is a bit like [b]
and a bit like [p]. Rather, they report hearing either [b] or [p]. Thus, the actual VOT
of the individual stimulus appears to be discarded, and all that remains in the percept
is category membership.
people to distinguish all speech sounds. Generally, they can only distinguish the
speech sounds that result in meaningful differences in their native language. To find
out an infant’s ability to discriminate different speech sound, Eimas et al, (1971)
tested two groups of infant whowere1 month and 4 months of age in their study.
Result showed that infants at both ages distinguished sounds that were members of
separate phonemes (i.e. categories) from one another but they failed to distinguish
sounds within a given category. The study also shows that infants can distinguish
speech sounds before they can produce them. Figure 6 shows the result of this
experiment. For stops with VOT at –20ms and 0ms, infants perceived them as the
same stop; it is also true for stops with VOT at 60ms to 80ms. But for the stops with
7
Figure 6. Experimental design of infant discrimination study. Eimas et al, (1971)
Note S = perceives as the same stops; D = perceived as two different stops.
In the next section, we are going to look at the differences between English and
Cantonese phonological systems, as this would help us to account for problems and
pronunciation.
3. Phonology of Syllables
English and Cantonese, they both have six stops in bilabial /p, b/, alveolar /t, d /, and
velar /k, g/. In English, /p, t, k / are voiceless whereas /b, d, g / are voiced. In
Cantonese, however, there are no voiced plosives; all plosives are voiceless. The
feature that distinguishes between the stops is aspiration (/p, t, k/ vs. / ph, th, kh/).
8
Table 1. An overview of English and Cantonese consonants (Chan & Li ,2000).
Method of Place of articulation
articulation
Bilabial Labio-dental Dental Alveolar Palatal- Palatal Velar Labio-velar Glottal
alveolar
(C) p, ph t, th k, kh kw, kwh
Plosives
(E) p, b t, d k, g
(C) f s h
Fricatives
(E) f, v s, z , h
h
(C) ts, ts
Affricates
(E) t, t
(C) m n
Nasals
(E) n
m
(C) l
Lateral
(E) l
(C) w j
Approximants
(E) w r j
three consonants before a vowel and a maximum of four consonants after a vowel. One
such example is ‘strengths’ /streks /. The syllable structure of Cantonese, in
contrast, is rather simple; the possible combinations of sounds are restricted. Unlike
limited to V, CV, VC, and CVC. Examples are given in Table 2 below:
Table 2
Syllable structure Examples
V // _ ‘exclamation showing surprise’
CV /fu/ _ ‘husband’
VC / an/ _ ‘late’
CVC /sik/ _ ‘colour’
9
word-final position are always unreleased. For example, in the word ‘duck’, /ap/,
the word ‘prosper’,/fat/, and in the word ‘house’, /uk/. Whereas in English,
unreleased stops only occur in connected speech when a word-final stop is followed
by a word in a word initial stop. For example, the word final [p] of the word “cup” is
Cantonese: English:
_ _
4. Explaining L2 Behavior
languages, I would like to review the behavior of L2 learners, how do we predict what
they will do if the target forms are not found in their native language.
why certain target forms are more difficult to acquire than others. One of the earliest
was the Contrastive Analysis Hypothesis (Lado, 1957). This hypothesis stated that
when two languages are similar, positive transfer will occur and hence those form will
be easy to learn; where they are different, negative transfer or interference will result
and those forms will be difficult to acquire. However, it turns out that defining
similarity and difference is not always easy. Some researches (Eckman & Iverson
10
1993, 1994) suggested that typological markedness be the basis of prediction.
Structures that are complex and/or especially common in human language are said to be
unmarked, while structures that are less complex or less common are said to be
B does not necessarily imply the presence of A." In other words, when a language has
voiced stops e.g. [d], we would expect that the language would have a voiceless
counterpart, [t] but not vice versa. From that, we could say that voiced stops are more
Sometimes something that is not in your L1 can be easy to acquire, e.g. English
does not make contrast between [] and [] in word initial position. But English
speakers seem able to make the contrast in French onsets without trouble. The
which there are differences between a target and a native language, the degree of
difficulty will be greater when the area of difficulty is more marked in the native
language and smaller when the degree of markedness is smaller. The degree of difficulty
among those target language (TL) structures that are different from those in the native
language (NL) will correspond directly to the degree of markedness The two
In (3), the presence of nasal vowels implies the presence of oral vowels but not
vice versa. There are languages which have [a] and [a]; languages which have [a] alone,
but there are no languages which have [a] but not [a]. From that, we know that nasal
vowels are more marked than oral vowels and so we would predict that the degree of
11
difficulty is higher when there are nasal vowels in the target language but not the
native language.
(3) [a] implies [a]
∴Nasal vowels are more marked than oral vowels. Hence, the prediction of the
MDH would be that nasal vowels are more difficult to acquire.
On the other hand, those TL differences that are not more marked will not be
difficult. MDH can explain several major patterns of difficulties found in second
language acquisition. Now we know that what kind of target forms are difficult for L2
learners, we will discuss what L2 learners will do when the target forms are difficult to
acquire.
When they are learning English, they would produce words with syllable final
obstruent devoicing (producing [hæt] for [hæd] “had”) because they have no voicing
can be found in all languages in the world. In order for Cantonese speakers to
pronounce the target English items, Cantonese speaker would adopt a number of
Epenthesis is one of the strategies Cantonese speaker use. A vowel, usually a schwa
// is inserted between a consonant cluster or after a final consonant of the syllable.
Another repair strategy is deletion. In this case, Coda consonants or one of the
consonant clusters are deleted in order to obtain the more optimal syllable (CV). The
replacement or substitution. This strategy doesn’t alter the syllable structure and it
12
appears quite frequently in final voiced stops (Edge, 1991). For Cantonese L2 learners
of English, the most number of errors found in these items are voice feature.
Devoicing is the most common in final voiced stops. The follow examples illustrate
maintain the relatively marked, closed CVC structure. Even though both deletion and
epenthesis convert the relatively complex CVC syllable into relatively simple CV
syllables, their outcomes differ as to what degree of ambiguity they impose on a word.
a theory of universal grammar” languages, native speakers, and language learners avoid
or minimize ambiguity. Young children frequently delete segments in both onset and
coda position but very rarely make use of epenthesis. This is because their phonetic
ability is low and their functional knowledge (in terms of the recoverability principle)
is not yet developed. Adults learning L2s seem to exhibit far more instances of
epenthesis than children acquiring their L1. The reason why epenthesis is a more
(1994), that even though adults’ phonetic skills in the target language lag behind that
3- to 5-week intervals from August 1990 to May 1991. This experiment was to test
13
strategies by L2 learners regarding grammatical and functional aspects. He predicted
that the error frequencies will be relatively low in the initial stages, higher frequencies
at a later stage, and relatively low frequencies again at even later stages of acquisition.
Also, epenthetic forms will be relatively lower in early phrases of development but
greater in later phrases. Figure 8 shows the results of the overall error frequencies in
the experiment. The result agreed with his prediction that learners’ acquisition of
codas can be characterized by the following four phases: (a) an initial phase with
relatively high error rates, followed by a rapid decrease in error frequency; (b) a linear
increase in error frequency; (c) a stable plateau phase of relatively high error
Figure 9 gives a summarized description of what the pattern looks like when
the mean epenthesis proportions for the autumn semester 1990 are compared with the
mean proportions for the spring semester 1991. Subject C1 already used epenthesis
more than twice as much as deletion during the autumn semester (epenthesis-deletion
proportion: 2.13) and almost three times as often during the spring semester
(proportion: 2.87). C2’s use of epenthesis is barely half as frequent as his use of
deletion during the first semester (proportion: 0.44), but there is a significant increase
14
in his use of epenthesis, which is almost as frequent as deletion during the second
semester (proportion: 0.88). Finally, C4 increased her use of epenthesis, which was
nearly as frequent as deletion during the autumn of 1990 (proportion: 0.75), to a level
almost three times as frequent during the spring of 1991 (proportion: 2.77). This was
a significant change
The functional or grammatical role of the coda also determines the use of
are relatively more important for the retention of semantically relevant information
proportions, or both, than will codas containing information that is more recoverable
(or predictable) from other segments or features in the context. In Swedish, /r/ coda
can serve as a plural marker, or a tense marker and also occurred in noninflected
noninflected word has been deleted, it is generally not expressed by other explicit
markers or features in the context, and it can be argued that deletion of the stem-final
/r/ results in much greater lexical-semantic ambiguity than the partial deletion of an
inflectional morpheme. It may therefore also be argued that the retention of final /r/ is
15
more beneficial in noninflected words. To test the hypothesis, inflected words that
ended in either the present-tense morpheme -r/-er or the plural morpheme -r/-ar/-er/-or
were compared with noninflected words with stems that ended in a single /r/. Figure
appear to be very small, all subjects produced significantly more epentheses for
Two pairs of word classes were compared on the subject of epenthesis and
deletion. One of them is the comparison between present tense and plural, As can be
seen in Figure 11, there is no consistency between the three subjects: C1 used
epenthesis significantly more often for present-tense (proportion: 0.1) than for plural
codas (0.02); subject C2 did not differentiate his use of epenthesis between the two
16
inflectional categories in any significant way. The other comparison deals with
of open-class words is less recoverable from the context, they will thus be pronounced
more accurately with a lower overall error frequency and a higher proportion of
epenthesis than the more recoverable or predictable /r/ of closed-class words. The
However, Lin (2001) found that in the case of consonant clusters, it is the learners’
choice of repair strategy but not the error rates that varies with the style of speech.
Twenty Chinese adults were included in her study of production of English onset
consonant clusters in four different types of tasks. The experiment include a wide
variety of task types, ranging from the most formal “reading of minimal pairs”, “word
list reading”, “sentence reading” to the least formal “conversation” as shown in the
17
The results of the error rates support her hypotheses and do not conform to
the general prediction that more accuracy will be obtained from L2 learners’
production of target items as the style becomes more formal. There is no significant
difference was found in the students’ error rates in the four speech tasks as shown in
Figure 14.
Figure 14. Overall error rates in the four tasks. (Lin 2000)
Her study also showed that the use of epenthesis increased as the style of the
task became more formal, and the percentage of deletion and replacements became
higher in less formal tasks. It is also true that the proportion of epenthesis vs. deletion
should be greater in tasks without linguistic context than in tasks with linguistic
context. For tasks that were more formal or that require more attention to form or
pronunciation rather than to content, the use of epenthesis would increase. One the
other hand, when the tasks became less formal or as more attention was paid to
content rather than form, more instances of deletion and replacement would be
preferred. The results of her experiment indicate that what is shifted with style is the
learners’ choice of the repair strategies rather than the accuracy rates.
18
Figure 15. Percentages of the three strategies in the four tasks. (Lin2000)
Note: MP = minimal pair; WL = word list; S = sentence; C = conversation.
5. Phonetics of L2 Learners
So, can L2 learners acquire new VOT? In this section, I will review the existing
literature that studied the acquisition of different stops in L2 which are different from
their L1.
Curtin et al. (1998) studied the acquisition of Thai voice and aspiration by
English and French speakers. Thai has a 3-way voicing contrast phonemically in stops
which includes voiced, voiceless unaspirated and voiceless aspirated stops. English
also has the three phonetically different stops, but only two phonemically different
stops. Aspiration is not the contrasting feature in the language in English and so there
phonetic difference between the [p] in “spin” and the aspirated [ph] in “pin”.
Underspecification means that underlying representations are not fully specified and
expresses this by assuming that underlyingly both p's are not specified for aspiration.
In this study, Curtin et al. (1998) wanted to find out whether allophonic aspiration in
English [p] vs. [ph] aids in the acquisition of contrastive aspiration in Thai /p/ and / ph
19
/. They also wanted to compare the developmental progression of the English learners
to that of native speakers of French. Like English, French has a 2-way voicing contrast
aspiration contrast. You could find voiced and voiceless stops in French, but you
Lisker, 1970; Strange, 1972; Pisoni et al., 1982) which has shown that English
/ph/ vs. /p/) than voiced-voiceless segments (e.g. /p/ vs. /b/) in the synthetic VOT
study. But in Curtin et al. (1998)’s study, result showed the opposite in one of the
that the contradictory orders (aspiration contrast are perceptually easier to distinguish
by English speakers, but English subjects did better in voicing contrast in this study)
representation and responses on that task must be made on the basis of lexically
stored representation. The details and the result of the experiment will be discussed
stops are stored as unaspirated in the lexicon and emerge in the fully specified
underlyingly both /p/s are not specified for aspiration in [ph in] and in [spin]. The
a syllable; aspiration does not apply in other contexts. English has no lexical
distinction between aspirated and non-aspirated stops but still there is a phonetic
difference.
(5) Lexical representation: /pæt/ /spæt/ /bæt/
Aspiration rule: [phæt] — —
Surface representation: [phæt] [spæt] [bæt]
20
Triads of words that minimally differ in both voice and aspiration are found in
Thai, neither of these features is predictable and so both voice and aspiration features
The first task of the study is a Minimal Pair Task. Nine Canadian English
speakers, 8 Canadian French speakers and 10 native speakers of Thai (controls) were
asked to choose between pictures of words that are in minimal pair relationship, when
presented with one word aurally. The pictures of the minimal pair are accompanied by
a picture of a foil that differs phonetically in more than one segment from the other
words. An aural presentation was heard and subjects were asked to respond by
pressing a key that corresponds to the position of the appropriate picture on the
screen. This task was used to study the development of lexical representation and to
find out if the subjects could lexically contrast both voice and aspiration, to see if they
The second task is called an ABX Task. In this task, a minimal pair ‘AB’ is
presented aurally followed by a third word ‘X’ that matches either A or B. The
tokens used for A, B and X were each produced by a different speaker. There were 72
24 Place controls. Subjects were asked to matches either A or B when they heard a
The results of the Minimal Pair task show that aspirated–unaspirated Minimal
Pairs were discriminated by both English and French groups at a level only slightly
better than chance, performance on the voicing contrast was better (Figure 16). This
experiment lasted for 11 days and results were collected in day 2, day 4 and day 11.
From the results in the last day (day 11), we could see the developmental difference
between some of the English and French subjects. This suggests that the presence of
21
contrast in the L2 acquisition of Thai. Because of this, Curtin et al. (1998) suggested
Figure 16. Minimal Pair Task- proportion correct (Curtin et al. 1998)
French only has voicing contrast in both lexical and surface representations, so
as expected in the ABX task, French speakers perform better on voice contrast than
on aspiration (Figure 17), similar to what they did in the Minimal Pair task. English
speakers perform similarly on voicing and aspiration contrast in the ABX task as
shown in Figure 17. This ABX results were quite different from what English
speakers did in the Minimal Pair task in which their performance on aspiration was
22
Curtin et al. (1998) claimed that the Minimal Pair task accesses lexical
from the results of an ABX discrimination task that English subjects did better than
those features that are present lexically in the L1, even though they may be able to
discriminate other L2 contrasts on the basis of surface features, and may eventually
Pairs were better discriminated by the English speakers than the French speaker. The
discrimination, while the task that accesses surface representations, English speakers
discrimination task that English subjects did better than the French subjects.
Flege and Eefting (1988) examined the imitation of a VOT continuum ranging
from /da/ to /ta/ (-60 to +90 ms) by subjects differing in age and/or linguistic
experience. Subjects were native speakers of English, native speakers of Spanish and
bilingual speakers of both. Spanish and English use different phonetic categories to
implement the contrast between /t/ and /d/. In Spanish, [d] is used to implement /d/
and [t] implements /t/. Spanish categories of [d] and [t] yield stops with VOT values
/d/ is implemented by [d] and [t], and /t/ is implemented by [th]. English output of [d]
and [t] result in VOT values of about –80 ms and 20 ms. The rule used to implement
[th] yields VOT values of approximately 80 ms. (Flege and Eefting, 1986). Figure 18
23
illustrates how English and Spanish speakers divide up a VOT continuum based on
In the experiment, subjects were asked to identify the stimuli before imitating
them. The stimuli, which consisted of a 16-member continuum ranging from /da/ to
/ta/, were presented twice on each trial. Results showed that regardless of the
properties of the acoustic input, children and adults who spoke only Spanish
produced only lead and short-lag VOT responses, which are their phoneme boundaries
in their L1 and they perceived the VOT continuum input as a member of either of
their L1 categories (Figure 19). English speakers also tended to produce phoneme
boundaries in their L1. They produce stop with only short-lag and long-lag VOT
values (Figure 20). On the other hand, native speaker of Spanish who spoke English
produced stops with VOT values falling into three modal VOT ranges (Figure 21).
They had acquired a new phonetic category that isn’t in their L1.
24
Figure 19. The frequency of VOT values Figure 20. The frequency of VOT
produced by the native Spanish subjects. values produced by the native English
subjects.
SA=Spanish adult
SC=Spanish children EA= English adult
EC= English children
25
Figure 21. The frequency of VOT values produced by the native Spanish speakers of
English. LCB= late childhood bilinguals. ECB= early childhood bilinguals
BC= bilingual children
6. Phonology of L2 Learners
literatures that examined segmental level, which has to do with phonological segments
phonology.
Even when the L1 has no clusters, some clusters are easier to acquire than
other. E.g. [pl] is easier to acquire by L2 learners than [fl]. To explain the
phenomenon, Broselow & Finer (1991) proposed that a Minimal Sonority Distance
clusters in syllable onsets. The basis for the markedness of the clusters in Broselow &
Finer (1991)’s study is the Sonority Index shown in (7) and the proposed MSD
parameter.
(7) Sonority Index
Class Scale
Stops 1
Fricatives 2
Nasals 3
Liquids 4
Glides 5
difference allowed in syllable onsets on the Sonority Index. Other things being equal,
languages that required a greater difference in sonority between adjacent segments will
have fewer kinds of consonant clusters in the onset. E.g. a stop-liquid cluster [pr]
would be less marked than a stop-fricative cluster [ps]. But Eckman & Iverson (1993)
argued it is typological markedness rather than sonority distance which better explains
26
L2 learners’ knowledge of English clusters in syllable onsets. they suggested
sequential markedness principle as the better explaination: “For any two segments A
and B and any given context X_Y, if A is less marked than B, then XAY is less
marked than XBY.” On this assumption, since [p] is less marked than [f], hence [pr]
clusters are less marked than [fr] clusters and are predicted to cause less IL difficulty
Korean, and 3 Cantonese speakers. They studied the production of English onset
consonant clusters (CCV). Threshold for definition of acquisition is said to have the
onset in the IL of a subject if the subject produces onset clusters at least 80% of the
time on at least 4 attempts. The data was collected 8 times in casual conversations
between 5 to 10 minutes. No attempt was made to control the vocabulary used by the
subject. They claimed that a less marked cluster would be present just in case one or
more of the more marked clusters is also present. 55 potential test of their claim (five
sets of onset per subject 11 subjects) were collected. Out of the 55 potential tests,
the data allow 50 to be tested (91%). Five of the potential tests yield no result
because the subject did not produce at least four tokens of the relevant clusters. Four
predict. In 92% of the cases, the subject’s performance obeyed the markedness
predictions.
The four cases which ostensibly violated what typological Markedness would
predict. two cases were from Cantonese speaking subjects in which they got the two
clusters [br] and [fr] but not [pr]. Since [p] is less marked than [b] and [f], we would
expect that [pr] would also be less marked than [br] and [fr]. Analysis of the actual
errors from these two subjects showed that both of them substituted [ ph] onsets for
the intended [pr] onsets. In order to explain this, Eckman & Iverson (1993) assumed
that on the basis of similarities in VOT, the two subjects are associating their NL /p/
27
NL TL
/p/ /b/ Short-lag VOTs.
/ph / /p/ Long-lag VOTs.
With this assumption, the subjects’ production would agree with markedness
unaspirated stops. Hence, the [ph]-liquid onset is more marked than [p]-liquid onset
and [f]-liquid onset. From Eckman & Iverson’s explanation, it brings up the question
Edge (1991)
and Cantonese. In Edge’s (1991) study, the data of native speakers of English was
included to account for the native devoicing and epenthesis. This was done to avoid
epenthesis, and the deletion of final voiced obstruents all characterize spoken English.
7 Japanese, 7 Cantonese and 4 native speakers of English were subjects of this study.
The tasks in this study included (1) a picture-elicited storytelling task which
contained words with voiced obstruents, (2) an oral reading of a short story and (3) an
oral reading of 41 randomly ordered words. The voiced obstruents were classified in
the data as either target, deletion, glottal stop substitution, devoicing, epenthesis,
processes, such as terminal devoicing, are universal. Edge’s data from the Cantonese
Eckman’s hypothesis. For the Cantonese subjects, 67% of the non-target variants
were devoiced and deletion appeared to be more frequent in connected speech. When
compared to deletion in the Native English subjects’ data, the deletion of Cantonese
subjects is quite different in its distribution. While deletion of /v/ in function words
28
(fond of playing) rarely occurred, deletion of final /g/, as in dog and of /d/ after a
this experiment indicate that under the three tasks, devoicing is the strategy that was
most frequent used by Cantonese speakers. It is also important to take into account
native speech in formulating rules for a language learner’s IL production. After we’ve
loo
speakers of Cantonese in onset and coda positions. The two consonant inventories
differ in several ways. French allows more consonants in both onset and coda
position. The number of consonants differs greatly between the two languages in coda
positions since Cantonese only allows unreleased stops /p, t, k/ and nasals in the
coda. Cantonese does not have the voiced/voicing contrast found in French stops but
voiceless aspirated.
There were 6 subjects in this study and their level of proficiency in French
was at the upper beginner and lower intermediary levels. The subjects were asked to
read a passage in the first task. For the second task, subjects were given an English and
Cantonese translation of the items and were asked to give the French equivalent. The
37 words were expected to be well known. Only five words were unknown to some of
the subjects and only three cases were the target words read and repeated after the
fieldworker.
merely sub-phonemic inaccuracy even though it contained a wrong nucleus, e.g. [ph ]
was treated as acceptable for initial /p/. They also judged as acceptable when it ended
in a nonnuclear element agreeable with the target phoneme even though it contained a
wrong nucleus, e.g. [sz] for /s/ and [sz] for /z/. Finally, they also judged as acceptable
29
when the target contained an allophone of the target but ended in a wrong phonetype,
e.g. [p] for /p/.
As we can see from the table below (Figure 17), focusing on the result of
stops, Cantonese speakers had greater problem in producing French initial voiceless
stop /p, t, k/ accuracy around 50% even though their native language has the
equivalent phone types. They made errors by producing the stops with prevoicing
and sometimes with a schwa-like vowel inserted after the consonant. In learning to
produce onset /p, t, k/, about 40% of their production were voiced [b, d, k], 35 % are
voiceless aspirated [ph, th, kh], and only about 20% are voiceless unaspirated [p, t, k].
This contradicts the MDH because these French stops have Cantonese counterparts
and one might expect that they be easily learned. In coda position, the result of this
than in voiceless stops. The voiced stops are nearly always devoiced in final position.
As Figure 18 shows, of all the errors made in the production of stops, 95% included
30
To account for the difficulties with French onset stops in Cantonese speakers’
production, Cichocki, et al. (1999) suggested that we could look at the patterns of
difficulty found in first language acquisition, which shows that voiceless initial stops
are more difficult than are voiced initial stops. (Ingram, 1978). Cichocki, et al. also
claimed that one of the problems in this study is that all the subjects were learning
French as a second foreign language. It is because English is taught in all Hong Kong
schools and is the medium if instruction in many. The possibility of interference from
English cannot be neglected when we look at the data obtained in this study. My
prediction is that English speakers would not have this trouble because English
speakers has the voicing contrast in their L1. Cantonese speakers may have difficulties
31
Figure 18. Cichocki, et al. (1999)
7. Discussion
English and Cantonese phonological systems in this article, we have examined some
difficulties that Cantonese speakers may have when learning English pronunciation. It
is argued that most of the Cantonese ESL learners’ difficulties with English
the phoneme inventories of the two languages, the characteristics and distribution of
the phonemes and the permissible syllable structures of the two languages in question.
In this section, we are going to look at differences between the acquisition of stops in
onset and coda position, and different repaired strategies are used under different
circumstances.
Cantonese speakers exhibit a voice contrast in word-initial, -medial and final position.
However, devoicing occurred in some voiced stops in coda position but not onset and
Cantonese speakers seems to have no difficulty in onset voiced stops. Since coda is a
32
more marked position than onset, we would expect that people would have more
difficulties in coda positions. Similar to Flege & Eefting (1988)’s studies of English
and Spanish speaker, Cantonese speakers judge tokens of [p, t, k] in their L1 and the
position even though they can detect auditorily the acoustic differences between
onset voiced stops in the acquisition of French. the result of the study by Cichocki, et
al. (1999). This only happened in the onset but not the coda position. Since voiced
stops are more marked than voiceless stops, this is not what we expected from the
prediction by MDH. Comparable to the result in Eckman 1981, subjects in this study
also showed that they had more difficulties in coda voiced stops. Apart from the fact
that voiceless initial stops are more difficult than are voiced initial stops in L1
acquisition studies, the reason why voiceless French voiceless onsets are difficult to
acquire by Cantonese speaker may also due to the perception of the voicing contrast.
Cantonese subjects may have a wrong realization in time of the phonological units
Cantonese speaker as Flege (1987) stated that, all other things being equal, we actually
learn L2 sounds which are dissimilar to the sounds in our L1 more easily than their
Repair strategies
In terms of the kind of repair strategies that Cantonese speaker will choose in
the acquisition of English voiced stop, we need to look at proficiency, formality and
study, data shows that coda deletion is low in the initial phrase of development; it
would increase during the early phrase and decrease during later phrases. The
proportion of epenthesis to deletion will increase over time, which means that the use
of epenthesis would be relatively low at the early stage and increase later on in the L2
development. Error rate increases because of the fact that fluency also increases
33
considerably with higher L2 proficiency. Fluent speech is characterized by more focus
on content and less focus on form and so the increase of deletion and epenthesis
would be found in the early phrase of L2 development. Another factor that varies
positively with increased formality of the speech task such that epenthesis is
frequently in less formal tasks (e.g.,sentence, text, and story reading or natural
conversation), where deletion is the dominant simplification strategy. Other than that,
one aspect of recoverability from the context is whether the coda is crucial part of a
argued that the reduction of lexical forms generally increases lexical ambiguity, and this
might particularly be the case for content words. In contrast, the information
formal markers or otherwise predictable from the context, and it might be argued that
inflectional information is more easily recoverable from the context than the
underlying form of a reduced lexical stem. It is more likely that word-final codas that
are part of a lexical stem will be pronounced less incorrectly than word-final codas
34
References:
Blumstein, S., Cooper, W., Goodglass, H., Statlender, S., & Gottlieb, J. (1980).
Production deficits in aphasia: A voice-onset time analysis. Brain and Language
9, 153–170.
Chan, A.Y.M. & Li, D.C.S. (2000). “English and Cantonese phonology in contrast:
explaining Cantonese ESL learners’ English pronunciation problems”. Language,
Culture and Curriculum, 13, 67-85.
Cichocki, W., House, A.B. Kinloch, A.M. & Lister, A.C. (1999). “Cantonese
speakers and the acquisition of French consonants”. Language Learning, 49, 95-
121.
Curtin, S., Goad, H. & Pater, J. (1998). “Phonological transfer and levels of
representation: the perceptual acquisition of Thai voice and aspiration by
English and French.” Second Language Research 14, 4. 389-405.
Eckman, F. (1981): “On predicting phonological difficulty in second language
acquisition.” Studies in Second Language Acquisition 4: 18-30.
Eckman, F & Iverson G. (1993) “Sonority and markedness among onset clusters in the
interlanguage of ESL learners” Second Language Research 9, 3. 234-252.
Eimas, P.D., Siqueland E.R., Jusczyk, P.W., & Vigorito, J. (1971). Speech perception
in infants. Science 171:303.6
Flege, J. (1987). 'The production of "new" and "similar" phones in a foreign language:
Evidence for the effect of equivalence classification.' Journal of Phonetics 15: 47-
65.
35
Flege, J.E. & Eefting, W. (1988) "Imitation of a VOT continuum by native speakers
of English and Spanish: Evidence for phonetic category formation", Journal of
the Acoustical Society of America 83: 729-740.
Lado, R. 1957: Linguistics across cultures. Ann Arbor: University of Michigan Press.
Pisoni, D., and Tash, J. (1974) Reaction times to comparisons with and across
phonetic categories. Perception and Psychophysics 15(2), 285-290.
Pisoni, D., Aslin, R., Perey, A. and Hennessy, B. (1982): Some effects of laboratory
training on identification and discrimination of voicing contrasts in stop
consonants. Journal of Experimental Psychology: Human Perception and
Performance 8, 297–314.
Tsui, I. Y. H., & Ciocca, V. (2000). “The perception of aspiration and place of
articulation of Cantonese initial stops by normal and sensorineural hearing-
impaired listeners”. The International Journal of Language and Communication
Disorders, 35, 507-525
36
37