Você está na página 1de 20

In press: in Kathrin Linke, Marc van Oostendorp and Francesc Torres-Tamarit (eds.).

Approaches to
metaphony in the languages of Italy. Berlin: de Gruyter

On integrating different methodologies in phonological research:


Acoustic, articulatory, behavioral and neurophysiological evidence in the study of a
metaphony system
Mirko Grimaldi, Sandra Miglietta, Francesco Sigona & Andrea Calabrese

1. Introduction
1.1 Metaphony and Southern Salento
The term metaphony traditionally refers to a vowel assimilation process affecting
stressed mid-vowels before high vowels (Savoia and Maiden 1997; Calabrese 1998;
Calabrese 2011; Loporcaro 2009).
Southern Salento varieties (in the province of Lecce) were traditionally considered
as being devoid of metaphony (cf. Parlangeli 1953; Rohlfs 1966; Stehl1988; Mancarella
1998). However, Grimaldi (2003; 2009) demonstrated that a vowel assimilation process
raising the stressed mid vowels [] and [] to the counterpart mid-high [e], [o] when
followed by the unstressed high vowels -i or -u is indeed present in this area1. Grimaldis
study was an acoustic analysis and statistic treatment of data collected through fieldworks
in 36 localities of southern Salento by means of a questionnaire. The phenomenon was
found in 19 localities of the extreme tip of this area2. While all the Salento varieties
spoken in this area share a five vowel phonemic system, i.e., /i, , a, , u/, a group of 19
localities modulate the stressed mid vowels [] and [], when followed by the unstressed
high vowels -i and -u, producing the allophonic variants [e], [o]. In Figure 1, we can
observe the stressed vowels produced by the 19 male speakers characterized by
metaphonic alternations3. A more detailed statistical analysis of the metaphonic data for
each variety showed microvariation in the way in which -i and -u affect the mid vowels.
All the 19 varieties share the assimilatory process for which mid front vowel is raised by
-i i.e., [][e] before -i while varying applications of the process are found in all
other conditions: i.e., in the case of [] before -u, and [] before -u and -i (cf. Grimaldi
2003: 60-72 for a detailed description of the phenomenon).4
In this work we study one of the 19 localities interested by metaphonic processes,
the Tricase variety, integrating two levels of investigation production and perception

1

See also Costagliola (2013) and Romano (2013) for two cases of metaphonic assimilation in Monterni
and Galtone varieties (central and southern Salento respectively).
2
For each variety 1 male speaker aged between 50-80 was recorded. The data were elicited by a semispontaneous approach on the base of a questionnaire of about 600 stimuli, containing representative
samples of the stressed vowels either in open or in closed syllable in all classes of words (nouns, verbs,
adjectives, etc.).
3
Figure 1 was realized by using R software (cf. McCloy 2015).
4
We will address this issue in a future work (see Calabrese and Grimaldi, in preparation).

and two methodologies acoustic-articulatory techniques and behavioralneurophysiological approaches very rarely exploited together in disentangling the
nature of the phonetic-phonological interface (cf. Grimaldi et al., 2010; Miglietta,
Grimaldi, and Calabrese 2013). The first level is explored by using an Ultrasound system
(US) that permits to reveal the fine-grained action of the anatomical parts involved in the
assimilatory process and to compare the tongue contours obtained by US imaging. The
second level is employed on the one hand by using perceptive tests and on the other hand
by electroencephalographic recordings of the auditory evoked potentials (AEPs). We aim
to clarify the acoustic-articulatory and perceptual nature of this assimilatory process from
an integrated cognitive perspective.
At the acoustic-articulatory level, our main concern is to investigate the articulatory
dynamics of the assimilatory process affecting stressed vowels in the Tricase dialect, and
specifically to understand what features are involved in it. In contrast, our main concern
at the perceptual level is to investigate the cognitive status of this process. In particular,
we test if the allophones generated by this process are encoded in memory
representations producing categorical perception. We assume that if they are, the process
under analysis cannot be a simple coarticulatory adjustment, but it must be under
cognitive control, and, therefore, it has to be considered a component of the phonological
grammar of this variety.
Before to discuss the data, some general explanations of the techniques used are
needed.

Figure 1: F1-F2 scatterplot on logarithmic scale of the five vowels produced by the 19 southern
Salento male speakers (one speaker for each locality) displaying the metaphonic adjustments of
mid vowels /e/ and /o/. The main assimilation process involves e/_i. Assimilations of e/_u, o/_u
and o/_i are also noticeable. In all cases different raising of mid vowels is showed. Ellipses on
data, confidence level 68,8%.

1.2. The US technique


An ultrasound machine (i.e., a classical echograph) emits ultra-high frequency
sound through a transducer or probe containing piezoelectric crystals. When this probe
is held against the skin of the neck, the ultrasound travels through the tongue and is
reflected back to the transducer, resulting in echo patterns from which 2-dimensional
images of the tongue surface are reproduced. These images can be viewed continuously
on the machine itself for visual feedback, or recorded to video for later analysis. Because
ultrasound is not able to image through bone or air, it can only allow visualization of the
surface of the tongue and not, for example, the palate, jaw or rear pharyngeal wall.
As ultrasounds are both non-invasive and non-obtrusive, and, therefore, do not
affect speech production, the sagittal placement of the US probe under the chin of a
speaker provided images of the mid-line of the tongue (sagittally, or along any 2dimensional axis) from the tongue blade to the tongue root at high temporal resolution
(typically ranging between 25 and 70 frames/sec) as a bright white line tracing the
boundary between the tongue surface and the air above it (cf. Figure 2). This technique
allows exporting US pictures as a continuous video stream and synchronously recording
the audio signal. However, in order to prevent that uncontrolled head motion will move
the probe to different regions of the tongue during speech thus impeding the
comparison of tongue shapes across tokens the head of the subject has been
immobilized (cf. Stone 2005; Gick, Wilson, and Derrick 2013). The US pictures obtained
may be processed with dedicated software in order to get a point-wise trace of the tongue
contour (as in Figure 2) for statistical analysis.

Tongue body

Tongue blade

(b)

(c
)

(a

Tongue root

(a) = Tongue Root


(b) = Tongue Body
(c) = Tongue Blade

Figure 2: On the left: Tongue surface contour (red dots) of [e] before -i in the word [meti]
you reaps. On the right: average tongue contour of ten repetitions of [e] before -i. Tongue
root, tongue body and tongue blade are highlighted.

1.3. The AEP techniques and the Mismatch Negativity component


The AEP technique provides not only a millisecond precise measurement of
information processing in the auditory cortex but also, depending upon the task, can
allow one to disentangle automatic detection from attentional processes. AEP studies
have generally used the so-called oddball paradigm. It consists in alternating repetitive
(standard) and infrequent (deviant) sounds (80%-20% of occurrence respectively) while
subjects are distracted from listening by a primary task (e.g., watching a silent movie), to

measure the so-called mismatch negativity (MMN) response to different classes of


sounds. The MMN is an AEP component, elicited by stimulus change at 100250ms,
mainly generated in the auditory cortex, reflecting the neural detection of a change in a
constant property of the auditory environment (Picton 2000; Ntnen et al. 2007;
Ntnen, Kujala and Winkler 2011; Winkler and Czigler 2012). In other words, the
MMN represents the neural detection of a mismatch between the deviant and the
memory trace formed by the standard in an oddball paradigm. Furthermore, the amplitude
and latency peaks of the MMN are directly correlated with the magnitude of the
perceived change and, hence, it indicates at a pre-attentive level whether the auditory
system has distinguished between two stimuli. For example, native speech contrasts elicit
larger MMNs than non-native sounds (see Amenedo and Escera 2000; Ntnen 2001).
The idea is that elicitation of MMN by infrequently presented deviants sounds indicates
that auditory features differentiating the deviant sounds from the standard sound are
detected.

Figure 3: Mismatch negativity (MMN) in healthy awake adults. One can note that MMN
amplitude for standard and deviant sounds is significant with respect to the baseline. Adapted
from Sams et al. (1985).

Thus, the basis for MMN elicitation is the contrast between the cortical
representation extracted from the auditory regularities occurring in the standard stimulus
and the cortical representation extracted from the auditory properties of the deviant
stimulus (cf. Figure 3). The key factors influencing deviance detection in the auditory
scene are: (1) cortical extraction of the standard regularities from the ongoing acousticphonetic input, and (2) cortical representation of these regularities in memory. According
to Lahiri and Reetz (2001, 2010) and Eulitz and Lahiri (2004), we assume that MMN
elicitation consists of the following steps. First, the standard stimulus, a vowel in our
study, creates a central sound representation, roughly corresponding to the vowel neural
trace stored in the auditory cortex and conveying information about the vowels
phonological representation. Second, the deviant vowel creates a percept corresponding
to the vowels phonological representation. Third, MMN is automatically elicited when
the phonological representation of the deviant vowel is compared to the phonological
representation of the standard vowel and different specifications in their phonological
representations are observed at the cortical level. Thus, MMN can be used to assess the
extraction of auditory regularities (e.g., the spectral and temporal features characterizing

both the deviant and the standard) from the acoustic-phonetic input sequence and
considered a measure of the successful extraction and representation of the auditory
regularities. In synthesis, an MMN responsei.e., when MMN amplitude is larger and
latency shorter the standard and deviant soundssignals the successful extraction and
representation of the phonological properties of the standard and deviant sounds at the
cortical level (cf. Sussmann et al. 2013 for discussion).
2. Methods
2.1. Acoustic-articulatory experiment
In a sound proof room, a 54-year-old male speaker of the Tricase variety produced
90 word stimuli containing the stressed target vowel embedded in the frame sentence Ieu
ticu___moi (I say___now), randomly presented on a PC monitor. 10 stimuli for each of
the five target vowels were produced: They were designed to elicit the full vowel
inventory in both open and closed syllables. For each vowel, we selected surrounding
consonantal contexts that did not affect the vowel production: i.e., where possible the
stressed vowel was surrounded by labial and/or coronal obstruents. The stressed [] and
[] vowels were differentiated according to the unstressed vowel context. Then, the
speaker produced 10 words containing [] before by -i, 10 with [] before by -u, and 10
with [] before by -e or -a, and so on for [].
An Aplio XV machine, by Toshiba Medical System corp., has been used to acquire
images of tongue contours at 25 Hz. The video stream was synchronously acquired
together with the audio signal, by means of an external a/v analog-to-digital acquisition
card, and then recorded in real-time on a dedicated PC (cf. Grimaldi et al., 2008; Sigona
et al. in press). The probe was rigidly locked into a fixed position on a microphone stand
appropriately adapted. To avoid the transducer pressure on the soft tissue of the jaw, an
acoustic standoff eliminating upward pressure of the tissue was used. The two sides of
the head of the subject was immobilized with an adjustable wood system built at CRIL
that allows the speaker to rest his/her forehead steadily so that the whole tongue contour
(i.e., from the tip to the shadow of the hyoid bone) could be captured and analyzed. The
elicited speech was recorded by using CSL 4500 (at a sampling rate of 22.05 kHZ), and a
Shure SM58-LCE microphone placed at a distance of 20 cm from the speaker.
The entire speaker productions were segmented and normalized in peak amplitude
by means of Praat 5.2 (Boersma and Weenink 2011). For each vowel, total duration as
well as F0, F1, F2 and F3 have been measured. The formant values have been measured
in the vowel steady tract (0,025 s) centered at the midpoint (here we focused on the F1
and F2 values only). The US a/v stream has been segmented offline by an automatic
software procedure developed at CRIL (cf. Grimaldi et al., 2008), on the basis of audio
pulses superimposed to the speech signal before and after each sentence during recording.
For each segmented sentence, looking at the acoustic waveform the operator manually
placed labels around the time intervals where the relevant vowels occurred, so that the
corresponding US pictures could be identified. These pictures were processed with
EdgeTrak in order to get a point-wise trace of the tongue contour for the statistical
analysis (Li, Kambhamettu and Stone 2005): see Figure 2.
2.1.1. Statistical analysis

For acoustic analysis, an independent t-test was carried out to examine the
assimilatory effect of the unstressed vowels -i, -u and -e,a5 on the mid vowels [] and []
(alpha level p < 0.05). Due to the protocol design, samples of the first category are
considered independent of any other sample of the second category. So, we compared:

[] before by -i with [] before by -e/a;


[] before by -u with [] before by -e/a
[] before by -u with [] before by -i.
[] before by -i with [] before by -e/a;
[] before by -u with [] before by -e/a
[] before by -u with [] before by -i.

To address the issues of the articulatory features involved in the Tricase metaphony
process, we used the Smoothing Spline ANOVA (SS ANOVA) to compare tongue curves
(Chong 2002; Davidson 2006, Lilienthal 2009). The SS ANOVA is based on an
inferential statistic method, used to study similarities and differences in the shape of two
groups of curves. It has already been applied to studies in medical science, environmental
science, and epidemiology, and may be applied to compare two groups of tongue curves
produced by the same subject: e.g., 10 tongue curves of [] before by -e/a vs. 10 tongue
curves of [] before by -i, etc. This statistical method has the advantage to determine
whether or not there are significant differences between the tongue curves belonging to
two groups as well as which sections, i.e., if root, body, or blade of the tongue curves are
different (Davidson, 2006). Given two groups of tongue curves, the corresponding
Smoothing Splines (SS) are computed: they are particular curves that provide the best fit
to all the data points belonging to each group. The SS for each data set is termed main
group effect, and around each SS the 95% Bayesian confidence interval is constructed.
The comparison of the two groups of tongue curves is performed with the interaction
diagram, which represents a plot of the difference of the SS for each data set from the SS
that is the best fit to all of the data. So, the 95% Bayesian confidence interval is
constructed once more, and the difference between the splines is significant when the
confidence interval does not encompass the zero on the y axis (cf. Figure 6(a-d)) where
the SS are on the left and the tongue curve (TC) on the right). SS ANOVA analysis has
been done by means of a Matlab tool developed at CRIL and the R package gss (Chong
2009).
2.2. Behavioral and AEP experiments
Twelve right-handed students from the University of Salento (7 females; mean
age 21.2, range: 19.1-26.0, s.d. 2) participated in both behavioral and AEP sessions. No
participants reported a history of neurological illness. All of the participants were native
speakers of the Tricase dialect, provided written informed consent and participated in
both the experiments. The discrimination task and AEP recordings were run within one
session with the discrimination task preceding the AEP recordings to prevent the stimuli
from being attentively processed before the AEP measurements. The experimental
procedure received the approval of the local ethics committee.

5

Unstressed -e and -a has been considered together as they do not trigger assimilation.

We concentrated our attention only on one assimilation process: [] [e] before i (e/_i in Figure 1 and 4 captions): i.e., the process shared by all the speakers of all the 19
localities where metaphonic adjustments were found (Grimaldi, 2003; 2009). We used
the vowels [, e, i] present in the stressed vowel system of the Tricase dialect as
experimental stimuli: [] and [i] have phonemic status, whereas [e] has an allophonic
status. We used three natural speech tokens for each stimulus type to introduce acoustic
variability and ensure that the acoustically different tokens were grouped together in a
more abstract representation of the speech sound category.
A male speaker of Tricase produced a total of 30 pseudowords (10 for each vowel type).
The vowels were inserted in the context b[V]b[V] and embedded in the carrier sentence
Ieu ticu__moi (I say__now). The speech signal was recorded in a soundproof room with
CSL 4500 and a Shure SM58-LCE microphone with a sampling rate of 44.1 kHz and an
amplitude resolution of 16 bits. The acoustic analysis was performed using Praat 5.2
(Boersma and Weenink 2011). The fundamental frequency (F0), first formant (F1) and
second formant (F2) formant values were measured in the vowel steady tract (0.25 s)
centered at the vowel midpoint. For every vowel category, we selected three acoustic
varying tokens with comparable pitch ([] = 174 ( 3); [e] = 174 ( 7); [i] = 182 ( 7).
The F1/F2 average formant values in Hz of the three exemplars were the following: []:
F1= 519 ( 11), F2 = 1906 ( 23); [e]: F1 = 389 ( 7), F2 = 1967 ( 20); [i]: F1 = 327 (
6); F2 = 2108 ( 51). The mean Euclidean distances in mel in the F1-F2 plane (at the
vowel mid-point) between all combinations of the three vowel types were [-e] 130mel,
[e-i] 88mel and [-i] 212mel. Lastly, portions containing only the steady-state vowel
signal were eliminated from the selected words. All nine stimulus audio files were
ramped with 10ms Gaussian on- and offsets and normalized for duration (200ms) and
peak amplitude (70 dB/SPL).
2.2.1. Behavioral test
In an AX (same-different) discrimination task, we assessed the attentive
discrimination of the allophonic variation [-e]. Each of the three variants of the two
vowel categories composing the allophonic variation were combined with one another
and the three tokens of the other vowel category that composed the pair. Thus, three pair
types were tested in all: [-], [-e] and [e-e]. The inter-stimulus interval was 800 ms, and
the trials initial silence was 500ms. Each of the 54 stimulus pairs occurred twice. The
complete set of 108 stimuli pairs was presented in random order. The listeners indicated
whether the sounds of a pair were identical or different. The experimental arrangement
provided by Praat 5.2 was applied in these tests. The subjects were tested in a sound
proof room with a laptop and headphones.
2.2.2. AEP recordings
In an oddball paradigm, the MMN responses to the phonemic contrast [e-i] and
allophonic pair [-e] were recorded. We compared responses to acoustic distinctions
generated by the assimilation process with responses to acoustic distinctions associated
with phonemic contrasts. By selecting the relevant distinctions, we carefully considered
the acoustic distances between the possible stimuli pairs. We used the variant [e] of the
mid vowel for the allophonic and phonemic conditions to reduce the difference in
acoustic distance between the two conditions. More specifically, we decided to pair the

phoneme [i] with the variant [e] to generate a mean acoustic distance (88mel) more
similar to the allophonic [-e] (130mel) than alternative pair [i-] (212mel). This set-up
permitted us to significantly match the acoustical deviance between stimuli because this
parameter is well known to influence MMN amplitude and latency (Ntnen et al. 1997;
Ntnen et al. 2007). In other word, maintaining constant as far as possible the acoustic
distance between the contrasts tested, we increase the chances to measure abstract
process and not purely physical differences related to stimuli.
Together, stimulus sequences of 1000 trials were randomly presented in each
block. In each sequence, one vowel type served as the standard (85% of the trials) and the
remaining vowel of the sound pair was the deviant. For both vowel pairs, the roles of
standard and deviant were reversed in separate blocks. Therefore, a total of four blocks
were recorded. Stimulus sequences were presented with a variable inter-stimulus interval
of 500ms to 550ms. During the recording, the subject sat in an acoustically shielded room
watching a silent movie. The subject was instructed to disregard the sounds presented via
loudspeakers. The stimuli were presented using ePrime 2.0, and the order of the blocks
was counterbalanced between participants.
AEPs were recorded (0.1-100 Hz, -2 dB points, sampling rate 250 Hz) with a 64channel ActiCap system (Brain Products). Vertical eye movements were monitored using
electrodes attached above and below the right eye and horizontal movements with
electrodes attached to the outer canthi of each eye. The online reference electrode was
FCz. Impedance was maintained under 5 . Off-line signal processing was performed
with the Brain Vision Analyzer (Brain Products) software package. The EEG was filtered
with a bandpass of 1-25 Hz (12 dB/oct), and the raw data were re-referenced against the
average of the left and right mastoids. Epochs began 100ms before until 600ms after
stimulus onset. Standard and deviant epochs were averaged and included a pre-stimulus
baseline of 100ms. The ERP responses to the initial three standard stimuli of each block
and standard stimuli that immediately followed the deviants were not included in the
analyses. The averaged data were baseline corrected over a pre-stimulus interval of
100ms. Epochs with an amplitude change exceeding 75 V at any of the electrodes were
rejected. All remaining standard and deviant epochs were included in the identity MMN
analysis.
For the amplitude and peak latencies analysis of the MMN component, we
selected a time window based on a visual inspection of the grand average data across all
of the subjects. For assessing the MMN component, we adopted the identity MMN
(iMMN) approach. The MMN is reflected in a difference waveform calculated by
subtracting the AEP response to standard stimuli from the deviant present in the identical
block. In contrast, the iMMN is calculated using the recordings of two corresponding
blocks. For instance, the standard [e] (of the block [e] standard and [i] deviant) was
subtracted from the [e] deviant of the reverse block ([i] standard and [e] deviant). The
iMMN approach eliminates variation in ERP morphology that may result from purely
acoustic differences and therefore permits the observation of memory representation
contributions (e.g., Pulvermller and Shtyrov 2006). The MMN latency corresponded to
the time at which the highest negative amplitude peak in the MMN time window
occurred (120-200ms). This time window was selected based on the grand average across
all subjects and was motivated by the expectation to observe the MMN response 100200ms after the onset of the deviant sound. The MMN amplitude was obtained by

measuring the mean amplitude (V) contained within a 50ms time window centered at
the MMN latency peak. The analysis was based on the electrodes Fz, Cz and FCz.
2.2.3. Statistical analysis
Discrimination performance was measured in percentage of same and different
responses and in terms of d (d-prime). This type of analysis was chosen because a
percentage analysis of the correct responses on the different pairs alone is not a very
meaningful measure of discrimination. The d-prime analysis is based on the Detection
Theory proposed by Macmillan and Creelman (2005) which interprets the listeners
responses in terms of the listeners tendency to respond same or different. According
to this analysis, four types of response rates are calculated: (1) hit rate; (2) miss rate; (3)
false alarm rate; (4) correct rejection rate. Finally, based on the hit and false-alarm
probabilities a d value is calculated. Briefly, the best subject maximizes hits and
minimizes false alarms, and thus the larger the difference between hits and false alarms,
the better the subjects performance. The statistic d is a measure of this difference.
To analyze the AEPs data, separate one-way ANOVAs were performed with mean
amplitude (V) as the dependent measure and probability (standard vs. deviant) as the
independent measure to assess that reliable MMN components were elicited. Separate
two-way ANOVAs were performed with latency (ms) and mean amplitude (V) as the
dependent measures to analyze the MMN component. The independent variables were
vowel pair status (allophonic vs. phonemic) and electrode (Fz, Cz and FCz).
Additionally, the interaction vowel pair status x electrode was included in the analyses.
Lastly, two separate one-way ANOVAs with an electrode grid of six electrodes (C3, Cz,
C4, F3, Fz and F4) were performed to analyze whether hemispheric asymmetries could
be observed. The dependent measure was mean amplitude and independent variable
laterality (3-line (C3, F3), z-line (Fz, Cz), and 4-line (C4, F4)).
3. Results
3.1. Acoustic-articulatory results
The mean formant values in Hz of the Tricase stressed vowels and the allophonic
variants are given in Table 1. In Figure 4 all the vowels tokens are plotted in a twodimensional F1-F2 space by recurring to logarithmic scale and ellipses on data
(confidence level 68,8%). Figure 4 was realized by using R software (McCloy 2015). In
both cases, data concerning the unstressed high vowels -i and -u triggering metaphonic
adjustments are described.
F1-F2
Mean
DS
Min
Max

-i

/i/

e/_i

e/_u

e/_e,a

/a/

o/_e,a

o/_u

o/_i

/u/

-u

327
2030
18
57
273
1926
370
2147

255
2151
18
57
232
2065
286
2253

406
1930
21
76
393
1821
470
2052

472
1717
22
42
440
1640
515
1774

506
1708
18
51
478
1632
538
1786

665
1349
37
73
616
1266
713
1508

513
890
29
53
479
790
577
964

465
862
25
27
433
8240
491
918

500
907
33
62
454
833
569
1021

322
753
40
79
249
669
378
960

347
705
34
96
314
600
436
866

Table 1: Mean formant values F1-F2 in Hz of the Tricase vowels. DS = Standard Deviation; Min
= Minimum; Max = Maximum.


Figure 4: F1-F2 scatterplot on logarithmic scale of the Tricase stressed vowels. Significant
metaphonic adjustments of mid vowels /e/, /o/ are shown: i.e., e/_i, e/_u and o/_u. Ellipses on
data, confidence level 68,8%.

The results of the independent t-test are given in Table 2, where the significant
differences of the action of the unstressed high vowels on the mid-stressed vowels are
indicated by asterisks. For what concerns F1 values, e/_i and e/_u are different from each
other and from e/_e, a. In contrast, while o/_i and o/_u are also different from each other,
only o/_u is different from o/_e,a. This reveals the presence of metaphonic adjustments in
which [] is raised before -i and -u but [] is raised only before -u, confirming previous
data obtained (Grimaldi 2003; Grimaldi 2009). Also F2 values showed a significant
effect for [] only: i.e., e/_i is different both from e/_e,a and e/_u. These significant
effects are better caught in the graphical representation of data (Figure 4) where we can
observe that [] is shifted upwards in frequency producing two allophonic variants
depending on the unstressed high vowel triggering metaphony (i.e., -i or -u). Conversely,
[] is shifted upwards in frequency producing only one allophonic variants due to the
action of -u. Finally, the significant effect for F2 is observable in the advancement of e/_i
with respect to e/_e,a and e/_u.

Vowel Category

F1

F2

e/_i ~ e/_e,a

t[18]=-10,825 p<0,05*

t[18]=7,184

p<0,05*

e/_u ~ e/_e,a

t[17]=-3,563

p<0,05*

t[17]=0,401

p=0,694

e/_u ~ e/_i

t[19]=6,671

p<0,05*

t[19]=-7,405 p<0,05*

o/_i ~ o/_e,a

t[18]=-0,948

p=0,356

t[18]=0,669

o/_u ~ o/_e,a

t[18]=-3,877

p<0,05*

o/_u ~ o/_i

t[18]=-2,598

p<0,05*

t[18]=-1,446 p=0,165
t[18]=-2,093 p=0,058

p=0,512

Table 2: Independent t-test (=0.05) for F1 and F2 values and for each vowel category
considered. Significant effects are highlighted by asterisks.

Figure 5: F1 Hz statistical distribution of e/_i, e/_u, e/_e,a in the acoustic space with respect to
unstressed -i, -u on the left and of o/_i, o/_u, o/_e,a with respect to unstressed -u on the right. On
the left it is shown that e/_i and e/_u does not overlap with -i, -u, while on the right that o/_u does
not overlap with -u.

Note that the metaphonic raising of [e] before -i, -u and [o] before -u does not
generate complete assimilation of mid vowels into the unstressed high vowels (cf. Figure
4). This process is demonstrated in Figure 5, where statistical distribution of raised and
not raised mid vowels in the acoustic space is shown compared with unstressed vowels
triggering metaphonic raising. The F1 parameters distinguishing [e] before -i, -u and [o]
before -u do not overlap with the F1 parameters of the unstressed vowel -i and -u that
trigger the metaphonic adjustments. That is, both for [e] and [o] there is not complete
assimilation with the unstressed vowels -i, -u. So, the mid vowels are raised but not
become high vowels. Also, Figure 5 shows clearly that when the unstressed vowels do
not trigger metaphonic adjustments, as in the case of [] before -i and [] before -e,a,
there is an overlapping of F1 parameters. Interestingly, statistical distribution of F1
parameters in Figure 5 indicates that the distribution of e/_i and e/_u F1 values tends to
mirror the distribution of unstressed -i and -u F1 values respectively. The same happens
for o/_u that tends to mirror the distribution of unstressed -u F1 values. This pattern may
be clarified by our articulatory data.
Regarding the articulatory data, the SS analysis and TCs of the stressed mid
vowels together with the unstressed high vowels -i, -u are shown in the Figures 6(a-d):

Tongue blade is up on the right, tongue body is up at the center and tongue root is below
on the left of TCs. Observe that the different curves in these figures involve statistical
tendencies extracted from actual tongue contours. Thus, what may appear to be
millimetric or even submillimetric changes in curve shapes are in fact the result of
significantly different anatomical tongue articulations that generate clearly different
acoustic effects. In particular, Figure 6(a) shows that the tongue root and the tongue body
of [e] before -i are advanced and raised, respectively, with respect to those of [] before e,a. Figure 6(b) indicates that the tongue root and tongue body of [e] before -u are only
slightly advanced and raised, respectively, with respect to those of [] before -e, a.
However, this slight difference generated significant differences, as shown on the one
hand by SS analysis on the left, and on the other hand by the significant effect noted at
the acoustic level for e/_u (cf. Figure 4 and Table 2). The difference in tongue root
involvement in the case of [e] before -i with respect to that of [e] before -u is better
noticeable in Figure 6(c). Here one can observe that in the case of [e] before -i (on the
left), there is both tongue root advancement and tongue body raising, whereas in the case
of [e] before -u (on the right), there is mainly tongue body raising, but little tongue root
action. Noticeably, the tongue body of [e] before -i is not as raised as that of unstressed i: This suggests that tongue body raising results, in this case, as an inertial consequence
of tongue root advancement rather than being the critical gesture involved in the
metaphonic assimilation. The involvement of tongue root for [e] before -i is better
discernible if we look at mid back vowel in Figure 6(d) where on the right it is shown that
the metaphonic adjustment of [o] before -u involves only tongue body raising with no
tongue root advancement, when compared with what happens in the case of [o] before -e,
a.


(a)


(b)

(c)

(d)
Figure 6(a-d): (a) SS on the left and TC on the right of [] before -e,a vs. [e] before -i; (b) SS on
the left and TC on the right of [] before -e,a vs. [e] before -u; (c) TC of [] before -e/a and [e]
before -i compared with unstressed -i (i/e__ in the caption) on the left; TC of [e] before -i and [e]
before -u compared with unstressed -u (u/e__ in the caption) on the right; (d) SS of [] before -e/a
and [o] before -u on the left and TC of [] before -e/a and [o] before -u compared with unstressed
-u on the right.

3.2. Behavioral and AEP results


The percentage analysis of the participants same and different responses
indicated that the vowels that composed the allophonic pair [e] were judged as
different at a high rate (94%), whereas both the vowel pairs composed of the identical
vowel type showed a high percentage of same responses: i.e., [] = 90%; [ee] = 80%
(see Figure 7). Due to the experimental design, especially given the high number of trials
(n=180) for each subject, a set of z-tests has been used to assess the statistical
significance of the same and different responses. These tests confirmed that the
percentages of the same-different responses were significantly different within each
vowel pair, as well as between the three tested vowel pairs (p=.000).
For what concerns the d-prime analysis, used to investigate the listeners tendency
to respond same or different (cf. Macmillan and Creelman 2005), the mean d score
was 2.55, which indicated an accurate discrimination between the allophones.

[e-e]
different

[-]

same

[-e]
0%

20%

40%

60%

80%

100%

Figure 7: Percentages of same/different responses of the AX discrimination task.

Concerning AEP data, significant MMNs were elicited in the allophonic (F (1,66)
= 14.592, p < 0.001) and phonemic conditions (F (1,66) = 6.047, p < 0.05). The
amplitude ANOVA revealed no significant main effect (vowel pair status: F (1,66) =
1.052, p = 0.31; electrode: F (2,66) = 0.191, p = 0.83) or interaction (vowel pair status x
electrode: F (2,66) = 0.204, p = 0.82). In contrast, the ANOVA on the latency of the
MMN revealed a significant main effect for vowel pair status (F (1,66) = 6.017, p <
0.05), which indicated that the latency of the phonemic condition was significantly
earlier. No main effect was observed for the factor electrode (F (2,66) = 0.283, p = 0.76)
or interaction vowel pair status x electrode (F (2,66) = 0.193, p = 0.83). The laterality
ANOVAs revealed no significant amplitude difference between the electrodes positioned
at the right and left hemisphere for either the allophonic (F (2,69) = 0.236, p = 0.79) or
phonemic condition (F (2,69) = 0.518, p = 0.60). In Table 3 the grand-average MMN
mean peak amplitudes and latencies at Fz, Cz, FCz electrodes are shown (values are
averaged across the midline electrodes), whereas in Figures 8 the averaged ERP
responses are represented at the Fz electrode.
Amplitude

Latency

Allophonic condition

-1.898 ( 0.679)

182 ( 37)

Phonemic condition

-2.004 ( 0.895)

154 ( 41)

Table 3: The grand-average MMN mean (Fz, Cz, FCz) peak amplitudes and latencies.

V
-3

-3

-2

-2

-1

-1

-100

100

200

300

400

500

ms

-100

/e/ as deviant
/e/ as standard

100

200

300

400

500

ms

Allophonic condition [-e]

Phonemic condition [e-i]

(b)

(a)

Allophonic condition
Phonemic condition

-3

-2

-1

-100

100

200

300

400

500

ms

Figure 8: Top panel: the group-averaged ERP responses to standards (grey lines) and deviants
(black lines) at Fz for the allophonic (left) and phonemic conditions (right) in the 120-200ms
window. Middle Panel: Topographic maps showing the iMMN peak latency activation viewed
from above: (a) allophonic conditions iMMN [] (left) and iMMN [e] (right); (b) phonemic
conditions iMMN [i] (left) and iMMN [e] (right). Bottom panel: the deviant-minus-standard
difference wave for the allophonic (grey line) and phonemic conditions (black line) at Fz in the
120-200ms window.

4. Discussion
The aim of this paper was twofold. Firstly, to investigate articulatory dynamics of
metaphonic process of the Tricase variety, generating allophonic variants. Secondly, to
ascertain how allophonic variants are perceptually computed as regard phonemic
contrasts.
The acoustic findings confirmed the presence of metaphonic adjustments in the
Tricase variety showing that: (i) F1 values of [e] before -i and [e] before -u are lowered
and significantly different from those of [] before -e,a; (ii) F1 values of [o] before -u are
lowered and significantly different from those of [] before -i and [] before -e,a; (iii) F2
values of [e] before -i are enhanced and significantly different from those of [] before -

e,a and [e] before -u (cf. Figure 4; Table 1 and 2). These facts are better clarified by
articulatory data. Comparing SS and TC in Figure 6(a-d) we observed that in the
metaphonic adjustments of [e] before -i, [e] before -u and [o] before -u, there is always
tongue body raising, but that only in the assimilation process of [e] before -i,
advancement of tongue root is a distinct tongue gesture involved in creating an
allophonic variant. Hence, as expected, F1 values are always lowered by tongue body
raising. When tongue root advancement also occurs, as in the case of [e] before -i, F2
values are also increased generating an advancement in the acoustic space of the vowel
(cf. Lindau, 1978; Tiede, 1996 Stevens, 1998; Archangeli and Pulleyblanck, 1994; Gick
et al., 2006).
Overall, our findings suggest that in the Tricase variety stressed vowels F1/F2
cluster into three allophonic variants:
i.
ii.
iii.

[e] before -i;


[e] before -u
[o] before -u

These allophonic variants are differentiated by the specific feature specifications


that are spread by the unstressed high vowels: [+ATR] for [e] before -i and [+high] for
[e] before -u and [o] before -u (see Grimaldi et al. 2010 and especially Calabrese and
Grimaldi 2013 for a phonological interpretation of these facts). This generates [+ATR]
[e ] when the unstressed high vowel is -i, and [+high, -ATR] [i ] and [+high, -ATR] [u ]
when the unstressed high vowel is -u, where the [+high, -ATR] vowels are acoustically
lower than their [+ATR, -high] counterparts, as in some African varieties (cf. Ladefoged
and Maddieson, 1996: 305). The resulting superficial stressed vowel system of the
Tricase variety is represented in Figure 9:

[+high, +ATR]

[-high, +ATR]
[+high, -ATR]
[-high, -ATR]

e
i
e

o
a

Figure 9: The superficial stressed vowel system of the Tricase variety on the base of metaphonic
adjustments active. According with current IPA usage, the diacritics and are used to indicate
Advanced and non Advanced Tongue Root, respectively.

The AX discrimination test showed that at the attentive level the allophonic pair [e] were judged as different at a high rate (94%) when compared with the vowel pairs [ee] and [-] composed of identical vowel type that were judged as same. The
electrophysiological study, involving the pre-attentive level and a millisecond precise
measurement of information processing in the brain, provided a more detailed picture of
the computations occurring in the perception of these pairs. The MMN amplitude
analysis revealed no significant difference between the allophonic (i.e., [-e]) and
phonemic (i.e., [e-i]) conditions, suggesting that the contrastive and non-contrastive
vowel pairs are equally computed in early speech processing and encoded in memory
representations. If the contrary was true, the amplitude of the allophonic contrast should
have been significantly reduced with respect to the phonemic contrast. At the same time,

we observed a shorter latency for the phonemic contrast. Generally, the MMN peak
latencies are attributed to the acoustic distances between the stimuli: i.e., the MMN
latency steadily decreased with increasing acoustic deviation (Ntnen et al. 1997).
However, as above noted (in 2.2 and 2.2.2), the Euclidean distance of our phonemic
contrast [ie] was 88mel, whereas the Euclidean distance of the allophonic contrast [-e]
was 130mel. Thus, if the difference in the peak latencies elicited by our stimuli was
purely due to acoustic reasons, we would have expected that the allophonic contrast
should have elicited a shorter latency than the phonemic one. This fact indicates that
phonological knowledge was actually accessed and listeners MMN responses to the
phonemic/allophonic distinctions were not simply an effect of the acoustic distance
between the vowel pairs. We suggest that the difference in latencies of the phonemic and
allophonic distinctions is due to a difference in perceptual parsing: whereas only
contrastive features need to be identified in the phonemic distinctions, both contrastive
and non contrastive features need to be identified in allophonic distinctions. Our
hypothesis is that the restriction of the search only to contrastive sound properties may
result in faster, less effortful cognitive processing. On the other hand, when both
contrastive and non-contrastive sound properties must be accessed and parsed, the
processing requires additional computational operations and related supplementary neural
activations, and therefore is slower. This analysis can be maintained only if mental
representations contain not only the contrastive features of vocalic phonemes but also the
non-contrastive features of the allophones that are the output of the metaphonic process
(cf. Miglietta, Grimaldi, and Calabrese 2013).
In synthesis, our MMN results demonstrated that the allophonic variants resulting
from the assimilatory process which predictably shifts the target realization of the
phoneme /e/ when followed by a high front vowel are encoded in mental phonological
representation. It follows that also this process must have a cognitive reality. One can
plausibly assume that the same occurs with the other allophonic variants of metaphony
discussed above. We may conclude that metaphony in the Tricase variety is under
cognitive control, and therefore it is part of the phonological grammar of the Tricase
variety6.
From a general perspective, our results suggest that listeners recognize and detect
sound alternations conditioned by the linguistic system environment. In the case of the
Tricase variety, learners converge on five phoneme categories and on rules that
predictably shifts the target realization of the mid vowels when followed by high vowels.
This phonological knowledge provides the language users with the necessary information
to produce the appropriate vowel tokens in the assimilatory context.
5. Conclusions
In this work we tried to integrate different levels of analysis and different methods
in order to provide advancements in the comprehension of the phonetic/phonology
interface within a cognitive perspective. Thanks to acoustic-articulatory analysis, using
US images of the tongue, we showed that the metaphonic process of the Tricase variety is
driven by the spreading of two distinct features from the unstressed high vowels to the
stressed mid vowels: (i) [+ATR] when the triggers is -i; (ii) [+High] when the trigger is
6

See Calabrese (2012) and Miglietta, Grimaldi, and Calabrese (2013) for discussion of further theoretical
consequences of our perceptual study of Tricase metaphony.

u. So, further evidence is given in favor of the use of US in analyzing the articulatory
grounding of phonological phenomena.
The perceptual level of analysis has been explored using behavioral and
electrophysiological methods. Focusing on the allophonic pair [-e] produced by the
metaphonic process and the phonemic contrast [e-i], we showed that MMN amplitudes
and latencies are compatible with the idea that allophones and phonemes are equally
computed in early speech processing and encoded in memory representations. This
finding suggests that predictable vowel allophonic alternations pattern with phonemic
contrasts for auditory perception.
Overall, our data support a model of phonology in which the acquisition of
phonemic categories occurs with the learning of phonetic distribution patterns and their
relationships within a grammar.
Acknowledgments
The authors would like to thank two anonymous reviewers for their useful suggestions.
Bibliography
Amenedo Elena & Escera Carles (2000). The accuracy of sound duration representation in the
human brain determines the accuracy of behavioral perception. European Journal of
Neuroscience. 12. 25702574.
Archangeli Diana & Pulleybalnck Douglas 1994. Grounded Phonology. Cambridge MA: MIT
Press.
Boersma Paul & Weenink Danile. 2011. Praat: doing phonetics by computer (Computer
program), Version 5.2. http://www.praat.org/.
Calabrese, Andrea. 1998. Metaphony revisited. Rivista di Linguistica. 10. 7-68.
Calabrese, Andrea. 2011. Metaphony in Romance. In Marc van Oostendorp, Colin Ewen,
Elizabeth Hume & Karen Rice (eds.), The Blackwell Companion to Phonology, 2631-2661.
Wiley-Blackwell.
Calabrese, Andrea. 2012. Auditory representations and phonological illusions: A linguists
perspective on the neuropsychological bases of speech perception. Journal of
Neurolinguistics. 25. 355381.
Calabrese Andrea & Grimaldi Mirko. 2013. Linterfaccia fonetica-fonologia nella metafonia del
Salento meridionale, in Antonio Romano & Mario Spedicato (Eds.), Sub voce
Sallentinitatis. Studi in onore di p. Giovan Battista Mancarella, 277-288, Lecce: Del Grifo
Editore.
Calabrese & Grimaldi. In preparation. Microvariation in a metaphonic system: the case of Tricase
dialect. Acoustic and articulatory evidence.
Chong, Gu. 2002. Smoothing Spline ANOVA Models. Springer: New York.
Costagliola, Angelica. 2013. Dialectologie et phontique exprimentale: Analyse acoustique et
articulatoire de certaines varits du Salentine centrales (Pouilles, Italie du Sud).
Unpublished PhD Thesis. Laboratoire de Phontique et Phonologie, CNRS/Sorbonne
Nouvelle - Paris III de Paris.
Davidson Lisa. 2006. Comparing tongue shapes from ultrasound imaging using smoothing spline
analysis of variance. Journal of Acoustical Society of America. 120(1). 407-415.
Gick, Bryan, Douglas Pulleyblank, Fiona Campbell & Ngessimo Mutaka. 2006. Low vowels and
transparency in Kinande vowel harmony. Phonology. 23. 1-20.
Gick, Bryan, Donald Derrick and Ian Wilson. 2013. Articulatory Phonetics. Hoboken, NJ: WileyBlackwell.
Grimaldi, Mirko, Calabrese Andrea, Sigona Francesco, Garrapa Luigina & Sisinni Bianca. 2010.

Articulatory Grounding of Southern Salentino Harmony Processes. Paper presented at the


11th Annual Conference of the International Speech Communication Association (ISCA),
Interspeech, Spoken Language Processing for All, Japan, Makurai, 26-30 September 2010.
1561-1564.
Grimaldi, Mirko. 2003. Nuove ricerche sul vocalismo tonico del Salento meridionale. Analisi
acustica e trattamento fonologico dei dati. Alessandria: Edizioni dellOrso.
Grimaldi, Mirko. 2009. Acoustic correlates of phonological microvariations. The case of
unsuspected highly diversified metaphonetic processes in a small area of Southern Salento
(Apulia). In Danil Tock & W. Leo Wetzels (eds.), Romance Languages and Linguistic
Theory 2006. Selected papers from Going Romance, Amsterdam 7-9 December 2006, 89109. Amsterdam / Philadelphia: John Benjamins.
Grimaldi, Mirko, Gili Fivela Barbara, Sigona Francesco, Tavella Michele, Fitzpatrick Paul, et al.
2008. New technologies for simultaneous acquisition of speech articulatory data: 3D
articulograph, ultrasound and electroglottograph. Paper presented at Proceedings of
LangTech, 2008, Rome, Italy. 1-5.
Ladefoged, Peter & Maddieson Ian. 1996. The Sounds of the Worlds Languages. Oxford:
Blackwell.
Lahiri, Aditi & Reetz Henning. 2001. Underspecified recognition In Carlo Gussenhoven &
Natasha Werner, Labphon VII, 637-675, Berlin: Mouton.
Loporcaro, Michele. 2009. Profilo linguistico dei dialetti italiani. Roma-Bari: Laternza.
Eulitz, Carsten & Lahiri Aditi. 2004). Neurobiological evidence for abstract phonological
representations in the mental lexicon during speech recognition, Journal of Cognitive
NeuroScience. 16(4). 577-583.
Lahiri, Aditi & Reetz Henning, 2010. Distinctive Features: Phonological Underspecification in
Processing. Journal of Phonetics. 38. 44-59.
Li, Min, Kambhamettu Chandra & Stone Maureen. 2005. Automatic contour tracking in
ultrasound images. Clinical Linguistics and Phonetics. 19. 545-554.
Lilienthal, Janine. 2009. The articulatory and acoustic impact of Scottish English /r/ on the
preceding vowel-onset. In 10th Annual Conference of the International Speech
Communication Association (ISCA), Interspeech, Brighton, September 6-10 2009. 28192822.
Lindau, Mona. 1978. Vowel features. Language. 54. 541-563.
Mancarella, Giovan Battista. 1998. Salento. Monografia regionale della Carta dei Dialetti
Italiani. Lecce: Edizioni del Grifo.
Macmillan, Neial A. & Creelman C. Douglass. 2005. Detection theory: A users guide. Mahaw,
NJ: Erlbaum.
McCloy, Daniel. 2015. phonR: tools for phoneticians and phonologists. R package version 1.0-1.
Miglietta, Sandra, Grimaldi Mirko & Calabrese Andrea. 2013. Conditioned allophony in speech
perception: An ERP study. Brain & Language. 126(3). 285-90
Ntnen, Risto, Lehtokoski Anne, Lennes Mietta, Cheour Marie, Huotilainen Minna, Livonen et
al. 1997. Language-specific phoneme representations revealed by electric and magnetic
brain responses. Nature. 385. 432434.
Ntnen, Risto. 2001. The perception of speech sounds by the human brain as reflected by the
Mismatch Negativity (MMN) and its magnetic equivalent (MMNm). Psychophysiology,
38. 121.
Ntnen, Risto, Paavilainen Petri, Rinne Teemu & Alho Kimmo. (2007). The mismatch
negativity (MMN) in basic research of central auditory processing: A review. Clinical
Neurophysiology. 118. 25442590.
Ntnen, Risto, Kujala Teja & Winkler Istvan N. 2011. Auditory processing that leads to
conscious perception: A unique window to central auditory processing opened by the
mismatch negativity and related responses. Psychophysiology. 48. 422.

Parlangeli, Oronzo. 1953. Sui dialetti Romanzi e Romaici del Salento. Memorie dellIstituto
Lombardo di Scienze e Lettere, classe di Lettere, Scienze Morali e Storiche, 15-16(III), 93198.
Peperkamp, Sharon, Pettinato Michle & Dupoux, Emmanuel. (2003). Allophonic variation and
the acquisition of phoneme categories. In Barbara Beachley, Amanda Brown, and Frances
Conli (eds.), Proceedings of the 27th annual boston university conference on language
development, 650661, Somerville, MA: Cascadilla Press.
Picton, Terrence W, Alain Claude, Otten Leun, Ritter Walter, & Achim Andr. 2000. Mismatch
negativity: different water in the same river. Audiology & Neuro-Otology. 5. 111139.
Pulvermller, Friedemann & Shtyrov Yury 2006. Language outside the focus of attention: The
mismatch negativity as a tool for studying higher cognitive processes. Progress in
Nueurobiology. 79. 4971.
Rohlfs, Gerhald. 1966-1969. Grammatica storica della lingua italiana e dei suoi dialetti, 3 voll.,
Torino: Einaudi. Or. ed. 1949-1954, Historische Grammatik der Italienischen Sprache und
ihrer Mundarten, Bern: A. Francke AG.
Romano, Antonio. 2013. Il vocalismo del dialetto salentino di Galtone: differenze di apertura
metafonetiche, tracce isolate di romanzo comune o interferenze diasistematiche? In
Antonio Romano & Mario Spedicato (eds.), Sub voce Sallentinitas. Studi in onore di p.
Govan Battista Mancarella, 247-276. Lecce: Edizioni del Grifo.
Sams, M., Paavilainen, P., Alho, K., and Ntnen, R. (1985). Auditory frequency discrimination
and event-related potentials. Clinical Neurophysiology. 62, 437448.
Savoia, Leonardo & Martin Maiden, 1997. Metaphony. In Martin Maiden & Mair Parry (eds.),
The dialects of Italy, 15-25. London: Routledge.
Sigona, Francesco, Stella Antonio, Grimaldi Mirko, Gili Fivela Barbara. In press. MAYDAY: A
software for multimodal articulatory data analysis. In Antonio Romano (ed.), Aspetti
prosodici del raccontare: dalla letteratura orale al parlato dei media. X Convegno
Nazionale dellAssociazione Italiana di Scienze della Voce, Universit di Torino 22-24
gennaio 2014. Roma: Bulzoni
Stehl, Thomas. 1988, Apulien und Salento. In Gerhald Holtus, Michael Metzeltin & Carl
Schmitt (eds.), Lexikon der Romanistischen Linguistik, Vol. IV, 695-716. Tbingen:
Niemeyer.
Stevens, Kenneth N. 1998. Acoustic Phonetics. Cambridge MA: The MIT Press.
Stone, Maureen. (2005). A guide to analyzing tongue motion from ultrasound images. Clinical
Linguistics and Phonetics. 19. 455502.
Sussman, Elise S., Chen Suan, Sussman-Fort, Jonathan & Dinces Elisabeth. 2013. The Five
Myths of MMN: Redefining How to Use MMN in Basic and Clinical Research. Brain
Topography. 27(4). 553-564.
Tiede, Mark E. 1996. An MRI-based study of pharingeal volume contrasts in Akan and English.
Journal of Phonetics. 24. 399-421.
Winkler, Istavan F. & Czigler Istvan. 2012. Evidence from auditory and visual event-related
potential (ERP) studies of deviance detection (MMN and vMMN) linking predictive
coding theories and perceptual object representations. International Journal of
Psychophysiology. 83(2). 132-143.

Você também pode gostar