Escolar Documentos
Profissional Documentos
Cultura Documentos
Partiels by Gerard Grisey and Gondwana by Tristn Murail are two landmark pieces
of what can be identified as the spectral movement.1 Both pieces share that they are the
and/or digital means: additive and Frequency Modulation (FM) syntheses respectively.2 Both
pieces also share that they use such synthesis processes as a model more than they are an
accurate and literal translation of these techniques into the acoustic instrumental realm. In
Partiels, Grisey used the analysis of a trombone pitch to understand the harmonic material that
constituted the trombone timbre at that given pitch, and used the harmonic series and the relative
intensity, duration, and appearance time of each partial as models for the piece. In Gondwana,
Murail used the data obtained from the calculation of the pitches produced by the side bands that
are the resultant of a FM equation. In this case, there is a formula that determines which and how
many side bands will be produced when a carrier frequency of a determined value is modulated
by a modulator frequency of a determined value. Such realizations are not literal mainly because
the synthesis techniques mentioned, when using electronic means, use sine waves, which are the
simplest type periodic wave. In contrast, Grisey and Murail had to deal with complex sounds of
acoustic instruments, meaning that the resultant timbres include series of additional harmonic
partials their relation to its electronic counterparts.3 In this project I sought to apply a similar
1 Joshua Fineberg, Spectral Music, Contemporary Music Review 19, no.2 (2000): 2.
approach for the realization of speech synthesis, which is almost exclusively done with
Speech synthesis by analog acoustic means has been explored in the past having a strong
impact the development of the study of phonetics. In his thesis on speech synthesis, Sami
Lemmetty described a couple of early examples of mechanic speech synthesis. These examples
vowels /a/ /e/ /i/ /o/ /u/; Wolfgang von Kempelens models of his Acoustic Mechanical Speech
Machine, which consisted of a couple of reeds, a box with a variety of resonators, and a flexible
end that could be manipulated to filter the sound in the way the mouth filters the sounds
produced by the vocal strings (the machine could produce some vowels and some nasal and
fricative consonants); Charles Wheatstone and Alexander Graham Bells constructions of von
Kempelens machine; and the exploration by Robert Willis on the production of vowels by
means of organ pipe-like tubes.4 These examples, however, are mainly based on the production
ways traditional-ensemble instruments cannot. In any case, the production of movable filters,
algorithms. Two common approaches to speech synthesis are the unit selection synthesis
approach and the statistical parametric synthesis approach. Both approaches imply access to a
large data base of recorded speech sounds. The main difference between both approaches is that
unit selection mixes and reproduce small bits of sound straight from the data base, while
statistical parametric synthesis uses algorithms to analyze the samples, calculate possible
4Sami Lemmetty, Review of Speech Synthesis Technology (Master's Thesis). (Finland: Helsinki
University of Technology, 1999), 46.
Leal 3
outcomes, and synthesize the result of such operations.5 Understanding the details of any of these
techniques would require a high degree of specialization, which escapes the scope of this project.
In any case, the use of analytical techniques to interpret the components of determined speech
sounds (phonemes), and the later synthesis of an approximated sound using acoustic instruments
A less data intensive approach for synthesizing certain phonemes by electronic means is
subtractive synthesis. Knowledge of the formants of specific vowels and consonants can be used
to control specific band-pass filters applied to a complex sound (such as white noise). This
approach is similar to that described in the previous examples of mechanic synthesis above. The
instruments. The visual analysis of such formants and the harmonic components of speech can be
used as models for an additive synthesis approach, which seems more reasonable in terms of data
The analysis of a large pool of phonemes, even using the simplest of the techniques
described above, implies an extended amount of data, even if such data is reduced to visual
representations. Because the scope of this project was limited by the time available during the
second half of a semesters worth of course work (for one class), an achievable goal seemed to
more familiarized with the sound of Spanish phonemes, and my understanding of English
phonemes may imply the failure to correctly assess the resultant sounds and their similarity to
English phonemes. Latin, however, seems a language that could fulfill the needs of this project.
Latin is a language historically related to western art music, which is often what we study in the
5 Heiga Zena; Keiichi Tokudaa, and Alan W. Blackc. 2009. "Statistical parametric speech
synthesis." Speech Communication 51, no.11 (2009): 10391044.
Leal 4
academy. At the same time, Latin vowels and consonants are closer to Spanish phonemes than
English phonemes are. I created the following sentence: Ad sum lucem arte et laborum, which
roughly translates into I am the light of arts and labor. I am not a religious person, but I interpret
the sentence as a manifestation of Marx and Engels Spectre of Communist, with which they
Because vowels and most consonants can be understood as spectrally different (mainly
harmonic vs mainly inharmonic components), I decided it was better to approach them in two
different ways. Vowels can be synthesized more easily by means of additive synthesis because
they are mainly constituted of harmonic components. I then proceeded to record and analyze my
voice saying the Spanish vowels (which correspond also to most Latin vowels) /a/ /e/ /i/ /o/
6 Friedrich Engels and Karl Marx, The Communist Manifesto, (Kindle Edition).
Leal 5
7 See http://www.klingbeil.com/spear/
8 See https://cycling74.com/products/max/#.WEiJqvkrI4k
Leal 6
To try an additive synthesis approach to reproduce the vowels, I set fifteen sinewave
oscillators. The first oscillator to the left produced a fundamental pitch, which was estimated in
123Hz, and each oscillator to the right produced a frequency resulting from the multiplication of
number box which displayed the resultant frequency in Hz, and to a gain slide which allowed for
I observed the harmonic structure of each vowel and the relative intensity of each
harmonic, and used that information as an estimate to produce vowels using the additive
Leal 7
synthesizer. The model was based only on approximates because the production mechanisms are
different. Perception played the most important role in tweaking the gain for each oscillator. I
was satisfied when I identified the sound as a vowel more than as separate oscillators.
Comparing the result for each vowel also was important because perception of the
formants was easier to understand for one vowel in relation to other vowels. The results for each
vowel were translated as a list of messages for all the oscillators, allowing me to quickly go from
one vowel to another. I recorded the result for a sequence of vowels and analyzed it using Spear
(Figures 5 and 6). While the harmonic analysis of the synthesized vowels is evidently less
complex than the real speech one, the approximation is enough to recognize the timbral
necessary to consider the timbral qualities of those instruments. I selected the clarinet as the main
instrument since, when played at very low dynamic, its timbre resembles that of sinewaves.9 The
translation, however, is not free of technical issues: a bass clarinet is needed to produce the pitch
closest to 123Hz, which is a B2 (123.47Hz), and for the sound to resemble the timbre of a
sinewave it is crucial that the note is played in a pianissimo dynamic. Otherwise, the result
includes a series of partials that have undesired effects on the addition during the synthesis, and
therefore on the resultant timbre. Similarly, because of the relative intensity of the partials that
constitute each vowel, really high pitches are demanded, which are technically almost impossible
to produce in the required dynamic, especially considering that the lower partials are also played
pianissimo.
These above mentioned and other issues will be accounted for by the end of the report.
For now, an approximation using samples will be dealt with as much freedom as possible. To run
9 Wolfe, Joe. 2016, Clarinet acoustics: an introduction, University of New South Wales
School of Physics. http://newt.phys.unsw.edu.au/jw/clarinetacoustics.html#pff
Leal 9
a first trial, I downloaded samples from the University of Iowa Electronic Music Studios
website.10 Pitches were determined using the frequencies produced by the oscillators in the
additive synthesizer, and selecting the closest semitone. Using a Audacity, a free Digital Audio
Workstation (DAW), each sample was loaded in a separate track, which allowed me to control
the gain of each partial independently. While all samples corresponded to a pianissimo
performance of the determined pitch, a great deal of gain control had to be used to produce a
timbre that could be identified as a vowel. This time, the model was taken from the additive
synthesizer, and again the results were slightly altered to produce the desired effect. A harmonic
analysis shows the results for the pseudo analog synthesis of vowels in Figures 7 and 8.
10 Lawrence Fritts, Musical Instruments Samples, University of Iowa Electronic Music Studios.
http://theremin.music.uiowa.edu/MIS.html.
Leal 10
noted, however, how there are similitudes in the analyses of the vowels through different media.
For instance, in figures 1 and 2 it can be observed the difference between vowels /e/ and /i/
mainly in the less nuanced harmonics for the vowel /i/ in the areas between 500 and 750 Hz, and
the total lack of harmonics in the 750 to 2000hz. Also, vowel /e/ presents more nuanced
harmonics than vowel /a/ in the 200 to 500Hz area, but the progression of harmonics upwards for
/e/ is less gradual, and there is visible weakening of harmonics in the area between 750 and
1500Hz, while for /a/ the intensity decreases in a gradual fashion up to the 1600Hz area, where
there is a visible lack of harmonics. Vowel /o/ has a similar structure than /a/, but with more
nuanced harmonics in the 200 to 500Hz area and a visible lack of harmonics in the 1200Hz
area. /u/ appears as similar to /o/, but with les nuanced harmonics in the 500 to 1200Hz, where
/u/ keeps losing harmonics gradually in contrast to the more sudden lack of them in /o/. The
difference between /o/ and /u/, however, seems to be much clearer in the higher register that is
not displayed in the figures. Similar tendencies can be observed in figures 5 and 6, and 7 and 8.
inharmonic sounds in the cases of the original speech and the pseudo-acoustic representation in
Two alternative approaches to the synthesis of vowels were explored, one including flutes
as well as clarinets and one using only clarinets but taking in account the harmonic structures of
the individual pitches produced in different dynamics (using mainly the lower pitches to produce
the aggregate). These experiments were not explored in enough depth to produce satisfying
results. In the first case, the lack of understanding of the flute spectrum produced undesired
inharmonic elements. In the second case, there was not enough independence of each partial
To study the consonants spectra, I recorded my voice saying diverse syllables, such as
apa, aba, opo, ola, and so on. Then, once I compared some of my analysis with the descriptions
and analysis included in articles and electronic resources about phonemes, I proceeded to record
my voice saying the proposed sentence in Latin.11 Figure 9 shows the spectral analysis of the
whole sentence.
Figure 10 is another analysis of the sentence (same recording) using the Emu online software.12 I
only included part of the sentence here, since it was the section I was able to realize using the
pseudo-analog method.
Ad sum lucem
and /m/ have a relatively harmonic spectrum, and the synthesis was approached in the same than
in the case of vowels. /d/ and /s/ and /ch/, however, include much more noise and lack of a clear
harmonic structure. /d/ is a stop phoneme, so the effect was achieved by suddenly cutting the
sound and allowing a softer considerably less rich (in terms of partials) aggregate to appear right
after the stop, as the resonant segment of the /d/ sound. This is not a very accurate representation
of the phoneme, but it is close enough for the ear to interpret when put in a larger context. The /s/
sound, as can be observed, has an incredibly rich spectrum, consisting mainly of noise and
without a clear harmonic structure, and can be classified as a colored noise.13 As the average
listener cannot really differentiate between very similar colored noises, the task was to find a
percussive sound which spectrum was also a colored noise with a similar appearance. The sound
12 See http://ips-lmu.github.io/EMU-webApp/
13 Joshua Fineberg, "APENDIX 1 Guide to the Basic Concepts and Techniques of Spectral
Music." Computer Music Review 19, no. 2 (2000): 91.
Leal 13
of a hi-hat when opened by the foot mechanism was the most similar case among the available
sounds.14 An artificial envelop was applied to avoid the initial percussive hit of the hi-hat. A
similar approach was applied to the phoneme /ch/, but some high frequencies were added to
resemble its whistle-like characteristic. The result of the proposed approach can be listened at
spectrogram of the pseudo-acoustic synthesis realized with the Emu web app.
Ad sum lucem
exploration should include the study of specific transitions between consonants and vowels and
between vowels and vowels, as there are specific harmonic structures and spectral elements that
would help to better interpret the phonemes. In addition, real acoustic synthesis should be
Bibliography