Você está na página 1de 2

Timbre 2018: Timbre Is a Many-Splendored Thing 5-7 July 2018, Montreal, Quebec, Canada

Towards timbre solfege from sound features manipulation


Thiago Rossi Roque1†, Rafael Santos Mendes2
1
IA, UNICAMP, Campinas, São Paulo, Brazil
2
DCA/FEEC, UNICAMP, Campinas, São Paulo, Brazil

Corresponding author email: t072515@dac.unicamp.br

Aims/goals
This work presents the initial results from a research on a new sound synthesis technique, aiming to sort
out the timbre of sound instruments into a set of unidimensional, predictable and recognizable parameters
by using sound features. At this point of the research, an analytical analysis/synthesis system for the
extraction and manipulation of perceptual spectral sound features have been developed using the Fractal
Additive Synthesis technique as basis framework.
Traditional sound synthesis techniques lacks from a good mapping from its control parameters to the final
resulting sound (Hunt & Kirk, 2000). By using feature manipulation techniques, we believe this problem
could be overcome by defining a set of manipulable, orthogonal and unidimensional sound features.
Pierre Schaeffer, in his seminal work “Traité des objets musicaux”, proposed an outlook of timbre, in the
context of reduced listening, by sorting out this phenomenon using a set of morphological criteria (Chion,
2009). In this work, an analogy between Schaeffer’s morphological criteria and the concept of sound
features is proposed in a way that the timbre phenomenon could be decomposed by abstractable
“pertinent traits”, similar to the traditional notion of pitch, duration and intensity.
Background information
Sound features are an important tool of the science called Music Information Retrieval (MIR), these are
constituted by extractable information sets that can describe and quantify several sound characteristics in
an objective way. Instead of use sound features just for the extraction of information about each sound
sample, this project implements a feature modulation system to enable timbre manipulation by
establishing a link between each sound feature and a perceptual element. In the last decades, several
researches have sought the use of sound features on synthesis techniques, each using different approaches.
In this context, it is worth to mention the work of Park (2007), Hoffman (2006), Mintz (2007) and
Sinclair (2017).
Fractal Additive Synthesis (FAS) is a spectral modeling analysis/synthesis technique developed for low
rate coding of tonal signals similar to Xavier Serra’s Spectral Modeling Synthesis (SMS) (Serra, 1989).
However, unlike SMS, FAS doesn't codifies each harmonic partial as a perfect periodic ideal signal, it
takes into account the natural pseudo-periodicity of tonal signals by performing a fractal, 1/f profile
analysis of the sidebands of each harmonic partial (Polotti, 2003). By this approach, FAS is capable to
synthesize sounds with far better organicity, giving more emphasis on codifying the stochastic aspect of
sound, even using the same amount of stored data.
By its low set of parameters needed to codify periodic/pseudo-periodic signal, FAS has also proven itself
a good candidate to be used as the basis framework for the extraction of sound features from pitched
sounds, justifying its choice for this project.
Methodology
In this initial stage, just a small set of spectral sound features has been chosen for this research, these
would be: Spectral Centroid (SC), Spectral Spread (SS), Even to Odd Harmonic Ratio (EOR), Tristimulus
(TR), Mean Harmonic Band Hurst Exponent (MHE) and Harmonic Band Correlation Coefficient
(HBCC)1.
To study the possible understanding of sound features as pertinent traits, we seek to analyze a possible
recognizability and previsibility on each feature. This study was carried out in three stages.

1 Mean Harmonic Band Hurst Coefficient and Harmonic Band Correlation Coefficient are two stochastic features
developed specifically for FAS; for detailed definition of these features, see Roque (2017).

1
Timbre 2018: Timbre Is a Many-Splendored Thing 5-7 July 2018, Montreal, Quebec, Canada

In the first one, each feature was extracted from samples of four different instruments (cello, trumpet,
oboe and clarinet) playing close pitches. The second stage consists on the analysis of the variation of each
feature for a same instrument but along different pitches. These first two stages aim study how the value
of some features vary for different samples with different timbre characteristics and along different
pitches. The third stage consists on the modulation of the features followed by a careful listening of the
synthesized modulated sound. For this last stage, only SC, EOR, MHE and HBCC were used. The main
goal of this stage was to seek for a monotonic relationship between the amount of modulation and its
related perceptual timbristic aspect.
Results
From the results obtained in the first two stages of this study, many perceptual characteristics could be
observed by the extracted features values. As expected, the SC presented an easier to be noticed relation
with a perceptual characteristic, strongly related to the notion of brightness. The MHE also presented an
easily perceived perceptual characteristic, but, unlike expected, presented a greater relation with the
background noise than with the notion of pseudo-periodicity. The TR has been shown to be an interesting
feature when associated to the SC for varying pitches.
The third stage of the analysis showed interesting results. All features presented a monotonic relation
between the amount of modulation and its related feature value. Although highly non-linear, this
monotonic relation express a good indication of a possible previsibility. On the other hand, except for the
SC, it was possible to obtain sounds from different instruments showing a same feature value but with
diverging perceptual characteristic associated with this feature.
A detailed presentation of these results can be found in Roque (2017).
Conclusions
This first set of results on the sort out of the timbre phenomenon into a set of unidimensional, predictable
and recognizable parameters shows a promising possibility for this usage of sound features. Among
studied features, SC presented itself as the best candidate for an abstractable pertinent trait. There is still
work to be done, mainly by expanding this analysis/synthesis system for other perceptual features,
including temporal ones.
Following recent researches on sound features allied to deep learning and autoencoders (Sinclair, 2017),
this machine learning approach seems promising for the search of new features towards this
morphological criteria analogy.
Nevertheless, there is still an important semantic gap to be overcome about the real relation between each
sound feature and its associated perceptual aspect. Only massive listening tests with a large number of
listeners might lead to final conclusions.
References
Chion, M. (2009). Guide to sound objects.
Hoffman, M. D., & Cook, P. R. (2006). Feature-Based Synthesis: Mapping Acoustic and Perceptual
Features onto Synthesis Parameters. In ICMC.
Hunt, A., & Kirk, R. (2000). Mapping strategies for musical performance. Trends in gestural control of
music, 21, 231-258.
Mintz, D. (2007). Toward timbral synthesis: a new method for synthesizing sound based on timbre
description schemes (Doctoral thesis, University of California, Santa Barbara).
Park, T. H., Biguenet, J., Li, Z., Richardson, C., & Scharr, T. (2007). Feature modulation synthesis
(FMS). In ICMC.
Polotti, P. (2003). Fractal additive synthesis: spectral modeling of sound for low rate coding of quality
audio (Doctoral thesis, École Polytechnique Fédérale de Lausanne, Lausanne).
Roque, T. R. (2017). Extração e Modulação de Descritores Sonoros a Partir da Síntese Aditiva Fractal
(Master dissertation, State University of Campinas, Campinas/SP).
Sinclair, S. (2017). Sounderfeit: Cloning a Physical Model with Conditional Adversarial Autoencoders. In
Proceedings of the 16th Brazilian Symposium on Computer Music.

Você também pode gostar