Escolar Documentos
Profissional Documentos
Cultura Documentos
I. INTRODUCTION
In the study of oral communication it is possible to
distinguish between two different channels. One of them
provides the transmission of messages in an explicit way
(speech), while the other channel contributes in an implicit
way providing information about the speaker; in this channel,
which is not as discussed as the first one, the voice can
be considered as a biological signal, due to the fact that it
contains extralinguistic information about physiological and
emotional states of people [1].
Any type of research aimed at understanding the human
mind facilitates the development of applications in which
human-machine interfaces are involved [2]. For this reason
is relevant in the study of humans behavior to understand
how humans can express their emotions [3]. In this paper
is presented a methodology for the extraction of acoustic
and spectral characteristics over speech signals from wich
it is possible to determine the emotional state in which
there is a speaker. This methodology can be implemented
in diagnosis and treatment of psychological disorders such
as Post-Traumatic Stress Disorders (PTSD) in which voice
analysis reveals hidden information to the specialist about
real emotional state of the patient [4]. Speech signals
are non-stationary random processes; they exhibit innate
rhythms and periodicity that is more readily expressed and
appreciated in terms of frequency than time units. For this
reason the use of time-frequency transforms is needed in
order to extract relevant features in time and frequency
domains [5].
This paper is organized as follows: first a description of the
time-frequency transforms used to perform the representation
of the signal is provided. Then, there is a description of
the automatic emotion identification process, from feature
M. Morales, J. Echeverry and A. Orozco are with Faculty
of Electrical, Electronic, Physics and Systems Engineering,
Universidad Tecnolgica de Pereira, La Julita, Pereira, Colombia
{mmperez,jdec,aorozco}@ohm.utp.edu.co
G.
Castellanos
is
with
Faculty
of
Engineering,
Universidad
Nacional
de
Colombia,
Manizales,
Colombia
cgcastellanosd@unal.edu.co
s(t)g(t )ejt dt
(1)
(2)
2590
Fundamental
Period
Pulse
generator
Vocal Tract
Parameters
Glottal
Model
1
N 1
Vp =
G (z)
H (z)
(3)
ai
NP
1
|p(i)|
i=1
Random noise
generator
G
N
P
|p(i + 1) p(i)|
i=1
1
N
Vocal Tract
Model
Fig. 1.
NP
1
z i
i=1
TABLE I
C ONFUSION MATRIX FOR EMOTION IDENTIFICATION PERFORMED BY
HUMAN LISTENERS
Intended emotion
Hapiness
Anger
Surprise
Sadness
Neutral
Precision
Hapiness
61.9
3.2
7.9
84.8
Neutral
9.5
7.9
1.6
81.0
81.0
2591
TABLE II
C ONFUSION MATRIX FOR EMOTION IDENTIFICATION WITH LPC
FEATURES
Real emotion
Hapiness
Anger
Neutral
Surprise
Sadness
Global
300
Happines
Anger
Neutral
Surprise
Sadness
250
Frequency (Hz)
200
Hapiness
50
3.33
3.33
3.33
16.66
Sadness
6.66
10
6.66
40
64.66
150
TABLE III
C ONFUSION MATRIX FOR EMOTION IDENTIFICATION WITH WVD
100
FEATURES
50
0
0
Fig. 2.
100
200
300
Time (ms)
400
500
600
Real emotion
Hapiness
Anger
Neutral
Surprise
Sadness
Global
Real emotion
Hapiness
Anger
Neutral
Surprise
Sadness
Global
3.33
46.66
67.33
Hapiness
80.00
3.33
16.66
13.33
Sadness
3.33
16.66
66.66
72.66
Feature extraction
TABLE V
C ONFUSION MATRIX FOR EMOTION IDENTIFICATION WITH G ABOR
LPC
104
Characteristics
FEATURES
ACF
Raw Data
Fig. 3.
Sadness
FEATURES
DWT
Window
TABLE IV
C ONFUSION MATRIX FOR EMOTION IDENTIFICATION WITH DWT
Gabor
WVD
Hapiness
46.66
13.33
13.33
10
13.33
Real emotion
Hapiness
Anger
Neutral
Surprise
Sadness
Global
Characterization scheme
Hapiness
73.33
20
16.66
3.33
Sadness
3.33
6.66
80
77.33
TABLE VI
C ONFUSION MATRIX FOR EMOTION IDENTIFICATION WITH RAW DATA
2592
FEATURES
Real emotion
Hapiness
Anger
Neutral
Surprise
Sadness
Global
Hapiness
83.33
6.66
10
3.33
10
Sadness
6.66
10
6.66
80
80.66
TABLE VII
C ONFUSION MATRIX FOR EMOTION IDENTIFICATION WITH MIXED
FEATURES
Real emotion
Hapiness
Anger
Neutral
Surprise
Sadness
Global
Hapiness
93.33
3.33
6.66
Sadness
90
94.66
VII. ACKNOWLEDGMENTS
This work has been partially funded by Colciencias
and Universidad Tecnolgica de Pereira under contract
1110-370-19600.
The authors would like to thank the Speech Technology
Group, Dept. of Electronic Engineering, Technical University
of Madrid, and especially Juan Manuel Montero, the database
SES provided for this study.
VI. CONCLUSIONS
A methodolgy for emotion detection in speech signals
based on the use of acoustic and spectral features
is developed using representation and acoustic features.
The results show high percentages in emotional states
recognition. This technique was validated using a Bayesian
classifier. Comparing Tables I and VII is proven the
effectiveness of the used methodology improving in 13.68%
the identification carried out by a group of human listeners
over short phrases.
It is possible to develop a system in real time, that serves as
2593
R EFERENCES
[1] X. Huang. Spoken language processing. Prentice Hall, 2001.
[2] Cowie, R. et.al. Emotion recognition in human-computer interaction.
IEEE Signal Processing Magazine, (18) 1, pp: 32-80, 2001.
[3] E. Vyrynen. Automatic emotion recognition from speech. Master
Thesis, Department of Electrical and Information Engineering,
University of Oulu, Finland, 2005.
[4] G. Castellanos, E. Delgado, G. Daza, L.G. Sanchez, J.F.
Suarez. Feature Selection in Pathology Detection using Hybrid
Multidimensional Analysis. Engineering in Medicine and Biology
Society, 2006. EMBS 06. 28th Annual International Conference of
the IEEE. pp: 5503 - 5506, 2006.
[5] F. Hlawatsch and G. Matz. Time-Frequency Signal Processing: A
Statistical Perspective Invited paper in Proc. CSSP-98, Mierlo (NL),
pp. 207-219, 1998.
[6] R. Rangayyan. Biomedical Signal Analysis. Ed. Wiley & Sons, 2001.
[7] J.M. Montero, J. Gutirrez-Arriola, S. Palazuelos, E. Enrquez, S.
Aguilera, J.M. Pardo Emotional Speech Synthesis: From Speech
Database to TTS, Proceedings of 5th International Conference on
Spoken Language Processing, ICSLP98, Australia, 1998.
[8] R. Barra, J. M. Montero, J. Macas, Prosodic and segmental
rubrics in emotion identification, in ICASSP 2006 Proceedings. IEEE
International Conference, 2006.
[9] J.M. Montero, J. Gutirrez-Arriola, R. Crdoba, E. Enrquez,
J.M. Pardo.The role of pitch and tempo in Spanish emotional
speech: towards concatenative synthesis. In Improvements in Speech
Synthesis. Ed. Wiley & Sons, 2002.