Voice Disorders Classification Using Multilayer Neural Network

2008 International Conference on Signals, Circuits and Systems
Voice Disorders Classification Using

Multilayer Neural Network
Lotfi Salhi Talbi Mourad Adnéne Cherif
Signal Processing Laboratory - Sciences Faculty of Tunis - Tunis ElManar – Tunisia
lotfi.salhi@laposte.net mouradtalbi1969@yahoo.com Adnane.cher@fst.rnu.tn
we attend confusion between the modifications of

Abstract—In this paper we present a new method the motivity orofacial and the speech quality that
for voice disorders classification based on multilayer remains the clinic objective [7].
neural network (MNN). The processing algorithm is Features assessment of a voice disorder is that the
based on a hybrid technique which uses the wavelets disorder carries on a patient's capacity to
energy coefficients as input of the MNN. The
training step uses a speech database of several
communicate are a crucial step to conceive a
pathological and normal voices collected from the program of its management. A process of the
national hospital “Rabta - Tunis” and was prosperous assessment allows the pathologist of the
conducted in a supervised mode for discrimination speech to diagnose the voice disorder, determine
of normal and pathology voices and in a second step the relative efficiency of several treatment
classification between neural and vocal pathologies approaches and formulate a prognosis [8].
(Parkinson, Alzheimer, laryngeal, dyslexia…). Physicians often use invasive techniques like
Several simulation results will be presented in endoscopy to diagnose the symptoms of vocal fold
function of the disease and will be compared with disorders. However, it is possible to identify
the clinical diagnosis in order to have an objective
evaluation of the developed tool.
disorders using certain features of speech signals
[3,4].
I. INTRODUCTION Different classic techniques are used to extract the
Pathological voice recognition has been received vocal parameters and so to make the classification
a great attention from researchers in the last decade. of the pathological voices such as pitch and formant
Speech processing has proved to be an excellent detection: PDA (cepstrum, FFT, spectrogram …).
tool for voice disorder detection. Among the most
interesting recent works are those concerned with II. DISORDER IDENTIFICATION BY PITCH AND
FORMANTS ANALYSIS
Parkinson’s Disease (PD), multiple sclerosis (MS)
and other diseases which belong to a class of neuro- A. Speech Processing Algorithm
degenerative diseases that affect patients speech,
motor, and cognitive capabilities [1,2]. Speech Acquirement
The speech production is a complex motor act that

implies a big number of muscles, of physiological Pre-emphasis
variables and a neurological control implying

different cortical and under cortical regions. Segmentation by
Hamming Window
We distinguishe three systems contributing to the
production of the speech: the respiratory system, Pitch Detection Formants extraction
the laryngeal system and the supra-laryngeal system by Cepstrum by LPC
(the articulators) [9,10].

The nervous system also controls the prosody. This F0 F1, F2, F3
one schematically covers the variations of height
(intonation, melody), the variations of intensity
(accentuation) and the temporal progress (pauses, Comparison with
Normal Values
debit, rhythm).
The analysis of the voice disorder stays essentially
clinic [3]. The instrumental measures are spilled Decision
little in practice clinic. The most used are the

acoustic and aerodynamic measures [4]. The speech
analysis is complex and has been disregarded for a Pathological Healthy
long time. A difficulty result is to analyze in the

literature the different treatment effect (medical or Neural Organic
surgical). Indeed, a many studies don't return

specific analysis of the speech. On the other hand, Figure 1: Speech processing algorithm
978-1-4244-2628-7/08/$25.00 ©2008 IEEE -1-

B. Pitch And Formants Analysis Results take the decision. For it, we try to improve this idea
Figure 2 illustrates the pitch variation by by new methods that use wavelet transformed and
application of the cepstrum method analysis of neural networks.
normal and pathological female sounds (32 years).
III. NEW APPROACH FOR VOICE PATHOLOGY
CLASSIFICATION
This work presents a development of the basic idea
presented in [5]. This paper propose a technique
that uses wavelet analysis to extract a feature vector
from speech samples, which is used as input to a
Multilayer Neural Network classifier. Wavelet
analysis provides a two-dimensional pattern of
wavelet coefficients. The energy content of
Wavelet coefficients at various level of scaling is
Figure 2-a: Pitch evolution for a normal female voice used to formulate a feature vector of speech sample.
Attempt is made to use this feature vector as a
diagnostic tool to identify pathological disorders in
the voice. A three layer feed forward network with
sigmoid activation is used for classification.
Generalized Back Propagation Algorithm (BPA) is
used for training of the network.
A. Algorithm Of The Hybrid Method
Speech Acquirement
Figure 2-b: Pitch evolution for a pathological female voice Pre-emphasis
The linear predictive coding (LPC) method Segmentation by

applied on the same voices allows evolution to Hamming Window
extract the formants. By comparison to the normal
Wavelet Transform
values (figure 3), the high variations of the
formants F1, F2 and F3 of the pathological male
Energy
sound confirm the conclusions. [12,13] Coefficients
Multilayer Neural
Network (MNN)
Decision
Pathological Healthy
Figure 4: Hybrid method algorithm
B. Wavelet transforms analysis

Figure 3-a: Formants evolution for a normal male voice
The wavelet transform can be viewed as
transforming the signal from the time domain to the
wavelet domain. This new domain contains more
complicated basis functions called wavelets, mother
wavelets or analysing wavelets.
A wavelet prototype function at a scale s and a
spatial displacement u is defined as: [6]
1 t u ( u IR, s IR* ) (1)
\ (t )
u,s \( )
s s
C. Continuous Wavelet Transforms

Figure 3-b: Formants evolution for a pathological male voice
Although these methods can help us to distinguish a The CWT is an excellent tool for mapping the
pathological voice but they remain subjective changing properties of non-stationary signals. The
methods that don't give any quantification values to CWT is also an ideal tool for determining whether
-2-
or not a signal is stationary in a global sense. When

a signal is judged non-stationary, the CWT can be
used to identify stationary sections of the data
stream.
Specifically, a Wavelet Transform function
f(t) L2(R) (defines space of square integrable
functions) can be represented as:
f
W ( f )(u, s ) ³f
f (t )\ u*, s (t )dt
(2) Figure 6: Three-level wavelet decomposition tree
f 1 t u In the figure, the signal is denoted by the sequence
³
f
f (t ) \ * (
s s
)dt
x[n], where n is an integer. The low pass filter is
The factor of scale includes an aspect transfer at a denoted by G0 while the high pass filter is denoted
time in the time brought by the term u, but also an by H0. At each level, the high pass filter produces
aspect dilation at a time in time and in amplitude detail information, d[n], while the low pass filter
associated with scaling function produces coarse
brought by the terms s and s et s. approximations, a[n].
D. Discrete Wavelet Transform F. Neural Network

The Discrete Wavelet Transform (DWT), which is Neural were chosen as a method of pattern
based on sub-band coding is found to yield a fast matching for many main reasons. First the MatLab
computation of Wavelet Transform. It is easy to software has a fantastic implementation of several
implement and reduces the computation time and different types of neural networks in its Neural
resources required. Network Toolbox. The big advantage of the neural
The Discrete Wavelet Transform (DWT) involves networks resides in their automatic training
choosing scales and positions based on powers of capacity, what permits to solve some problems
two, so called dyadic scales and positions. The without requiring to the complex rule writing, while
mother wavelet is rescaled or dilated by powers of being tolerant to the errors [11].
two and translated by integers. A neuron (Figure 7) is an information processing
The Daubechies wavelet db40 is selected as a good unit, which is an essential part of a neural network.
choice because of its high performance in an A neuron consists of three main elements: synapses
informal listening test [15]. For fast wavelets (links), a linear combiner, and an activation
transformed (DWT), the functions are defined by a function. Each synapse (link) contains a weight
game of indications that one designates under the factor. Input p(i), which is connected to neurone k,
appellation " coefficients of the filters in is multiplied by synaptic weight w(k, i).
wavelets"[6]. Hence, the output of a neuron depends on its inputs
Figure 5 shows the variation of the analysis and its activation function. There are different types
function of the Daubechies wavelet (db40) and its of activation functions that can be used in Matlab.
corresponding spectrum. The most commonly used activation functions are
Psi function of Daubechies Wavelet Specter of Daubechies Wavelet
hard limit, linear, or sigmoid functions. Naturally,
0.8 1.4
one can also construct hes own activation function.
0.6 1.2
0.4
1
0.2
0.8
0
0.6
-0.2
0.4
-0.4
-0.6 0.2
-0.8 0
0 20 40 60 80 0 5000 10000
time(ms) frequency (Hz)
Figure 5: Wavelet of Daubechies to 40 hopeless moments and its

spectrum
Figure 7: Model of an artificial neuron
E. DWT and Filter Banks
Starting with a discrete input signal vector x[n], the
Each layer has a weight matrix W, a bias vector b,
first stage of the fast wavelet transform (FWT)
and an output vector a (Figure 8). The number of
algorithm decomposes the signal into two sets of
neurons usually varies between each layer. In
coefficients. These are the approximation
Figure 3, the number of inputs is R, and the number
coefficients cA1 (low frequency information) and
of neurons in the first layer is S1, while in the
the detail coefficients cD1 (high frequency
second layer it is S2, also the same for other layers.
information). The DWT is computed by successive
The layers, which are situated between the inputs
lowpass and highpass filtering of the discrete time-
and the output layer, are called hidden layers. Thus,
domain signal as shown in figure 6.
Figure 8 shows two hidden layers [11].
-3-
illness. The simulation result of different absolute

coefficients is given in figure 10 and 11.
We notice a clean difference between wavelet
coefficient evolutions of the two different signals.
This analysis method can also provide a visual
pattern, which can be of considerable help in
diagnostics.
Analyzed signal of normal voice.
0.5
Figure 8: Multi-layer neural network structure -0.5

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
A neural network is trained by giving a target Discrete Transform, absolute coefficients.
output to a certain input group, in which case the
level
term supervised learning is used. Alternatively, a
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
network can be trained through self-guidance, Continuous Transform, absolute coefficients.
which means that the network parameters adapt 31

29
27
25
23
21
Scale
19
17
according to the input. In both cases, the free 15
13
11
9
7
5
3
parameters in the network, weights and biases, 1
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
time (or space) b
adapt according to the measured data. The training
Figure 10: Wavelet Analysis of Normal Voice
can be gradual (incremental training), which means
Analyzed signal of pathological voice.
that the weights and biases are adapted every time 1
that a new training example is fed to the network, 0
or it can be done in batches (batch training), in -1

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
which case the parameters are not adapted until all Discrete Transform, absolute coefficients.
the examples have been fed.

level
The backpropagation algorithm is often used in the

training of multi-layer neural networks. 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Continuous Transform, absolute coefficients.
If sigmoid activation functions are used in the 31

29
27
25
23
21
Scale
19
output layer, the outputs of the network are limited 17
15
13
11
9
7
5
3
to a small range. Also, if a linear activation function 1
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
time (or space) b
is used in the output layer, the network outputs can
Figure 11: Wavelet Analysis of Pathological Voice
have any real number values.
B. Neural Network Design
IV. SIMULATION RESULTS
In our survey we use a multilayer neural network
The Matlab7.0 platform is used for implementation
(MNN) with only one layer hidden between the
of the neural network formed of three layers, one of
input layer and the output layer. Every neuron of
input, one of output and a hidden layer (Figure 11).
the hidden layer is connected to the neurons of the
The input layer is formed of the same neurons
input layer and those of the output layer and there
number that corresponds to the components of the
is not a connection between the cells of a same
input vector. The input is the feature vector
layer. The activation functions used in this type of
obtained from Wavelet decomposition.
network are the doorstep or sigmoid functions.
The hidden layer contains fifteen neurons and the
This network follows a supervised training
output layer contains only one neuron to give the
according to the rule of errors correction. The
decision pathological or normal (Figure 9).
training type used for this network is the
supervised fashion. To every well stocked input
an answer corresponds waited at the output. So
the network is going to alter until it finds the good
output.
C. Feature Extraction
A Filter Bank is used to extract the wavelet
coefficients.
The energy of every level is normalized against
Figure 9: Voices classification model total energy content in the signal.
E (3)
A. Wavelet Coefficients E (i) = i N
ET
We apply discreet and continuous wavelet
transform coefficients on the same word Where i = 1, 2 …
pronounced by two speakers that have the same ET : Total Energy across all the levels.
sex and same age. One of these speakers is E T = ¦ Ei (4)
healthy whereas the other sulfur of Alzheimer's i
Ei : Energy at each level.

-4-
Table I: Neural Network Results With Three Coefficients

V. NEURAL NETWORKS RESULTS
We use 60 words on the total, pronounced by Pronounced Normal Pathological
different speakers of which 30 are normal and the word
other present pathologies of vocal or neurological training
20 20
origin. For the training, we use 40 words (20 number
normals and 20 pathological). Test number 10 10
After training the network will be tested with 20 Correcte
9 8
words different from those used for the training classification
(10 normals and 10 pathological). Rate of
90 % 80%
In order to obtain optimal result, we vary at every classification
time the number of the energy coefficients to the
input of the neural network. This procedure B. Five Energy Coefficients
requires a variation of the choice of the waveket We use a wavelet filter bank to extract a five
filter bank. wavelet coefficient then we calculate their
corresponding energy
A. Three Energy Coefficeients The results of neural network training and testing
We use a wavelet filter bank to extract a three are regrouped in tableII.
wavelet coefficient then we calculate their
corresponding energy Table II: Neural Network Results With Five Coefficients
Tree Decomposition Pronounced Normal Pathological

Word
(0,0)
Training
20 20
Number
Test Number 10 10
Correcte
(1,0) (1,1) 10 9
Classification
Rate of
100 % 90 %
Classification
(2,0) (2,1)
C. Seven Energy Coefficeients
We use a wavelet filter bank to extract a seven
wavelet coefficient then we calculate their
E1 E2 E3 corresponding energy
The results of neural network training and testing
Figure 12: Three energy coefficients extraction
are regrouped in table III.
The obtained training curve is given in figure 13 Table III: Neural Network Results With Seven Coefficients
Performance is 9.96138e-006, Goal is 1e-005
1
10 Pronounced Normal Pathological
0
10
Word
Training
-1
10 20 20
Number
Training-Blue Goal-Black
-2
10
Test Number 10 10
-3
10
Correcte
10 10
-4
10 Classification
-5
10
Rate of
-6 100 % 100 %
10
0 20 40 60 80 100 120 140 160 180 200 Classification
202 Epochs
Figure 13: Training curve with three coefficients Then we can resum these different results in the
following diagrams (figure 14).
The results of neural network training and testing
are regrouped in table I.
-5-
VII. REFERENCES
Pathological Voices Classification [1] V. Parsa and D. G. Jamieson, “Interactions
between speech coders and disordered speech,”
100 Speech Communication, vol. 40, no. 7, pp. 365–
80 385, 2003.
60 [2] S. B. Davis, “Acoustic characteristics of normal
40 and pathological voices,” Speech and Language:
20 Advances inBasic Research and Practice, vol. 1, pp.
271–335, 1979.
0
1 2 3 4 5 6 7 [3] F. Plant, H Kessler, B Cheetham, J Earis,
E n e r g y C o e f f i c i e n t N u mb e r
“Speech Monitoring of Infective Laryngitis”,
Proceedings of ICSLP96, Philadelphia, pp. 749 –
752 , 1996
Normal Voices Classification [4] M.N. Viera, F.R. McInnes, M.A. Jack “Robust
F0 andJitter estimation in the Pathological voices “,
Proceedings of ICSLP96, Philadelphia, pp.745 –
100
748, 1996.
80
[5] J.Nayak, P.S.Bhat “Classification and analysis
Classification Rate
60
of speech abnormalities”, ITBM-RBM 26 (2005)
40 319-327.
20 [6] S. Mallat, “A Theory for multiresolution signal
0 decomposition: Wavelet representation” , IEEE
1 2 3 4 5 6 7
Energy Coefficient Number
Trans. Pattern Analysis and Machine Intelligence.
Vol. 11. No. 7 pp674-693 July 1989.
Figure 14: Voice classification by MNN results
[7] B. Boyanov, S.Hadjitodorov: “Acoustic analysis
of pathological voices: a voice analysis system for
VI. CONCLUSION screening of laryngeal diseases”, Proc. IEEE
The goal of this work is to conceive a tool of help Engineering in Medical and Biology, (1997), vol.
to the clinicians in the Tunisian hospitals. This tool 16, no. 4, 74-82.
allow to follow-up of patients who suffer from [8] J.J. Jiang,Yu Zhang: “Nonlinear dynamic
illness of vocal and neurological origin. analysis of speech from pathological subjects”,
We presented in this paper a material and software Proc IEEE Electronics Letters, March (2002),
interface of numeric treatment of the patient’s vocal vol.38, no.6.
signal based on neural networks. [9] P.Yu, M.Ouaknine, J.Revis, and A.Giovanni,
Result of the multilayer neural network (MNN) “Objective Voice Analysis for Dysphonic Patients:
classifier gives the correct classification. The A Multiparametric Protocol Including Acoustic and
classification rate is between 90% and 100%. We Aerodynamic Measurements”, Journal of Voice
have demonstrated in this study, a feature vector Vol. 15, No. 4, pp. 529–542 © 2001 The Voice
based on wavelet coefficients is useful for Foundation
classification of normal and pathological speech [10] J.Wang, Jo.Cheolwoo, “Performance of
data. At a preliminary level, the speech data is Gaussian Mixture Model as a classifier for
classified into two classes normal or pathological. Pathological Voice”, proceeding of the ASST in
The multilayer neural network (MNN) with back Auckland 2006, pp 165-169.
propagation algorithm (BPA) used as a classifier [11] J.Kortelainen, K.Noponen, « Neural
has been proved to be more efficient and more networks », Intelligent Systems 2005
precise than the time-frequency analysis method. [12] S.Lotfi, C.Adnène, “A Speech Processing
The MNN classifier represents a low cost, accurate, Interface for Analysis of Pathological Voices”, in
and automatic tool for pathological voice proceeding of ICTTA conference, Damascus 2006.
classification using wavelet coefficients normalized [13] S.Lotfi, B.Haythem, C.Adnène, “ Interface
energy. It is presented in this paper as diagnostic d’analyse vocale a l’identification de certaines
tools to aid the physician and clinician in the pathologies d’origine neurologique et vocale”, in
analysis of speech disease. proceeding of JTM conference, Tunis 2007.
Therefore, future work will be focused on the [14] A.CHÉRIF, « Pitch detection and formant
specific recognition of illness type that causes the extraction of Arabic speech processing » Journal of
speech pathology. applied acoustics, January 2001
Finally, This work has to be validated on a larger [15] A.M. Gaouda, M. Salama, A. Chikhani, and
speech pathology database to increase the result M. Sultan, “Application of wavelet analysis for
reliability. monitoring dynamic performance in industrial
plants,” North American Power Symposium, Oct.
1997, Laramie, Wyoming.
-6-

Voice Disorders Classification Using Multilayer Neural Network

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Voice Disorders Classification Using Multilayer Neural Network

Enviado por

Direitos autorais:

Formatos disponíveis

2008 International Conference on Signals, Circuits and Systems

Voice Disorders Classification Using

we attend confusion between the modifications of

The speech production is a complex motor act that

variables and a neurological control implying

(the articulators) [9,10].

little in practice clinic. The most used are the

long time. A difficulty result is to analyze in the

surgical). Indeed, a many studies don't return

978-1-4244-2628-7/08/$25.00 ©2008 IEEE -1-

A. Algorithm Of The Hybrid Method

Figure 2-b: Pitch evolution for a pathological female voice Pre-emphasis

The linear predictive coding (LPC) method Segmentation by

Figure 4: Hybrid method algorithm

B. Wavelet transforms analysis

C. Continuous Wavelet Transforms

or not a signal is stationary in a global sense. When

D. Discrete Wavelet Transform F. Neural Network

Figure 5: Wavelet of Daubechies to 40 hopeless moments and its

illness. The simulation result of different absolute

Figure 8: Multi-layer neural network structure -0.5

A neural network is trained by giving a target Discrete Transform, absolute coefficients.

output to a certain input group, in which case the

which means that the network parameters adapt 31

that a new training example is fed to the network, 0

or it can be done in batches (batch training), in -1

the examples have been fed.

The backpropagation algorithm is often used in the

If sigmoid activation functions are used in the 31

Ei : Energy at each level.

Table I: Neural Network Results With Three Coefficients

Tree Decomposition Pronounced Normal Pathological

Você também pode gostar