Escolar Documentos
Profissional Documentos
Cultura Documentos
2
) with the argument in the function standing for
probability of x given mean and variance
2
.Translating
Eq. (1) from a single variable to a D-dimension vector x,
theGaussian distribution is written as [2]:
(2)
Where x is a D-component column vector, is the D-
component mean vector, is the D by D covariance matrix,
| | and
-1
are its determinant and inverse respectively.
(x - )
t
denotes the transpose of (x - ).
Distribution of different random variables can be modelled
using Gaussian distribution. However, this has severe
limitations when it comes to modelling on real datasets [3].
To improve these limitations, we consider a linear
combination of group of M Gaussian densities of the form:
(3)
Where wm is the mixing coefficients and M is the total
number of Gaussian mixtures. From Eq. (3) which is
termed mixture of Gaussians, each Gaussian density N (x
|m, m) has its own mixing coefficient, mean m and
covariance m. Note that these individual Gaussian
components are normalized so that [2]:
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
ISSN: 2347-8578 www.ijcstjournal.org Page 25
(4)
where0 1. We therefore see that the mixing
coefficients satisfy the requirements to be probabilities.
From the sum and product rules, the marginal density is
given by [2]:
(5)
Which is equivalent to Eq. (3), whereby p(m) = wm is the
probability of the mthmixture component and the
densityp(x|m) = N (x|m,m) is known as the probability of x
conditioned on m. Now if we want to find the Gaussian
mixture component from which vector x come, we can
reverse the conditional probability by using Bayes theorem
[2]:
(6)
Log likelihood function is computed to find out the
parameters of w {w1,,wm}, {1,, m}, {1,,
m} in Eq. (1.3) & Eq. (1.6) by using [2]:
(7)
Where X={x1,,xN}.
Optimized parameters are to be found of each Gaussian
mixture using iterative optimization techniques to
maximize the likelihood function by using expectation
maximization (EM) technique. A two stage process is
involved in EM iterative method [2]. Current estimate of
the latent variables is used to evaluate the expectation (E)-
step of the log-likelihood. The second stage is the
maximization (M)-step, the parameters is re-estimated by
maximizing the expected log-likelihood found in the E-
step. A repeated process is done until a convergence
criterion is satisfied. [2]
The expectation maximization process is illustrated in the
plots depicted in Fig 1. Plot (a) shows an example of data
points in two-dimensional Euclidean space coloured in
green. Plot (b) shows the first stage of the E-step where the
mean init, covariance init and mixing coefficient
wintparameters in each of the two Gaussian components
(highlighted in blue and red) are initialized. Next, the
posterior probabilities of the data points to each Gaussian
component are evaluated. The data points highlighted in
blue (8) indicates that the posterior probabilities are closer
to the blue Gaussian component while the data points
coloured in red (9) are closer to the red Gaussian
component. The data points that have probabilities
belonging to either Gaussian component are depicted by a
pink triangle (:). Plot (c) shows the first M-step in re-
estimating the new means, covariance and mixing
coefficient of the Gaussian components from the data
points that were newly assigned to either the red or the blue
component based on their current posterior probabilities.
The log likelihood probability is then maximized by
repeating the cycle of the E and M steps until convergence
criteria is met as illustratedinplots(d)(i) [2].
Figure 1
Figure 1: Illustration of the expectation maximization
(EM) iterative optimization technique for a mixture of two
Gaussian components (adapted from Bishop [3]). (a) Green
points denote an example of a dataset in two-dimensional
Euclidean space. (b) First stage, expectation (E) step:
Initialization of the parameters mean init, covariance init
and mixing coefficient wintand evaluating the posterior
probabilities of the data points to each Gaussian
component. (c) Second stage, maximization (M) step: Re-
estimating the parameters using the current posterior
probabilities and calculating the log likelihood. (d)-(i) show
subsequent E and M steps through to the final convergence
of the log likelihood.
III. SUPPORT VECTOR MACHINE
Support Vector Machine (SVM) [4] is linear classifier.
Compared to the other classifiers, Support Vector Machine
yields good performance to algorithms in binary
classification problems [3]. SVM with a two-classification
problem using a simple form of linear classification [2].
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
ISSN: 2347-8578 www.ijcstjournal.org Page 26
Figure 2: A maximal margin hyper plane with its support
vectors highlighted in circles.
Given N number of data points with d-dimensional features
(variables) R
d
, where xR
d
is the d-dimensional input
vector and q {-1, 1} is a class for binary classification.
The decision function is of the form [2]:
q = sign (w. x + b) (8)
Where b is bias and w is the weight vector and is
associated label, b R and w R
d
are the parameters that
control the function of the decision boundary given by Eq.
(8). The linear data set is linearly separable in input space is
assumed.
Margin is defined as the smallest distance between the
decision boundary and any of the samples as illustrated in
Fig. 2 (left).Margin is used to form linear classifier in SVM.
To maximize the distance of the margin between two
parallel hyper-planes which separates two groups (classes)
is the main objective of margin. This is illustrated in Fig.2
(right) where the linear classifier defined by the hyper-
plane <w, x> + b = 0 is midway between the separating
hyper-planes. The support vector that is boundary is
highlighted by the circles in Fig. 1.2 (right). This margin
can be computed as 2 / ||w|| [2].
If the training data set is not linearly separated we can we
can map the non-separable data from the input space to a
higher dimensional feature space whereby the data is
transformed to be linearly separable in which the linear
models can be used. So we can write the equation as [2]:
q=sign (w. (x) + b) (9)
where : x (x); is a non-linear map from the input space
to some feature space. This means that we can build non-
linear machines in two steps: first a fixed non-linear
mapping transforms the data into a feature space (x), and
then a linear machine is used to classify the data in the
feature space.
The decision rule can be evaluated using just inner products
between the test point and the training point by expressing
equation (9) as a linear combination of training points [2].
q = (xi) (x) + b (10)
Exact separation of the training data in the original input
space x will be given by the resulting support vector
machine, although the corresponding decision boundary is
nonlinear. In real world datasets, exact separation of the
training data can lead to poor generalization due to the
class-condition distributions overlap. We need to modify
the support vector to allow some of the training points to be
misclassified to find the solution in searching for the
maximum margin classifier of the following optimization
problem [2]:
Zminimize
subject to ( ) + b) , (11)
Where b and w is the data sample with a
d-dimensional feature vector, {-1, 1} is the class labels,
and l is the number of training points. The function maps
the training vectors into a higher dimensional space. The
first constraint dictates that points with equivalent labels are
on the same side of the line. The slack variable allows
data to be misclassified while being penalized at rate C in
the objective function in Eq. (11). Therefore this allows
SVM to handle non-separable data in real-world situations.
IV. CONCLUSION
In this paper, we briefed various types of classification
techniques used for depression analysis. We have discussed
these classification techniques and also explained their
functionality in detail. We can conclude that the GMM
(Gaussian Mixture Model) when used provide less accurate
results than SVM (Support Vector Machine). The both
linear and non-linear features of speech signal can be
classified using the SVM (Support Vector Machine)
classifier.
ACKNOWLEDGMENT
We wish to express our sincere gratitude to Prof
Dr. V. M. Wadhai, Principal and Prof. Dr.PrassanaJoeg
H.O.D of Computer Engineering Technology, MITCOE
PUNE for guiding us in this survey. We also thank to our
friends and other MITCOE staff members for guidance and
encouragement in carrying out this work. We would like to
thank them for their valuable guidance throughout the
preparation of this.
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
ISSN: 2347-8578 www.ijcstjournal.org Page 27
REFERENCES
[1] Lu-Shih Alex Low*, Namunu C. Maddage, Margaret
Lech, Lisa B. Sheeber, and Nicholas B. Allen, Detection of
Clinical Depression in Adolescents Speech During Family
Interactions, ieee transactions on biomedical engineering,
vol. 58, no. 3, march 2011.
[2] Low Lu-Shih Alex, Thesis Detection of Clinical
Depression in Adolescents Using Acoustics Speech Analysis,
May 2011.
[3] BISHOP C. M., Pattern recognition and machine
learning. New York: Springer, 2006.
[4] BOSER, B. E., GUYON, I. M., and VAPNIK, V. N.,
A training algorithm for optimal margin classifiers,
Proceedings of the fifth annual workshop on
Computational learning theory, New York, NY, USA:
ACM, pp. 144-152, 1992.
[5] J. O. Cavenar, H. Keith, H. Brodie, and R. D. Weiner,
Signs and Symptomsin Psychiatry.Philadelphia: Lippincott
Williams & Wilkins, 1983.
[1]