Escolar Documentos
Profissional Documentos
Cultura Documentos
!Srihari
Kernel Methods!
Sargur Srihari!
Machine Learning
!Srihari
1.
2.
3.
4.
5.
6.
7.
Machine Learning
!Srihari
Machine Learning
!Srihari
Memory-Based Methods!
Training data points are used in prediction phase!
Examples of such methods!
Parzen probability density model!
Linear combination of kernel functions centered on each
training data point!
Machine Learning
!Srihari
Kernel Functions!
Many linear parametric models can be re-cast into
equivalent dual representations where predictions are
based on a kernel function evaluated at training points!
Kernel function is given by!
!
!k (x,x) = (x)T (x)
where (x) is a fixed nonlinear feature space mapping (basis
function)!
Machine Learning
!Srihari
Used widely!
in support vector machines
in developing non-linear variant of PCA
In kernel Fisher discriminant
6
Machine Learning
!Srihari
Machine Learning
!Srihari
Dual Representation!
Linear models for regression and classification can be
reformulated in terms of a dual representation!
In which kernel function arises naturally!
2
1
J(w) = {wT (x n ) t n } + wT w
2 n=1
2
is the set of M
basis functions
or feature vector
Machine Learning
!Srihari
n=1
= an (x n )
n=1
= T a
!
Solution for w is a linear combination of vectors (xn)
whose coefficients
are functions of w where!
is the design matrix whose nth row is given by (xn)T
% 0 (x1 )
'
' .
= ' 0 (x n )
' .
'& (x )
0
N
.
.
.
M 1 (x1 ) (
*
.
*
. M 1 (x n )* is a N M matrix
*
.
. M 1 (x N )*)
.
!
an =
1 T
{w (x n ) tn }
Machine Learning
!Srihari
Transformation from w to a
!
Thus we have ! w = T a
Instead of working with parameter vector w we can
reformulate least squares algorithm in terms of parameter
vector a
giving rise to dual representation!
1 T
{w (x n ) tn }
10
Machine Learning
!Srihari
k(x1,x N ) %
'
'
'
'
k(x N ,x N1 )&
Notes:
is NxM and K is NxN
K is a matrix of similarities of pairs of samples (thus it is symmetric)
11
Machine Learning
!Srihari
J(w) = {wT (x n ) t n } + wT w
2 n=1
2
1
1
J(w) = a T T T a a T T t + t T t + a T T a
2
2
2
!where t = (t1,..,tN)T!
!
Sum of squares error function is written in terms of Gram
matrix as!
1
1
J(a) = a T KKa a T Kt + t T t + a T Ka
!
2
2
2
1
Solving for a by combining w=Ta and a = {w (x ) t }
!
!
!a =(K +IN)-1t
T
Machine Learning
!Srihari
Prediction Function!
!
y(x) = wT (x)
w=Ta
an =
1 T
{w (x n ) tn }
= a T (x)
= k(x)T (K + IN )1 t where k(x) has elements k n (x) = k(x n ,x)
Machine Learning
!Srihari
Machine Learning
!Srihari
Constructing Kernels!
To exploit kernel substitution need valid kernel
functions!
First Method!
choose a feature space mapping (x) and use it to find
corresponding kernel!
One-dimensional input space!
k(x, x') = (x)T (x')
M
= i (x) i (x')
i=1
15
Machine Learning
!Srihari
Gaussian!
Logistic Sigmoid!
Basis!
Functions!
(x)!
Kernel!
Functions!
k(x,x) = (x)T(x)
Red cross is x
16
Machine Learning
!Srihari
= (x) (z)
!
Feature mapping takes the form! ( x) = ( x12 , 2 x1 x 2 , x 22 )
Comprises of all second
order terms with a specific weighting!
Machine Learning
!Srihari
Mercers theorem: any continuous, symmetric, positive semidefinite kernel function k(x, y) can be expressed as a dot product in
a high-dimensional
space!
Machine Learning
!Srihari
Machine Learning
!Srihari
20
Machine Learning
!Srihari
Gaussian Kernel!
Commonly used kernel is!
! k(x,x) = exp (-||x-x||2/22)
It is seen as a valid kernel by expanding the square!
! ||x-x||2 = xTx + (x)Tx -2xTx
To give!
k(x,x) = exp (-xTx/22) exp (-xTx/2) exp (-(x)Tx/
22)
From kernel construction rules 2 and 4 !
together with validity of linear kernel k(x,x)=xTx
21
Machine Learning
!Srihari
k(A1, A2 ) = 2|A1 A 2 |
A={1,2,3,4,5}
A1={2,3,4,5}
A2={1,2,4,5}
A1A2={2,4,5}
Hence k(A1,A2)=8
What are feature vectors
(A1) and (A2)
such that
(A1)(A2)T=8?
Machine Learning
!Srihari
23
Machine Learning
!Srihari
24
Machine Learning
!Srihari
25
Machine Learning
!Srihari
26
Machine Learning
!Srihari
Fisher Kernel!
Alternative technique for using generative models!
Used in document retrieval, protein sequences, document recognition!
27
Machine Learning
!Srihari
!
This is the covariance matrix of the Fisher scores!
T 1
So the
Fisher
kernel!
k(x,x')
=
g(
,x)
F g(,x)
28
Machine Learning
!Srihari
Sigmoidal Kernel!
Provides a link between SVMs and neural networks!
!
!k (x,x) = tanh (axTx + b)
Its Gram matrix is not positive semidefinite!
But used in practice because it gives SVMs a superficial
resembalance to neural networks!