Escolar Documentos
Profissional Documentos
Cultura Documentos
0/26
The VC Dimension
Roadmap
1
Theory of Generalization
kX
1
N
i
i=0
| {z }
highest term N k 1
can be = actually,
go play and prove it if math lover! :-)
Theory of Generalization
kX
1
N
i
i=0
| {z }
highest term N k 1
positive rays:
mH (N) = N + 1 N + 1
mH (2) = 3 < 22 : break point at 2
positive intervals:
mH (N) = 21 N 2 + 21 N + 1 12 N 2 + 12 N + 1
mH (3) = 7 < 23 : break point at 3
2D perceptrons:
mH (N)=? 61 N 3 + 56 N + 1
mH (4) = 14 < 24 : break point at 4
Theory of Generalization
want:
h
i
P h H s.t. Ein (h)Eout (h) > 2
mH ( N)exp 2
2 N
The VC Dimension
Definition of VC Dimension
4(2N)k 1 exp 18 2 N
if 1 mH (N) breaks at k
(good H)
2 N large enough
(good D)
= probably generalized Eout Ein , and
if 3 A picks a g with small Ein
(good A)
= probably learned!
3/26
The VC Dimension
Definition of VC Dimension
VC Dimension
the formal name of maximum non-break point
Definition
VC dimension of H, denoted dVC (H) is
largest N for which mH (N) = 2N
the most inputs H that can shatter
dVC = minimum k - 1
N dVC
k > dVC
=
=
4/26
The VC Dimension
Definition of VC Dimension
mH (N) = N + 1
dVC = 1
positive intervals:
dVC = 2
convex sets:
mH (N) = 12 N 2 + 12 N + 1
mH (N) = 2N
up
dVC =
bottom
2D perceptrons:
dVC = 3
mH (N) N 3 for N 2
The VC Dimension
Definition of VC Dimension
unknown
P on X
training examples
D : (x1 , y1 ), , (xN , yN )
(historical records in bank)
hypothesis set
H
x
learning
algorithm
A
final hypothesis
gf
(learned formula to be used)
The VC Dimension
Definition of VC Dimension
Fun Time
If there is a set of N inputs that cannot be shattered by H. Based
only on this information, what can we conclude about dVC (H)?
1
dVC (H) = N
Reference Answer: 4
It is possible that there is another set of N
inputs that can be shattered, which means
dVC N. It is also possible that no set of N
input can be shattered, which means dVC < N.
Neither cases can be ruled out by one
non-shattering set.
7/26
The VC Dimension
VC Dimension of Perceptrons
2D PLA Revisited
linearly separable D
T large
N large
Ein (g) = 0
Eout (g) 0 :-)
8/26
The VC Dimension
VC Dimension of Perceptrons
VC Dimension of Perceptrons
1D perceptron (pos/neg rays): dVC = 2
2D perceptrons: dVC = 3
dVC 3:
dVC 3:
two steps:
dVC d + 1
dVC d + 1
9/26
The VC Dimension
VC Dimension of Perceptrons
Reference Answer: 1
dVC is the maximum that mH (N) = 2N , and
mH (N) is the most number of dichotomies of N
inputs. So if we can find 2d+1 dichotomies on
some d + 1 inputs, mH (d + 1) = 2d+1 and
hence dVC d + 1.
10/26
The VC Dimension
VC Dimension of Perceptrons
dVC d + 1
X=
visually in 2D:
xT1
xT2
xT3
..
.
xTd+1
1 0 0 ... 0
1 1 0 ... 0
1 0 1
0
.. ..
..
. 0
. .
1 0 ... 0 1
note: X invertible!
11/26
The VC Dimension
VC Dimension of Perceptrons
Can We Shatter X?
xT1
xT2
..
.
X=
xTd+1
1 0 0 ... 0
1 1 0 ... 0
.. ..
..
. 0
. .
1 0 ... 0 1
invertible
to shatter . . .
y1
(Xw) = y
X invertible!
w = X1 y
The VC Dimension
Degrees of Freedom
10
11
10
11
6
12
13
10
15
16
17
18
10
0
9
11
16
17
11
10
6
12
18
16
2
17
18
16
11
18
10
0
9
16
2
17
18
16
11
18
10
0
9
16
2
17
18
4
3
16
2
17
11
18
10
0
9
7
6
12
5
13
4 14
15
12
5
15
2
17
4 14
15
15
13
4 14
13
12
5
12
5
11
6
13
2
17
4 14
15
15
10
4 14
13
12
5
12
5
14
0
9
11
6
13
13
10
4 14
15
12
5
13
4 14
14
11
6
12
13
4 14
15
16
2
17
18
15
16
2
17
18
The VC Dimension
x1
x2
x3
...
h(x) = +1
xN
free parameters: a
x2
x3
h(x) = +1
...
h(x) = 1
xN
free parameters: `, r
practical rule of thumb:
dVC #free parameters (but not always)
18/26
The VC Dimension
M and dVC
copied from Lecture 5 :-)
1
can we make sure that Eout (g) is close enough to Ein (g)?
small M
1
large M
Yes!,
P[BAD] 2 M exp(. . .)
No!, too few choices
small dVC
1
No!,
P[BAD] 2 M exp(. . .)
Yes!, many choices
large dVC
Yes!, P[BAD]
4 (2N)dVC exp(. . .)
No!, P[BAD]
4 (2N)dVC exp(. . .)
Yes!, lots of power
The VC Dimension
Fun Time
Origin-crossing Hyperplanes are essentially perceptrons with w0
fixed at 0. Make a guess about the dVC of origin-crossing
hyperplanes in Rd .
1
d +1
Reference Answer: 2
The proof is almost the same as proving the
dVC for usual perceptrons, but it is the intuition
(dVC #free parameters) that you shall use to
answer this quiz.
20/26
The VC Dimension
Interpreting VC Dimension
BAD
4(2N)dVC exp 18 2 N
|
{z
}
Rephrase
. . ., with probability 1 , GOOD: Ein (g) Eout (g)
set
= 4(2N)dVC exp 18 2 N
1 2
=
exp
N
d
8
4(2N) VC
4(2N)dVC
1 2
ln
= 8 N
r
4(2N)dVC
8
ln
=
N
21/26
The VC Dimension
Interpreting VC Dimension
BAD
4(2N)dVC exp 18 2 N
|
{z
}
Rephrase
. . ., with probability 1 , GOOD!
r
gen. error Ein (g) Eout (g)
r
Ein (g)
8
N
ln
4(2N)dVC
8
N
ln
4(2N)dVC
r
Eout (g)
Ein (g) +
8
N
ln
4(2N)dVC
The VC Dimension
Interpreting VC Dimension
THE VC Message
with a high probability,
r
Eout (g) Ein (g) +
|
out-of-sample error
Error
model complexity
8
N
ln
4(2N)dVC
{z
(N,H,)
}
in the middle
best dVC
in-sample error
dvc
VC dimension, dvc
The VC Dimension
Interpreting VC Dimension
BAD
4(2N)dVC exp 18 2 N
|
{z
}
given specs = 0.1, = 0.1, dVC = 3, want 4(2N)dVC exp 81 2 N
N
100
1,000
10,000
100,000
29,300
bound
2.82 107
9.17 109
1.19 108
1.65 1038
9.99 102
sample complexity:
need N 10, 000dVC in theory
The VC Dimension
Interpreting VC Dimension
Looseness of VC Bound
i
h
PD Ein (g) Eout (g) >
4(2N)dVC exp 18 2 N
Why?
Hoeffding for unknown Eout
N dVC
instead of mH (N)
The VC Dimension
Interpreting VC Dimension
Fun Time
Consider the VC Bound below. How can we decrease the
probability of getting BAD data?
h
i
PD Ein (g) Eout (g) >
4(2N)dVC exp 18 2 N
Reference Answer:
Congratulations on being
Master of VC bound! :-)
25/26
The VC Dimension
Interpreting VC Dimension
Summary
1
2