Escolar Documentos
Profissional Documentos
Cultura Documentos
Heinrich Jiang
arXiv:1707.06261v1 [stat.ML] 19 Jul 2017
The next result shows that given an estimate of f that is 5 Global Maxima Estimation
uniformly bounded by , then the level sets of f can be
recovered at a rate of 1/ . In this section, we give guarantees on estimating the
global maxima of f .
Theorem 2 ((Super)-Level Set Recovery). Let f be con-
tinuous and satisfy -regularity at level . Suppose fk Definition 9. x0 is a maxima of f if f (x) < f (x0 ) for
satisfies all x B(x0 , r)\{x0 } for some r > 0.
sup |f (x) fk (x)| , We then make the following assumptions, which states
xX
that f has a unique maxima, where it has a negative-
1 definite Hessian.
where 0 < < 2 C min{rM , r0 } . If
Assumption 4. f has a unique maxima x0 :=
16 (2C)D/ log(4/) D log n argmaxxX f (x) and f has a negative-definite Hessian
n ,
vD pX,0 D/ at x0 .
These assumptions lead to the following, which states The volume of M is bounded above by a constant.
that f has quadratic smoothness and decay around x0 .
M has condition number 1/ , which controls the
Lemma 1 (Dasgupta and Kpotufe [5]). Let f satisfy As- curvature and prevents self-intersection.
sumption 4. Then there exists C, C, rM , > 0 such that
the following holds. Let pX be the density of P with respect to the uniform
2 2 measure on M .
C |x0 x| f (x0 ) f (x) C |x0 x|
for all x A0 where A0 is a connected component of We now give the manifold analogues of Theorem 1 and
{x : f (x) } and A0 contains B(x0 , rM ). Corollary 1.
Theorem 5 (k-NN Regression Rate). Suppose that As-
We utilize the following estimator, which is the maxi- sumptions 2, 3, and 5 hold and that
mizer of fk amongst sample points X = {x1 , ..., xn }.
k 28 D log2 (4/) log n,
b := argmax fk (x).
x d
xX 1 1
k min , pX,0 vd n.
4 4d
b in estimating
We next give the result of the accuracy of x
x0 . Then with probability at least 1 , the following holds
Theorem 4. Suppose that f is continuous and that As- uniformly in x X .
sumptions 1, 2, 3, and 4 hold. Let k satisfy 1/d !
4k
|f (x) fk (x)| uf x,
210 D log2 (4/) log n vd n pX,0
k 4 / 2 }
, r
min{1, C 2 rM D log n + log(2/)
( D/2 ) + 2 .
1 D
2
C rM k
k vD min r0 , n.
2 32 C
Similar to the full dimensional case, we can then apply
Then the following holds with probability at least 1 . this to the class of Hlder continuous functions.
r Corollary 2 (Rate for Hlder continuous functions).
2 32 D log n + log(2/) Suppose that Assumptions 2, 3, and 5 hold and that
|x x0 | max ,
C k
2/D k 28 D log2 (4/) log n,
32C 2k d
. 1 1
C pX,0 vD n k min , pX,0 vd n.
4 4d
Remark 5. Taking k n4/(4+D) optimizes the above
e 1/(4+D) ). This can If f is Hlder continuous (i.e. |f (x) f (x )| C |x
expression so that |x x0 | . O(n
x | ), then the following holds
be compared to the minimax rate for mode estimation
O(n1/(4+D) ) established by Tsybakov [20]. /d
4k
P sup |f (x) fk (x)| C
Remark 6. An analogue for global minima also holds. xX vd n pX,0
r !
D log n + log(2/)
6 Regression On Manifolds + 2 1 .
k
In this section, we show that if the data has a lower intrin-
Remark 7. Taking k = O(n2/(2+d) ) gives us a rate
sic dimension, then k-NN will automatically attain rates e /(2+d) ), which is more attractive than the full
of O(n
as if it were in the lower dimensional space and indepen-
e /(2+D) ) when intrinsic di-
dimensional version O(n
dent of the ambient dimension.
mension d is lower than ambient dimension D.
We make the following regularity assumptions which are
standard among works in manifold learning e.g. [8, 1].
7 Proofs
Assumption 5. P is supported on M where:
7.1 Supporting Results
M is a d-dimensional smooth compact Riemannian
manifold without boundary embedded in compact Suppose that P is the distribution corresponding to pX .
subset X RD . Let Pn be the empirical distribution w.r.t. x1 , ..., xn . We
need the following result giving guarantees on the masses Now we will bound |A|. Since H is finite, choose vec-
of empirical balls with respect to the mass of true balls. tors e1 , ..., ed such that they form an orthogonal basis of
Lemma 2 (Chaudhuri and Dasgupta [3]). Pick 0 < < Rd and none of these vectors are perpendicular to any
1. Assume that k D log n. Then with probability at H H. Let e1 , ..., ed induce hyperplanes H1 , ..., Hd ,
least 1 , for every ball B RD we have respectively (i.e. Hi being the orthogonal complement
of ei ). Without loss of generality, orient the space such
D log n that e1 is the vertical direction (i.e. so that we can use
P(B) C,n Pn (B) > 0 descriptions such as above and below). For each re-
n
k k k gion in A that is bounded below, associate such a region
P(B) + C,n Pn (B) to its
n n n n
lowest point. Then it follows that there are at most
D of these regions since they are the intersection of D
k k k hyperplanes.
P(B) C,n Pn (B) < .
n n n
We next count the regions unbounded below. Place H1
where C,n := 16 log(2/) D log n. below the lowest point corresponding the regions in A
that were bounded below. Then we have that the regions
7.2 Proof for k-NN Regression unbounded below are {A A : A H1 6= }. It thus
remains now to count A1 := {AH1 : A A, AH1 6=
The next result bounds rk (x) uniformly in x X . }.
Lemma 3. The following holds with probability at least We now orient the space so that e2 corresponds to the ver-
1 /2. If tical direction. Then we can repeat the same procedure
1 and for each region in A1 that is bounded below with the
28 D log2 (4/) log n k vD r0D n,
2 lowest point. There are at most D1 n
since they are
then an intersection of D 1 hyperplanes in H along with
1/D
2k H1 , and then placing e2 sufficiently low, the remaining
sup rk (x) . regions correspond to A2 := {A H1 H2 : A
xX vD n pX,0
A, A H1 H2 6= }.
1/D
2k
Proof. Let r = vD npX,0 . We have Continuing this process, it follows that when we orient
ei to be the vertical direction, in order to count Ai :=
P(B(x, r)) inf pX (x ) vD rD {A H1 Hi : A A, A H1 Hi 6= },
x B(x,r)X the number
of regions in Ai bounded below is at most
n
2k and the remaining ones are correspond to Ai+1 .
pX,0 vD rD = . Di
n P
It thus follows that |A| D n D
j=0 j Dn , as desired.
By Lemma 2 and the condition on k, it follows that with
probability 1 /2, uniformly in x X , Pn (B(x, r))
k
n . Hence, rk (x) < r and the result follows immediately.
Proof of Theorem 1. We have
The next result bounds the number of distinct k-NN sets |fk (x) f (x)|
over X . 1 X n
Lemma 4. Let M be the number of distinct k-NN sets = (i + f (xi ) f (x)) 1 [xi Nk (x)]
|Nk (x)|
i=1
over X , that is, M := |{Nk (x) : x X }|. Then
1 X n
M D nD . (f (xi ) f (x)) 1 [xi Nk (x)]
|Nk (x)|
i=1
1 X n
Proof.First, let A be the partitioning of X induced by
+ xi 1 [xi Nk (x)]
the n2 hyperplanes defined as the perpendicular bisec- |Nk (x)|
i=1
tors of each pair of points xi , xj for i 6= j. Let us denote
1 X n
this set of hyperplanes as H. We have that if x, x are
uf (x, rk (x)) + xi 1 [xi Nk (x)] .
in the same partition of A, then Nk (x) = Nk (x ). If Nk (x)
i=1
not, then any path from x to x must cross some perpen-
dicular bisector in Nk (x ) Nk (x), which would be a The first term can be viewed as the bias term and the
contradiction. Thus, M |A|. second can be viewed as variance term.
By Lemma 3, we can bound the first term as follows with 7.3 Proofs for Level Set Estimation
probability at least 1 /2 uniformly in x X .
1/D ! Proof of Theorem 2. Let r := (2/C)1/ . We have
2k
uf (x, rk (x)) uf x, .
X,0 vD n sup fk (x) C r + < .
X \(Lf ()r)
For the variance term, we have by Hoeffdings inequality
that if b f () Lf () r Lf () 2r. It
This shows that L
n
1 X remains to show the other direction, that
Ax := xi 1 [xi Nk (x)]
k b f () 2r.
i=1 Lf () L
then
! Define r := (/(2C))1/ . Since dH (Lf (), Lf ( +
2 t )) r, and r < r, it suffices to show that
P Ax > exp t2 .
k
p b f () r.
Lf ( + ) L
Taking t = D log n + log(2D/), then we have
! To do this, it is enough to show that for all x Lf (+),
2 t
P Ax > . (1) the following holds:
k 2D nD
By Lemma 4 and union bound, it follows that Pn (B(x, r)) > 0,
!
2 t and that (2) any x B(x, r) X satisfies fk (x ) .
P sup Ax > . We have
xX k 2
Hence, we have with probability at least 1 , P(B(x, r)) vD rD pX,0
1/D ! 16 log(4/)D log n
2k ,
|f (x) fk (x)| uf x, n
pX,0 vd n
r where the last inequality holds by the condition on n.
D log n + log(2/)
+ 2 . Thus by Lemma 2, Pn (B(x, r)) > 0. Finally, we have
k
uniformly in x X . inf fk (x ) + C r > ,
x B(x,r)
sup fk (x ) + + C r < + 2, where vold is the volume w.r.t. the uniform measure on
x B(x,r) M.
as desired.
The next is the manifold analogue of Lemma 3.
7.4 Proof of Global Maxima Estimation Lemma 6. Suppose that Assumptions 2, 3, and 5 hold.
The following holds with probability at least 1 /2. If
Proof of Theorem 4. Define the following.
r k 28 D log2 (4/) log n,
D log n + log(2/) d
var := 2
k 1 1
1/D k min , pX,0 vd n.
2k 4 4d
k :=
pX,0 vD n
then for all x M
r2 := max{16var/C, (2k /c)2 },
1/d
4k
where c2 = C/8C. The goal is now to show |xx0 | r. rk (x) .
The proof now mirrors that of Theorem 1 of Dasgupta vd n pX,0
and Kpotufe [5]. It suffices to show that
1/d
sup fk (x) < inf fk (x), Proof. Let r = 4k
. We have
xX \B(x0 ,r) xB(x0 ,rn ) vd npX,0