Escolar Documentos
Profissional Documentos
Cultura Documentos
Retrieval
Dimension Reduction
Methods
Contents
Introduction
Dimension Reduction Method
Nearest Neighbor Queries
Using Only One Dimension
Representative Point Methods
Transformation into a Different and Smaller Feature Set
Introduction
Motivation: Perform search more efficiently using some indexing
structure
1 Dimension:
Data can be ordered
Building index using conventional methods (ex. B-Trees) -> search is much easier
High-Dimensional Data
Use Multidimensional indexing
Problem: Difficulties associated with multidimensional indexing
Dimension reduction to make multidimensional indexing feasible
Price: Lower quality of query results when performing search
Reduce to a dimension larger than one -> avoid Too much information loss
Dimension Reduction Method
Definition: A mapping f that transforms a vector v in the
original space to a vector v- = f(v) in the transformed lower-
dimension space.
Let d, d- be the distance metrics in the original and
transformed space respectively.
n, k number of dimensions in the original and transformed
space respectively.
n>k
Distances in the transformed space approximate distances in
the original space i.e. d(u,v) d-(f(u),f(v))
Recall: Defines the query quality. The lower the query radius
(higher precision), the higher the recall
Pruning Property: d-(f(a),f(b)) <= d(a,b) ; // ensure 100% recall
Nearest Neighbor Queries
Used to:
Find the nearest object to a given point e.g. Given a star, find the 5 closest stars
Find the closest object given a range e.g. Find all stars between 5 and 20 light
years of a given star
Spatial joins e.g. Find the three closest restaurants for each of two different movie theaters
Alternative:
Transform the features into another set of more relevant i.e.
discriminating features
Transform the data so that most of the information is
concentrated in a small number of features.