Você está na página 1de 11

Multimedia Information

Retrieval

Dimension Reduction
Methods
Contents
Introduction
Dimension Reduction Method
Nearest Neighbor Queries
Using Only One Dimension
Representative Point Methods
Transformation into a Different and Smaller Feature Set
Introduction
Motivation: Perform search more efficiently using some indexing
structure
1 Dimension:
Data can be ordered
Building index using conventional methods (ex. B-Trees) -> search is much easier
High-Dimensional Data
Use Multidimensional indexing
Problem: Difficulties associated with multidimensional indexing
Dimension reduction to make multidimensional indexing feasible
Price: Lower quality of query results when performing search
Reduce to a dimension larger than one -> avoid Too much information loss
Dimension Reduction Method
Definition: A mapping f that transforms a vector v in the
original space to a vector v- = f(v) in the transformed lower-
dimension space.
Let d, d- be the distance metrics in the original and
transformed space respectively.
n, k number of dimensions in the original and transformed
space respectively.
n>k
Distances in the transformed space approximate distances in
the original space i.e. d(u,v) d-(f(u),f(v))
Recall: Defines the query quality. The lower the query radius
(higher precision), the higher the recall
Pruning Property: d-(f(a),f(b)) <= d(a,b) ; // ensure 100% recall
Nearest Neighbor Queries
Used to:
Find the nearest object to a given point e.g. Given a star, find the 5 closest stars
Find the closest object given a range e.g. Find all stars between 5 and 20 light
years of a given star
Spatial joins e.g. Find the three closest restaurants for each of two different movie theaters

Pruning property does not guarantee 100% recall


Proximity-preserving-property: preserve ordering of the observed
objects in the transformed space
given: d,d- the distance metric in the original and transformed space respectively.
Transformation f.
d(a,b) <= d(a,c) d-(f(a),f(b)) <= d-(f(a),f(c)) for any objects a,b, and c
Using Only One Dimension
Original data dimension is known e.g. data is represented as feature vectors in a high-
dimensional space.
Simplest technique:
Ignore some of the features
Retain the most discriminating of the features
Method #1 Most drastic and easiest to implement:
Use just one of the given features without applying any transformation
Drawback: Many objects may be represented by the same point.
K-Nearest Neighbor Algorithm (Friedman, Baskett and Shustek)
Feature f has been chosen. All objects are sorted with respect to this feature
f-distance
Given q the query object, the k-nearest neighbors are found by processing the objects in increasing order of
their f-distance from q.
Stop processing when encountering an object o with f-distance from q is greater than actual distance from q to
the nearest kth-neighbor so far.
Using Only One Dimension
Efficiency of the k-nearest neighbor algorithm of Friedman et
al.[649] depends, in part, on which feature f is used.
Feature f can be obtained globally or locally.
Global Perspective: f is the feature with the largest range
(spread) of values
We have to examine all the objects before starting the search.
Local Perspective: f is the feature with the largest expected
range of values about the value of for query object q (qf)
Objects have to be sorted with respect to all features.
The local density around q depends of the expected number N - of
objects that will be examined during the search.
Friedman obtains N- from the radius of the expected search
region.
Use uniform distribution
Using Only One Dimension
Local density of feature i is determined by calculating the size
of the range containing the N-/2 values less than or equal to qi
and the N-/2 values greater than or equal to qi and choosing f
as the one with the largest range.

Friedman et al. vs. Brute force algorithm


Friedman et al. is considerably more efficient when the
dimension of the underlying data is relatively small.
The dimension of the underlying data >= 9 brute force will do
much better.
Using Only One Dimension
Method #2 Representative feature:
Combine different features into one by using some information
from each of the features.
Example: Each object is represented by n different features,
each has 64-bit value.
Represent single number by concatenating the values of the MSB from
each of the n different features.
Drawback: May objects will be represented by the same point.
Method #3 Space-Ordering-Approach
Using one of the space-ordering methods such as Morton and
Peano-Hilbert orders
Drawback: Pruning proberty does not hold
Representative Point Methods
Transform a spatial object to a Representative Point in the space of the
same or higher dimension (with respect to the space from which they are
drawn).
Small-Sized Feature vectors.
Representative features of the object that will serve as the basis of the
feature vector
Example: Represent a t-dimensional object by
Its centroid -> t features
Axis-aligned minimum bounding rectangle -> 2.t features corresponding to the
coordinate values of two diagonally opposite corners.
Minimum bounding sphere -> t+1 features corresponding to the coordinate values of
the centroid plus the magnitude of the radius.
Dimension-reduction method:
The number of features used to represent the object has been reduced in comparison
with the feature per pixel method used to indicate the space occupied by the object
Drawback: Pruning Property will not hold
Transformation into a Different
and Smaller Feature Set
Subset of features that discriminate best between the data ->
base the spatial index only on them.
Methods depend on the data domain

Alternative:
Transform the features into another set of more relevant i.e.
discriminating features
Transform the data so that most of the information is
concentrated in a small number of features.

Você também pode gostar