Você está na página 1de 11

Multimedia Information

Retrieval
Dimension Reduction
Methods

Contents

Introduction
Dimension Reduction Method
Nearest Neighbor Queries
Using Only One Dimension
Representative Point Methods
Transformation into a Different and Smaller Feature Set

Introduction

Motivation: Perform search more efficiently using some indexing


structure
1 Dimension:

Data can be ordered

Building index using conventional methods (ex. B-Trees) -> search is much easier

High-Dimensional Data

Use Multidimensional indexing

Problem: Difficulties associated with multidimensional indexing

Dimension reduction to make multidimensional indexing feasible

Price: Lower quality of query results when performing search

Reduce to a dimension larger than one -> avoid Too much information loss

Dimension Reduction Method

Definition: A mapping f that transforms a vector v in the


original space to a vector v- = f(v) in the transformed lowerdimension space.
Let d, d- be the distance metrics in the original and
transformed space respectively.
n, k number of dimensions in the original and transformed
space respectively.
n>k
Distances in the transformed space approximate distances in
the original space i.e. d(u,v) d-(f(v),f(u))
Recall: Defines the query quality. The lower the query radius
(higher precision), the higher the recall
Pruning Property: d-(f(a),f(b)) <= d(a,b) ; // ensure 100% recall

Nearest Neighbor Queries

Used to:

Find the nearest object to a given point ex. Given a star, find the 5 closest stars

Find the closest object given a range ex. Find all stars between 5 and 20 light years
of a given star

Spatial joins ex. Find the three closest restaurants for each of two different movie theaters

Pruning property does not guarantee 100% recall


Proximity-preserving-property: preserve ordering of the observed
objects in the transformed space

given: d,d- the distance metric in the original and transformed space respectively.
Transformation f.

d(a,b) <= d(a,c) d-(f(a),f(b)) <= d-(f(a),f(c)) for any objects a,b, and c

Using Only One Dimension

Original data dimension is known ex. data is represented as feature vectors in a highdimensional space.
Simplest technique:

Ignore some of the features

Retain the most discriminating of the features

Method #1 Most drastic and easiest to implement:

Use just one of the given features without applying any transformation

Drawback: Many objects may be represented by the same point.

K-Nearest Neighbor Algorithm (Friedman, Baskett and Shustek)


Feature f has been chosen. All objects are sorted with respect to this feature
f-distance
Given q the query object, the k-nearest neighbors are found by processing the objects in increasing order of
their f-distance from q.
Stop processing when encountering an object o with f-distance from q is greater than actual distance from q to
the nearest kth-neighbor so far.

Using Only One Dimension

Efficiency of the k-nearest neighbor algorithm of Friedman et


al.[649] depends, in part, on which feature f is used.
Feature f can be obtained globally or locally.
Global Perspective: f is the feature with the largest range
(spread) of values

Local Perspective: f is the feature with the largest expected


range of values about the value of for query object q (qf)

We have to examine all the objects before starting the search.

Objects have to be sorted with respect to all features.


The local density around q depends of the expected number N - of
objects that will be examined during the search.

Friedman obtains N- from the radius of the expected search


region.
Use uniform distribution

Using Only One Dimension

Local density of feature i is determined by calculating the size


of the range containing the N-/2 values less than or equal to qi
and the N-/2 values greater than or equal to qi and choosing f
as the one with the largest range.

Friedman et al. vs Brute force algorithm


Friedman et al. is considerably more efficient when the
dimension of the underlying data is relatively small.
The dimension of the underlying data >= 9 brute force will do
much better.

Using Only One Dimension

Method #2 Representative feature:


Combine different features into one by using some information
from each of the features.
Example: Each object is represented by n different features,
each has 64-bit value.

Represent single number by concatenating the values of the MSB from


each of the n different features.

Drawback: May objects will be represented by the same point.


Method #3 Space-Ordering-Approach
Using one of the space-ordering methods such as Morton and
Peano-Hilbert orders
Drawback: Pruning proberty does not hold

Representative Point Methods

Transform a spatial object to a Representative Point in the space of the


same or higher dimension (with respect to the space from which they are
drawn).
Small-Sized Feature vectors.
Representative features of the object that will serve as the basis of the
feature vector
Example: Represent a t-dimensional object by

Its centroid -> t features

Axis-aligned minimum bounding rectangle -> 2.t features corresponding to the


coordinate values of two diagonally opposite corners.

Minimum bounding sphere -> t+1 features corresponding to the coordinate values of
the centroid plus the magnitude of the radius.

Dimension-reduction method:

The number of features used to represent the object has been reduced in comparison
with the feature per pixel method used to indicate the space occupied by the object

Drawback: Pruning Property will not hold

Transformation into a Different


and Smaller Feature Set

Subset of features that discriminate best between the data ->


base the spatial index only on them.
Methods depend on the data domain
Alternative:
Transform the features into another set of more relevant i.e.
discriminating features
Transform the data so that most of the information is
concentrated in a small number of features.

Você também pode gostar