Multimedia Information Retrieval - Dimension Reduction Methods

Multimedia Information
Retrieval
Dimension Reduction
Methods
Contents
Introduction
Dimension Reduction Method
Nearest Neighbor Queries
Using Only One Dimension
Representative Point Methods
Transformation into a Different and Smaller Feature Set
Introduction
Motivation: Perform search more efficiently using some indexing

structure
1 Dimension:
Data can be ordered
Building index using conventional methods (ex. B-Trees) -> search is much easier
High-Dimensional Data
Use Multidimensional indexing
Problem: Difficulties associated with multidimensional indexing
Dimension reduction to make multidimensional indexing feasible
Price: Lower quality of query results when performing search
Reduce to a dimension larger than one -> avoid Too much information loss
Dimension Reduction Method
Definition: A mapping f that transforms a vector v in the

original space to a vector v- = f(v) in the transformed lowerdimension space.
Let d, d- be the distance metrics in the original and
transformed space respectively.
n, k number of dimensions in the original and transformed
space respectively.
n>k
Distances in the transformed space approximate distances in
the original space i.e. d(u,v) d-(f(v),f(u))
Recall: Defines the query quality. The lower the query radius
(higher precision), the higher the recall
Pruning Property: d-(f(a),f(b)) <= d(a,b) ; // ensure 100% recall
Nearest Neighbor Queries
Used to:
Find the nearest object to a given point ex. Given a star, find the 5 closest stars
Find the closest object given a range ex. Find all stars between 5 and 20 light years
of a given star
Spatial joins ex. Find the three closest restaurants for each of two different movie theaters
Pruning property does not guarantee 100% recall

Proximity-preserving-property: preserve ordering of the observed
objects in the transformed space
given: d,d- the distance metric in the original and transformed space respectively.
Transformation f.
d(a,b) <= d(a,c) d-(f(a),f(b)) <= d-(f(a),f(c)) for any objects a,b, and c
Original data dimension is known ex. data is represented as feature vectors in a highdimensional space.
Simplest technique:
Ignore some of the features
Retain the most discriminating of the features
Method #1 Most drastic and easiest to implement:
Use just one of the given features without applying any transformation
Drawback: Many objects may be represented by the same point.
K-Nearest Neighbor Algorithm (Friedman, Baskett and Shustek)

Feature f has been chosen. All objects are sorted with respect to this feature
f-distance
Given q the query object, the k-nearest neighbors are found by processing the objects in increasing order of
their f-distance from q.
Stop processing when encountering an object o with f-distance from q is greater than actual distance from q to
the nearest kth-neighbor so far.
Efficiency of the k-nearest neighbor algorithm of Friedman et

al.[649] depends, in part, on which feature f is used.
Feature f can be obtained globally or locally.
Global Perspective: f is the feature with the largest range
(spread) of values
Local Perspective: f is the feature with the largest expected

range of values about the value of for query object q (qf)
We have to examine all the objects before starting the search.
Objects have to be sorted with respect to all features.

The local density around q depends of the expected number N - of
objects that will be examined during the search.
Friedman obtains N- from the radius of the expected search

region.
Use uniform distribution
Local density of feature i is determined by calculating the size

of the range containing the N-/2 values less than or equal to qi
and the N-/2 values greater than or equal to qi and choosing f
as the one with the largest range.
Friedman et al. vs Brute force algorithm

Friedman et al. is considerably more efficient when the
dimension of the underlying data is relatively small.
The dimension of the underlying data >= 9 brute force will do
much better.
Method #2 Representative feature:

Combine different features into one by using some information
from each of the features.
Example: Each object is represented by n different features,
each has 64-bit value.
Represent single number by concatenating the values of the MSB from

each of the n different features.
Drawback: May objects will be represented by the same point.

Method #3 Space-Ordering-Approach
Using one of the space-ordering methods such as Morton and
Peano-Hilbert orders
Drawback: Pruning proberty does not hold
Representative Point Methods
Transform a spatial object to a Representative Point in the space of the

same or higher dimension (with respect to the space from which they are
drawn).
Small-Sized Feature vectors.
Representative features of the object that will serve as the basis of the
feature vector
Example: Represent a t-dimensional object by
Its centroid -> t features
Axis-aligned minimum bounding rectangle -> 2.t features corresponding to the

coordinate values of two diagonally opposite corners.
Minimum bounding sphere -> t+1 features corresponding to the coordinate values of
the centroid plus the magnitude of the radius.
Dimension-reduction method:
The number of features used to represent the object has been reduced in comparison
with the feature per pixel method used to indicate the space occupied by the object
Drawback: Pruning Property will not hold
Transformation into a Different

and Smaller Feature Set
Subset of features that discriminate best between the data ->

base the spatial index only on them.
Methods depend on the data domain
Alternative:
Transform the features into another set of more relevant i.e.
discriminating features
Transform the data so that most of the information is
concentrated in a small number of features.

Multimedia Information Retrieval - Dimension Reduction Methods

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Multimedia Information Retrieval - Dimension Reduction Methods

Enviado por

Direitos autorais:

Formatos disponíveis

Multimedia Information

Motivation: Perform search more efficiently using some indexing

Data can be ordered

Use Multidimensional indexing

Problem: Difficulties associated with multidimensional indexing

Dimension reduction to make multidimensional indexing feasible

Price: Lower quality of query results when performing search

Dimension Reduction Method

Definition: A mapping f that transforms a vector v in the

Nearest Neighbor Queries

Pruning property does not guarantee 100% recall

Using Only One Dimension

Ignore some of the features

Retain the most discriminating of the features

Method #1 Most drastic and easiest to implement:

Drawback: Many objects may be represented by the same point.

K-Nearest Neighbor Algorithm (Friedman, Baskett and Shustek)

Using Only One Dimension

Efficiency of the k-nearest neighbor algorithm of Friedman et

Local Perspective: f is the feature with the largest expected

We have to examine all the objects before starting the search.

Objects have to be sorted with respect to all features.

Friedman obtains N- from the radius of the expected search

Using Only One Dimension

Local density of feature i is determined by calculating the size

Friedman et al. vs Brute force algorithm

Using Only One Dimension

Method #2 Representative feature:

Represent single number by concatenating the values of the MSB from

Drawback: May objects will be represented by the same point.

Representative Point Methods

Transform a spatial object to a Representative Point in the space of the

Its centroid -> t features

Axis-aligned minimum bounding rectangle -> 2.t features corresponding to the

Drawback: Pruning Property will not hold

Transformation into a Different

Subset of features that discriminate best between the data ->

Você também pode gostar