Você está na página 1de 85

iirs DIGITAL IMAGE CLASSIFICATION

POONAM S. TIWARI
Photogrammetry and Remote Sensing
Division
Digital Image Classification Poonam S.Tiwari
iirs
Main lecture topics
• What is it and why use it?
• Image space versus feature space
• Distances in feature space
• Decision boundaries in feature space
• Unsupervised versus supervised training
• Classification algorithms
• Validation (how good is the result?)
• Problems

Digital Image Classification Poonam S.Tiwari


iirs What is Digital Image Classification
Multispectral classification is the process of sorting pixels
into a finite number of individual classes, or categories of
data, based on their data file values. If a pixel satisfies a
certain set of criteria , the pixel is assigned to the class
that corresponds to that criteria.
Multispectral classification may be performed using a
variety of algorithms
•Hard classification using supervised or unsupervised
approaches.
• Classification using fuzzy logic, and/or
•Hybrid approaches often involving use of ancillary
information.

Digital Image Classification Poonam S.Tiwari


iirs What is Digital Image
Classification
• grouping of similar
pixels
• separation of dissimilar
ones
• assigning class label to
pixels
• resulting in manageable
size of classes

Digital Image Classification Poonam S.Tiwari


iirs
CLASSIFICATION METHODS
MANUAL
• visual interpretation
• combination of spectral and spatial information

COMPUTER ASSISTED
• mainly spectral information

STRATIFIED
• using GIS functionality to incorporate
• knowledge from other sources of information

Digital Image Classification Poonam S.Tiwari


iirs
Why use it?
• To translate continuous variability of
image data into map patterns that
provide meaning to the user.
• To obtain insight in the data with
respect to ground cover and surface
characteristics.
• To find anomalous patterns in the
image data set.

Digital Image Classification Poonam S.Tiwari


iirs
Why use it? (advantages)

• Cost efficient in the analyses of large


data sets
• Results can be reproduced
• More objective then visual
interpretation
• Effective analysis of complex multi-
band (spectral) interrelationships

Digital Image Classification Poonam S.Tiwari


iirs
Dimensionality of
Data

• Spectral Dimensionality is determined by


the number of sets of values being used in
a process.
• In image processing, each band of data is
a set of values. An image with four bands
of data is said to be four-dimensional
(Jensen, 1996).

Digital Image Classification Poonam S.Tiwari


iirs
Measurement Vector

• The measurement vector of a pixel is the


set of data file values for one pixel in all n
bands.

• Although image data files are stored band-


by-band, it is often necessary to extract the
measurement vectors for individual pixels.

Digital Image Classification Poonam S.Tiwari


iirs

Digital Image Classification Poonam S.Tiwari


iirs
Mean Vector
•When the measurement vectors of several pixels
are analyzed, a mean vector is often calculated.
•This is the vector of the means of the data file
values in each band. It has n elements.

Mean Vector µI =

Digital Image Classification Poonam S.Tiwari


iirs Image space

Single-band Image Multi-band Image

• Image space (col,row)


• array of elements corresponding to reflected or
emitted energy from IFOV
• spatial arrangement of the measurements of the
reflected or emitted energy

Digital Image Classification Poonam S.Tiwari


iirs
Feature Space:
• A feature space image is simply a graph of
the data file values of one band of data
against the values of another band.
ANALYZING PATTERNS IN MULTISPECTRAL DATA
C

PIXEL A: 34,25
PIXEL B: 34,24
PIXEL C: 11,77 A
B

Digital Image Classification Poonam S.Tiwari


iirs One-dimensional feature space

Input layer

No distinction between classes

Distinction between classes

Digital Image Classification Poonam S.Tiwari


iirs Feature Space Multi-dimensional

Feature vectors

Digital Image Classification Poonam S.Tiwari


iirs Feature space (scattergram)

Low frequency

High
frequency

Two/three dimensional graph or scattered diagram


formation of clusters of points representing DN values in
two/three spectral bands.

Each cluster of points corresponds to a certain cover type on


ground
Digital Image Classification Poonam S.Tiwari
iirs
Distances and clusters in feature space

band y
. .
..
(units of 5 DN)
Max y

. ... .
(0,0) band x (units of 5 DN) Min y ..
Euclidian distance
(0,0) Min x Max x

Cluster

Digital Image Classification Poonam S.Tiwari


iirs Spectral Distance
Euclidean Spectral distance is distance in n- dimensional
spectral space. It is a number that allows two measurement
vectors to be compared for similarity. The spectral distance
between two pixels can be calculated as follows:

Where:
D = spectral distance
n = number of bands (dimensions)
i = a particular band
di = data file value of pixel d in band i
ei = data file value of pixel e in band i
This is the equation for Euclidean distance—in two dimensions (when n = 2), it
can be simplified to the Pythagorean Theorem (c2 = a2 + b2), or in this case:
D2 = (di - ei)2 + (dj - ej)2

Digital Image Classification Poonam S.Tiwari


iirs
Image classification process
2 4 Definition of the clusters in
Selection of the
the feature space
1 image data

5 Validation of the result

Digital Image Classification Poonam S.Tiwari


iirs
• It is also important for the analyst to realize that
there is a fundamental difference between
information classes and spectral classes.

• Information classes are those that human beings


define.

• Spectral classes are those that are inherent in


the remote sensor data and must be identified
and then labeled by the analyst.

Digital Image Classification Poonam S.Tiwari


iirs
SUPERVISED CLASSIFICATION :

• The identity and location of some of the land cover types


such as urban, agriculture, wetland are known a priori
through a combination of field work and experience.

• The analyst attempts to locate specific sites in the


remotely sensed data that represent homogenous
examples of these known land cover types known as
training sites.

• Multivariate statistical parameters are calculated for


these training sites.

• Every pixel both inside and outside the training sites is


evaluated and assigned to the class of which it has the
highest likelihood of being a member.
Digital Image Classification Poonam S.Tiwari
iirs Supervised image classification

Steps in supervised
classification
• Identification of sample
areas (training areas)
• Partitioning of the A class sample
feature space • Is a number of
training pixels
•Forms a cluster in
feature space

A cluster
• Is the representative for
a class
• Includes a minimum
number of observations
(30*n)
• Is distinct
Digital Image Classification Poonam S.Tiwari
iirs
UNSUPERVISED CLASSIFICATION
• The identities of land cover types to be specified
as classes within a scene are generally not
known a priori because ground reference
information is lacking or surface features within
the scene are not well defined.
• The computer is required to group pixels with
similar spectral characteristics into unique
clusters according to some statistically
determined criteria.
• Analyst then combine and relabels the spectral
clusters into information classes.

Digital Image Classification Poonam S.Tiwari


iirs
Supervised vs.
Unsupervised Training
• In supervised training, it is important to have a set of desired
classes in mind, and then create the appropriate signatures
from the data.

• Supervised classification is usually appropriate when you


want to identify relatively few classes, when you have
selected training sites that can be verified with ground truth
data, or when you can identify distinct, homogeneous regions
that represent each class.

• On the other hand, if you want the classes to be determined


by spectral distinctions that are inherent in the data so that
you can define the classes later, then the application is better
suited to unsupervised training. Unsupervised training
enables you to define many classes easily, and identify
classes that are not in contiguous, easily recognized regions.

Digital Image Classification Poonam S.Tiwari


iirs SUPERVISED CLASSIFICATION

• In supervised training, you rely on your own


pattern recognition skills and a priori
knowledge of the data to help the system
determine the statistical criteria (signatures)
for data classification.
• To select reliable samples, you should know
some information—either spatial or
spectral—about the pixels that you want to
classify.

Digital Image Classification Poonam S.Tiwari


iirs Partition of a feature space

class a

• decide on
decision
class c boundaries

class b • assign a class


to each pixel
class d

Digital Image Classification Poonam S.Tiwari


iirs
Training Samples and
Feature Space Objects

• Training samples (also called samples)


are sets of pixels that represent what is
recognized as a discernible pattern, or
potential class. The system calculates
statistics from the sample pixels to create
a parametric signature for the class.

Digital Image Classification Poonam S.Tiwari


iirs
Selecting Training
Samples
• Training data for a class should be
collected from homogeneous environment.
• Each site is usually composed of many
pixels-the general rule is that if training
data is being collected from n bands then
>10n pixels of training data is to be
collected for each class. This is sufficient
to compute variance-covariance matrices
required by some classification algorithms.

Digital Image Classification Poonam S.Tiwari


iirs
There are a number of ways to collect training
site data-

•using a vector layer


• defining a polygon in the image
• using a class from a thematic raster
layer from an image file of the same
area (i.e., the result of an unsupervised
classification)

Digital Image Classification Poonam S.Tiwari


iirs
Evaluating Signatures

• There are tests to perform that can help


determine whether the signature data are a true
representation of the pixels to be classified for
each class. You can evaluate signatures that
were created either from supervised or
unsupervised training.

Digital Image Classification Poonam S.Tiwari


iirs
Evaluation of Signatures
• Ellipse—view ellipse diagrams and scatterplots of data file
values for every pair of bands.

Digital Image Classification Poonam S.Tiwari


iirs Evaluation of Signatures…………..

Signature separability is a statistical measure of distance between


two signatures. Separability can be calculated for any combination
of bands that is used in the classification, enabling you to rule out
any bands that are not useful in the results of the classification.

1. Euclidian Distance:

Where:
D = spectral distance
n = number of bands (dimensions)
i = a particular band
di = data file value of pixel d in band i
ei = data file value of pixel e in band i

Digital Image Classification Poonam S.Tiwari


iirs Signature Seperability………

2. Divergence

Digital Image Classification Poonam S.Tiwari


iirs Signature Seperability………
3. Transformed Divergence

The scale of the divergence values can range from 0 to 2,000. As a


general rule, if the result is greater than 1,900, then the classes can be
separated. Between 1,700 and 1,900, the separation is fairly good. Below
1,700, the separation is poor (Jensen, 1996).

Digital Image Classification Poonam S.Tiwari


iirs Signature Seperability………
4. Jeffries-Matusita Distance

Range of JM is between 0 and 1414. The JM distance has a saturating


behavior with increasing class separation like transformed divergence.
However, it is not as computationally efficient as transformed divergence”
(Jensen, 1996).

Digital Image Classification Poonam S.Tiwari


iirs SELECTING APPROPRIATE
CLASSIFICATION ALGORITHM
• Various supervised classification algorithms may be
used to assign an unknown pixel to one of the classes.
• The choice of particular classifier depends on nature of
input data and output required.
• Parametric classification algorithms assume that the
observed measurement vectors Xc , obtained for each
class in each spectral band during the training phase are
Gaussian in nature.
• Non Parametric classification algorithms make no such
assumptions.
• There are many classification algorithms i.e.
Parallelepiped, Minimum distance, Maximum Likelihood
etc.

Digital Image Classification Poonam S.Tiwari


iirs PARALLELEPIPED CLASSIFICATION
ALGORITHM
¾ In the parallelepiped decision rule, the data file values of the
candidate pixel are compared to upper and lower limits. These
limits can be either:
1. the minimum and maximum data file values of each band in the
signature,
2. the mean of each band, plus and minus a number of standard
deviations, or
3. any limits that you specify, based on your knowledge of the data
and signatures.
¾ There are high and low limits for every signature in
every band. When a pixel’s data file values are between
the limits for every band in a signature, then the pixel is
assigned to that signature’s class.

Digital Image Classification Poonam S.Tiwari


iirs
• Therefore, if the low and high decision
boundaries are defined as
Lck= µck - Sck
and
Hck= µck + Sck
• The parallelepiped algorithm becomes
Lck ≤ BVijk ≤ Hck

Digital Image Classification Poonam S.Tiwari


iirs

Means and Standard Deviations Partitioned Feature Space


255 255

Band 2 Band 2

0 0
0 Band 1 255 0 Band 1 255

Feature Space Partitioning - Box classifier

Digital Image Classification Poonam S.Tiwari


iirs

Class “unknown”

Digital Image Classification Poonam S.Tiwari


iirs Points a and b are pixels in
the image to be classified.
Pixel a has a brightness value
of 40 in band 4 and 40 in
band 5. Pixel b has a
brightness value of 10 in band
4 and 40 in band 5. The boxes
represent the parallelepiped
decision rule associated with
a ±1s classification. The
vectors (arrows) represent the
distance from a and b to the
mean of all classes in a
minimum distance to means
classification algorithm.

Digital Image Classification Poonam S.Tiwari


iirs

Digital Image Classification Poonam S.Tiwari


iirs Overlap Region

In cases where a pixel may fall into the overlap


region of two or more parallelepipeds, you must
define how the pixel can be classified.

• The pixel can be classified by the order of the


signatures.

• The pixel can be classified by the defined


parametric decision rule.

• The pixel can be left unclassified.

Digital Image Classification Poonam S.Tiwari


iirs
ADVANTAGES:
™Fast and simple.
™Gives a broad classification thus narrows down the number of
possible classes to which each pixel can be assigned before more
time consuming calculations are made.
™Not dependent on normal distributions.

DISADVANTAGES:
™Since parallelepiped has corners, pixels that are actually quite far,
spectrally from the mean of the signature may be classified

Parallelepiped Corners
Compared to the
Signature Ellipse

Digital Image Classification Poonam S.Tiwari


iirs MINIMUM DISTANCE TO MEANS
CLASSIFICATION ALGORITHM
9 This decision rule is computationally simple and commonly
used.
9 Requires mean vectors for each class in each band µck from
the training data.
9 Euclidean distance is calculated for all the pixels with all the
signature means
D = √ (BVijk- µck)2 + (BVijl- µcl)2
Where
µck and µcl represent the mean vectors for class c measured in
bands k and l
9 Any unknown pixel will definitely be assigned to one of any
classes, there will be no unclassified pixel.

Digital Image Classification Poonam S.Tiwari


iirs
MINIMUM DISTANCE TO MEANS

Histogram of training set

300

200

100

0 31 63 95 127 159 191 223 255

Decision rule:
Priority to the shortest distance to the class mean

Digital Image Classification Poonam S.Tiwari


iirs Feature Space Partitioning - Minimum
Distance to Mean Classifier 255

"Unknown"

Mean vectors Band 2


255

0
Band 2 0 Band 1 255

255

0
0 Band 1 255

Band 2

0
0 Band 1 255

Threshold Distance

Digital Image Classification Poonam S.Tiwari


iirs
ADVANTAGES:
• Since every pixel is spectrally closer to
either one sample mean or other so there
are no unclassified pixels.
• Fastest after parallelepiped decision rule.
DISADVANTAGES:
• Pixels which should be unclassified will
become classified.
• Does not consider class variability.

Digital Image Classification Poonam S.Tiwari


iirs
Mahalanobis Decision Rule
• Mahalanobis distance is similar to minimum distance, except that
the covariance matrix is used in the equation. Variance and
covariance are figured in so that clusters that are highly varied lead
to similarly varied classes,

Digital Image Classification Poonam S.Tiwari


iirs
Advantages
• Takes the variability of classes into account, unlike
minimum distance or parallelepiped.
• May be more useful than minimum distance in cases
where statistical criteria (as expressed in the covariance
matrix) must be taken into account
Disadvantages
• Tends to overclassify signatures with relatively large
values in the covariance matrix.
• Slower to compute than parallelepiped or minimum
distance.
• Mahalanobis distance is parametric, meaning that it
relies heavily on a normal distribution of the data in each
input band.

Digital Image Classification Poonam S.Tiwari


iirs
Maximum Likelihood/Bayesian
Decision Rule
• The maximum likelihood decision rule is based on the
probability that a pixel belongs to a particular class. The
basic equation assumes that these probabilities are
equal for all classes, and that the input bands have
normal distributions.
• If you have a priori knowledge that the probabilities are
not equal for all classes, you can specify weight factors
for particular classes. This variation of the maximum
likelihood decision rule is known as the Bayesian
decision rule (Hord, 1982).

Digital Image Classification Poonam S.Tiwari


iirs The equation for the maximum likelihood/Bayesian classifier is as
follows:

The pixel is assigned to the class, c, for which D is the lowest.

Digital Image Classification Poonam S.Tiwari


iirs

Digital Image Classification Poonam S.Tiwari


iirs
Advantages
• The most accurate of the classifiers (if the input samples/clusters
have a normal distribution), because it takes the most variables into
consideration.
• Takes the variability of classes into account by using the covariance
matrix, as does Mahalanobis distance.
Disadvantages
• An extensive equation that takes a long time to compute. The
computation time increases with the number of input bands.
• Maximum likelihood is parametric, meaning that it relies heavily on a
normal distribution of the data in each input band.
• Tends to overclassify signatures with relatively large
values in the covariance matrix.

Digital Image Classification Poonam S.Tiwari


iirs UNSUPERVISED CLASSIFICATION
• It requires only a minimum amount of initial input from the
analyst.
• Numerical operations are performed that search for natural
groupings of the spectral properties of pixels.
• User allows computer to select the class means and
covariance matrices to be used in the classification.
• Once the data are classified, the analyst attempts a posteriori
to assign these natural or spectral classes to the information
classes of interest.
• Some clusters may be meaningless because they represent
mixed classes.
• Clustering algorithm used for the unsupervised classification
generally vary according to the efficiency with which the
clustering takes place.
• Two commonly used methods are-
– Chain method
– Isodata clustering

Digital Image Classification Poonam S.Tiwari


iirs
CHAIN METHOD
• Operates in two pass mode( it passes through the
registered multispectral dataset two times).
• In the first pass the program reads through the dataset
and sequentially builds clusters.
• A mean vector is associated with each cluster.
• In the second pass a minimum distance to means
classification algorithm is applied to whole dataset on a
pixel by pixel basis whereby each pixel is assigned to
one of the mean vectors created in pass 1.
• The first pass automatically creates the cluster
signatures to be used by supervised classifier.

Digital Image Classification Poonam S.Tiwari


iirs PASS 1: CLUSTER BUILDING

• During the first pass the analyst is required to


supply four types of information-
• R , the radius distance in spectral space used to
determine when a new cluster should be formed.
• C, a spectral space distance parameter used
when merging clusters when N is reached.
• N , the number of pixels to be evaluated between
each major merging of clusters.
• Cmax maximum no. of clusters to be identified.

PASS 2: Assignment of pixels to one of the Cmax


clusters using minimum distance classification logic

Digital Image Classification Poonam S.Tiwari


iirs

Original brightness values of pixels 1, 2, and


3 as measured in Bands 4 and 5 of the
hypothetical remote sensed data.

Digital Image Classification Poonam S.Tiwari


iirs

The distance (D) in 2-dimensional spectral space between pixel 1


(cluster 1) and pixel 2 (potential cluster 2) in the first iteration is
computed and tested against the value of R=15, the minimum
acceptable radius. In this case, D does not exceed R. Therefore, we
merge clusters 1 and 2 as shown in the next illustration.

Digital Image Classification Poonam S.Tiwari


iirs

Pixels 1 and 2 now represent cluster #1. Note that the location of cluster 1 has
migrated from 10,10 to 15,15 after the first iteration. Now, pixel 3 distance
(D=15.81) is computed to see if it is greater than the minimum threshold, R=15. It
is, so pixel location 3 becomes cluster #2. This process continues until all 20
clusters are identified. Then the 20 clusters are evaluated using a distance measure,
C (not shown), to merge the clusters that are closest to one another.

Digital Image Classification Poonam S.Tiwari


iirs

How clusters migrate during the several iterations of a clustering


algorithm. The final ending point represents the mean vector that
would be used in phase 2 of the clustering process when the
minimum distance classification is performed.

Digital Image Classification Poonam S.Tiwari


iirs
• Note: As more points are added to a cluster, the mean
shifts less dramatically since the new computed mean is
weighted by the number of pixels currently in a cluster.
The ending point is the spectral location of the final
mean vector that is used as a signature in the minimum
distance classifier applied in pass 2.

• Some clustering algorithms allow the analyst to


initially seed the mean vector for several of the
important classes. The seed data are usually obtained in
a supervised fashion, as discussed previously. Others
allow the analyst to use a priori information to direct the
clustering process.

Digital Image Classification Poonam S.Tiwari


iirs

Pass 2: Assignment of Pixels to One of the Cmax


Clusters Using Minimum Distance
Classification Logic
The final cluster mean data vectors are used in a
minimum distance to means classification
algorithm to classify all the pixels in the image
into one of the Cmax clusters.

Digital Image Classification Poonam S.Tiwari


iirs ISODATA Clustering
The Iterative Self-Organizing Data Analysis Technique (ISODATA)
represents a comprehensive set of heuristic (rule of thumb) procedures that
have been incorporated into an iterative classification algorithm.

The ISODATA algorithm is a modification of the k-means clustering


algorithm, which includes a) merging clusters if their separation distance in
multispectral feature space is below a user-specified threshold and b) rules
for splitting a single cluster into two clusters.

ISODATA is iterative because it makes a large number of passes


through the remote sensing dataset until specified results are
obtained, instead of just two passes.

ISODATA does not allocate its initial mean vectors based on the
analysis of pixels rather, an initial arbitrary assignment of all Cmax
clusters takes place along an n-dimensional vector that runs between
very specific points in feature space.
Digital Image Classification Poonam S.Tiwari
iirs
ISODATA algorithm normally requires the analyst
to specify-
ƒ Cmax : maximum no. of clusters to be identified.
ƒ T:maximum % of pixels whose class values are
allowed to be unchanged between iterations.
ƒ M :maximum no. of times isodata is to classify pixels
and recalculate cluster mean vectors.
ƒ Minimum members in a cluster
ƒ Maximum standard deviation for a cluster.
ƒ Split separation value (if the valuse is changed from 0.0, it takes
the place of S.D. )
ƒ Minimum distance between cluster means.

Digital Image Classification Poonam S.Tiwari


iirs Phase 1: ISODATA Cluster Building using many
passes through the dataset.
a) ISODATA initial distribution of five
hypothetical mean vectors using ±1s standard
deviations in both bands as beginning and
ending points.
b) In the first iteration, each candidate pixel is
compared to each cluster mean and assigned
to the cluster whose mean is closest in
Euclidean distance.
c) During the second iteration, a new mean is
calculated for each cluster based on the actual
spectral locations of the pixels assigned to
each cluster, instead of the initial arbitrary
calculation. This involves analysis of several
parameters to merge or split clusters. After
the new cluster mean vectors are selected,
every pixel in the scene is assigned to one of
the new clusters.
d) This split–merge–assign process continues
until there is little change in class assignment
between iterations (the T threshold is
reached) or the maximum number of
iterations is reached (M).

Digital Image Classification Poonam S.Tiwari


iirs

a) Distribution of 20 ISODATA
mean vectors after just one
iteration
b) Distribution of 20 ISODATA
mean vectors after 20 iterations.
The bulk of the important
feature space (the gray
background) is partitioned rather
well after just 20 iterations.

Digital Image Classification Poonam S.Tiwari


iirs
Sources of Uncertainty in Image
Classification

1.Non-representative training areas

2. High variability in the spectral


signatures for a land cover class

3. Mixed land cover within the pixel area

Digital Image Classification Poonam S.Tiwari


iirs
Evaluating
Classification

• After a classification is performed, these


methods are available for testing the accuracy of
the classification:

• Thresholding—Use a probability image file to


screen out misclassified pixels.
• Accuracy Assessment —Compare the
classification to ground truth or other data.

Digital Image Classification Poonam S.Tiwari


iirs
Accuracy Assessment

Accuracy assessment is a general term for


comparing the classification to geographical data
that are assumed to be true, in order to determine
the accuracy of the classification process. Usually,
the assumed-true data are derived from ground
truth data.

Digital Image Classification Poonam S.Tiwari


iirs Accuracy Assesement……

• Assessing accuracy of a remote sensing


output is one of the most important steps
in any classification exercise!!
• Without an accuracy assessment the
output or results is of little value.

Digital Image Classification Poonam S.Tiwari


iirs

There are a number of issues relevant to the generation


and assessment of errors in a classification.

These include:

• the nature of the classification;


• Sample design and
• assessment sample size.

Digital Image Classification Poonam S.Tiwari


iirs Nature of Classification:

– 1) Class definition problems occur when trying to extract information


from a image, such as tree height, which is unrealistic. If this
happens the error rate will increase.

– 2) A common problem is classifying remotely sensed data is to use


inappropriate class labels, such as cliff, lake or river all of which are
landforms and not cover-types. Similarly a common error is that of
using class labels which define land-uses. These features are
commonly made up of several cover classes.

– 3) The final point here, in terms of the potential for generation of error
is the mislabeling of classes. The most obvious example of this is to
label a training site water when in fact it is something else. This will
result in, at best a skewing of your class statistics if your training site
samples are sufficiently large, or at worst shifting the training
statistics entirely if your sites are relatively small.

Digital Image Classification Poonam S.Tiwari


iirs Sample Design:
• In addition to being independent of the original training sample the sample used
must be of a design that will insure consistency and objectivity.

• A number of sampling techniques can be used. Some of these include random,


systematic, and stratified random.

• Of the three the systematic sample is the least useful. This approach to
sampling may result in a sample distribution which favours a particular class
depending on the distribution of the classes within the map.

• Only random sample designs can guarantee an unbiased sample.

• The truly random strategy however may not yield a sample design that covers
the entire map area, and so may be less than ideal.

• In many instances the stratified random sampling strategy is the most useful
tool to use. In this case the map area is stratified based on either a
systematic breakdown followed by a random sample design in each of the
systematic subareas, or alternatively through the application of a random
sample within each of the map classes. The use of this approach will ensure
that one has an adequate cover for the entire map as well as generating a
sufficient number of samples for each of the classes on the map
Digital Image Classification Poonam S.Tiwari
iirs Sample Size:

• The size of the sample used must be sufficiently large to be


statistically representative of the map area. The number of
points considered necessary varies, depending on the
method used to estimate.

• What this means is that when using a systematic or random


sample size, the number of points are kept to a
manageable number. Because the number of points
contained within a stratified area is usually high, that is
greater than 10000, the number of samples used to test
the accuracy of the classes through a stratified random
sample will be high as well, so the cost for using a highly
accurate sampling strategy is a large number of samples

Digital Image Classification Poonam S.Tiwari


iirs
ERROR MATRIX
Once a classification has been sampled a
contingency table (also referred to as an
error matrix or confusion matrix) is
developed.
• This table is used to properly analyze the
validity of each class as well as the
classification as a whole.
• In this way the we can evaluate in more
detail the efficacy of the classification.

Digital Image Classification Poonam S.Tiwari


iirs One way to assess accuracy is to go out in the field and observe
the actual land class at a sample of locations, and compare to the
land classification it was assigned on the thematic map.
• There are a number of ways to quantitatively express the amount
of agreement between the ground truth classes and the remote
sensing classes.
• One way is to construct a confusion error matrix, alternatively
called a error matrix
• This is a row by column table, with as many rows as columns.
• Each row of the table is reserved for one of the information, or
remote sensing classes used by the classification algorithm.
• Each column displays the corresponding ground truth classes in
an identical order.

Digital Image Classification Poonam S.Tiwari


iirs OVERALL ACCURACY
• The diagonal elements tally the number of pixels classified
correctly in each class.

• But just because 83% classifications were accurate overall, does


not mean that each category was successfully classified at that
rate.

Digital Image Classification Poonam S.Tiwari


iirs USERS ACCURACY
• A user of the imagery who is particularly interested
in class A, say, might wish to know what proportion
of pixels assigned to class A were correctly
assigned.
• In this example 35 of the 39 pixels were correctly
assigned to class A, and the user accuracy in this
category of 35/39 = 90%

Digital Image Classification Poonam S.Tiwari


iirs
In general terms, for a particular category is user accuracy
computed as:

• which, for an error matrix set up with the row and column
assignments as stated, is computed as the user accuracy

• Evidently, a user accuracy can be computed for each row.

Digital Image Classification Poonam S.Tiwari


iirs PRODUCERS ACCURACY

• Contrasted to user accuracy is producer accuracy, which


has a slightly different interpretation.
• Producers accuracy is a measure of how much of the land in
each category was classified correctly.
• It is found, for each class or category, as

The Producer’s accuracy for class A is 35/50 = 70%

Digital Image Classification Poonam S.Tiwari


iirs
So from this assessment we have three measures
of accuracy which address subtly different issues:

– overall accuracy : takes no account of source of


error
(errors of omission or commission)

– user accuracy : measures the proportion of each


TM class which is correct.

– producer accuracy : measures the proportion of


the land base which is correctly classified.

Digital Image Classification Poonam S.Tiwari


iirs KAPPA COEFFICENT
• Another measure of map accuracy is the kappa coefficient, which is a measure
of the proportional (or percentage) improvement by the classifier over a
purely random assignment to classes.

For an error matrix with r rows, and hence the same number of columns, let
– A = the sum of r diagonal elements, which is the numerator in the
computation of overall accuracy
– Let B = sum of the r products (row total x column total).

• Then

• where N is the number of pixels in the error matrix


(the sum of all r individual cell values).

Digital Image Classification Poonam S.Tiwari


iirs

For the above error matrix,


– A = 35 + 37 + 41 = 113
– B = (39 * 50) + (50 * 40) + (47 * 46) = 6112
– N = 136

Thus

This can be tested statistically.

Digital Image Classification Poonam S.Tiwari


iirs

Digital Image Classification Poonam S.Tiwari

Você também pode gostar