Você está na página 1de 176

Remote Sensing and Image Analysis

1.1 Remote Sensing


Remote sensing is a technology used for obtaining information about a target through the
analysis of data acquired from the target at a distance. It is composed of three parts, the
targets - objects or phenomena in an area; the data acquisition - through certain
instruments; and the data analysis - again by some devices. This definition is so broad that
the vision system of human eyes, sonar sounding of the sea floor, ultrasound and x-rays
used in medical sciences, laser probing of atmospheric particles, are all included. The
target can be as big as the earth, the moon and other planets, or as small as biological cells
that can only be seen through microscopes. A diagrammatic illustration of the remote
sensing process is shown in Figure 1.1.
An essential component in geomatics, natural resource and environmental studies is the
measurement and mapping of the earth surface - land and water bodies. We are interested
in knowing the types of objects, the quality and quantity of various objects, their
distribution in space and time, their spatial and temporal relationships, etc. In this book, we
introduce some of the major remote sensing systems used for mapping the earth. We
concentrate on examining how satellite and airborne images about the earth are collected,
processed and analyzed. We illustrate various remote sensing techniques for information
extraction about the identity, quantity, spatial and temporal distribution of various targets
of interest.
Remote sensing data acquisition can be conducted on such platforms as aircraft, satellites,
balloons, rockets, space shuttles, etc. Inside or on-board these platforms, we use sensors to
collect data. Sensors include aerial photographic cameras and non-photographic
instruments, such as radiometers, electro-optical scanners, radar systems, etc. The platform
and sensors will be discussed in detail later.
Electro-magnetic energy is reflected, transmitted or emitted by the target and recorded by
the sensor. Because energy travels through the medium of the earth's atmosphere, it is
modified such that the signal between the target and the sensor will differ. The effects of
the atmosphere on remote sensing will be examined later. Methods will be introduced to
reduce such atmospheric effects.
Once image data are acquired, we need methods for interpreting and analyzing images. By
knowing "what" information we expect to derive from remote sensing, we will examine
methods that can be used to obtain the desirable information. We are interested in "how"
various methods of remote sensing data analysis can be used.

In summary, we want to know how electromagnetic energy is recorded as remotely sensed


data, and how such data are transformed into valuable information about the earth surface.

Figure 1.1
The Flows
of Energy
and
Information
in Remote
Sensing

1.2 Milestones in the History of Remote Sensing


The following is a brief list of the times when innovative development of remote sensing
were documented. More details may be found in Lillesand and Kiefer (1987) and
Campbell (1987).
1839 Photography was invented
Parisian Photographer, Gaspard Felix Tournachon used a balloon to
1858
ascend to a height of 80m to obtain the photograph over Bievre, France
1882 Kites were used for photography
1909 Airplanes were used as a platform for photography
1910World War I. Aerial reconnaissance: Beginning of photo interpretation
20
1920Aerial photogrammetry was developed
50
American Society of Photogrammetry was established. Radar development
1934
for military use started
1940's Color photography was invented
Non-visible portions of electromagnetic spectrum, mainly near-infrared,
1940's
training of photo-interpretation
1950- Further development of non-visible photography, multi-camera

1970 photography, color-infrared photography, and non-photographic sensors.


Satellite sensor development - Very High Resolution Radiometer (VHRR),
Launch of weather satellites such as Nimbus and TIROS
1962 The term "Remote Sensing" first appeared
The launch of Landsat-1, originally ERTS-1,Remote sensing has been
1972
extensively investigated and applied since then
1982 Second generation of Landsat sensor: Thematic Mapper
French SPOT-1 High Resolution Visible sensors MSS, TM, HRV have
been the major sensors for data collection for large areas all over the
1986 world. Such data have been widely used in natural resources inventory and
mapping. Major areas include agriculture, forest, wet land, mineral
exploration, mining, etc.
1980- Earth-Resources Satellite from other countries such as India, Japan, and
90
USSR. Japan's Marine Observing Satellite (MOS - 1)
A new type of sensor called an imaging spectrometer, has been developed.
1986-

developers: JPL, Moniteq,ITRES and CCRS.


Products: AIS, AVIRIS, FLI, CASI, SFSI, etc. A more detailed
description of this subject can be found in Staenz (1992).

Proposed EOS aiming at providing data for global change monitoring.


Various sensors have been proposed.

1990-

Japan's JERS-1 SAR,


European ERS Remote Sensing Satellite SAR,
Canada's Radarsat
Radar and imaging spectrometer data will be the major theme of
this decade and probably next decade as well

1.3 Resolution and Sampling in Remotely Sensed Data

We begin to ask: what are the factors that make remotely sensed images taken for the
same target different? Remotely sensed data record the dynamics of the earth surface. The
three-dimensional earth surface is changing as time goes. Two images taken at the same
place with the same imaging condition will not be the same if they are obtained at different
times. Among many other factors that will be introduced in later chapters, sensor and

platform design affect the quality of remotely sensed data.


Remote sensing data can be considered as models of the earth surface at very low level of
generalization. Among various factors that affect the quality and information content of
remotely sensed data, two concepts are extremely important for us to understand. They
determine the level of details of the modeling process. These are the resolution and the
sampling frequency.

Resolution

the maximum separating or discriminating power of a


measurement. It can be divided into four types: spectral,
radiometric, spatial and temporal.

Sampling
frequency

determines how frequent are data collected. There are three types
of sampling important to remote sensing: spectral, spatial and
temporal.

Combinations of resolutions and sampling frequencies have made it possible for us to


have different types of remote sensing data.

For example, assume that the level of solar energy coming from the sun and passing
through the atmosphere at a spectral region between 0.4 mm - 1.1 mm is distributed as in
Fig. 1.2. This is a continuous curve.

Fig. 1.2 Solar Energy Reaching the Earth Surface

After the solar energy interacts with a target such as a forest on the earth, the energy is
partly absorbed, transmitted, or scattered and reflected. Assume that the level of the
scattered and reflected energy collected by a sensor behaves in a manner as illustrated in
Fig. 1.3.

Fig. 1.3 Reflected Solar Energy by Trees


The process that makes the shape of the energy curve change from Fig. 1.2 change to Fig.
1.3 will be discussed later. Let us use Fig. 1.3 to discuss the concepts of spectral resolution
and spectral sampling.

Fig. 1.4 Example of Differences in Spectral Resolution and Spectral Sampling


In Figure 1.4, the three shaded bars A, B, and C represent three spectral bands. The width
of each bar covers a spectral range within which no signal variation can be resolved. The
width of each spectral band represents its spectral resolution. The resolution of A is coarser
than the resolution of B. This is because spectral details within band A that cannot be
discriminated may be partly discriminated with a spectral resolution as narrow as band B.
The resolution relationships among the three bands are:
Resolution of A < Resolution of C < Resolution of B
Sampling determines the various ways we use to record a spectral curve. If data storage is
not an issue, we may choose to sample the entire spectral curve with many narrow spectral

bands. Sometimes, we choose to make a discrete sampling over a spectral curve (Figure
1.4). The questions are: which way of sampling is more appropriate and what resolution is
better? It is obvious that if we use a low resolution, we are going to blur the curve. The
finer the resolution is, the more precise can we restore a curve, provided that sufficient
spectral sampling frequency is used.
The difference between imaging spectrometers and earlier generation sensors is in the
difference of the spectral sampling frequency. Sensors of earlier generations use selective
spectral sampling. Imaging spectrometers have a complete systematic sampling scheme
over the entire spectral range. An imaging spectrometer, such as CASI, has 288 spectral
bands between 0.43 - 0.92 spectral region, while earlier generation sensors only have 3 - 7
spectral bands.

Spatial resolution and sampling


Similar to the spectral case, the surface has to be sampled with certain spatial resolution.
The difference is that spatial sampling is mostly systematic, i.e., a complete sampling over
an area of interest. The difference in spatial resolution can be seen in Figure 1.5.

Figure 1.5. Sampling the same target with different spatial resolutions.

A scene including a house with garage and driveway is imaged with two different spatial
resolutions. For each cell in Figure 1.5a no object occupies an entire cell. Each cell will
contain energy from different cover types. Such cells are called mixed pixels, also known
as mixels. In Chapter 7, we will introduce some methods that can be used to decompose
mixed pixels. Mixed pixels are very difficult to discriminate from each other. Obviously a
house cannot be easily recognized at the level of resolution in Figure 1.5a, but it may be

possible in Figure 1.5b. As spatial resolution becomes finer, more details about objects in a
scene become available. In general it is true that with finer spatial resolutions objects can
be better discriminated with human eyes. With computers, however, it may be harder to
recognize objects imaged with finer spatial resolutions. This is because finer spatial
resolutions increase the image size for a computer to handle. More importantly, for many
computer analysis algorithms, they cause the effect of "seeing the tree but not the forest."
Computer techniqes are far poorer than human brain in generalization from fine details.
Temporal sampling can be regarded similar to spectral sampling. For example, temporal
sampling means how frequently we are imaging an area of interest. Are we going to use
contiguous systematic sampling as in movie making or selective sampling as in most
photographic actions? To decide the temporal sampling scheme, the dynamic
characteristics of the target under study have to be considered. For instance, if the study
subject is to discriminate crop species, the phenological calendar of each crop type should
be considered for when to collect remotely sensed data in order to best characterize each
different crop species. The data could be selected from the entire growing season between
late April to early October for mid and high latitudes in northern hemisphere. If the subject
is flood monitoring, the temporal sampling frequency should be high during the flood
period because floods usually last only a few hours to a few days.
Radiometric resolution can be understood in a similar manner as with spatial resolution.
This is a concept well illustrated in a number of digital image processing books (e.g.,
Gonzalez and Wintz, 1987; Pratt, 1991). It is associated with the level of quantization of an
image which is in turn related to how to use the minimum amount of data storage to
represent the maximum amount of information. This is often a concern in data
compression. Although we will explain the concept of radiometric resolution in Chapter 5,
we will only touch the topic of data compression in Chapter 7 from an information
extraction point of view.
1.4 Use of Remote Sensing

A fundamental use of remote sensing is to extend our visual capability. In addition,


remote sensing can enhance our memory because our brains tend not to remember every
fine piece of details about what we see. With remote sensing images, we can do a lot more
than refreshing our memories, which is a primary goal of conventional photography. We
want to measure and map spatial dimensions of objects from remote sensing images.
Furthermore, we use remotely sensed data to monitor the dynamics of the phenomena on
the earth surface. These include monitoring the vigor and stress of vegetation and
environmental quality, measuring the temperature of various objects, detecting and
identifying catastrophic sites caused by fire, flood, volcano, earthquakes etc., estimating
the mass of various components, such as biogeochemical constituents of a forest, volume

of fish schools in water, crop production of agricultural systems, water storage and runoff
of watersheds, population in rural and urbanized areas, and quantity and living conditions
of wildlife species.

We organize the remaining chapters of this book that lead you to take more advantages
of remote sensing in the applications mentioned above. In Chapter two, we will first
introduce the very basic physics required to understand the imaging mechanism in remote
sensing. In Chapter three, we introduce the development of sensing systems following a
historical order. In Chapter four, we introduce imaging geometry and illustrate geometrical
calibration methods that are required to achieve precise measurement of spatial dimensions
of objects. In Chapter five, we explain various methods for recovering image radiometry
affected by sensor malfunctioning, atmospheric interference and terrain relief. In Chapter
six, we illustrate some of the most commonly used image processing methods for image
enhancement. In Chapter seven, we focus on the introduction of various strategies for
information extraction from remotely sensed data. In Chapter eight, following a brief
introduction on map making, we introduce some methods that are used to combine maps
and other spatial data with remotely sensed data for analysis and extraction of information
on various targets.
Chapter 1

References
Campbell, J.B., 1987. Introduction to Remote Sensing, The Guilford Press.
Gonzalez, R.C., P. Wintz, 1987. Digital Image Processing. 2nd Ed., Addison-Wesley, Reading:MA.
Lillesand, T.M. and Kiefer, R.W., 1987, Remote Sensing and Image Interpretation, Sec. Ed., John Wiley and Sons,
Inc.: Toronto.
Emphasis on aerial photography, photogrammetry, photo interpretation, non-photographic sensing systems
and their image interpretation, and introduction to digital image processing.
Staenz, K., 1992. A decade of imaging spectrometry in Canada. Canadian Journal of Remote Sensing. 18(4):187197.
Lists most of the imaging spectrometers developed worldwide. Sensor calibration and various applications.
Pratt, W., 1991. Digital Image Processing. John Wiley and Sons, Inc.: Toronto.

Further Readings

Asrar G., ed. 1989, Theory and Applications of Optical Remote Sensing, John Wiley and Sons, Toronto.
A selection of most important fields of optical remote sensing ranging from the physical basis of energy-meter
interaction, vegetation canopy modelling, atmospheric effects reduction, applications to forest, agriculture,
coastal wetland, geology, snow and ice, climatology and meteorology, and ecosystem. Its emphasis is on the
application of remote sensing to understanding land-surface processes globally.
Jensen, J.R., 1986, Digital Image Processing, an Introductionary Perspective. Prentice-Hall: Englewood Cliffs, N.J.
A good introduction book on digital image analysis concepts and procedures. A show how type of book. Easy
for beginners. Typical topics covered include image statistics , image enhancement in spatial domain, geometric
correction, classification, change detection. Completely related to a remote sensing context.
Richards, J.A., 1986, Digital Image Processing, Springer-Verlag: New York.
A good introduction book. More mathematical than Jensen's book. Some additional materials in comparison to
Jensen's book include an entire chapter on Fourier Transform. Relationships among some basic image
enhancement and image classification algorithms.

2.1 Electromagnetic Energy


Energy is a group of particles travelling through a certain media. Electromagnetic energy is
a group of particles with different frequencies travelling at the same velocity. These
particles have a dual-mode nature. They are particles but they travel in a wave form.
Electromagnetic waves obey the following rule:

This equation explains that the shorter wavelength has higher spectral frequency
Electromagnetic energy is a mixture of waves with different frequencies. It may be viewed
as:

Each wave represents a group of particles with the same frequency. All together they
have different frequencies and magnitudes.

With each wave, there is an electronic (E) component and a magnetic component (M).
The Amplitude (A) reflects the level of the electromagnetic energy. It may also be
considered as intensity or spectral irradiance. If we plot A against the wavelength we
then get an electromagnetic curve, or spectrum (Figure 2.1).

Figure 2.1. An electromagnetic spectrum

Any matter with a body temperature greater than 0 K emits electromagnetic energy.
Therefore, it has a spectrum. Furthermore, different chemical elements have different
spectra. They absorb and reflect spectral energy differently. Different elements are
combined to form compounds. Each compound has a unique spectrum due to its unique
molecular structure. This is the basis for the application of spectroscopy to identify
chemical materials. It is also the basis for remote sensing in discriminating one matter from
the other. Spectrum of a material is like the finger print of human being.

2.2 Major Divisions of Spectral Wavelength Regions

The wavelength of electromagnetic energy has such a wide range that no instrument can
measure it completely. Different devices, however, can measure most of the major spectral
regions.
The division of the spectral wavelength is based on the devices which can be used to
observe particular types of energy, such as thermal, shortwave infrared and microwave
energy. In reality, there are no real abrupt changes on the magnitude of the spectral energy.
The spectrum are conventionally divided into various parts as shown below:

The optical region covers 0.3 - 15 mm where energy can be collected through lenses. The
reflective region, 0.4 - 3.0 mm, is a subdivision of the optical region. In this spectral
region, we collect solar energy reflected by the earth surface. Another subdivision of the
optical spectral region is the thermal spectral range which is between 3 mm to 15 mm,
where energy comes primarily from surface emittance. Table 2.1 lists major uses of some
spectral wavelength regions.
Table 2.1. Major uses of some spectral wavelength regions
Wavelength

Use

Wavelength

Use

g ray

Mineral

1.55-1.75 m

Water content in plant or soil

X ray

Medical

2.04-2.34 m

Mineral, rock types

Detecting oil spill

10.5-12.5 m

Surface temperature

0.4-0.45 m

Water depth, turbidity

3 cm - 15 cm

Surface relief, soil moisture

0.7-1.1 m

Vegetation vigor

20 cm - 1 m

Canopy penetration, woody biomass

Ultraviolet(UV)

2.3 Radiation Laws


At the reflective spectral region, we are more concerned about the reflective properties of
an object. But in the thermal spectral region, we have to rely on the emittance of an object.
This is because most matters at the conventional temperature (temperature of our
environment) emit energy that can be measured. Therefore, we introduce some basics of
the radiation theory.

The first theory treats electromagnetic radiation as many discrete particles called photons
or quanta (terms in Physics). The energy of a quantum is given by
E = hv
where
E energy of a quantum (Joules)
h = 6.626 x 10-34 (Planck's constant)
v frequency
since

thus

Energy (or radiation) of a quantum is inversely proportional to the wavelength. The longer
the wavelength of a quantum, the smaller is its energy. (The shorter the wavelength, the
stronger is its energy.) Thus, the energy of a very short wavelength (UV and shorter) is
dangerous to human health. If we want to sense emittance from objects at longer
wavelength, we will have to either use very sensitive devices or use less sensitive device to
view a larger area to get sufficient amount of energy.
This has implications to remote sensing sensor design. To use the available sensing
technology at hand, we will have to balance between wavelength and spatial resolution. If
we wish to make our sensor to have higher spatial resolution, we may have to use short
wavelength regions.
The second radiation theory is Stefan-Boltzmann Law:

M: total radiant existence for a surface of a material watts/m2


: Stefan-Boltzmann constant, 5.6697 x 10-8 Wm-2 K-4
T: absolute temperature, K
This means that any material with a temperature greater than 0 K will emit energy. The
total energy emitted from a surface is proportional to T4 .

This law is expressed for an energy source that behaves as a blackbody - a hypothetical,
ideal radiator that absorbs and re-emits all energy incident upon it. Actual matters are not
perfect blackbody. For any matter, we can measure its emitting energy (M), and compare it
with the energy emitted from a blackbody at the same temperature (Mb) by:

" " is the emissivity of the matter. A perfect reflector will have nothing to emit.
Therefore, its e will be "0". A true blackbody has an of 1. Most other matters fall in
between these two extremes.
The third theory is Wien's displacement law which specifies the relationship between the
peak wavelength of emittance and the temperature of a matter.

max = 2897.8/T
As the temperature of a blackbody gets higher, the wavelength at which the blackbody
emits its maximum energy becomes shorter.

Figure 2.2. Blackbody radiation.

Figure 2.2 shows blackbody radiation curves for temperature levels of the Sun, a
candescent lamp and the Earth. During the day time we can see the energy from the sun is
overwhelming. During the night, however, we can use the spectral region between 3 m
and 16 m to observe the emittance properties of the earth surface.

At wavelengths longer than the thermal infrared region, i.e. at the microwave region, the
energy (radiation) level is very low. Therefore, we often use human-made energy source to
illuminate the target (such as Radar) and to collect the backscatter from the target. A
remote sensing system relying on human-made energy source is called an "active" remote
sensing system. Remote sensing relying on energy sources which is not human-made is
called "passive" remote sensing.
2.4 Energy Interactions in the Atmosphere
The atmosphere has different effects on the EM transfer at different wavelength. In this
section, we will mainly introduce the fact that the atmosphere can have a profound effect
on intensity and spectral composition of the radiation that reaches a remote sensing system.
These effects are caused primarily by the atmospheric scattering and absorption.
Scattering: The redirection of EM energy by the suspended particles in the air.

Different particle sizes will have different effects on the EM energy propagation.
dp <<

Rayleigh scattering Sr

dp =

Mie scattering Sm

dp >>

Non-selective scattering Sn

The atmosphere can be divided into a number of well marked horizontal layers on the basis
of temperature.
Troposphere:

It is the zone where weather phenomena and atmospheric turbulence are most marked. It
contains 75% of the total molecular and gaseous mass of the atmosphere and virtually all
the water vapour and aerosols.
height 8 - 16 km (pole to equator)
Stratosphere: 50 km Ozone
Mesosphere : 80 km
Thermosphere : 250 km
Exosphere : 500 km ~ 750 km
The atmosphere is a mixture of gases with constant proportions up to 80 km or more from
ground. The exceptions are Ozone, which is concentrated in the lower stratosphere, and
water vapor in the lower troposphere. Carbon dioxide is the principal atmosphere gas, with
its concentration varying with time. It is increasing since the beginning of this century due
to the burning of fossil fuels. Air is highly compressible. Half of its mass occurs in the
lowest 5 km and pressure decreases logarithmically with height from an average sea-level
value of 1013 mb.

Figure 2.3 Horizontal layers that divide the atmosphere (Barry and Chorley, 1982)

Scattering causes degradation of image quality for earth observation. At higher altitudes,
images acquired in shorter wavelengths (ultraviolet, blue) contain a large amount of
scattered noise which reduces the contrast of an image.

Absorption: Atmosphere selectively absorbs energy in different wavelengths with


different intensity.
The atmosphere is composed of N2 (78%), O2 (21%), CO2, H2O, CO, SO2, etc. Since
different chemical element has a different spectral property, regions with different
intensity. As a result, the atmosphere has the combined absorption features of various
atmospheric gases. Figure 2.4 shows the major absorption wavelengths by CO2, H2O, O2,
O3 in the atmosphere.

Figure 2.4 Major absorption wavelengths by CO2, H2O, O2, O3 in the atmosphere

(Source: Lillesand and Kiefer, 1994)

Transmission: The remaining amount of energy after being absorbed and scattered by the
atmosphere is transmitted.

H2O is most variable in the atmosphere.


CO2 varies seasonally.
Therefore, the absorpiton of EM energy by H2O and CO2 is the most difficult part to be
characterized.

Atmospheric Window: It refers to the relatively transparent wavelength regions of the


atmosphere.
Atmospheric absorption reduces the number of spectral regions that we can work with in
observing the Earth. It affects our decision in selecting and designing sensor. We have to
consider
1) the spectral sensitivity of sensors available;
2) the presence and absence of atmospheric windows;
3) the source, magnitude, and spectral composition of the energy available in these ranges.
For the third point, we have to base our decision of choosing sensors and spectral regions
on the manner in which the energy interacts with the target under investigation.
On the other hand, although certain spectral regions may not be as transparent as others,
they may be important spectral ranges in the remote sensing of the atmosphere.
2.5 Energy Interactions with the Earth Surface

What will happen when the EM energy reaches the Earth surface? The answer is that the
total energy will be broken into three parts: reflected, absorbed, and/or transmitted.

r, a and t change from one matter to another.


t has been mentioned earlier. Solar energy has to transfer through the atmosphere in order
to reach the Earth surface. Transmitted energy can also be measured from under water.
At the thermal spectral region, energy is primarily absorbed, and the reflected energy is
significantly less in magnitude than the emission of a target. Since what is absorbed will be
emitted, the absorbance "a" or the emissivity " " is a parameter of concern in the thermal
region.
r is the
easiest to
measure
using
remote
sensing
devices.
Therefore,
it is the
most
important
parameter
for remote
sensing
observation
using the
0.3 - 2.5
m. r is
called
spectral
reflectance

or
reflectance
or spectral
signature.
Our second question is: how is energy reflected by a target? It can be classified into three
cases, specular reflector, irregular reflector, and perfect diffusor.

Specular reflector is caused by the surface geometry of a mater. It is of little use in remote
sensing because the incoming energy is completely reflected in another direction. Still
water, ice and many other minerals with crystal surfaces have the same property.
Perfect diffuse reflector refers to a matter which reflects energy uniformly to all directions.
This type of reflector is desirable because it is possible to observe the matter at any
direction and obtain the same reflectance.
Unfortunately most targets have a behaviour between the ideal specular reflector and
diffuse reflector. This makes quantitative remote sensing and target identification purely
from reflectance data difficult. Otherwise, it would be easy to discriminate object using
spectral reflectances from a spectral library. Due to the variability of spectral signature, one
of the current research direction is to investigate the bidirectional properties of various
targets.
Plotting reflectance against wavelength, we will get a spectral reflectance curve. Examples
of spectral curves of typical materials such as vegetation, soil and water are shown in
Figure 2.5. Clear water has a low spectral reflectance (< 10%) in the visible region. At
wavelengths longer than 0.75 m, water absorbs almost all the incoming energy.
Vegetation generally has three reflectance valleys. The one at the red spectral wavelength
region (0.65 m) is caused by high absorptance of energy by chloraphyll a and b in the
leaves. The other two at 1.45-1.55 m and 1.90-1.95 m are caused by high absorptance of
energy by water in the leaves. Dry soil has a relatively flat reflectance curve. When it is
wet, its spectral reflectance drops due to water absorption.

Figure 2.5 Typical Spectral Reflectance Curves for Soil, Vegetation and Water
(Lillesand and Kiefer, 1994)

Questions
1. Using the scattering properties of the atmosphere explain why under clear sky condition
the sky is blue. Why the sun looks red at the time of sunset or sun-rise?
2. Why X-ray is used for medical examination? Using radiation law No. 3, explain why as
a piece of iron is heated, the color of the iron begins with dark red, then changes to red, to
yellow to white.
3. Describe how one may use absorptance and transmittance of a matter in remote sensing.
4. Use Figures 2.4 and 2.5 as references, answer the following questions:
5. Can we use 6-7 m to observe the atmosphere?
6. Can we use 0.8-1.0 m to observe under water materials such as plankaton?
7. Which spectral regions should be used to observe water content in the atmosphere?
What about water content in vegetation?
Chapter 2

References
Barry, and Chorley, 1982. Climate, Weather and Atmosphere, Longman: London
Elachi, C., 1987. Introduction to the Physics and Techniques of Remote Sensing, John Wiley and Sons, Inc.: Toronto
Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd Ed., John Wiley and Sons,
Inc.: Toronto.

3.1 Camera Systems

A camera system is composed of the camera body, lens, diaphragm, shutter and a film
(Figure 3.1):

Figure 3.1. Components of a camera system


Lens

collects the energy and it has a focal length.

Diaphragm

controls the amount of energy reaching the film with an adjustable


diameter

Shutter

works by open and close. The time period between the open and close
controls the amount of energy entering into the camera.

The principle of imaging is described by:

where
f = focal length
do = distance from the lens to the object
di = distance from the lens to the image (Figure 3.2)
For aerial photography,
do >> di

Therefore, we have a fixed distance between the lens to the film.

Figure 3.2. The imaging optics

The diameter of a diaphragm controls the depth of field. The smaller the diameter of an
opened diaphram, the wider the distance range in which the scene constructs clearly
focused image. The diaphragm diameter can be adjusted to a particular aperture. What we
normally see on a camera's aperture setting is F 2.8 4 5.6 8 11 16 22
These F#s are obtained by f/diameter. When diameter becomes smaller, F# becomes larger
and more energy is stopped. The actual amount of energy reaching the film is determined
by:

where
i is the energy intensity J/m2s
t is time in second
F is F# as mentioned above
E is energy in J/m2

Films
A film is primarily composed of an emulsion layer(s) and base (Figure 3.3)

Black and white film

Color film

Figure 3.3. Layers in black and white films and colour films

The most important part of the film is the emulsion layer. An emulsion layer contains light
sensitive chemicals. When it is exposed in the light, chemical reaction occurs and a latent
image is formed. After developing the film, the emulsion layer will show the image.
Films can be divided into negative and positive, or divided in terms of their ranges of
spectral sensitivities: black and white (B/W), B/W Infrared, Color, Color Infrared.
B/W negative films are those films that have the brightest part of the scene appearing the
darkest while the darker part of the scene appearing brighter on a developed film.
Color negative films are those on which a color from the scene is recorded by its
complementary colors.
There are two important aspects of a film: its spectral sensitivity and its characteristic
curve.
Spectral sensitivity specifies the spectral region to which a film is sensitive (Figure 3.4).

Figure 3.4. The sensitivity of films and the transmittance of a filter

Since infrared is also sensitive to visible light, the visible light should be intercepted by
some material. This is done by optical filtering (Figure 3.4). For this case, a dark red filter
can be used to intercept visible light.
Similarly, other filters can be used to stop light at certain spectral ranges from reaching the
film.
Characteristic curve indicates the radiative response of a film to the energy level.

[a]

[b]

Figure 3.5. Film characteristic curves.

If the desnity of a film develops quickly when the film is exposed to light, we say that the
film is fast (Fig. 3.5a). Otherwise, the film is slow (Fig. 3.5b). Film speed is defined by
labels such as ASA 100, ASA 200, ...., ASA 1000. The greater the ASA number, the faster
a film is. High speed films will have a good contrast on the image, but low speed films will
provide better details.

Color Films
There are two types of colors: additive primaries and subtractive primaries:
Three additive primaries are red, green, and blue
Three subtractive primaries are cyan, magenta, and yellow
All colors can be made by combining any two primary colors.

Figure 3.6. Additive and subtractive colors

Additive colors apply to the mixing of light (Fig. 3.6a), while subtractive colors are used
for the mixing of paints used in printing (Fig. 3.6b). In order to represent colors onto a
medium such as a film or a colour photographic paper, subtractive colors are needed.

Color Negative Films


Figure 3.7 shows the structure of the three emulsion layers for a color negative film. Figure
3.8 shows the spectral sensitivities of each emulsion layer of the film. It can be seen from
Figure 3.8 that green and red emulsion layers are also sensitive to blue. Therefore, film
producers add a yellow filter to stop the blue lights from reaching the green and red
emulsion layers (Figure 3.7).

Figure 3.7. Layers in a stadard colour film

Figure 3.8. The approximate spectral sensitivities of the three layers.

The development procedure for the colour negative film is shown in Figure 3.9.

Color Infrared Films


The sensitivity curve of a colour infrared film is shown in Figure 3.10. Figure 3.11 shows
the structure of this type of film.
As the light of three primary colors pass through the film

After film development


Yellow
Magenta
Cyan

Print with white light


RGB

RGB

RGB

Y
M
C
RG through

RB through

GB through

The results are:


B

Figure 3.9. The development procedure of a colour film

Figure 3.10. The sensitivity curves of a colour infrared film

Dye

Film

NIR + B (Cyan dye forming layer)


G + B (Yellow dye forming layer)
R + B (Magenta dye forming layer)
Figure 3.11. Three layers of a colour-infrared film

The CIR film development process is as the following:


Photography
B

Clear
Clear

IR
C

Clear

Use of white light to pass through the developed film


BGR

BGR

BGR

Clear
Clear

BGR
C

Clear

Colour photographic paper exposure

Black

Y
M

Final result
R

Figure 3.12. The procedures used to develop a colour film.

3.2 Aerial Photography

In texts on aerial photogrammetry or photo-interpretation, various types of aerial


photographic cameras are discussed in detail (e.g. Lillesand and Kiefer, 1994; Paine,
1981). The implications of flight height, photographical orientation, and view angle on
aerial photographic products are briefly discussed here.
Flight Height
For a given focal length of an aerial camera, the higher the camera is, the larger the area
each aerial photo can cover. Obviously, the scale of aerial photographs taken at higher
altitudes will be smaller than those taken at lower altitudes.
However, photographs taken at higher altitudes will be severely affected by the
atmosphere. This is particularly true when films sensitive to shorter wavelengths are used.
Thus, ultraviolet and blue should be avoided at higher altitudes. Instead, CIR or BWIR is
more suitable.
Camera Orientation
Two types of camera orientations maybe used: vertical and oblique(slant) (Figure 3.13).
Oblique allows one to take pictures of a large area while vertical allows for less distortion
in photo scale.

Figure 3.13. Vertical and slant aerial photography

View Angle:
View angle is normally determined by the focal length and the frame size of a film. For a
camera, the frame is fixed, therefore the ground coverage is determined by the altitude and
the camera viewing angle (Figure 3.14)

f1 > f2 > f3
a1 < a2 < a3
Figure 3.14. Viewing angle determined by the focal length
In normal cameras

Aerial Camera

Normal lens

50 mm

300 mm

Wide angle

28 mm

150 mm

Fisheye lens

7 mm

88 mm

Obviously, wide angles allow a larger area to be photographed.


Photographic Resolution

Spatial resolution of aerial photographs is largely dependent on the following factors:


Lens resolution
optical quality
Film resolution
Film flatness -normally not a problem
Atmospheric conditions -changes all the time
Aircraft vibration and motion -random
Film resolution depends mainly on granularity.
There is a standard definition of photographic resolution is the maximum number of linepairs per mm that can be distinguished on a film when taken from a resolution target
(Figure 3.15).
If the scale of an aerial photograph is known, we can convert the photographic resolution
(rs) to ground resolution.

Figure 3.15. Resolving power test chart (from Lillesand and Kiefer, 1994).

Ground Coverage
A photograph may have a small coverage if it is taken either at a low flight height or with a
narrower viewing angle.
The advantages of photographs with small coverages are that they provide more detail,
and less distortion and displacement. It is easier to analyze a photograph with a small

coverage because similar target will have less distortion from the center to the edge of the
photograph, and from one photograph to the other.
The disadvantage of photographs with small coverages is that it needs more flight time to
cover an area and thus the cost will be higher. Moreover, mosaicing may cause more
distortion.
A large coverage can be obtained by taking the photograph from a higher altitude or using
a wider angle. The quality of photographs with a large coverage is likely to have poorer
photographic resolution due to larger viewing angle and likely stronger atmospheric effect.
The advantages are that a large coverage is simultanuously obtained, requires less
geometric mosaicing, and costs less.
The disadvantages are that it is difficult to analyze targets in detail and that target is
severely distorted.
Essentially, the size of photo coverage is related to the scale of the raw aerial photographs.
Choosing photographs with a large coverage or a small one should be based on the
following:
budget at hand
task
equipment available
The following are some of the advantages/disadvantages of aerial photography in
comparison with other types of data acquisition systems:
Advantages:
High resolution (ground)
Flexibility
High geometric reliability
Relatively inexpensive
Disadvantages:
Day light exposure (10:00 am- 2:00 pm) required
Poorer contrast at shorter wavelengths
Film non-reusable
Inconvenient
Inefficient for digital analysis

3.3 Satellite-Borne Multispectral Systems

What are the differences between a camera system and a scanning system? The following
are some of the major differences:
A rotating mirror is added in front of the lens of camera
In a scanning system, films are changed to photo-sensitive detectors and magnetic tapes.
They are used to store the collected spectral energy (Figure 3.16).

Figure 3.16. A multispectral scanning system

Landsat Multispectral Scanner System


The first of the Landsat series was launched in 1972. The satellite was called Earth
Resources Technology Satellites (ERTS-1). It was later renamed as Landsat - 1. On board
of Landsat-1 are two sensing systems: multispectral scanning system (MSS) and return
beam vidicon (RBV). RBV was discontinued since Landsat-3. MSS is briefly introduced
here because it is still being used. The MSS sensor has 6 detectors per band (Figure 3.17).
The scanned radiance is measured in four image bands (Figure 3.18).

Figure 3.17. Each scan will collect six image lines.

Figure 3.18. Four image bands with six detectors in each band.

MSSs have been used on Landsat - 1, 2, 3, 4, 5. They are reliable systems. The spectral
region of each band is listed below:
Landsat 1,
2

Landsat 4, 5

B4

0.5 - 0.6 mm

B1

B5

0.6 - 0.7mm

B2

B6

0.7 - 0.8 mm

B3

B7

0.8 - 9.1 mm

B4

Landsat 3 had a short life. The MSS systems on Landsat 3 were modified as compared to
Landsat 1 and 2. Landsat-6 was launched unsuccessfully in 1993.
Each scene of MSS image covers 185 km X 185 km in area. It has a spatial resolution of 79
m X 57 m. An advantage of MSS is that it is less expensive. Sometimes one detector is left
blank or its signal is much different from other ones, creating banding or striping. We will
discuss methods for correcting these problems in Chapter 5.

Landsat Thematic Mapper System


Since the launch of Landsat 4 in 1982, a new type of scanner, called Thematic Mapper
(TM), has been introduced. It

Increased the number of spectral bands.


Improved spatial and spectral resolution.
Increased the angle of view from 11.56 to 14.92.
TM1

0.45 - 0.52 mm

30 m

TM2

0.52 - 0.60

30 m

TM3

0.63 - 0.69

30 m

TM4

0.76 - 0.90

30 m

TM5

1.55 - 1.75

30 m

TM7

2.08 - 2.35

30 m

TM6

10.4 - 12.5 mm

120m

MSS data are collected on only one scanning direction. TM data are collected on both
scanning directions (Figure 3.19).

Figure 3.19. Major changes of the TM system as compared to the MSS system.

High Resolution Visible (HRV) Sensors


A French satellite called 'Le Systme Pour l'observation de la Terre' (SPOT) (Earth
Observation System) was launched in 1986. On board this satellite, a different type of
sensors called High Resolution Visible (HRV) were used. The HRV sensors have two
modes: the panchromatic (PAN) mode and the multispectral (XS) mode.
The HRV panchromatic sensor has a relatively wide spectral range, 0.51 - 0.73 mm, with a
higher spatial resolution of 10 x 10 m2
HRV Multispectral (XS) mode
B1

0.50 - 0.59 mm

B2

0.61 - 0.68 mm

B3

0.79 - 0.89 mm

The spatial resolution for the multispectral (XS) mode is 20 x 20 m2.


Besides the difference of spectral and spatial resolution design from the Landsat sensor
systems, major differences between MSS/TM and HRV are the use of linear array (also
called pushbroom) detectors and the off-nadir observation capabilities with the HRV
sensors (Figure 3.20). Instead of mirror rotation in the MSS or the TM sensors which
collect data using only a few detectors, the SPOT HRV sensors use thousands of detectors
arranged in arrays called "charge-coupled devices" (CCDs). This has significantly reduced
the weight of the sensing system and power requirement.

Figure 3.20. The SPOT HRV systems

A mirror with the view angle of 4.13 is used to allow 27 off nadir observation. An
advantage of the off-nadir viewing capability is that it allows more frequent observations
of certain targeted area on the earth and acquisitions of stereo-pair images. A disadvantage
of the HRV sensors is the difficulties involved in calibrating thousands of detectors. The
radiometric resolution of MSS is 6 to 7 bits, while both TM and HRVs have an 8 bit
radiometric resolution.
The orbital cycle is 18 days for Landsats 1 - 3; 16 days for landsats 4, 5; 26 days for SPOT1 (SPOT HRV sensors can repeat the same target in 3 to 5 days due to their off-nadir
observing capabilities).

AVHRR - Advanced Very High Resolution Radiometer


Among many meterological satellites, the Advanced Very High Resolution Radiometers
(AVHRR) on board the NOAA series (NOAA-6 through 12) have been widely used.

NOAA series were named after the National Oceanic and Atmospheric Administration of
the United States.
The AVHRR sensor has 5 spectral channels
B1

0.58 - 0.68 mm

B2

0.72 - 1.10 mm

B3

3.55 - 3.95 mm

B4

10.3 - 11.30 mm

B5

11.5 - 12.50 mm

Swath width 2400 Km

The orbit repeating cycle is twice daily. This is an important feature for frequent
monitoring. NOAA AVHRRs have been used for large scale vegetation and sea ice studies
at continental and global scales.

Earth Observing System (EOS)


To document and understand global change, NASA initiated Mission to Planet Earth. This
is a program involving international efforts to measure the Earth from space and ground.
Earth Observing System is a primary component of the Mission to Planet Earth. EOS
includes the launch of a series of satellites with advanced sensor systems by the end of this
century. Those sensors will be used to measure most of the measurable aspects of the land,
ocean and atmosphere, such as cloud, snow, ice, temperature, land productivity, ocean
productivity, ocean circulation, atmospheric chemistry, etc.
Among various sensors on board the first six satellites to be launched, there is a sensor
called Moderate Resolution Imaging Spectrometer (MODIS). It has 36 narrow spectral
bands between 10-360 nm. The spatial resolution changes as the spectral band changes.

Two bands have 250 m, 5 have 500 m while the rest have 1000 m resolution. The sensor is
planned to provide data covering the entire Earth daily.

Other Satellite Sensors


GOES - Geostationary Operational Environmental Satellite (Visible to NIR, Thermal)
DMSP - Defense Meterological Satellite Program 600 m resolution (Visible to NIR,
Thermal) used, for example for urban heat island studies
Nimbus - CZCS - coastal zone color scanner, 825 m spatial resolution
Channels (6 total) Spectral Resolution
1-4

0.02 m

for chlorophyll absorption studies

5-6

NIR -thermal

Two private companies, Lockheed, Inc. and Worldview, Inc. are planning to launch their
own commecial satellites in 2-3 years time with spatial resolutions ranging from 1 m to 3
m. In Japan, the NASDA (National Space Development Agency) has developed the Marine
Observation System (MOS). On board this system, there is a sensor called Multispectral
Electronic Self-scanning Radiometer (MESSR) with similar spectral bands as the Landsat
MSS systems. However, the spatial resolution of the MESSR system is 50 x 502.
Other countries such as India and the former USSR have also launched Earth resources
satellites with different optical sensors.
3.4 Airborne Multispectral Systems
Multispectral scanners

The mechanism of airborne multispectral sensors is similar to the Landsat MSS and TM.
The airborne sensor systems usually have more spectral bands ranging from ultraviolet to
visible through near infrared to thermal areas. For example, the Daedalus MSS system is a
widely used system that has 11 channels, with the first 10 channels ranging from 0.38 to
1.06 m and the 11th is a thermal channel (9.75 - 12.25 mm).

Another airborne multispectral scanner being used for experimental purposes is the TIMS Thermal Infrared Multispectral Scanner. It has 6 channels: 8.2 - 8.6; 8.6 - 9.0; 9.4 - 10.2;
10.2 - 11.2; 11.2 - 12.2 m.

MEIS-II
Canada Centre for Remote Sensing developed the Multispectral Electro optical Imaging
Scanner (MEIS-II). It uses 1728 - element linear CCD arrays that acquire data in eight
spectral bands ranging from 0.39 to 1.1 mm. The spatial resolution of MEIS-II can reach
up to0.3 m.
Advantages of multispectral systems over photographic systems are

Spectral range: photographic systems operate between 0.3 - 1.2 mm while


Multispectral systems operate between 0.3 - 14 mm.
Multiband photography (photographic system) uses different optical systems to
acquire photos. This leads to problems in data incomparability among different
cameras. MSS, on the other hand, uses the same optical system, eliminating the data
incomparability problems.
Electronical process used in MSS is easier to calibrate than photo chemical process
used in photographic systems.
Data transmission is easier for MSS than for photographic systems which require
onboard supply of films.
Visual interpretation for photographic systems v.s. digital analysis for MSS systems-Visual analysis is difficult to analyze in 3-dimension.

False Color Composite


Each time, only three colours (red, green and blue) can be used to display data on a colour
monitor. The colours used to display an image may not be the actual colour of the spectral
band that is used to acquire the image. Image displayed with such colour combinations are
called false colour composite. We can make many 3-band combinations out of a
multispectral image.

where Nc is the total number of 3-band combinations and nb is the number of spectral
bands in a multispectral image. For each of these 3-band combinations, we can use red,
green, and blue to represent each band and to obtain a false-colour image.

Digital Photography with CCD Arrays


Videographic imaging includes the use of video cameras and digital CCD cameras. Video
images can be frame grabbed,or quantized and stored as digital images; however, the
image resolution is relatively low (up to 550 lines/image). Digital CCD cameras use twodimensional silicon-based charge coupled devices that produce a digital image in standard
raster format. CCD detectors arranged in imaging chips of approximately 1024 X 1024 or
more photosites produce an 8-bit image (King, 1992).
Digital CCD photography compare favorably to other technologies such as traditional
photography, videography, and line scanning. Comparing to photography, digital CCD
cameras have linear response, greater radiometric sensitivity, wider spectral response,
greater geometric stability, and no-need for film supply (Lenz and Fritsch, 1990, King,
1992). Matching with the fast development of softcopy photogrammetry, they have the
potential to replace the role of aerial photography and photogrammetry for surveying and
mapping.

Imaging Spectrometry
Imaging spectrometry refers to the acquisition of images in many, very narrow, continuous
spectral bands.
The spectral region can range from visible, near-IR to mid-IR.

The first imaging spectrometer was developed in 1983 by JPL. The system called
Airborne Imaging Spectrometer (AIS) collects data in 128 channels from 1.2 m to
2.4 mm. Each image acquired has only 32 pixels in a line.
The Airborne Visible-Infrared Imaging Spectrometer (AVIRIS) represents an
immediate follow-up of the AIS (1987). It collects 224 bands from 0.40 - 2.45 mm
with 512 pixels in each line.
In Canada, the first system was the FLI - Flourescence Linear Imager manufactured
by Moniteq, a company that used to be located in Toronto, Ontario.

In Calgary, the ITRES Research is producing another imaging spectrometer called the
Compact Airborne Spectroscopy Imager (CASI) (Figure 3.21).

Figure 3.21. The two dimensional linear array of the CASI.

For each line of ground targets, there will be nb x ns data collected at 2 bytes (16 bits)
radiometric resolution where nb is the number of spectral bands and ns is the number of
pixels in a line.
Due to constraint of data transmission rate, these nb x ns data cannot be transferred
completely. This leads to a division into two operation modes of CASI, spectral mode and
spatial mode.
In spectral mode, all 288 spectral bands are used, but only up to 39 spatial pixels (look
directions) can be transferred.
In the spatial mode, all 512 spatial pixels are used, but only up to 16 spectral bands can be
selected.

Where to obtain remote sensing data?


See the Appendix in Lillesand and Kiefer (1994).
3.5 Microwave Remote Sensing

Radar represents "radio detection and ranging". As we mentioned before it is an active


sensing system. It uses its own energy source - microwave energy. A radar system
transmits pulses in the direction of interest and records the strength and origin of "echos"
or reflection received from objects within the system's field of view.
Radar systems may or may not produce images

None imaging

Doppler radar -used to measure vehicle


speeds.
Plan Position Indicator (PPI) - used to
observe weather systems or air traffic

}Ground Based

Imaging radar

Side-looking airborne radar (SLAR)

}Air Based

SLAR systems produce continuous strips of imagery depicting very large ground areas
located adjacent to the aircraft flight line. Since cloud system is transparent to microwave
region, a SLAR has been used to map tropical areas such as SLAR Amazon River Basin.
Started in 1971 and ended in 1976, the project RADAM (Radar of the Amazon) was the
largest radar mapping project ever undertaken. In this project, the Amazon area was
mapped for the first time. In such remote and cloud covered areas of the world, radar
system is a prime source of information for mineral exploration, forest and range
inventory, water supplies and transportation management and site suitability assessment.
Radar imagery is currently neither as available nor as well understood as other image
products. An increasing amount of research is being conducted on interaction mechanism
between energy and surface targets, such as forest canopy, and on the combination of radar
image with other image products.
SLAR system organization and operation are shown in Figure 3.22.

Figure 3.22. A RADAR system components and organization

Spatial Resolution of SLAR systems


The ground resolution of SLAR system is determined by two independent sensing
parameters: pulse length and antenna beam width (Figure 3.23).
The time period can be measured from a transmitted signal travelling through the air
reaching the target and being scattered back to the antenna. We then can determine the
distance or the 'slant range' between the antenna and the target.

where

Sr: the slant range.


c: the speed of light.
t: time period for a returned transmitted pulse.

Figure 3.23

From Figure 3.23, it can be seen that SLAR depends on the time it takes for a transmitted
pulse being scattered back to the antenna to determine the position of a target.
In the across track direction, there is a spatial resolution which is determined by the
duration of a pulse and the depression angle (Figure 3.24). This resolution is called ground
range resolution. (rg)

Figure 3.24. The across track spatial resolution

The along track distinguishing ability of a SLAR system is called azimuth resolution ra:

Figure 3.25. The sidelobes of a RADAR signal

It is obvious that in order to minimize rg, one needs to reduce . For the case of ra, the
optimal situation is determined by which is a function of wavelength and antenna
length

can be the actual physical length of an antenna or a synthetic one.


Those systems whose beam width is controlled by the physical antenna length are called
brute force or real aperture radar.

For a real aperture radar, the physical antenna length must be considerably longer than the
wavelength in order to achieve higher azimuth resolution. Obviously, it has a limit at which
the dimension of the antenna is not realistic to be put onboard an aircraft or a satellite.
This limiation is overcome in synthetic aperture radar (SAR) systems. Such systems use
a short physical antenna, but through modified data recording and processing techniques,
they synthesize the effect of a very long antenna. This is achieved by making the use of
Doppler effect (Figure 3.26).

Figure 3.26. The use of Doppler Effect

Synthetic aperture radar records the frequency differences of backscattering signal at


different aircraft position during the time period when the target is illuminated by the
transmitted energy.
SAR records both amplitude and frequency of backscattering signals of objects throughout
the time period in which they are within the beam of moving antenna. These signals are
recorded on tapes or on films. This leads to two types of data processing.
One of the problem associated with processing radar signals from tapes is that the signal is
contaminated by random noise. When displayed on a video monitor, the radar image tends
to have a noisy or speckled appearance. Later in the digital analysis section, we will
discussed the speckle reduction strategies.

Radar Equation Derivation

What we actually measure is the backscattered energy Pr in


Watts.
Antenna transmitts Pt watts

At a distance R the power is

Power received at Antenna:

If the same antenna is used for both transmitting and receiving, then

All parameters in this formula except "d" is determined by the system. Only is parameter
related to the ground target. Unfortunately, is a poorly understood parameter which
largely limits its use in remote sensing.
We know is related not only to system variables including wavelength, polarization,
azimuth, landscape orientation, and depression angle, but also to landscape parameters
including surface roughness, soil moisture, vegetation cover, and micro topography.
moisture influences the dielectric constant of the target which in turn could significantly
change the backscattering pattern of the signal. Moisture also stops the microwave
penetrating capability.
Roughness - the standard deviation S(h) of the heights of individual facets.

In the field, we use an array of sticks arranged paralell to each other with a constant
distance interval to measure the surface roughness.
A common definition of a rough surface is one whose S(h) exceeds one eighth of the
wavelength divided by the cosine of the incidence angle

As we illustrated in the spectral reflectance section, a smooth surface will tend to reflect all
the energy input at an angle equal to the incidence angle, while a rough surface tends to
scatter the incoming energy more or less at all direction.

Polarization

Microwave energy can be transmitted and received by the antenna at a selected orientation
of the electromagnetic field. The orientation or polarization of the EM field is labelled as
Horizontal (H) and Vertical (V) direction. The antenna can transmit using either
polarization. This EM energy makes it posible for a radar system to operate in any of the
four models transmit H and recieve H, transmit H receive V, transmit V recieve H, and
transmit V receive V. By operating at different modes, the polarizing characteristics of
ground target can be obtained.

Corner reflector

It tends to collect
reflected signal at its foreground and returns the signal to the antenna.
Microwave Bands
Band

Wavelength 1

Ka

0.75 - 1.1 cm

40 - 26.5 GHz

1.1 - 1.67 cm

26.5 - 18 GHz

Ku

1.67 - 2.4cm

18 - 12.5 GHz

2.4 - 3.75cm

12.5 - 8GHz

3.75 - 7.5cm

8 - 4GHz

7.5 - 15cm

4 - 2GHz

15 - 30cm

2 - 1GHz

30 - 100cm

1 - 300 MHz

Geometric Aspects
Radar uses two types of image recording systems, a slant-range image recording system
and a ground-range image recording system.
In slant-range recording system, the spacing of targets is proportional to the time interval
between returning signals from adjacent targets.

In ground-range image recording system, the spacing is corrected to be approximately


proportional to the horizontal ground distance between ground targets.

If the terrain is flat, we can convert the slant-range spacing SR to Ground range GR

Relief distortion

Relief displacement is different on the photographs from SLAR images.

Space-borne radars

Seasat launched in1978, duration 98 days

Frequency L band
Swath width 100 km centered at 20 from nadir
Polarization HH
Ground Resolution 25 m x 25 m

Shuttle Imaging Radar, SIR-A, SIR-B, SIR-C

The Euraopean Space Agency has lauched a satellite in 1991: ERS - 1, with a C
band SAR sensor.
In 1992, the Japanese JERS -1 satellite was launched with a L band radar mounted.
The L band radar has a higher penetration capability than the C band SAR.

Radarsat

Scheduled to be launched in mid 1995, Radarsat will contain a SAR system which is
very flexible in terms of configurations of incidence angle, resolution, number of
looks and swath width.

Radarsat

Frequency

C band 5.3 GHz

Altitude

792 Km

Repeat Cycle

16 days

Radarsat Subcycle

3 day

Period

100.7 min (14 cycles per day)

Equatorial crossing

6:00 A.M.

Platform
Satellite Orbits

Campbell's Book p. 118-129

Chapter 3

References
Ahmed, S. and H.R. Warren, 1989. The Radarsat System. IGARSS'89/12th Canadian Symposium on Remote
Sensing. Vol. 1. pp.213-217.
Anger, C.D., S. K. Babey, and R. J. Adamson, 1990, A New Approach to Imaging Spectroscopy, SPIE
Proceedings, Imaging Spectroscopy of the Terrestrial Environment, 1298: 72 - 86. - specifically, CASI
Curlander, J.C., and McDonough R. N., 1991. Synthetic Aperture Radar, Systems & Signal Processing. John Wiley
and Sons: New York.
Elachi, C., 1987. Introduction to the Physics and Techniques of Remote Sensing. John Wiley and Sons, New York.
King, D., 1992. Development and application of an airborne multispectral digital frame camera sensor. XVIIth
Congress of ISPRS, International Archives of Photogrammetry and Remote Sensing. B1:190-192.
Lenz, R. and D. Fritsch, 1990. Accuracy of videometry with CCD sensors. ISPRS Journal of Photogrammetry and
Remote Sensing, 90-110.
Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd. Ed., John Wiley and Sons,
Inc.: Toronto.
Luscombe, A.P., 1989. The Radarsat Synthetic Aperture Radar System. IGARSS'89/12th Canadian Symposium
on Remote Sensing. Vol. 1. pp.218-221.
Staenz, K., 1992. A decade of imaging spectrometry in Canada. Canadian Journal of Remote Sensing. 18(4):187197.

4.1 Digital Imagery


Different from Cartesian coordinate system, the origin and the axis in an image coordinate
system takes the following form for printing and processing purposes:

Figure 4.1. An image coordinate system


Each picture element in an image, called a pixel, has coordinates of (x, y) in the discrete
space representing a continuous sampling of the earth surface. Image pixel values represent
the sampling of the surface radiance. Pixel value is also called image intensity, image
brightness or grey level. In a multispectral image, a pixel has more than one grey level.
Each grey level corresponds to a spectral band. These grey levels can be treated as greylevel vectors.
From the continuous physical space to the discrete image space, a quantization process is
needed. The details of quantization is determined by how we do sampling and what kind of
resolution we use. General concepts on sampling and resolution have been introduced in
Chapter 1.
Two concepts are of particular importance; image space and feature space. Image space
refers to the spatial coordinates of an image(s) which are denoted as I with m x n elements,
where m and n are respectively the number of rows and the number of columns in the
image(s). The elements in image space, I(i,j) (i = 1, 2,..., m; j = 1, 2,..., n) are image pixels.
They represent spatial sampling units from which electromagnetic energy or other
phenomena are recorded. All possible image pixel values constitute the feature space V.
One band of image constitutes a one-dimensional feature space. k bands in an image
denoted as Ik construct a k-dimensional feature space Vk. Each element in Vk is a unit
hypercube whose coordinate is a k-dimensional vector v = (v1, v2, ..., vk)T. When k = 1, 2,
and 3 the hypercube becomes a unit line, a unit area, and a real unit cube. Each pixel in
image space has one and only one vector in feature space. Different pixels may have the
same vector in feature space.

Multispectral images construct a special feature space, a multispectral space Sk. In S, each
unit becomes a grey-level vector g = (g1, g2, ..., gk)T. In multispectral images, each pixel
has a grey-level vector. There are other types of images which add additional dimensions
to the feature space. In the feature space, various operations can be performed. One of
these operations is to classify feature space into groups with similar grey-level vectors, and
give each group a same label that has a specific meaning. The classification decision made
for each image pixel is in feature space and the classification result is represented in image
space. Such an image is a thematic image which could also be used as an additional
dimension in feature space for further analysis.
4.1.1 Pixel Window
A pixel window is defined in image space as a group of neighbouring pixels. For the
computation simplicity, a square pixel neighbourhood wl(i,j) centered at pixel I(i,j) with a
window lateral length of l is preferred. Without further explanation, we refer to a pixel
window as wl(i,j). In order to ensure that I(i,j) is located at the centre of the pixel window,
it is necessary for l to be an odd number. It is obvious that the size of a pixel window wl(i,j)
is l X l. The following conditions hold for a pixel window:

This means that the minimum pixel window is the centre pixel itself, and the maximum
pixel window could be the entire image space, provided that the image space is a square
with an odd number of rows and columns. When the image space has more than one image,
a pixel window can be used to refer to a window located in any one image or any
combinations of those images.
4.1.2 Image Histogram
A histogram sometimes has two means: a table of occurrence frequencies of all vectors in
feature space or a graph plotting these frequencies against all the grey-level vectors. The
occurrence frequency in the histogram is the number of pixels in the image segment having
the same vector. When the entire image space is used as the image segment, the histogram
is referred to as h(I). When a histogram is generated from a specific pixel window, it is
identified as hl(i,j) where l, i, and j are the same as above. In practice, one-dimensional
feature space is mainly used. In this case, a histogram is a graphical representation of a
table with each grey level as an entry of the table. Corresponding to each grey level is its
occurrence frequency f(vi) , i = 0, 1, 2, ..., Nv-1 and Nv are the numbers of grey levels of an
image (e.g.,Nv = 8 in Figure 4.2).

Figure 4.2. An example histogram

From a histogram h(I) we can derive the cumulative histogram hc(I)={fc(vi) , i = 0, 1, 2, ...,
Nv-1}. This is obtained for each grey level by summing up all frequencies whose grey
levels are not higher than the particular grey level under consideration (Figure 4.3).

Figure 4.3. A cumulative histogram

In a numerical form that is:

4.1.3 Quality of a Digital Image


Two parameters of a sensor system at a specific height determine the quality of a digital
remote sensing image for a given spectral range; the spatial resolution rsand the
radiometric resolution rr. As discussed in Chapter 1, the spatial resolution determines how
finely the spatial detail of the real world an image can record (i.e., how small the spatial
sampling unit is) and therefore the number of pixels in the image space. The radiometric
resolution determines how finely a spectral signal can be quantized and therefore the

number of grey levels that is produced. The finer these resolutions are, the closer is the
information recorded in the image to the real world, and the larger are the sizes of the
image space and the grey-level vector space. The size (or alternatively the number of
pixels) of image space, S(I), has an exponential relation with the spatial resolution, and so
does the size (or the number of vectors) of the feature space, S(V), with the radiometric
resolution. Their relations take the following forms:

where k, as defined above, is the number of images in the image space. While S(I) has a
fixed exponential order of 2 with rs, S(V) depends not only on rr, but also on k. The
number of vectors in Vk becomes extremely large when k grows while rr is unchanged. For
example, each band of a Landsat TM or SPOT HRV image is quantized into 8 bits (i.e., an
image has 256 possible grey levels). Thus, when k = 1, S(V) = 256 and when k = 3, S(V) =
16,777,216. If a histogram is built in such a three-dimensional multispectral space, it would
require at least 64 Megabytes of random access memory (RAM) or disk storage to process
it. Therefore, the feature space has to be somehow reduced for certain analyses.
4.1.4 Image Formats
A single image can be represented as a 2-dimensional array. A multispectral image can be
represented in a 3-dimensional array (Figure 4.4)

Figure 4.4. A multispectral image

In a computer, image data can be stored in a number of ways:

The most popular ones include Band Sequential (BSQ), Pixel Interleaved, Line Interleaved
(BIL) or separate files. These format can be illustrated using the following example of a
three-band multispectral image.
AAA BBB CCC
AAA BBB CCC
Band 1 Band 2 Band 3
BIL is typically used by the Landsat Ground Station Operators' Working Group
(LGSOWG)
AAA BBB CCC, AAA BBB CCC
Band Sequential BSQ takes the following form:
AAA AAA BBB BBB CCC CCC
Pixel Interleaved format is used by PCI. It takes the form of:
ABC ABC ABC ABC ABC ABC
These are the general formats that are being used. BIL is suitable for data transfer from the
sensor to the ground. It does not need a huge buffer for data storage on the satellite if the
ground station is within the transmission coverage of the satellite.
Pixel interleaved is suitable for pixel-based operation or multispectral analysis.
Band sequential and separate file formats are the proper forms to use when we are more
interested in single-band image processing, such as image matching, correlation, geometric
correction, and when we are more concerned with spatial information processing and
extraction. For example, we use these files when linear features or image texture are of our
concern.
4.2 Factors Affecting Image Geometry
In remote sensing there are three major forms of imaging geometry as shown in Figure 4.5:

Figure 4.5 The major types of imaging geometry

The first one is central perspective. It is the simplist because the entire image frame is
defined by the same set of geometrical parameters. In the second imaging geometry, each
pixel has its own central perspective. This is the most complicated because each pixel has
to be corrected separately if there exists geometrical distortion. The third one shows that
each line of an image has a central perspective.
The platform status which can be represented by six parameters all affect the image
geometry.
(X, Y, Z, , , )

In addition, the following factors affect the image geometry.

airborne platform
earth rotation - affects satellite
continental drift

Most remote sensing satellites for earth resources studies, such as the Landsat series and
the SPOT, use Sun synchronous polar orbit around the earth (Figure 4.6) so that they
overpass the same area on the earth at approximately the same local time. Most of the
earth's surface can be covered by these satellites.

Figure 4.6. Sun synchronous polar orbit for Earth resources satellites

The effects of roll, pitch and yaw along the direction of satellite orbit or the airplane flight
track can be illustrated by using Figure 4.7.

Figure 4.7. The effects of roll, pitch and yaw on image geometry

4.3. Flattening the Earth Surface through Map Projection


Although the Earth's surface is spherical, we use flat maps to represent the phenomena on
the surface. We transform the coordinates on the spherical surface to a flat sheet of paper
using map projection. The most widely used map projection is Universal Transverse
Mercator (UTM) projection.
4.4 Georeferencing (Geometric Correction)
The purpose of georeferencing is to transform the image coordinate system (u,v), which
may be distorted due to the factors discussed above, to a specific map projection (x,y) as
shown in Figure 4.8. The imaging process involves the transformation of a real 3-D scene
geometry to a 2-D image

Figure 4.8. Georeferencing is a transformation between the image space to the geographical coordinate
space

Terms such as geometric rectification or image rectification, image-to-image registration,


image-to-map registration have the following meanings:
1) Geometric rectification and image rectification recovers the imaging geometry
2) Image-to-image registration refers to transforming one image coordinate system into
another image coordinating system
3) Image-to-map registration refers to transformation of one image coordinate system to a
map coordinate system resulted from a particular map projection.
Georeferencing generally covers 1) and 3). It requires a transformation T:
Forward Transformation is composed of the following transformations:

In order to achieve:

Every step involved in the imaging process has to be known, i.e., we need to know the
inverse process of geometric transformation.

This is a complex and time consuming process. However, there is a simpler and widelyused alternative: polynomial approximation.

Coefficients a's and b's are determined by using Ground Control Points (GCPs).
For example, we can use very low order polynomials such as the affine transformation
u = ax + by + c
v = dx + ey + f
A minimum of 3 GCPs will enable us to determine the coefficients in the above equations.
In this way, we don't need to use the transformation matrix T. However, in order to make
our coefficients representative of the whole image that is transformed, we have to make
sure that our GCPs are well distributed all over the image.
The third choice is that we can combine the T-1 method with the polynomial technique in
order to reduce the transformation errors involved in the direct transformation of T-1
(Figure 4.9).

Figure 4.9. Larger magnitude of errors may be introduced


if direct transformation is used.

This may be achieved through the following four steps:


(1) To refine imaging geometry parameters.
In T-1, due to the inaccuracies of satellite or plane positioning, polynomials are used to
correct them. For platform position the following formula can be used:

We can use GCPs to refine the coefficients. Global Positioning System (GPS) and/or
Inertial Navigation Systems (INS) techniques can also be used. The integration of GPS and
INS with remote sensing sensors are being investigated (Schwarz, et al, 1993).
(2) Divide output grid into blocks (Figure 4.10):

Figure 4.10. In the x-y space

(3) Map the grid points using


(4) Use a low order polynomial inside each block for detailed mapping (Figure 4.11)

Figure 4.11. Further transformation from u-v space to Dx-Dy space using lower order
polynomials
The choices are:

(i) Affine
(ii) Bilinear
Why (ii) is called bilinear? This is because each coordinate can be a multiplication of two
linear function of x and y.
u = (a + bx) (c + dy)
linear linear
Bilinear
Since there are four known and four unknown, therefore we can solve (i) using least
squares (ii) using regular solution of an equation group. We will only show how to obtain
ao, a1, a2, a3 in (ii).

For point
Similarly, we can obtain bo, b1, b2, b3.
Why do we use bilinear instead of affine? It is because the bilinear transformation
guarantees the continuity from block to block in the detailed mapping. The geometric
interpolation of bilinear transformation is illustrated in Figure 4.12.

Figure 4.12. Linear and bilinear transformation

Method for Determining the coefficients of a polynomial geometric transformation

We can use the least square solution for bilinear polynomials.

This is done with more than 4, say n, GCPs,

(u1, v1) (x1, y1)


(u2, v2) (x2, y2)
.
.
.
(un, vn) (xn, yn)

by substituting the n GCPs coordinates into (1) and (2) we will obtain two groups of overdetermined equations

The least squares solution in matrix form:


For x

by multiplying

on both sides, we have

A is the solution. Similarly we can solve bo, b1, b2, b3.


This can be applied to affine transformation and higher order polynomial transformation.
4.5 Image Resampling
Once the geometric transformation (T) is obtained as we have discussed above,
theoretically we can use the following relation to transform each pixel (i,j) from image
space (u-v) to a desirable space (x-y).

The results will appear as in Figure 4.13. Pixel position (1, 1) may be transformed to
(4850.672, 625.341).

Figure 4.13. Forward transformation results in fractional coordinates in x-y space


In order to have a grid coordinate in x-y space, data have to be resampled in u-v image
space for given coordinates in x-y space. This is shown below:

For a pixel location in x-y space, the corresponding coordinates * in u-v space are found
through T-1. To determine the grey level at the * location in u-v space, interpolation
strategies are used. These include:
Nearest neighbour interpolation
Bilinear (linear in one dimension)
cubic - special case of spline
There are some other interpolation methods, such as use of sinc function, spline function,
etc. The most commonly used methods in remote sensing are, however, the three listed
above.
Nearest neighbour interpolation simply assigns the value to a pixel that is closest to * as
shown below:

In one dimensional case, it can be illustrated as:

This can be achieved using the following convolution operation:

where

(u) is the weight function.

One dimensional convolution:


The convolution of two functions f1 and f2 is denoted by,

Convolution is equivalent to flipping filter backwards and then do correlation.


In image enhancement, convolution is the same as correlation since filters are symmetrical.
In image resampling, Id(i) is the discrete image, (i) is the weight function for
interpolation
I'(u) = Id(u) * (u)
convolution operator

Since most weight functions are limited to a local neighbourhood, only a limited number of
i's need to be used.

For instances, in the nearest neighbour (NN) interpolation, i takes the value which is
closest to u. In linear interpretation, l and h takes the nearest integer less than and equal to
u and the nearest integer greater than u, respectively. For cubic, l and h takes the second
nearest integer less than and equal to u while h takes second closest integer that is greater
that u.
For sinc function,

x can be infinite but we usually need to use a limited number of terms up to 20.
According to the above introduction of convolution, for the nearest neighbour case the
weight function is

For the linear case, it is

For the cubic convolution:

For the case of two dimension, a sequential process is used.


Z(u) values are first calculated along each row as shown below

Then Z(u, v) is obtained by applying convolution along the dashed line. The convolution
process for all the three interpolation cases can be shown by

For the NN, l = m, the closest point to u.

For linear:
l = nearest integer equal or smaller than u
m = nearest integer larger than u
For Cubic:
l = nearest two integers equal or smaller than u
m = nearest two integers larger than u.
Chapter 4

References
Jensen, J.R., 1986. Digital Image Processing, a Remote Sensing Perspective.
Schwarz, K-P., Chapman, M.A., Canon, E.C. and Gong, P., 1993. An integrated INS/GPS approach to the
georeferencing of remotely sensed data. Photogrametric Engineering and Remote Sensing, 59(11): 1667-1673.
Shlien, S., 1979. Geometric correction, registration, and resampling of Landsat Imagery. Canadian Journal of
Remote Sensing. 5(1):74-87.

5. Radiometric Correction
In addition to distortions in image geometry, image radiometry is affected by factors, such
as system noise, sensor malfunction and atmospheric interference. The purpose of
radiometric calibration is to remove or reduce the sensor (detector) inconsistencies, sensor
malfunction, viewing geometry and atmospheric effects. We will first introduce the
calibration of detector responses.

5.1 Detector Response Calibration


As we have discussed before, Landsat MSS has 6 detectors at each band, TM has 16 and
SPOT HRVs have 3000 or 6000 detectors. The differences between the SPOT sensors and
Landsat sensors are that each SPOT detector collects one column of an image while each
detector of Landsat sensors corresponds to many lines of an image (Figure 5.1).

Figure 5.1. Images acquired using detectors in linear array sensors and in scanners

The problem is that no detector functions the same way as others. If the problem becomes
serious, we will observe banding or striping on the image.
There are two types of approach to overcome the detector response problems: absolute
calibration and relative calibration.
5.1.1 Absolute calibration
In this mode, we attempt to establish a relationship between the image grey level and the
actual incoming reflectance or the radiation. A reference source is needed for this mode
and this source ranges from laboratory light, to on-board light, to the actual ground
reflectance or radiation.

For CASI, each detector is calibrated by the manufacturer in the laboratory. For the
Landsat MSS, a calibration wedge with 6 different grey levels is used. For the Landsat TM,
three lamps, which have 8 brightness combinations, are used.
In any case, a linear response is assumed for each detector
vo = a vi + b
vo - observed reading
vi - known source reading
e.g. for an 8-bit image 0 < vo < 255 .
Least squares method is used to derive a and b (Figure 5.2).

Figure 5.2. Responses of the six Landsat MSS detectors. A least squares linear
fitting is applied to these detector responses.
Once each detector is calibrated, the calibrated image data (digital numbers) can be
converted into radiances or spectral reflectances. For the case of converting digital
numbers of an 8 bit image into radiances, we have

5.1.2 Relative calibration

Even though data may have been absolutely calibrated, an image may still have problems
caused by sensor malfunctioning. For example, in some of the early Landsat-1, 2, 3
images, there may be lines which have been dropped out. No response for that particular
detector can be found. In other cases, there are still striping problems. This happens to both
MSS and TM images. The striping problem is most obvious when an image is acquired
over water body where the actual spectral reflectances from one part to another are similar
(Figure 5.3).

Figure 5.3. When six detectors of the Landsat MSS are seeing the same
water target, their responds should be the same.

There are two additional methods to balance the detector response:


(1) Balance the mean and standard deviation
(2) Balance the histogram

(1) Balance the mean and standard deviation (m and )


The aim of this method is to make the m and to be the same for each detector. For each
detector i, we need a transfer function to transfer measured mi and i to a standard set of m
and .
For each detector n, assume measured mean = mn
measured

= n

desirable mean = M
desirable

=S

The transfer function is


I'n = anIn + bn
where I'n is the calibrated intensity and In is the original intensity
an and bn are the gain and bias to be determined.
The solution is:

For an 8-bit image, you may try to use M = 128 and S = 50 or may use the mean and
standard deviation calculated from the entire sample.
This may not always work. The assumption behind this strategy is that detector responses
are linear.

(2) Balance histogram

The assumption for balancing histogram is that each detector has the same probability of
seeing the scene and, therefore, the grey-level distribution function should be the same.
Thus if two detectors have different histograms (a discrete version of grey-level
distribution function), they should be corrected to have the same histogram.
This is usually done by comparing their cumulative histograms as shown in Figure 5.4.

Figure 5.4. Balancing the histogram F2 to the reference histogram F1.

This process is done for each given grey level, g2, to find its cumulative frequencies
fc2(g2) in F2. Then in F1 find the grey-level value, g1, such that its cumulative frequency
fc1(g1) = fc2(g2). Then assign g1 to g2 in the histogram to be adjusted.
5.2 Atmospheric Correction of Remotely Sensed Data
Atmospheric correction is a major issue in visible or near-infrared remote sensing because
the presence of the atmosphere always influences the radiation from the ground to the
sensor.
The radiance that reaches a sensor can be determined by

Normally Lmax, Lmin and DNrange are known from the sensor manufacturer or operator.
However, Ls is composed of contributions from the target, background and the atmosphere
(Figure 5.5):

Figure 5.5 Target, background and scattered radiation received by the sensor.

As introduced before, the atmosphere has severe effects on the visible and near-infrared
radiance. First, it modifies the spectral and spatial distribution of the radiation incident on
the surface. Second, radiance being reflected is attenuated. Third, atmospheric scattered
radiance, called path radiance, is added to the transmitted radiance.
Assuming that Ls is the radiance received by a sensor, it can be divided into LT and LP
LS = LT + LP (1)
LT is the transmitted radiance.
LP is atmospheric path radiance.
Obviously, our interest is to determine LT.
For a given spectral interval, the solar irradiance reaching the earth's surface is

EG =

where ES is the solar irradiance outside the atmosphere,


Ti atmospheric transmittance along the the incident direction,
i incident angle
Ed diffuse sky irradiance
Surface can be either specular or diffuse. Most surfaces can be considered as approximate
diffuse reflectors at high solar elevations, i.e. when i is small.
If the surface is assumed to be a perfect diffuse reflector i.e. the Lambertian case, the ratio
of the radiation reflected in the viewing direction to the total radiation into the whole upper
hemisphere is given by

Based on Lambertian assumption,

where is the target reflectance, Te is the transmittance along the viewing direction.
Therefore in order to quantitatively analyze remotely sensed data, i.e. to find ,
atmospheric transmittance T and path radiance Lp have to be known.
5.2.1 Single scattering atmospheric correction

In practice, (2) and (3) can be written as

Path radiance Lp
Lp is determined by at least two parameters: single scattering albedo and single scattering
phase function.
Single scattering albedo = 1 when no attenuation occurs. Single scattering phase function
denotes the fraction of radiation which is scattered from its initial forward direction to
some other direction.
For Rayleigh atmosphere

For Mie's atmosphere

From the above diagram, it can be seen that forward scattering is dominated by aerosols
while back scattering is mainly due to Rayleigh scattering.
A number of path radiance determination algorithms exists. For a nadir view as Landsat
MSS, TM and SPOT HRV are usually used. Lp for these algorithms can be determined by:

P is a combination of Mie and Ragleigh atmosphere.

For aerosol scattering, the phase function Pp(i) does not change much as wavelength
changes, the function for = 0.7 mm can be used for all wavelengths. This function is
usually found in a diagram or a table form. See a function found in Forster (1984).

The average background B is usually determined by collecting ground-truth information


for a region. A 3 km x 3 km square centering the pixel to be corrected can be used.
Sky Irradiance and Ground Irradiance

In this section, we only tried to introduce some basic concepts of this complex topic. This
is only a single-scattering correction algorithm for nadir viewing condition. More
sophisticated algorithms which counts multiple-scattering do exist. Some examples of
these algorithms are LOWTRAN 7, 5S (Simulation of the Satellite Signal in the Solar
Spectrum 5S) and 6S (Second Simulation - aircraft, altitude of target). There are
FORTRAN codes available for these algorithms. The 5S and 6S are proposed by Tanre and
his colleagues (e.g. Tanre et al., 1990, IGARSS 190, p. 187).
One has to be careful when conducting atmospheric correction since there are many factors
to be counted and to be estimated. If these estimations are not properly made, the
atmospheric correction might add more bias than does the atmosphere itself.
5.2.2 Dark-target atmospheric correction
This is most suitable to the clear sky when Rayleigh atmosphere dominates since Rayleigh
scattering affects short wavelength, particularly visible, and we know that clear-deep water
has a very low spectral reflectance in the short wavelength region. If a relatively large
water body, say 1-2 km in diameter, can be found on an image, we can use the radiance of
water derived from the image as Lw and the real water radiance, L, to estimate Lp.
Lw = K DN water + Lmin
Lp = Lw - L

Lp can then be subtracted from other radiances in an image for the visible channels.
For the infrared channels, Rayleigh atmosphere has little effects and Lp is assumed to be 0.
It can be seen that this method only applies to Rayleigh atmosphere.
5.2.3 Direct digital number to reflectance transformation
This can be done by
R = a DN + b
By tying the ground reflectance measured during the flight overpass to the corresponding
pixel values on the image, we can solve the equation to obtain a and b. This is an empirical
method. In fact, both the dark-target and direct digital number conversion methods have
been most widely used in remote sensing.
5.3 Topographic Correction
In previous sections, we attempted to correct the atmospheric effects, i.e. convert image
digital numbers DNs to image radiance Ls. After atmospheric correction, we expect to
have the spectral reflectivity .
Assume that atmospheric effects can be completely removed from the image, the spectral
reflectivity obtained contains the real target reflectance r and the topographic modification
during image acquisition, G.
=rG
The G contains information about the viewing and energy incidence geometric
relationship.

Moon can be considered approximately as a surface that reflects equal amount of light in
all directions.
5.3.1 The role of relief
What effects does the relief have on the image radiometry? To answer this question, a
different coordinate system will be used and Figure 5.6 shows this image coordinate
system. In this coordinate system, z is the viewing direction and x-y plane is the image
plane.

The actual relief for a small area is defined by its normal and the light source defined by

In discrete case these are the differences between elevations between neighbourhood cells
and the grid cell under consideration.
5.3.2 Gradient Space
For perfectly white surface, when r = 1

For a grey surface, which is a proportional to a perfectly white surface


We can consider as a function of the slope factor (p, q) of the surface.

If r is the same over the whole study area, we can use two set of (p, q)'s to recover (p, q).
Similarly, we can use three sets of
(p, q)s
1(p, q) 2(p, q) 3(p, q) to
recover both "" and (p, q).
Using (p, q), we can generate a shaded map based on a DEM of an area.
Instead of calculating (p, q) for each grid on a DEM, we can calculate a two dimensional
lookup table
p
q

-0.2

-0.1

0.1

0.2

-0.2
-0.1
0
0.1
0.2

The entire DEM {p, q} can be mapped using the above table.
in vector form is (-p, -q, 1)
in vector form is (-ps, -qs, 1)
The look direction is (0, 0, 1)

For sensors that look in nadir direction, the image coordinate system is only a shift from
the local Cartesian coordinates. Thus, the above formula can be used to correct satellite
(Landsat) imagery.

These relationships can be seen from the following diagram.

Chapter 5

References
Forster, B.C., 1984. Derivation of atmspheric correction procedures for Landsat MSS with particular reference
to urban data. Int. J. of Remote Sensing . 5(5):799-817.
Horn, B.K.P., 1986. Robot Vision. The MIT Press:Toronto.
Horn, B.K.P., and Woodham, R.J., 1979. Destriping Landsat MSS images by histogram modification. Computer
Graphics and Image Processing. 10:69-83.
Richards, J.A., 1986. Digital Image Processing. Springer-Verlag: Berlin.

Tanre, D., Deuze, J.L., Herman, M., Santer, R., Vermonte, E., 1990. Second simulation of the satellite signal in
the solar spectrum - 6S code. IGARSS'90, Washington D.C., p. 187.

Further Readings:
Woodham, R.J., and Gray, M.H., 1987. An analytic method for radiometric correction of satellite multispectral
scanner data. IEEE Transactions on Geosciences and Remote Sensing. 25(3):258-271.

6. Image Enhancement

6.1 Histogram-Based Operation

A histogram of an image can tell us about the data distribution with respect to image grey
levels. The purpose of a histogram-based operation is that when a grey-level
transformation is made, pixels in the image having a specific range of grey levels can be
enhanced or suppressed. This is also called contrast adjustment. It can be done using:
1. histogram stretching
2. histogram compression (Figure 6.1)

Figure 6.1. Histogram stretching and compression

Both histogram stretching and histogram compression can be done either linearly or
nonlinearly.
a) Linear adjustment (Figure 6.2)

DN' = a DN

Figure 6.2.

b) Piece-wise linear adjustment (Figure 6.3)

Figure 6.3.

From Figure 6.3, we can have


The idea of contrast adjustment is
the mapping of the range of digital numbers in the original image to a new range. For
example, if the image displayed using the histogram in Figure 6.4a appears dark on the
screen due to the majority of pixel grey-level values are lower than 150. We can linearly
stretch the histogram to transform the grey-level range (0-150) in Figure 6.4a to a new
grey-level range (0-255) in Figure 6.4b.

ab
Figure 6.4. (a) original histogram of an image. (b) the histogram after adjustment.

The following transformation can be used:

c) Non-linear adjustment (Figure 6.5)

Figure 6.5. An exponential adjustment

We can try to use a = 16, b = , i.e. DN' = 16 .


Other non-linear functions include logarithmic, and even sinusoidal.
d) Non-linear adjustment - histogram equalization
The task of histogram equalization is to transform a histogram of any shape to a histogram
which has the same frequency along the whole range of digital number (Figure 6.6).

Figure 6.6. In the continuous case reshape the histogram

This is realized by equally partitioning the cumulative histogram fc of the original image
into 255 pieces. Each piece will correspond to one digital number in the equalized image
(Figure 6.7). On the cumulative curve, find out the nth dividing point,

DN' = n the corresponding DN is x.

Figure 6.7. For the discrete case, modify the grey level value according to the principle of equal
frequency.

The equalization process can also be considered as a histogram matching method used in
image destriping as discussed in Section 5.1. Here we attempt to match the original
cumulative histogram Fc1 to the new cumulative histogram Fc2 (Figure 6.8).

Figure 6.8.

The following example shows how an equalization can be made in discrete digital form. It
starts with the generation of image histogram (first two columns in Table 6.1). Then
probability, Pi is calculated from frequency, f(vi) (third column). A cumulative histogram
Fc can be calculated from frequencies. Similarly, the cumulative distribution function
(CDF) can be derived from probabilities. Based on the cumulative distribution function we
can convert the original grey-levels into grey-levels of the equalized image (Table 2).
Table 6.1 Histogram, cumulative histogram and cumulative distribution function (CDF)
Grey Level
(DN)

Frequency
f(vi)

Probability
Pi

Cumulative
histogram Fc

CDF

0.04

0.04

17

0.17

21

0.21

15

0.15

36

0.36

18

0.18

54

0.54

24

0.24

78

0.78

12

0.12

90

0.90

90

0.90

10

0.10

100

1.00

100

1.00

Table 6.2 Conversion from the grey levels of the


original image to the output image

Input Level

(23 - 1) * CDF

Output

0
1
2
3
4
5
6
7

0.28
1.47
2.52
3.78
4.46
6.3
6.3
7

0
1
3
4
4
6
6
7

6.2 Density Slicing/Color Density Slicing and Pseudo Coloring

Density Slicing is to represent a group of contiguous digital numbers using a single value.
Although some details of the image will be lost, the effect of noise can also be reduced by
using density slicing. As a result of density slicing, an image may be segmented, or
sometimes contored, into sections of similar grey level. Each of these segments is
represented by a user specified brightness.
Similarly, we can represent a section of grey levels using different colors, or
pseudocoloring. This has been used in coloring classification maps in most image analysis
software systems. For example, five classes can be represented by red, green, blue, yellow,
and grey. This can be realized by assigning red, green, and blue color guns with the
following values:
Class No

Red Gun

255

255

100

Green Gun

Blue Gun

6.3 Image Operation Based on Spatial Neighbourhoods


6.3.1 Window-based image smoothing, Low - pass filters
1. Averaging with equal weights

Color

We can also use 5x5, 7x7, etc. This filter is also called a box-car filter.
2. Averaging with different weights

The last filter can be used to remove drop-out lines in Landsat images. This is done by
applying a filter only along the drop-out lines in those images.
3. Median filter
This filter is more useful in removing outliers, random noise, and speckles on RADAR
imagery, than a simple average filter. It has a desirable effect of keeping edges to some
extent. This filter can also be applied to drop-out line removal in some Landsat images.

If we denote an image window in the following form:

The average filter in 1 can be written as

By moving (i, j) all over an image, the original image, I, can be filtered and the new image,
I', can be created.

For 2,

6.3.2 Window-based edge enhancement - High-pass filters


In order to enhance edges, differences between neighbourhood digital numbers are taken.
We will start from one dimensional example:
1111:2222I
<> edge
By taking I(i+1) - I(i) , we get
0 0 0 1 0 0 0 I'
We suppressed all the non-change part and left the edge out and thus an enhancement can
be achieved. We can apply the differencing technique again to I', to get I''
0 0 1 -1 0 0 I"
I" = I'(i+1) - I'(i) = I(i+2) - I(i+1) - I(i+1) + I(i)
= I(i+2) - 2I(i+1) + I(i)
1 -2 1 are the weights
The advantage of using a second order differencing is that we can locate exact position of
the edge at the zero-crossing point.

We call the first differencing, taking a gradient


We can use the matrix

and the second differencing, taking a

1 -2 1 as a Laplacian filter, an edge enhancement filter.


In the two-dimension form, a Laplacian filter is:

Another form can be:

Sobel filter - spatial derivative

6.3.3 Contrast stretching through high-frequency enhancement


This is also called edge-enhancement by subtractive smoothing

Why we don't use

This contrast will not be as good as DN-KDN".


The question is, can we write DN-kDN" in a filter form? The answer is yes.

6.3.4 Linear Line Detection Templates

With 5 x 5 filters we can have more directions ex.

6.4 Morphological Filtering


Morphological filtering is one type of processing in which the spatial form or structure of
objects within an image is modified. Dilation, erosion and skeletonization are three
fundamental morphological operations.
In this section, we first introduce binary image morphological filtering. Two types of
connectivities are defined as following.

6.4.1 Binary image hit or miss transformations


Basic morphological operations, dilation, erosion and many variants can be defined and
implemented by "hit or miss" transformations. A small odd-sized mask is scanned over a
binary image. If the binary-pattern of a mask matches the state of the pixels under the
mask, an output pixel in spatial correspondence to the center pixel of the mask is set to
some desired binary state. Otherwise, the output is set to the opposite binary state.
For example, to perform simple binary noise cleaning, if the isolated 3 x 3 pixel window

is encountered, the "1" in the center will be replaced by a "0". Otherwise, the center pixel
value is not changed.

It is often possible to use simple neighbourhood logical relationships to define the


conditions for a hit. For the simple noise removal case,

where

denotes the logical and i.e., intersection operation and

denotes the union.

For simplicity purpose, we use a local coordinates to represent a pixel window:

Additive operators
The center pixel of a 3 x 3 pixel window are converted by these operators from zero state
to one state when a hit is obtained. The basic operators include
Interior Fill - create one if all four-corrected neighbour pixels are one

Diagonal Fill - create one if this process will eliminate eight-connecting of the
background.

where

Bridge - Create one if this will result in connectivity of previously unconnected


neighbouring ones.

where

and

There are 119 patterns which satisfy the above condition. For example,

Eight-Neighbour Dilate create one if at least one eight-connected neighbour pixel is one.

This is a special case of dilation.


Subtractive operators convert center pixel from one to zero.
Isolated one removal

Spur removal - Erase one with a single eight-connected neighbour

where

H - break - Erase one if it is H-connected

Interior Pixel Removal - Erase one if all 4-connected neighbours are ones

Eight-Neighbour Erode - Erase one if at least one eight-connected neighbour pixel is


zero.

6.4.2 Binary image generalized dilation and erosion


Examples of image set algebraic operations

Generalized Dilation
It is expressed as

where I(i,j) for 1 < i, j < N is a binary-valued image and H(m,n) for 1 < m,n < a, a is an odd
integer called a structuring element. Minkowski addition is defined as

In order to compare I(i,j) with I'(i,j). I(i,j) should be translated to


TQ(I(i,j)) where Q = ((L-1)/2, (L-1)/2)
Generalized erosion is defined as

where H(m,n) is an odd size LxL structuring element. One formula is

Another formula using the reflection of H as a structuring element:

According to the rules defined above, you can observe what it looks like.
Some properties of Dilation and Erosion
I(i,j) = I
Dilation is commutative
I J=J I
But, in general, erosion is not commutative
I J J I
Dilation and erosion are opposite in effect; dilation of a background of an object behaves
like erosion of the object
The following chain rules hold for dilation and erosion
A

(B

C) = (A

B)

(B

C) = (A B)

C
C

6.4.3. Binary image close and open operations


Dilation and erosion are often applied to an image in concatenation. A dilation followed by
an erosion is called a close operation,
I'(i,j) = I(i,j) H(m,n) = [I(i,j) H(m,n) ]
(m,n)
The close operation, also called closing, fills gaps and preserves isolated pixels that have a
binary value of 1.
An erosion followed by a dilation is called an open operation.
I'(i,j) = I(i,j) H(m,n) = [I(i,j)

(m,n)]

H(m,n)

The open operation, also called openning, breaks thin connections and clears isolated
pixels with binary values of 1.
6.4.4 Grey scale image morphological filtering
Applying mathematical morphology to grey scale images is equivalent to finding the
maximum or the minimum of a neighborhood defined by the structuring element. If a 3X3
neighborhood is taken as a structuring element, then dilation is defined as
I'(i,j) = max (I,I0,I1,I2,I3,I4,I5,I6,I7)
and erosion is defined as
I'(i,j) = min (I,I0,I1,I2,I3,I4,I5,I6,I7).
Similarly, closing refers to a dilation followed by an erosion while openning means erosion
followed by dilation. The effect of closing on grey scale images is that small objects
brighter than background are preserved and bright objects with small gaps in between may
become connected. Openning, on the other hand, removes bright objects that are small in
size and breaks narrow connections between two bright objects.
6.5 Image Enhancement in Multispectral Space - Multispectral Transformation
The multispectral or vector nature of most remote sensing data makes it possible for
spectral transformations to generate new sets of image components or bands. The
transformed image may make evident features not discernable in the original data or
alternatively, it might possibly preserve the essential information content of the image with
a reduced number of the transformed dimensions. The last point has significance for the
display of a data in three dimensions on a colour monitor or in colour hardcopy, and for
transmission and storage of data.
6.5.1 Image arithmetic, band ratios and vegetation indices
Addition, subtraction, multiplication, and division of the pixel brightnesses from two bands
of image data form a new image. Multiplication is not as useful as others.

We can plot the pixel values in a two-dimensional space (Figure 6.10.) This twodimensional diagram is called a scatter plot.

Figure 6.10. A scatterplot


A multispectral space is a coordinating system in which each axis represents the grey-level
values of a specific image band.
Ratio
R(i,j) is the ratio for pixel coordinate (i,j)
ak, bk are constants, there are at least one a and one b that are not 0.
nb is the number of bands.
Commonly used ratios are:

Rv - a ratio that tends to enhance vegetation. It is also called a vegetation index.

Ratioing also allows for shade effect suppression.

Figure 6.11. Reflectances of two types of vegetation


Figure 6.11 shows that for a healthy vegetation, the spectral reflectance difference between
the NIR band and the R band is quite high. As the vegetation suffers from stress, the
difference is smaller.
To generate two ratios between SRNIR and SRR, one for the normal vegetation and one
for the vegetation which is under stress, we use the following equations.

SR represents spectral reflectance.


From these ratios, RVN > RVS, we can observe the difference in the conditions of the two
types of vegetation

Vegetation Indices

Normalized Difference Vegetation Index (NDVI)

This is calculated from the raw remote sensing data. We can also calculate the NDVI using
the processed remote sensing data (after converting digital numbers to spectral
reflectances)
To suppress the effect of different soil backgrounds on the NDVI, Huete (1989)
recommended to use a soil-adjusted vegetation index:

The mathematical equivalence of NDVI

Transformed Vegetation Index


TVI = {(DNNIR - DNR)/(DNNIR + DNR)}1/2
Perpendicular Vegetation Index

6.5.2 Principal component transformation


The dimension of the multispectral space constructed by a remotely sensed image is the
number of spectral bands. For example, Landsat MSS image constructs a four dimensional
multispectral space. For Landsat TM image, the multispectral space will have seven
dimensions.

For simplicity purpose, two-dimensional data will be used as examples to illustrate the
procedure of principal component transformation. Without loss of generality, the procedure
can be applied to data in multispectral space of any dimension.
The Covariance Matrix and Correlation Matrix
Two examples will be used to illustrate the usefulness of covariance matrix.
Example 1

Pixel

B1

B2

Xi - M

X1

-2, -0.33

X2

-1, -1.33

X3

1, -1.33

X4

2, -0.33

X5

1, 1.67

X6

-1, 1.67

2.33

Scatter plot for Example 1


Example 2

Pixel

B1

B2

Xi - M

X1

-1.5, 1.5

X2

0.5, -0.5

X3

1.5, 0.5

X4

1.5, 1.5

X5

-0.5, 0.5

X6

-1.5, 0.5

3.5

3.5

Scatter plot of example 2


To calculate the means in vector form

where N is the number of pixels.

For variance-covariance matrix V

Since N is normally very large we can approximate and write

V is an nb x nb symmetric matrix.
The mean vectors and (Xi - M) are as listed in the two example tables.
The covariance matrix for example one is

What are the differences between V1 and V2? We can answer this question by further
examining their corresponding correlation matrices R1 and R2.

From R1, we can see that the correlation between Band 1 and 2 is 0. This means that Band
1 and Band 2 contain independent information about our target. We cannot use B1 to
replace B2.
For R2, the correlation between Band 1 and Band 2 is 0.761, which is quite high. Using
either channel, we can obtain, to a large extent, information about the other channel.

The Principal Component Transformation (PCT)


PCT, or an Hoteling transformation, is the purpose of a principal component
transformation to find a new set of coordinate system in which data can be represented
without correlation as for the case of V1. In other words, can we find a coordinating system
such that V2 can be transformed to a diagonal matrix? The answer can be found in matrix
algebra.
X >Y
transformation
It is recommended that a rotation matrix be used to complete this process. The rotation
matrix is G,
Y = GX .
G can be found by deriving the eigenvalues and eigenvectors from the covariance matrix
Vx. To find eigenvalues we need to solve
| Vx - I | = 0 (1)
Where "I" is an identity matrix. is the eigenvalue vector ( 1, 2, ...., nb)T.
For each non-zero eigenvalue, i, we can find its corresponding eigenvector gi = (gi1, gi2,
...., ginb)T . This can be obtained from
[ Vx - iI ] gi = 0 (2)
The rotation matrix G can then be determined by

As an example, we will find the eigenvalues and eigenvectors for V2 =


To find eigenvalues, we use (1)

Once the transformation is done, the covariance matrix is in the new coordinating system

Now the results can be interpreted using the data in example 2 (Figure 6.12).

Figure 6.12. The new axes derived from the PCT in the original coordinate system.
B'1 and B'2 are the new axes. In this coordinating system, data variance along B'1 is 2.67
while variance on B'2 is only 0.33. This means that in the rotated space, the data variance
along one axis is the same as its corresponding eigenvalue.
From, 2.67 + 0.33 = 1.90 + 1.10 = 3.00, we can see that the rotation will not affect the total
variance of the original data. Using 1.90/3.00 and 1.10/3.00 we can determine the
percentage of total variances that B1 and B2 represent.
B1 represents 1.90/3.00 = 63.3% of the total variance of the original data
B2 represents 1.10/3.00 = 36.7% of the total variance of the original data
The percentages are called loading of each band.
For B'1, it represents 2.67/3.00 = 89% of the total variance while B'2 contains only 11% of
the total variance.
From the loadings of B'1 and B'2, we can see that after the rotation we can add more
loading in one band while reducing the amount of loading in another band. For
multispectral space with nb dimensions, after the principal component transformation, we
will have a few higher loadings for the first few bands and a very low loading for the rest.
We call those bands containing relatively high loadings the principal components. We can,
therefore, make the use of these principal components in our data analysis while ignoring
those relatively minor components. By so doing, we will not lose much of the original data
variability. This serves as a purpose of reducing data dimensionality. It's application in

classification (keeping the maximum variance) and in change detection (keeping the
minimal variance) normally holds the promise.
The PCT is a linear transformation technique which helps to enhance remotely sensed
imagery. Although, principal components are often used, minor components may also be
useful in highlighting information on low data variability that the remote sensing data have.
For example, a few researchers have used the PCT to multi-temporal change detection.
They found that changes in information of a scene are preserved in minor components.
6.5.3 Tasselled Cap Transform (K-T transform)
Different from the PCT which is based on the data covariance matrix, Kauth and Thomas
(1976) have developed a linear transformation which is physically-based on crop growth.

Figure 6.13. A 3-D data scatterplot of the multispectral space constructed by the
green, red and near-infrared bands (Which looks like a tasselled cap.)
The growing cycle of crop started from bare soil, then to green vegetation and then to crop
maturation with crops turning yellow. These different stages of vegetation growth has
made the data distribution in the three dimensional multispectral space (Figure 6.13)
appear in a shape of a tasselled cap.
Kauth and Thomas defined a linear transformation to enhance the data according to the
data structure. They have defined four components called, redness (soil), greenness
(vegetation), yellowness and noise, using the following transformation matrix for Landsat
MSS data

Later, Crist, Cicone and Kauth developed a new transformation technique for Landsat TM
data. (Crist and Kauth, 1986; Crist and Cicone, 1984)
Their new redness or brightness and greenness are defined as:
Redness = 0.3037 TM1 + 0.2793 TM2 + 0.4743 TM3
+ 0.5586 TM4 + 0.5082 TM5 + 0.1863 TM7
Greenness = -0.2848 TM1 - 0.2435 TM2 - 0.5436 TM3
+ 0.7243 TM4 + 0.0840 TM5 - 0.1800 TM7 .
Chapter 6

References
Crist, E.P. and Cicone, R.C., 1984. A physically-based transformation of the Thematic Mapper data - the
Tessled Cap. IEEE Transactions on Geoscience and Remote Sensing. GE-23:256-263
Crist,E.P., and KauthR.J., 1986. The Tessled Cap De-Mystified. Photogrammetric Engineering and Remote
Sensing. 52(1):81-86.
Huete, A.R., 1989. Soil influences in remotely sensed vegetation canopy spectra. In Theory and Applications
of Optical Remote Sensing. Ed. by G. Asrar, John Wiley and Sons: New York.
Kauth, R.J., Thomas, G.S. 1976. The tessled cap - a graphic description of the spectral-temporal development
of agricultural crops as seen by Landsat. Proceedings of the symposium on Machine Processing of Remotely Sensed
Data. Purdue University, West Lafayette, Indiana, pp. 4B41-51.
Pratt, W., 1991. Digital Image Processing. John Wiley and Sons: Toronto.
Richards, J.A., 1987. Digital Image Processing. Springer-Verlag, Berlin.

7. Information Extraction
7.1 Image Interpretation
To derive useful spatial information from images is the task of image interpretation. It
includes
detection: such as search for hot spots in mechanical and electrical facilities and white
spot in x-ray images. This procedure is often used as the first step of image interpretation.
identification: recognition of certain target. A simple example is to identify vegetation
types, soil types, rock types and water bodies. The higher the spatial/spectral resolution of
an image, the more detail we can derive from the image.
delineation: to outline the recognized target for mapping purposes. Identification and
delineation combined together are used to map certain subjects. If the whole image is to be
processed by these two procedures, we call it image classification.
enumeration: to count certain phenomena from the image. This is done based on
detection and identification. For example, in order to estimate household income of the
population, we can count the number of various residential units.
mensuration: to measure the area, the volume, the amount,and the length of certain target
from an image. This often involves all the procedures mentioned above. Simple examples
include measuring the length of a river and the acreage of a specific land-cover class. More
complicated examples include an estimation of timber volume, river discharge, crop
productivity, river basin radiation and evapotranspiration.
In order to do a good job in the image interpretation, and in later digital image analysis,
one has to be familiar with the subject under investigation, the study area and the remote
sensing system available to him. Usually, a combined team consisting of the subject
specialists and the remote sensing image analysis specialists is required for a relatively
large image interpretation task.
Depending on the facilities that an image interpreter has, he might interpret images in raw
form, corrected form or enhanced form. Correction and enhancement are usually done
digitally.
Elements on which image interpretation are based
Image tone, grey level, or multispectral grey-level vector

Human eyes can differentiate over 1000 colors but only about 16 grey levels. Therefore,
colour images are preferred in image interpretation. One difficulty involved is use of
multispectral image with a dimensionality of over 3. In order to make use of all the
information available in each band of image, one has to somehow reduce the image
dimensionality.
Image texture
Spatial variation of image tones. Texture is used as an important clue in image
interpretation. It is very easy for human interpreters to include it in their mental process.
Most texture patterns appear irregular on an image.
Pattern
Regular arrangement of ground objects. Examples are residential area on an aerial
photograph and mountains in regular arrangement on a satellite imagery.
Association
A specific object co-occurring with another object. Some examples of association are an
outdoor swimming pool associated with a recreation center and a playground associated
with a school.
Shadow
Object shadow is very useful when the phenomena under study have vertical variation.
Examples include trees, high buildings, mountains, etc.
Shape
Agricultural fields and human-built structures have regular shapes. These can be used to
identify various target.
Size
Relative size of buildings can tell us about the type of land uses while relative sizes of tree
crowns can tell us about the approximate age of trees.
Site
Broad leaf trees are distributed at lower and warmer valleys while coniferous trees tend to
be distributed on a higher elevation, such as tundra. Location is used in image
interpretation.

Image interpretation strategies


Direct recognition: Identification of targets.
Land-cover classification
(Land cover is the physical evidence of the earth's surface.)
- indirect interpretation
to map something that is not directly observable in the image. This is used to classify land
use types (Gong and Howarth, 1992b). Land-use is the human activities on a piece of land.
It is closely related to land-cover types. For example, a residential land-use type is
composed of roof cover, lawn, trees and paved surfaces.
- from known to unknown
To interpret an area where the interpreter is familiar with first, then interpret the areas
where the interpreter is not familiar with (Chen et al, 1989). This can be assisted by field
observation
- from direct to indirect
In order to obtain forest volume, one might have to determine what is observable from the
image, such as tree canopies, shadows etc. Then the volume can be derived. We can also
estimate the depth of permafrost using the surface cover information (Peddle, 1991).
- Use of collateral information
Census data,and topographical maps and other thematic maps may all be useful during
image interpretation.
More details on the image interpretation can be found in Lillesand and Kiefer (1994) or
Campbell (1987).
7.2 Image Segmentation
Dividing an image into relatively homogeneous regions or blocks.
1. Thresholding - Global operation

Multilevel thresholding

where n is the code of segments


N is the maximum grey level value
e.g.

I(i,j) =
2

when threshold T = 4, the resultant thresholded image is:

Normally T is determined from the histogram of an image as shown in the following


example.

2. Region-growing - local operation


I0

I1

I2

I7

I3

I6

I5

I4

(1) Suppose I as a seed (starting point) is of Label K, then Ii will also belong to K, if |Ii - I|
< .
where

is a small tolerance number, and i = 0, 1, 2, ...., 7.

Create a mean m based on the second point that is assigned to K.

(2) If the second point is not found in the local neighbourhood, then remove the label K
from the seed point I.
(3) If the second point is found, then operate (1) with the second point using m1. If a third
point I is found, a new m2 will be generated based on m1 an Ij.

(4) Gradually growing a local area by using the criterion in (1). If an nth point is found, the
mn-1 is adjusted to the group mean

(5) Repeat (1) to (4) with different seeds and s. Thresholding is faster, however, it is not
adaptive to local properties. e.g. if a neighbourhood is as following
5

For thresholding with a threshold of 4 the results will be


1

while with the region-growing technique, if the seed I = 2 and = 1, 2 will not be assigned
to a segment label because no neighbourhood pixel will meet the criterion in (4).
Image segmentation can also be done using clustering algorithms. Segmentation is usually
used as the first step in image analysis. Once an image is properly segmented, the
following operation can be performed: classification, morphological operation, and image
understanding through knowledge-based or more advanced computation.
7.3 Conventional Multispectral Classification Methods
7.3.1 General procedures in image classification
Classification is the most popularly used information extraction techniques in digital
remote sensing. In image space I, a classification unit is defined as the image segment on
which a classification decision is based. A classification unit could be a pixel, a group of
neighbouring pixels or the whole image. Conventional multispectral classification
techniques perform class assignments based only on the spectral signatures of a
classification unit. Contextual classification refers to the use of spatial, temporal, and
other related information, in addition to the spectral information of a classification unit in
the classification of an image. Usually, it is the pixel that is used as the classification unit.
General image classification procedures include (Gong and Howarth 1990b):

(1) Design image classification scheme: they are usually information classes such as urban,
agriculture, forest areas, etc. Conduct field studies and collect ground infomation and other
ancillary data of the study area.
(2) Preprocessing of the image, including radiometric, atmospheric, geometric and
topographic corrections, image enhancement, and initial image clustering.
(3) Select representative areas on the image and analyze the initial clustering results or
generate training signatures.
(4) Image classification
Supervised mode: using training signature
unsupervised mode: image clustering and cluster grouping
(5) Post-processing: complete geometric correction & filtering and classification
decorating.
(6) Accuracy assessment: compare classification results with field studies.
The following diagram shows the major steps in two types of image classification:
Supervised:

Unsupervised

In order to illustrate the differences between the supervised and unsupervised


classification, we will introduce two concepts: information class and spectral class:
Information class: a class specified by an image analyst. It refers to the information to be
extracted.
Spectral class: a class which includes similar grey-level vectors in the multispectral space.

In an ideal information extraction task, we can directly associate a spectral class in the
multispectral space with an information class. For example, we have in a two dimensional
space three classes: water, vegetation, and concrete surface.

By defining boundaries among the three groups of grey-level vectors in the twodimensional space, we can separate the three classes.
One of the differences between a supervised classification and an unsupervised one is the
ways of associating each spectral class to an information class. For supervised
classification, we first start with specifying an information class on the image. An
algorithm is then used to summarize multispectral information from the specified areas on
the image to form class signatures. This process is called supervised training. For the
unsupervised case,however, an algorithm is first applied to the image and some spectral
classes (also called clusters) are formed. The image analyst then try to assign a spectral
class to the desirable information class.

7.3.2 Supervised classification


Conventional Pixel-Labelling Algorithms in Supervised Classification
A pixel-labelling algorithm is used to assign a pixel to an information class. We can use the
previous diagram to discuss ways of doing this.

From the above diagram, there are two obvious ways of classifying this pixel.

(1) Multidimensional thresholding


As in the above diagram, we define two threshold values along each axis for each class. A
grey-level vector is classified into a class only if it falls between the thresholds of that class
along each axis.
The advantage of this algorithm is its simplicity. The drawback is the difficulty of
including all possible grey-level vectors into the specified class thresholds. It is also
difficult to properly adjust the class thresholds.
(2) Minimum-Distance Classification

Fig. 1 shows spectral curves of two types of ground target: vegetation and soil. If we
sample the spectral reflectance values for the two types of targets (bold-curves) at three
spectral bands: green, red and near-infrared as shown in Fig. 1, we can plot the sampled
values in the three dimensional multispectral space (Fig. 2). The sampled spectral values
become two points in the multispectral space. Similar curves in Fig. 1 will be represented
by closer points in Fig. 2 (two dashed curves in Fig. 1 shown as empty dots in Fig. 2. From

Fig. 2, we can easily see that distance can be used as a similarity measure for classification.
The closer the two points, the more likely they are in the same class.
We can use various types of distance as similarity measures to develop a classifier, i.e.
minimum-distance classifier.
In a minimum-distance classifier, suppose we have nc known class centers
C = {C1, C2, ..., Cnc}, Ci, i = 1, 2, ..., nc is the grey-level vector for class i.

As an example, we show a special case in Fig. 3 where we have 3 classes (nc = 3) and two
spectral bands (nb = 2)

If we have a pixel with a grey-level vector located in the B1-B2 space shown as A (an
empty dot), we are asked to determine to which class it should belong. We can calculate
the distances between A and each of the centers. A is assigned to the class whose center
has the shortest distance to A.
In a general form, an arbitrary pixel with a grey-level vector g = (g1, g2, ..., gnb)T,
is classified as Ci if
d(Ci, g) = min (d(Ci1,g1), d(Ci2,g2), ..., d(Cinb,gnb))
Now, in what form should the distance d take? The most-popularly used form is the
Euclidian distance

The second popularly used distance is Mahalanobis distance

where V-1 is the inverse of the covariance matrix of the data.


If the Mahalanobis distance is used, we call the classifier as a Mahalanobis Classifier.

The simplest distance measure is the city-block distance

For dm and de, because taking their squares will not change the relative magnitude among
distances, in the minimum distance classifiers, we usually use
as the distance
measures so as to save some computations.
Class centers C and the data covariance matrix V are usually determined from training
samples if a supervised classification procedure is used. They can also be obtained from
clustering.
For example, there are ns pixels selected as training sample for class Ci.

where j = 1, 2, ..., nb
k = 1, 2, ..., ns
If there are a total of nt pixels selected as training samples for all the classes

The average vector M = (m1, m2, ..., mns) will be obtained.

i = 1, 2, ..., nb.
k = 1, 2, ..., nt.
The covariance matrix is then obtained through the following vector form

(3) Maximum Likelihood Classification (MLC)


MLC is the most common classification method used for remotely sensed data. MLC is
based on the Baye's rule.
Let C = (C1, C2, ..., Cnc) denote a set of classes, where nc is the total number of classes.
For a given pixel with a grey-level vector x, the probability that x belongs to class ci is
P(Ci|x), i = 1, 2, ..., nc. If P(Ci|x) is known for every class, we can determine into which
class x should be classified. This can be done by comparing P(Ci|x)'s, i = 1, 2, ..., nc.
x => ci, if P(Ci|x) > P(Cj|x) for all j # i. (1)
However, P(Ci|x) is not known directly. Thus, we use Baye's theorem:
P(Ci|x) = p(x|Ci) P(Ci)/P(x)
where
P(Ci) is the probability that Ci occurs in the image. It is called a priori probability.
P(x) is the probability of x occurring in each class ci.

However, P(x) is not needed for the classification purpose because if we compare P(C1|x)
with P(C2|x), we can cancel P(x) from each side. Therefore, p(x|Ci) i = 1, 2, ..., nc are the
conditional probabilities which have to be determined. One solution is through statistical
modelling. This is done by assuming that the conditional probability distribution function
(PDF) is normal (also called, Gaussian distribution). If we can find the PDF for each class
and the a priori probability, the classification problem will be solved. For p(*x|ci) we use
training samples.

For one-dimensional case, we can see from the above figure that by generating training
statistics of two classes, we have their probability distributions. If we use these statistics
directly, it will be difficult because it requires a large amount of computer memory. The
Gaussian normal distribution model can be used to save the memory. The one-dimensional
Gaussian distribution is:

where we only need two parameter for each class i and


i the mean for Ci
the standard deviation of Ci
i,

can be easily generated from training sample.

For higher dimensions,

where nb is the dimension (number of bands)


i is the mean vector of ci

, i = 1, 2, ..., nc

Vi is the covariance matrix of Ci


P(Ci) can also be determined with knowledge about an area. If they are not known, we can
assume that each class has an equal chance of occurrence.
i.e. P(C1) = P(C2) = ... = P(Cnc)
With the knowledge of p(x|Ci) and P(Ci), we can conduct maximum likelihood
classification. p(x|Ci) P(Ci) i = 1, 2, ..., nc can be compared instead of P(Ci|x) in (1).

The interpretation of the maximum likelihood classifier is illustrated in the above figure.
An x is classified according to the maximum p(x|Ci) P(Ci). x1 is classified into C1, x2 is
classified into C2. The class boundary is determined by the point of equal probability.

In two-dimensional space, the class boundary cannot be easily determined. Therefore we


don't use boundaries in maximum likelihood classification and, instead, we compare
probabilities.
Actual implementation of MLC
In order to simplify the computation, we usually take a logarithm of p(x|Ci). P(Ci)

Since - nb/2 log 2 is a constant, the RHS can be simplified to

(2)
Often, we assume P(Ci) is the same for each class. Therefore (2) can be further simplified
to
(3)
g(x) is referred to as the discriminant function.
By comparing g(x)'s, we can assign x to the proper class.
With the maximum likelihood classifier, it is guaranteed that the error of misclassification
is minimal if p(x|Ci) is normally distributed.
Unfortunately, the normal distribution cannot always be achieved. In order to make the
best use of the MLC method, one has to make sure that his training sample will generate
distributions as close to the normal distribution as possible.
How large should one's training sample be? Usually, one needs 10 x nb, preferably 100 x
nb, pixels in each class (Swain and Davis, 1978).
MLC is relatively robust but it has the limitation when handling data at nominal or ordinal
scales. The computational cost increases considerably as the image dimensionality
increases.
7.3.3 Clustering algorithms
For images that the user has little knowledge on the number and the spectral properties of
spectral classes, clustering is a useful tool to determine inherent data structures. Clustering
in remote sensing is the process of automatic grouping of pixels with similar spectral
characteristics.
Clustering measures - measures how similar two pixels are. The similarity is based on:

(1) Euclidean distance dE(x1, x2)


(2) City-block distance dc(x1, x2)
Clustering criteria - determines how well the clustering results are
Sum of Squared Error (SSE)

Clustering algorithm 1: Moving cluster means


K-means clustering (also called c-means clustering)
1. Select K points in the multispectral space as candidate clustering centres
Let these points be

Although m can be arbitrarily selected, it is suggested that they be selected evenly in the
multispectral space. For example, they can be selected along the diagonal axis going
through the origin of the multispectral space.
2. Assign each pixel x in the image to the closest cluster centre m
3. Generate a new set of cluster centers based on the processed result in 2.

n is the number of iterations of step 2.


4. If
=

,(a small tolerance), the procedure is terminated. Otherwise let


and return to step 2 to continue.

Clustering Algorithm 2

ISODATA - Iterative Self Organizing Data Analysis Technique A


Based on the K-means algorithm, ISODATA adds two additional steps to optimize the
clustering process.
1. Merging and deletion of clusters
At a suitable stage, e.g. after a number of iterations of steps 2 - 4 in the K-means algorithm,
i = 1, 2, ..., nc are examined.
all the clusters
If the number of pixels in a particular cluster is too small, then that particular cluster is
deleted.
If two clusters are too close, then they are merged into one cluster.
2. Splitting a cluster
If the variance of a cluster is too large, that cluster can be divided into two clusters.
These two steps increase the adaptivity of the algorithm but also increase the complexity of
computation. Compared to K-means, ISODATA requires more specification of parameters
for deletion and merging and a variance limit for splitting. Variance has to be calculated for
each cluster.
In the K-means algorithm, clustering may not be realized, i.e., the clustering is not
converging. Therefore, we might have to specify the number of iterations to terminate a
clustering process.
Clustering Algorithm 3: Hyrarchical clustering.
This algorithm does not require an image analyst to specify the number of classes
beforehand. It assumes that all pixels are individual clusters and systematically merges
clusters by checking distances between means. This process is continued until all pixels are
in one cluster. The history of merging (fusion) is recorded and they are displayed on a
dendrogram, which is a diagram that shows at what distances the centers of particular
clusters are merged. The following figure shows an example of this algorithm.

This procedure is rarely used in remote sensing because a relatively large number of pixels
in the initial cluster centers requires a huge amount of disk storage in order to keep track of
cluster distances at various levels. However, this algorithm can be used when a smaller
number of clusters is obtained previously from some other methods.
Clustering Algorithm 4: Histogram-based clustering.
Histogram in high dimensional space
H(V) is the occurrence frequency of
grey-level vector V. The algorithm is to find peaks in the multi- dimensional

histogram:
(1) Construct a multi-dimensional histogram
(2) Search for peaks in the multispectral space using an eight-neighbour comparison
strategy to see if the center frequency is the highest in a 3 x 3 grey-level vector
neighbourhood. For three dimensional space, search the peak in a neighbourhood.

(3) If a local highest frequency grey-level vector is found, it is recorded as a cluster center.

(4) After all centers are found, they are examined according to the distance between each
pair of clusters. Certain clusters can be merged if they are close together. If a cluster center
has a low frequency it can be deleted.
The disadvantage of this algorithm is that it requires a large amount of memory space
(RAM). For an 8-bit image, we require 256 x 4 bytes to store frequencies (each frequency
is a 40 byte integer) if the image has only one band. As the dimensionality becomes higher,
we need x 4 bytes of memory. When NB = 3, it requires 64 MB (256nb). Nevertheless, this
limit could partly be overcome by a grey-level vector reduction algorithm (Gong and
Howarth, 1992a).
7.3.4 Accuracy assessment
Accuracy assessment of remote sensing product
The process from remote sensing data to cartographic product can be summarized as
following:

The reference that the remote sensing products are to be compared with is created based on
human generalization. Depending on the scale of the reference map product, linear features
and object boundaries are allowed to have a buffer zone. As long as the boundaries fall in
their respective buffer zones, they are considered correct.
However, this has not been the case in assessing remote sensing products. In the evaluation
of remote sensing products, we have traditionally adopted a hit-or-miss approach, i.e., by
overlaying the reference map on top of the map product obtained from remote sensing,
instead of giving the RS products tolerant buffers.

Some of the classification accuracy assessment algorithms can be found in Rosenfield and
Fitz patrick-lins (1986) and Story and Congalton (1986)
In the evaluation of classification errors, a classification error matrix is typically formed.
This matrix is sometimes called confusion matrix or contingency table. In this table,
classification is given as rows and verification (ground truth) is given as columns for each
sample point.

The above table is an example confusion matrix. The diagonal elements in this matrix
indicate numbers of sample for which the classification results agree with the reference
data.
The matrix contain the complete information on the categorical accuracy. Off diagonal
elements in each row present the numbers of sample that has been misclassified by the
classifier, i.e., the classifier is committing a label to those samples which actually belong to
other labels. The misclassification error is called commission error.
The off-diagonal elements in each column are those samples being omitted by the
classifier. Therefore, the misclassification error is also called omission error.

In order to summarize the classification results, the most commonly used accuracy measure
is the overall accuracy:

From the example of confusion matrix, we can obtain

= (28 + 15 + 20)/100 = 63%.

More specific measures are needed because the overall accuracy does not indicate how the
accuracy is distributed across the individual categories. The categories could, and
frequently do, exhibit drastically differing accuracies but overall accuracy method
considers these categories as having equivalent or similar accuracies.
By examining the confusion matrix, it can be seen that at least two methods can be used to
determine individual category accuracies.
(1) The ratio between the number of correctly classified and the row total
(2) The ratio between the number of correctly classified and the column total
(1) is called the user's accuracy because users are concerned about what percentage of the
classes has been correctly classified.
(2) is called the producer's accuracy.
The producer is more interested in (2) because it tells how correctly the reference samples
are classified.
However, there is a more appropriate way of presenting the individual classification
accuracies. This is through the use of commission error and omission error.
Commission error = 1 - user's accuracy
Omission error = 1 - producer's accuracy
Kappa coefficient
The Kappa coefficient (K) measures the relationship between beyond chance agreement
and expected disagreement. This measure uses all elements in the matrix and not just the

diagonal ones. The estimate of Kappa is the proportion of agreement after chance
agreement is removed from consideration:
= (po - pc)/(1 - pc)
po = proportion of units which agree, =Spii = overall accuracy
pc = proportion of units for expected chance agreement =Spi+ p+i
pij = eij/NT
pi+ = row subtotal of pij for row i
p+i = column subtotal of pij for column i

po = 0.63

One of the advantages of using this method is that we can statistically compare two
classification products. For example, two classification maps can be made using different
algorithms and we can use the same reference data to verify them. Two s can be derived,
1,
2. For each , the variance
can also be calculated.
It has been suggested that a z-score be calculated by

A normal distribution table can be used to determine where the two


different from Z.

s are significantly

e.g. if Z> 1.96, then the difference is said to be significant at the 0.95 probability level.

can be estimated by using the following equation:

Given the above procedures, we need to know how many samples are need to be collected
and where they should be placed.
Sample size
(1) The larger the sample size, more representative an estimate can be obtained, therefore,
more confidence can be achieved.
(2) In order to give each class a proper evaluation, a minimum sample size should be
applied to every class.
(3) Researchers have proposed a number of pixel sampling schemes (e.g., Jensen, 1983).
These are:
Random
Stratified Random
Systematic
Stratified Systematic Unaligned Sampling
7.4 Non-Conventional Classification Algorithms
1. By conventional classification, we refer to the algorithms which make the use of only
multi-spectral information in the classification process.
2.

3. The problem with multi-spectral classification is that no spatial information on the image
has been utilized. In fact, that is the difference between human interpretation and
computer-assisted image classification. Human interpretation always involves the use of
spatial information such as texture, shape, shade, size, site, association etc. While the
strength of computer technique lag is on the handling of the grey-level values in the image,
in terms of making use of spatial information, computer technique lag for behind.
Therefore, it is an active field in image understanding (which is a subfield of pattern
recognition, or artificial intelligence to make use of spatial patterns in an image).
We can summarize three general types of non-conventional classification:
Preprocessing approach,
Post processing approach, and
Use of contextual classifier.
Diagram 1 shows the procedures involved in a preprocessing method. The indispensable
part of a preprocessing classification method is the involvement of spatial-feature
extraction procedures.
Thanks to the development in the image understanding field, we are able to use part of
the spatial information in image classification. Overall, there are two types of approaches
to make use of spatial information.
- Region-based classification (object-based)
- Pixel window-based classification

Object-based classification
In order to classify objects, one has to somehow partition the original imagery. This can be
done with image segmentation techniques that have been introduced previously, such as
thresholding, region-growing and clustering.
The resultant segmented image can then be passed on to the region extraction procedure,
where segments are treated as a whole object for the successive processing.
For instance, we can generate a table for each object as an entity table. From the entity
table, we can proceed with various algorithms to complete classification, or prior to
classification, we may do some preprocessing, such as filtering out some small objects.

We may have to base our classification decision on some neighbourhood information.


Gong and Howarth (1990) have developed a knowledge-based system to conduct a regionbased (object-based) classification.
4. Pixel-window based classification
In a pixel-window based classification, a labelling decision is made for one pixel according
to the multi-spectral data. This data contains information on not only the pixel but also its
neighbourhood.
A pixel window can be of any size, as long as it does not exceed the size of an image. For
computational simplicity, however, odd-sized squares are used.

The grey-level variability within a pixel window can be measured and used in a
classification algorithm. The grey-level variability is referred to as texture (Haralick,
1979). The following is some commonly-used texture measures:
(1) Simple statistics transformation
For each pixel-window, we can calculate parameters as in Table 1 (Hsu, 1978; Gong and
Howarth, 1993).
TABLE 7.4. STATISTICAL MEASURES USED FOR SPATIAL FEATURE
EXTRACTION
Feature
Full Name
Mathematical Description
Code
AVE

Average

STD

Standard Deviation

SKW

Skewness

KRT

Kurtosis

ADA

Absolute Deviation from the


Average

CCN

ACN
CAN

Contrast Between the


Center Pixel and its
Neighbors
Average Difference
Between the Center Pixel
and its Neighbors
Contrast Between Adjacent
Neighbors

CAS

Sum of the Squared CAN

CSN

Contrast Between the


Second Neighbors

CSS

Sum of the Squared CSN

RXN

Range

MED

Median

______________________________________________________________________
Pixel value at location
Value for the center pixel
Values for a pair of adjacent pixels
Values for a pair of every second neighbors
Number of pixels in the window
Number of pairs of adjacent neighbors
Number of pairs of every second neighbor

(2) Grey-level co-occurrence matrix method (used to characterize textures)

The matrix is determined by enumerating all possible combination of two grey-levels of


pairs of pixels in a pixel window. These pixel-pairs are defined by their distance (D) and
angle (a).
From the grey-level co-occurrence matrix, one can generate a number of parameters.
(Haralick et al 1973): These include,
Homogeneity
Contrast
Entropy, etc.
Although these methods have been used in many remote sensing applications, they require
a large amount of computation and disk space. There are so many parameters that need to
be determined, such as size of pixel-window, distance, angle, statistics, etc.
Most of these spatial features can be categorized into two groups. The first group of spatial
features is similar to an average filtered image. The second group is similar to an edgeenhanced image.
The simplest example for the post-processing contextual classification is through filtering
such as majority filtering.
(3) Majority filter

(4) Grey-level vector reduction and frequency-based classification.


After testing a number of pixel-window based contextual classification algorithms, Gong
and Howarth (1992a) found that most of these algorithms either required too much
computation, or did not significantly improve classification accuracies when they were
applied to the classification of SPOT HRV XS data acquired over an urban area. They
developed a procedure called a grey-level vector reduction and a frequency-based
classification which was tested using the same SPOT data set and some other data sets,
such as TM data and CASI (7.5 m x 7.5 m spatial resolution) data. The results proved that
the frequency-based classification method could save a significant amount of computation
while achieving high classification accuracies.
Chapter 7

References
Chen, Q., and others, 1989. Remote Sensing and Image Interpretation. Higher Education Press, Beijing, China, (In
Chinese).
Gong P. and P.J. Howarth, 1990a. Land cover to land use conversion: a knowledge-based approach, Technical
Papers, Annual Conference of American Society of Photogrammetry and Remote Sensing, Denver, Colorado, Vol. 4,
pp.447-456.
_____, 1990b. An assessment of some factors influencing multispectral land-cover classification,
Photogrammetric Engineering and Remote Sensing, 56(5):597-603.
_____, 1990c. Impreciseness in land-cover classification: its determination, representation and application. The
International Geoscience and Remote Sensing Symposium, IGARSS '90, pp. 929-932.
_____, 1992a. Frequency-based contextual classification and grey-level vector reduction for land-use
identification. Photogrametric Engineering and Remote Sensing, 58(4):421-437.
_____, 1992b. Land-use classification of SPOT HRV data using a cover-frequency method. International Journal
of Remote Sensing, .
_____, 1993. An assessment of some small window-based spatial features for use in land-cover classification,
IGARSS'93, Tokyo, August 18-22, 1993.
Gonzalez, R. C., and P. Wintz, 1987. Digital Image Processing, 2nd. Ed., Addison-Wesley Publishing Company,
Reading, Mass.
Haralick, R. M., 1979. Statistical and structural approaches to texture. Proceedings of the IEEE, 67(5):786-804.
Haralick, R. M., Shanmugan, K. and Dinstein, I., 1973. Texture features for image classification. IEEE
Transactions on System, Man and Cybernetics, SMC-3(6):610-621.

Hsu, S., 1978. Texture-tone analysis for automated landuse mapping. Photogrammetric Engineering and Remote
Sensing, 44(11):1393-1404.
Jensen, J.R., 1983. Urban/Suburban Land Use Analysis. In R.N. Colwell (editor-in-chief), Manual of Remote
Sensing, Second Edition, American Society of Photogrammetry, Falls Church, USA, pp. 1571-1666.
Lillesand, T. M., and R. W. Kiefer, 1994. Remote Sensing and Image Interpretation. 3rd Edition, John Wiley and
Sons, New York.
Peddle, D., 1991. Unpublished Masters Thesis, Department of Geography, The University of Calgary.
Richards, J. A., 1986. Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag, Berlin.
Rosenfield, G. H., and K. Fitzpatrick-Lins, 1986. A coefficient of agreement as a measure of thematic
classification accuracy. Photogrammetric Engineering and Remote Sensing, 52(2):223-227.
Story, M. and R. G. Congalton, 1986. Accuracy assessment, a user's perspective. Photogrammetric Engineering
and Remote Sensing, 52(3):397-399.
Swain, P. H., and S. M. Davis (editors.), 1978. Remote Sensing: The Quantitative Approach. McGraw-Hill, New
York.
Yen, J., 1989. Gertis: a Dempster-Shafer approach to diagosing hierarchical hypotheses. Communications of the
ACM. 32(5):573-585.

Further Readings
Ball, G. H., and J. D. Hall, 1967. A clustering technique for summarizing multivariate data. Behavioral Science,
12:153-155.
Bezdek, J.C., R. Ehrlich & W. Fall, 1984, FCM: the fuzzy c-means clustering algorithm, Computers and
Geoscience, 10:191-203.
Bishop, Y. M. M., S. E. Feinberg, and P. W. Holland, 1975. Discrete Multivariate Analysis - Theory and Practice.
The MIT Press, Cambridge, Mass.
Chittineni, C. B., 1981. Utilization of spectral-spatial information in the classification of imagery data.
Computer Graphics and Image Processing, 16:305-340.
Cibula, W. G., M. O. Nyquist, 1987, Use of topographic and climatological models in geographical data base
to improve Landsat MSS classification for Olympic national park. Photogrammetric Engineering and Remote
Sensing, 53(1):67-76.
Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, Vol.
20, No. 1, pp. 37-46.
Congalton, R. G., and R. A. Mead, 1983. A quantitative method to test for consistency and correctness in
photointerpretation. Photogrammetric Engineering and Remote Sensing, 49(1):69-74.

Conners, R. W., and C. A. Harlow, 1980. A theoretical comparison of texture algorithms. IEEE Transactions on
Pattern Analysis and Machine Intelligence, PAMI-2(3): 204-222.
Fleiss, J. L., J. Cohen, and B. S. Everitt, 1969. Large sample standard errors of Kappa and weighted Kappa.
Psychological Bulletin, Vol. 72, No. 5, pp. 323-327.
Fu, K. S and Yu, T. S., 1980. Spatial Pattern Classification Using Contextual Information, Research Studies Press,
Chichester, England.
Fung, T., and E. F. LeDrew, 1987. Land cover change detection with Thematic Mapper spectral/textural data
at the rural-urban fringe. Proceedings of 21st Symposium on Remote Sensing of Environment, Ann Arbor, Mi., Vol. 2,
pp.783-789.
_____, 1988. The determination of optimal threshold levels for change detection using various accuracy
indices. Photogrammetric Engineering and Remote Sensing, 54(10):1449-1454.
Gong, P., D. Marceau, and P. J. Howarth, 1992. A comparison of spatial feature extraction algorithms for
land-use mapping with SPOT HRV data. Remote Sensing of Environment. 40:137-151.
Gong, P., J. R. Miller, J. Freemantle, and B. Chen, 1991. Spectral decomposition of Landsat TM data for urban
land-cover mapping, 14th Canadian Symposium on Remote Sensing, pp.458-461.
Ketting, R. J., and Landgrebe, D. A., 1976. Classification of multispectral image data by extraction and
classification of homogeneous objects. IEEE Transactions on Geoscience and Electronics, GE-14(1):19-26.
Landgrebe, D. A. and E. Malaret, 1986. Noise in remote sensing systems: the effects on classification error.
IEEE Transactions on Geoscience and Remote Sensing, GE-24(2):

8. Integrated Analysis of Multisource Data


8.1 Introduction to Multi-Source Spatial Data
1. Spatial Data
Any data with a locational aspect associated are spatial data. In real life, we often ask the question of
where. Where is the bus stop? Where is the post office? To know where is a major part of human life.
In our computerized information society, most of the questions of where can be answered in a
computer system. However, we are not satisfied with knowing where about something, we may need
to know how things at a specific location are related. We want to use what is known to infer those
unknown aspects, those unknown locations.
From highly civilized urban areas to areas where human kind is sparsely populated, spatial data play
an important role in our modern society. In this chapter, we will focus on our natural environment
where different types of natural resources, land covers and uses, and accessibility often concern us. To
find out what it is about a particular location, one would read a map. As a surveyor or cartographer, it
is our job to make such a map. Traditionally, one has go to the field (a particular place) to measure the
location, and record what exists. This is the traditional survey and map approach. A second approach
is to use aerial photography and remote sensing techniques, these are the techniques developed since
World War I. As the technology advances, we observe a revolutionary leap in instruments and
associated data processing techniques. Now satellite based technology is occupying an important
position in the geomatics field. We begin with asking the following questions:
What are spatial data?
What are the general approaches to spatial data collection?
What is the current status of spatial data acquisition technology?
What will be achieved in the near future in spatial data acquisition?
_________
Image, is a medium for communication. High Resolution TV is the tool for communication.
Computer provides us the processing power.
Telecommunication is the tool.
Think about FAX machines and modem.

2. Spatial Data Collection


How spatial data are collected?

First hand
Second hand - digitizing from maps.
To know how spatial data are collected, helps us to appreciate the possible level of errors or
uncertainties involved in the data collection process.
In what forms are spatial data collected? How is spatial sampling done?
Random collection
Systematic sampling or complete coverage
Other hybrids of the first two
One needs to determine the density of sampling, obviously, the denser one collects data, the more
likely one would represent the reality.
The density of sampling is a function of a number of factors,
(1) the complexity of the phenomena,
(2) the capability of the measuring tools
(3) the available accuracy requirement
(4) economic considerations
At most of the time, we tend to use second-hand spatial data, i.e., currently available data and they are
often in map forms.
How are maps made?
For thematic maps,
(1) Manually Base map preparation
Thematic data transfer (from survey, aerial photographs remote sensing images) on to the base map
Interpolation or extrapolation may be needed
Classification, generalization, symbolization and decoration
Layer separation and printing

(2) Computer-assisted Base map preparation


Geometric transformation (include interpolation)
Data conversion, classification, generalization
Legend design and decoration
Printing
For base maps,
Select proper map projection
- preserving area
- preserving length
- preserving direction
Data transfer
Interpolation or extrapolation
Generalization, symbolization and decoration
Printing

Reference: Robinson, Elements of Cartography


What is a thematic map?
What is a base map?
What is a reference map?
What is a topographic map?

3. Types of Spatial Data

According to geometrical properties


Positional

Linear
Areal
Volumetric
According to thematic entities
Natural resources, forest, geological, lithological, agricultural, climatic
Man-made
Municipal
Cadastral, etc.

4. Scale of Measurement In Spatial Data


Nominal
Ordinal
Interval
Ratio

5. Multi-Source Data Analysis


Map overlay, for a particular location, collects all the necessary data so as to derive useful
information.
Similar to decision making in ordinary life, one needs to accumulate evidences in order to arrive a
decision, in multi-source data analysis, each piece of evidence recorded in the data will be evaluated to
validate certain hypothesis.
It is the objective of this chapter to examine a number of schemes for integrated analysis of spatial
data. Algorithms developed in pattern recognition and artificial intelligence can be used.

8.3 Integrated Analysis of Multi-Source Data

In daily life, we use our sensing organs and brain to recognize things and then make decisions and
take actions. Our sensing organs include eyes, ears, nose, tongue and skin touch. The first three are our
remote sensors. Our sensors pass scene, sound, smell, tastes and feeling to our brains, our brains

process the evidences collected by different sensors and analyze them and then compare with things in
our memory that have been recognized before to see if based on the data collected we can recognize
(label) the newly detected thing as one of the things which has been recognized before. If the
recognized thing is a tree in our way, our brain may decide to go around it. In an increasingly
competitive society, in order to make optimized decisions, we have to make best use of all the
evidences that are available to arrive an accurate recognition. In our daily life, we experience
thousands of processes like this, evidence collection - evidence analysis - decision making - action
taking. For example, our eyes cannot resolve details either from too far away or due to their sizes
being too small. This has been made possible with the help of a telescope and a macroscope.

We cannot see in the spectral ranges outside the visible spectral wavelength region, various detectors
sensitive to different non-visible regions can record images for us to see as if our eyes were sensitive
to those spectral regions. In spatial data handling, our brains cannot memorize exactly the location and
spatial extent that certain phenomenon occupies, electro-magnetic media can be used to do so. The
evidence volume is so large that our brain can only process a very small amount of it. Therefore, we
need to use computers to assist us to do so. In this chapter, we examine some of the techniques that
can be used in computer assisted handling of various spatial evidence, especially integrated analysis of
spatial evidence from multiple sources, such as from field survey, remote sensing and/or existing map
sources.
Data integration: integrate spatial data from different sources for a single application. What types of
application are we referring to?
One problem in data integration is:
incompatibility between spatial data sets, in the following aspects:
data structures
data types
spatial resolutions
levels of generalization
- Data structures Raster vs.Vector
Discrepancies in concepts of spatial representation
cell object
Location (i, j) {(xi, yi)}
Entity/Attribute Incomplete/Broken Complete

Being Represented
Ease of representing Discrete
continuous
Phenomena Phenomena
More flexible
Level of Generalization Low High
Communication Hard Easy
Storage Large amount Less
_________
Is overlay of digital files a data integration method?
Yes, a very preliminary one. Given two data sets A = {(x, y) : z}
B = {(x, y) : u} AUB = {(x, y) : z, u}
It is more or less a data accumulation.
Five types of models,
PM Point Model : (x, y, z, ...)
GM Grid Model : (i, j, z, ...) i, j
LM Line Model : ({x, y}, z, ...)
AM Area Model : ({x, y}, z, ...)
CM Cartographic Model: Traditional meaning {PM, LM, AM}
Now {PM, GM, LM, AM}
_________
* An important extension, 3rd spatial dimension and the temporal dimension.
* Discussion:

Do PM, LM, AM involve scale as their individual components?


No, data acquisition error and processing error are involved.
Only CM involves scale.
* Scale, generalization, error and uncertainty are so much interrelated that deserve some conceptual
clarification.
Models for converting different data models
Aggregation
(1) Point Surface Interpolation
(2) Grid Larger Majority rule; composite rule based on statistics
From low generalization level to higher levels
(3) Line Simplified line
(4) Area Simplified area, or point
_________

Comment on why do we need (3)?


Disaggregation
Boundaries Probability surfaces

_________
(1) Mark and Csillag's model (1989)
Homogeneity is broken only at the boundaries.

(2) Goodchild et al. (1992) spatial autoregressive model

e = {ei} ei random number obey (0, S2)


X' = r W X + e
X' = {x'i} x'i
X = {xi} xi

R
{0, 1} or {A, B, ..., }

r is a spatial dependence factor,


W is an N x N weight matrix of interactions between pixels
_________
The problem is that do we need disaggregate our data? What is the uncertainties involved in the
disaggregation process?

8.4 A Review of Probability Theory

Let W denote a finite collection of mutually exclusive statements about the world. By e = 2W we
denote the set of all events. An empty set f, a subset of every set by definition is called the impossible
event, since the outcome of a random selection can never be an element of f. On the other hand, the set
W itself always contains the actual outcome, therefore it is called the certain event. If A and B are
events, then so are the union of A and B, AB, and the complements of A( ) and
B( ) , respectively. For example, the event AB occurs if and only if A occurs or B occurs. We call the
pair (W, e) the sample space. Define a function P: e [0, 1] to be a probability if it could be induced
in the way described, i.e. if it satisfies the following conditions which are well known as the
Kolmogorov axioms,
(1) P(A) 0 for all A

(2) P(W) = 1
(3) For A, B

W , from AB = f follows

P(AB) = P(A) + P(B)


P(A) or P(B) is known as the prior probability of A or B occurring. The prior probability of an event is
not conditioned on the occurrence of any other event. Suppose it is noted in an experiment that for 1 i

n, the event Ai occurred ki times. Then under the conventional evaluation, called maximum
likelihood evaluation:
Pm(Ai) =
but under an alternative evaluation, called the Bayesian evaluation:
Pb(Ai) =
under this evaluation, we implicitly assume that each event has already occurred once even before the
experiment commenced. When Ski ,
Pm(Ai) = Pb(Ai)
Nevertheless,
0 < Pb(Ai) < 1 .
Let P(AB) denote the probability of even A occurring conditioned on event B having already
occurred. P(AB) is known as the posterior probability of A subject to B, or the conditional probability
of A given B.
For a single event A, A

W the following hold

P(A) 1
P( ) = 1 - P(A)
For A

B and A, B

W, P(A) P(B) (monotonicity)

P(AB) P(A) + P(B), (subadditivity)


P(B ) = P(B) - P(A), (subtractivity)
P(AB) = P(A) + P(B) - P(AB)
Finally, for a number of events {Aii=1, ..., n }

= S1 - S2 + S3 - S4 + ... + (-1)n-1 Sn
where S1 =

S2 =
S3 =
Sn = P(A1, A2, ..., An)
For conditional probability, P(AB), A, B

W and P(B) > 0, define,

P(AB) = .
We then have

P(A1A2 ... An) =

= P(A1) P(A2A1) P(A3A1A2) ...


where Ai

If Ai (e, W) for i = 1, ..., n , and


AiAj = f , for i _ j and Ai = W
and P(Ai) > 0 the for any given B (e, W)
P(B) =
_________
This is the complete probability for event B. Therefore the conditional probability can be written as
P(AiB) =
This is the Bayes formula. A number of different versions of this formula will be discussed.
As P( ) = 1 - P(A) and P(AB) = ,
it can be derived that
P( B) = 1 - P(AB)
Definition: Prior odds on event A is
O(A) = .

Since P( ) = 1 - P(A) , O(A) =


Therefore P(A) can be represented by its prior odds:
P(A) =
Definition: Posterior odds on event A conditioned on event B is
O(AB) = .
Similarly,
O(AB) = and thus
P(AB) =
Assume event A is a hypothesis h and event B is a piece of evidence e, with the definition of
conditional probability, the following hold:
P(h ) = ,
P( ) = ,
P(he) = , and
P( e) =
The odds on h conditioned on e being absent is obtained by:
O(h ) = =
= O(h)
This is called an odds likelihood formulation of the Bayes theorem. Depending on the context, the
following expressions can be used synonymously: e does not occur, e is absent, e does not exist and e
is false.
Similarly,
O(he) = O(h) .
This called an odds likelihood formulation of the Bayes theorem. The following expressions can be
used synonymously: e occurs, e is present, e exists and e is true.
For a hypothesis, supported by multiple pieces of evidences, by generalizing the above, we have

O(h 1 2 ... kek+1 ... em) = O(h)


when all evidences are mutually independent on h and ,
O(h 1 2 ... kek+1 ... em) = O(h)

8.5 Application Of The Probability Theory

In logical expression, when e implies h, that is, e h, can be alternatively read as 'e is sufficient for h'
or as 'h is necessary for e'. There is no ambiguity between e and h, i.e., the reliability is 100%.
However, in reality, the reliability of e in support of h is lower than a logical implication.

8.5.1. Necessity and Sufficiency Measures

An evidence e can usually be in two states: absent or present when P(e) = 0 or P(e) = 1, it is of no
practical interest. Either way there is nothing to observe. For h, it is the same. Therefore, we shall
assume 0 < P(e) < 1 and 0 < P(h) < 1.
To study the necessity and sufficiency measures of e for h, we need to explore the influence that a
state of e has on h. If the state of e makes h more plausible, we say that the state of e encourages h. If it
makes h less plausible, we say that the state of e discourages h. If it neither encourages nor
discourages h, then the state of e has no influence on h, or e and h are independent of each other.
For the necessity measure, we first explore how the absence of e influences h. From O(h ) = O(h)
we define N =
0N
Similarly, we have
For S = P(he) = 1 , e h \ e is sufficient for h
1 < S < P(he) > P(h) , e encourages h
S = 1 No influence
0 < S < 1 P(he) < P(h) , e discourages h
S = 0 P(he) = 0 , e , e \ e is sufficient for .
From the above analysis, it is clear that N and S are the measures for necessity and sufficiency,
respectively. N, S and O(h) needed to evaluate O(h ) and O(he) are provided by domain experts.
Quite often, instead of directly supply N and S domain experts may supply values of P(eh) and P(e ).
This implies that observing evidential probabilities under a certain hypothesis h or .

N= =
S= .

8.5.2. Posterior Probability Estimation


In the above section, it has been explained that in order to determine the necessity and sufficiency
measures N and S. The posterior probabilities such as P(eh) and P(e ) are provided by domain
experts. Sometimes, the system engineer may have to participate in the process of determining P(eh)
and P(e ) as will be explained in later part of this lecture (e.g., classification of land-use/cover types
from remotely sensed images).
In spatial handling, domain experts may provide us the spatial data required or we are requested to
collect further data from sources such as remote sensing images. Domain experts may also provide us
their knowledge on where a specific hypothesis has been validated. It might be our responsibility to
transform this type of knowledge into a computer system. The processes of collecting and encoding of
expert knowledge is called knowledge acquisition and knowledge representation, respectively. While
various complex computer structures for knowledge representation may be used, relatively simple
procedures such as use of parametric statistical models or non-parametric look-up tables are often
used. For the parametric method, a further readings is Richards (1986). For the non-parametric
approach, refer to Duda and Hart (1973). Remote sensing image classification can be considered as a
process of hypothesis test in which remotely sensed data are treated as evidences and a number of
classes represent a list of hypotheses. In remote sensing image classification the equivalent of
processes of knowledge acquisition and representation is supervised training (Gong and Howarth,
1990; and Gong and Dunlop, 1991).

8.5.3. Maximum Likelihood Decision Rule Based on Penalty Functions


In a classification problem, we are given a data set X = { xi i = 1, 2, ... , N } xi being a vector is
considered as a piece of evidence. It may support a number of classes (hypotheses) H = {hjj = 1, 2, ...
, M } . To develop the general method for maximum likelihood classification, the penalty function or
the loss function is introduced:
l(jk) , j, k = 1, ... , M .
This is a measure of the loss or penalty incurred when a piece of evidence is supporting class hj when
in fact it should support class hk. It is reasonable to assume that l(jj) = 0 for all j. This implies that
there is no loss for an evidence supporting the correct class. For a particular piece of evidence xi, the
penalty incurred as xi being erroneously supporting hj is:
l(jk) p(hkxi)
where p(hkxi) is as before the posterior probability that hk is the correct class for evidence xi.
Averaging the penalty over all possible hypotheses, we have the average penalty, called the
conditional average loss, associated with f evidence xi erroneously support class hj. That is:

L(hj) =
L is a measure of the accumulated penalty incurred given the evidence could have supported any of the
available classes and the penalty functions relating all these classes to class hj.
Thus a useful decision rule for evaluating a piece of evidence for support of a class is to choose that
class for which the average loss is the smallest, i.e.,
xi encourages hj, if L(hj) < L(hk) for all
This k_ j is the algorithm that implemented Bayes' rule. Because p(hkxi) is usually not available, it is
evaluated by p(xihk), p(hk) and p(xi)
p(hkxi) =
Thus
L(hj) =
l(jk)'s can be defined by domain experts.
A special case for l(jk)s is given as follows:
Suppose l(jk) = 1 - Fjk with Fjj = 1 and Fjk to be defined. Then from the above formula we have
L(hj) = =1The minimum penalty decision rule has become searching the maximum for g(hj) which is
.
Thus the decision rule is
xi encourages hj, if g(hj) > g(hk) fall all k_ j
If Fjk = djk , the delta function, i.e.,
djk =
g(hj) is further simplified to
and thus the decision rule becomes
xi encourages hj if p(xihj)p(hj) > p(xihk)p(hk) for all j_k.

This is the commonly-used maximum likelihood decision rule.

8.6 Introduction To Fuzzy Set Theory

Fuzzy set is a "class" with a continuum of grades of membership (Zadeh, 1965). More often than not,
the classes of objects encountered in the real physical world do not have precisely defined criteria of
membership. For example, the "class of all real numbers which are much greater than 1", or the "class
of beautiful cats", do not constitute classes or sets in the usual mathematical sense of these terms.
However, the fact remains that such imprecisely defined "classes" pay an important role in human
thinking, particularly in the domain of patterns recognition, and abstraction

8.6.1. Ordinary Set

Let W, a non-empty set, be the formal basis of our further exertions. Set W is often called the
universe of discourse or frame of discernment. Our focus is primarily on finite sets. In such cases, the
number of elements in W, its cardinality is abbreviated by W. Any element in W is denoted by w.
For a specific w W, $ set A which makes either w A or w A. This is the basic requirement in
ordinary set theory.
Set A is denoted by A = {w1, w2, ... , wn} , wi is the ith element of set A. When elements in A cannot
be explicitly listed, A is denoted by { w .... }. The later part in the brackets is a description to those
elements which is included. In general,
A = { wA(w) true } ,
where A is a function of w.
Given A, B defined on W, if for any w W we have w A w B, then
A
If A

B
B and B

A, then A = B

Any A defined on W is called a subset of W


A

W.

An empty set is one that does not contain any element in W. An empty set is denoted as f.
Any A on W , f

W.

A discussed so far is called a single element set. When any A W becomes an element of another set
U, U is also a set, it is sometimes called a set class. All the set classes for W becomes 2W . For

instance, if W = {black, white} then 2W = {{black, white}, {black}, {white}, f} . In fact, sets defined
on W could be a set class. Therefore, a set A defined on W is sometimes denoted as A 2W .

8.6.2. Logical Operations of Ordinary Sets

Definition 1. Given A, B 2W ,
A B = {ww A or w B},
A B = {ww A and w B},
= {ww A},
are called the union of set A and B, the intersection of A and B and the complement of A, respectively.
When "", "", and "-" operators are used in combination, "-" has higher priority than "" and "".
It can be proven that for any W and A, B 2W , the following relationships hold:
()= ,
()= .
These are called De-Morgan's law.
The following are some properties for set arithmetics.
AA=A,AA=A
AB=BA,AB=BA
(A B)C = A(B C )
(A B)C = A(B C )
(A B)B = B,
(A B)B = B
A(B C) = (A B)(A C),
A(B C) = (A B)(A C)
A W = W, A W = A

A f = A, A f = f
()=A
A =W
A =f.
Definition 2. The two denotations
Ai = {wwW, $ iI such that wAi }
Ai = {wwW, $ iI such that wAi }
are called the union and intersection of set class { AiiI } .
I = { 1, 2, ... , n, ... } is called index set.
When I = { 1, 2 }, definition 2 is equivalent to definition 1.
Definition 3.
A - B = {wwA and wB } is called the difference set of B for A.
A-B=A
=-A
A projection from W to F is defined by:
f:WF.
Projection is the extension to the concept of a function. For any w W, there exists an element j =
f(w). w is the original image and f(w) is called the image of w.
W is the definition range for f, and
f(W) = { j$ wW such that j = f(w) } .
f(W) is called the value range.
If f(W) = F , then f is full projection from W to F.
If for any given w1, w2 W and w1 _ w2, we have
f(w1) _ f(w2) ,

then f is a one to one projection.


Definition 4. Given A 2W , determine a projection from W to { 0, 1 },
XA : W { 0, 1 } such that
XA(w) =
XA is the characteristic function of set A.
The value of the characteristic function of A at w is XA (w). X(w) is called the degree of membership
for w in A.
Obviously, when w A, the degree of membership for w belonging to A is 1 indicating that w is
absolutely an element of A. When w A, the degree of membership becomes 0, indicating that w
does not belong to A at all.

8.6.3. Fuzzy Set, Its Definition and Arithmetic Operations

Definition 5. Given a universe of discourse W, a fuzzy set A is defined as:


for any w W, there is a number m [0, 1] which is the degree of membership for w belonging to .
Projection m : W [0, 1] is called the membership function of .
Example, Given W = {a, b, c, d}
if m = 1, m = 0.8 m = 0.4 m = 0 , then is a fuzzy set. If is used to represent the concept of "Circular
shape", then m indicates the degrees of circularity of all elements in W.
when W is composed of a finite number of elements, W is called a finite universe of discourse.
A fuzzy set defined on a finite W can be represented by a vector. For instance, the "circular shape"
defined on W constitutes a fuzzy set which can be written as
= (1, 0.8, 0.4, 0) .
When there may be confusion between different elements, a fuzzy set may be represented as
= 1/a + 0.8/b + 0.4/c + 0/d ,
where denominators corresponds to elements in W and nominators represent the degrees of
membership. "+" is only a separation mark. When the degree of membership is 0, that element can be
omitted such as

= 1/a + 0.8/b + 0.4/c ,


we may also see in the following form
= {(1, a), (0.8, b), (0.4, c)} .
Example, If age is the universe of discourse, such as W = {0, 1, 2, ..., 200}, the fuzzy sets for "old"
and "young" may be defined as
m=
m= .
Although W is a finite set, we can treat it as a continuous range between 0 to 200 to generate the
curves for fuzzy sets and .
Definition 6. Given , F(W), where F(W) is the set of all the fuzzy sets defined on W. The
membership functions for , and are:
m = max (m , m ) ,
m = min (m , m ) , and
m = 1 - m , respectively .
If for W = {a, b, c, d}, two fuzzy sets are defined as:
= (1, 0.8, 0.4, 0) for circular shape
= (0.3, 0.4, 0.2, 0) for square shape,
then for circular or square we have
= (1, 0.8, 0.4, 0)
for circular and square we have
= (0.3, 0.4, 0.2, 0).
for not circular, we have
= (0, 0.2, 0.6, 1).

8.6.4. Transformations Between Fuzzy Sets and Ordinary Sets

Definition 7. Given

F(W), for any l [0, 1]

Al = { wm l } is called the l-level cut of .


If any w whose membership value exceeds l is considered as a member of , then fuzzy set becomes
ordinary set Al .
For instance, from = (1, 0.8, 0.4, 0) we have
C1 = {0} , C0.5 = C0.8 = {a, b} .
If F(W) , it can be proven that
m= ,
where XAl is the characteristic function for Al . This theorem and the level cut concept are the
linkages for conversions between fuzzy sets and ordinary sets.

8.6.5. Fuzzy Statistics

Fuzzy set theory and probability theory are used to handle two different types of uncertainty. We use
probability to study random phenomena. Each event itself has distinct meaning and not uncertain.
However, due to the lack of sufficient condition the outcome for certain event to occur during a
process cannot be determined.
In fuzzy set theory, concept or event itself does not have a clear definition. For example, "tall mean",
how tall they are is not defined. Here, whether certain phenomena belong to this concept is difficult to
determine. We call it fuzziness the uncertainty involved in a classification due to the imprecise
concept definition. The root for fuzziness is that there exists transitions between two phenomena. Such
transitions make it possible for us to label phenomena into either this or that class. Fuzzy set theory is
the base for us to study membership relationships from the fuzziness of phenomena.
Fuzzy statistics is used to determine estimate the degree of membership or membership function. In
order to do so we need to design a fuzzy statistic experiment. In such an experiment, similar to fuzzy
statistics, there are four elements:
1. Universe of discourse W ;
2. An element w in W ;
3. An ordinary set A which is varying on the W basis. A is related to a fuzzy set which corresponds to
a fuzzy concept. Each time A is fixed, it represents a deterministic definition of the fuzzy concept as
its approximation.

4. Condition S which contains all the objective and subjective factors that are related to the definition
of the fuzzy concept and therefore is a constraint of the variation of A.
The purpose of fuzzy statistics is to use a deterministic approach to study the uncertainties. The
requirements for a fuzzy statistical experiment is that in each experiment a deterministic decision on
whether w belongs to A. Therefore, in each experiment, A is a definite ordinary set. In fuzzy statistical
experiments, w is fixed while A is changing.
In n experiments, calculate the membership frequency of w belonging to fuzzy set , denoted by f
f=
As n increases, f may stabilize. The stabilized membership frequency is the degree of membership for
w belonging to . We call fuzzy statistics involving more than one fuzzy concepts, multi-phase fuzzy
statistics.
Definition 8. Given Pm = { 1 , ..., m} Ai F(W), i = 1, ..., m, this type of experiments is m-phase
fuzzy statistical experiments, provided that in each experiment we can determine a projection such that
e : W Pm .
Each fuzzy set in Pm is one phase of Pm.
The results of multi-phase fuzzy statistics enable us to obtain a fuzzy membership function for each
phase on W. They have the following properties:
m 1(w) + m 2(w) + ... + m m(w) = 1
If W = {w1, w2, ... , wn} is a finite universe of discourse we have
m 1(wj) + ... + m m(wj) = n .

8.6.6. Fuzzy Relation

An important concept needed in fuzzy set theory is that of a fuzzy relation which generalizes the
conventional set-theoretic notion of relation. Let W1 and W2 be two universes. A fuzzy relation has
the membership function mR : W1 x W2 [0, 1]. The projection of on W1 is the marginal fuzzy set
m = sup {m(w1, w2)w2 W2}
for all w1 W1 . If 1 is a fuzzy set on W1 the m 1 can be extended to W1 x W2 by
m = m 1(w1)

for all (w1, w2) W1 x W2 .


Based on the above introduction, it can be seen that a fuzzy relation in R, the real number space is a
fuzzy set in the product space R x R. For example, the relation denoted by x >> y, x, y R' may be
regarded as a fuzzy set in R2 with the membership function of , f having the following values:
f=0;
f = 0.7 ;
f = 1 ; etc.

8.6.7. Possibility Distribution

Let wo be an unknown value ranging over a set W, and let the piece of imprecise information be
given as a set E, i.e., wo E is known for sure and E 2. If we ask whether another set A contains
wo, there can be two possible answers:
if A E = f then it is impossible that wo A
if A E _ f then it is possible
Formally, we obtain a mapping
PossE : 2W [0, 1] , PossE(A) =
where 1 indicate "possible" and 0 "impossible".
When E becomes a fuzzy set , we define
Poss : 2W [0, 1]
Poss = sup {aA Ea _ f, a [0, 1]}
= sup {m w A}
Hence given a fuzzy set the small positive integer
= (1, 1, 0.8, 0.6, 0.4, 0.2) .
Given A = {3} , the possibility is 0.8
A = {xx 3} Poss = 1 .
Possibility tells us about the possibility of "not A", hence the necessity of the occurrence of A,

Nec = 1 - Poss .

8.6.8. Algebraic Operations on Fuzzy Sets

In addition to the operations of union and intersection, one can define a number of other ways of
forming combinations of fuzzy sets and relating them to one another.
Algebraic product: Given and the algebraic product of and denoted by is defined in terms of the
membership functions of and ,
f=ff
This indicate that

The algebraic sum of and denoted by + is defined by


f+=f+f
provided that 0 f + f 1
Convex combination of and with an arbitrary fuzzy set denoted by
( , ; ) is defined by
(, ; )= +
written out in terms of membership functions
f( , ; ) (w) = f f + (1 - f ) f
A basic property of the convex combination of , and is expressed by
AB

(A, B ; L)

AB

Given any fuzzy set satisfying

, one can always find a fuzzy set such that

=(, ; ).
In fact,
f = for w W .

8.6.9. A Proposed Procedure for Use of Fuzzy Set Theory


in Integrated Analysis of Spatial Data

The problem,
Given spatial data E = {e1, e2, ... , em} from m different sources S1, S2, ... , Sm, one wishes to decide
which hypothesis among n of them H = {H1, H2, ... , Hn} is most likely to happen. Or in a
classification problem, one wishes to decide which class among n classes {C1, C2, ... , Cn} is the most
appropriate one into which E to be classified. Formally stated, one wishes to find out a projection F
such that
F : S1 x S2 x ... x Sm H
which satisfies
(1) 0 FHj(E) 1 for j = 1, 2, ... , n
(2) FHj(E) = 1 .
It requires relatively deep mathematical knowledge to determine a projection from the Cartesian
product space S1 x S2 x ... x Sm to H, interested reader may find Kruse et al. (1991) a starting point.
This may be relaxed by finding a projection between each source Si to H.
Therefore,
One may follow the steps listed below to solve the problem posed.
Step 1. Consider each element in H fuzzy set j, j = 1, 2, ... , n. Determine the fuzzy membership
function on each source Si, i = 1, 2, ... , m for each Hj, j = 1, 2, ... , n. Thus a total of m x n
membership functions need to be found. Usually, expert knowledge or fuzzy sets.
Step 2. Combine evidences from different sources to validate hypotheses or to conduct classification.
Fuzzy set operations including union, intersection, complement and algebraic operation can be used
for such purposes.
Step 3. Compare combined degree of membership for each hypothesis (class), confirm the hypothesis
with the highest degree of membership.
Gong (1993) and a fuzzy classifier in a forest ecological classification research (Crain et al., 1993) are
all following this procedure. It needs to be further validated. The assumption here is obviously each
hypothesis is independent to the other.
8.7 Introduction To Neural Networks

Similar to the earlier part of this course, our interests are still focused on the problem that given a
piece of evidence e E test the hypothesis that e validates
h H. Transform this into classification or pattern recognition, we would like to have an algorithm or
a system that is capable of classifying or recognizing a given set of observations and label it into a
class or a pattern. We would like the system or the algorithm learns from observations of patterns that
are labelled by class and then is able to recognize unknown patterns and properly label them with
output of class membership values.
One of the most exciting developments during the early days of pattern recognition was the
perception, the idea that a network of elemental processors arrayed in a manner reminiscent of
biological neural nets might be able to learn how to recognize or classify patterns in an autonomous
manner. However, it was realized that simple linear networks were inadequate for that purpose and
that non-linear networks based on threshold-logic units lacked effective learning algorithms. This
problem has been solved by Rumelhart, Hinton and Williams [1986] with a generalized delta rule
(GDR) for learning. In the following section, a neural network model based on the generalized delta
rule is introduced.

8.7.1. The Generalized Delta Rule For the Semilinear Feed Forward Net With Back Propagation
of Error

The architecture of a layered net with feed forward capability is shown below:

In this system architecture, the basic elements are nodes " " and links "". Nodes are arranged in
layers. Each input node accepts a single value. Each node generates an output value. Depending on the
layer that a node is located, its output may be used as the input for all nodes in the next layer.
The links between nodes in successive layers are weight coefficients. For example wji is the link
between two nodes from layer i to layer j. Each node is an arithmetic unit. Nodes in the same layer are
independent of each other, therefore they can be implemented in parallel processing. Except those

nodes of the input layer i, all the nodes take the inputs from all the nodes of the layer and use the linear
combination of those input values as its net input, or a node in layer j, the net input is,
uj = .
The out of the node in layer j is
Oj = f (uj)
where f is the activation function. It often takes the form as a signoidal function,
Oj =
qj serves as a threshold or bias. The effect of a positive qj is to shift the activation function to the left
along the horizontal axis. The effect of qo is to modify the shape of the signoid. These effects are
illustrated in the following diagram

This function allows for each node to react to certain input differently, some nodes may be easily
activated or fined to generate a high output value when qo is low and qj is small. In contrary, when qo
is high and qj is large a node will have a slower response to the input uj. This is considered occurring
in the human neural system where neurons are activated by different levels of stimuli.
Such a feed forward networks requires a single set of weights and biases that will satisfy all the (input,
output) pairs presented to it. The process of obtaining the weights and biases is called network learning
or training. In the training task, a pattern p = {IPi}, i = 1, 2, ..., ni, ni is the number of nodes in the
input layer. IP is the input pattern index.
For the given input p , we require the network adjust the set of weights in all the connecting links and
also all the thresholds in the nodes such that the desired outputs p (= {tpk}, k = 1, 2, ..., nk, nk is the
number of output nodes) are obtained at the output nodes. Once this adjustment has been
accomplished by the network, another pair of input and output, p and p is presented and the network
is asked to learn that association also.
In general, the output p = {Opk} from the network will not be the same as the target or designed
values p. For each pattern, the square of the error is
Ep = 2

and the average system error is


E= 2.
where the factor of a half is used purely for mathematical convenience at the later stage.
The generalized delta rule (GDR) is used to determine the weights and biases. The correct set of
weights is obtained by varying the weights in a manner calculated to reduce the error Ep as rapidly as
possible. In general, different results will be obtained depending on one carries out the gradient search
in weight space based on either Ep or E.
In GDR, the determination of weights and thresholds is carried out by minimizing Ep.
The convergency of Ep towards improved values for weights and thresholds is achieved by taking
incremental changes wkj proportional to - _Ep/_wkj . The subscript p will be omitted subsequently,
thus
wkz = - h (1)
where E is expressed in terms of outputs Ok, each of which is the non-linear output of the node k,
Ok = f(uk)
where uk is the input to the kth node and it is
uk = (2)
Therefore by the chain rule, is evaluated by
= (3)
from (2), we obtain
= Oj (4)
Define
dk = - (5)
and thus
wkj = hdkOj (6)
dk = - = (7)
where

= - (tk - Ok) (8)


and
= f'k(uk) (9)
Thus
dk = (tk - Ok) f'k(uk) (10)
\ wkj = h(tk - Ok) f'k(uk) Oj (11)
For weights that do not directly affect output nodes,
wji = - h
=-h
= - h Oi
= h Oi
= h f'j(uj) Oi
= h djOi (12)
However is not directly achievable. Thus it has to be indirectly evaluated in terms of quantities that
are known and other quantities that can be evaluated.
- =-
= wkmOm
= wkj = dk wkj (13)
\ dj = f'j(uj) dk wkj (14)
That is, the deltas at an internal nodes can be evaluated in terms of the deltas at a later layer. Thus,
starting at the last layer, the output layer, we can evaluate dk using equation (10). Then we can
propagate the "error" backward to earlier layers. This is the process of error back-propagation.
In summary, with the subscript p denoting the pattern number, we have
p wji = h dpj Opi (15)
If the j node is in the output layer,

dpj = (tpj - Opj) f'j(upj) (16)


If the j node is in internal layers, then
dpj = f'j(upj) dpk wkj
Particularly, if
Oj = (18)
then
= Oj(1 - Oj) (19)
and the deltas are
dpk = (tpk - Opk) Opk (1 - Opk)
dpj = Opj (1 - Opj) dpk wkj (20)
for the output layer and the hidden layer nodes, respectively.
qj are learned in the same manner as are the other weights.
Note that the number of hidden layer can be greater than 1. Although a three-layer network can form
arbitrarily complex decision regions, sometimes difficult learning tasks can be simplified by increasing
the number of internal layers. Some preliminary assessment of this algorithm was made for ecological
land systems classification of a selected study site in Manitoba (Gong et al., 1994).
References

Crain, I.K., Gong, P., Chapman, M.A., 1993. Implementation considerations for uncertainty
management in an ecologically oriented GIS. GIS'93, Vancouver, B.C., pp.167-172.
Duda, R. O. and P. E. Hart, 1973. Pattern Classification and Scene Analysis. Wiley and Sons, New
York, 482p.
Freeman J.A., D. M. Skapura, 1991. Neural Networks, Algorithms, Applications, and Programming
Techniques, Addison-Wesley:New York.
Gong, P., 1993. Change detection using principal component analysis and fuzzy set theory. Canadian
Journal of Remote Sensing. 19(1): 22-9.

Gong, P., and D.J. Dunlop, 1991. Comments on Skidmore and Turner's supervised non-parametric
classifier. PE&RS. 57(1):1311-1313.
Gong, P. and P. J. Howarth, 1990. Land cover to land use conversion: a knowledge-based approach,
Technical Papers, Annual Conference of American Society of Photogrammetry and Remote Sensing,
Denver, Colorado, Vol. 4, pp.447-456.
Gong, P., A. Zhang, J. Chen, R. Hall, I. Corns, Ecological land systems classification using
multisource data and neural networks, Accepted by GIS'94, Vancouver, B.C., February, 1994.
Goodchild, M.F., G. Sun, S. Yang, 1992. Development and test of an error model for categorical data.
International Journal of Geographical Information Systems. 6(2): 87-104.
Kosko, B., 1992. Neural Networks and Fuzzy Systems. Prentice-Hall; Englewood Cliffs, New Jersey.
Kruse R., E. Schwecke, J. Heinsohn, 1991. Uncertainty and Vagueness in Knowledge Based on
Systems, Numerical Methods. Springer-Verlag: New York.
Mark D. and Cscillag F., 1989. The nature of boundaries on area-class maps. Cartographica, pp. 6577.
Pao Y., 1989. Adaptive Pattern Recognition and Neural Networks. Addison-Wesley: Reading, MA.
Richards, J. A., 1986. Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag,
Berlin.
Shinghal R., 1992. Formal Concepts in Artificial Intelligence, Fundamentals. Chapman & Hall: New
York.

Você também pode gostar