Escolar Documentos
Profissional Documentos
Cultura Documentos
Figure 1.1
The Flows
of Energy
and
Information
in Remote
Sensing
1990-
We begin to ask: what are the factors that make remotely sensed images taken for the
same target different? Remotely sensed data record the dynamics of the earth surface. The
three-dimensional earth surface is changing as time goes. Two images taken at the same
place with the same imaging condition will not be the same if they are obtained at different
times. Among many other factors that will be introduced in later chapters, sensor and
Resolution
Sampling
frequency
determines how frequent are data collected. There are three types
of sampling important to remote sensing: spectral, spatial and
temporal.
For example, assume that the level of solar energy coming from the sun and passing
through the atmosphere at a spectral region between 0.4 mm - 1.1 mm is distributed as in
Fig. 1.2. This is a continuous curve.
After the solar energy interacts with a target such as a forest on the earth, the energy is
partly absorbed, transmitted, or scattered and reflected. Assume that the level of the
scattered and reflected energy collected by a sensor behaves in a manner as illustrated in
Fig. 1.3.
bands. Sometimes, we choose to make a discrete sampling over a spectral curve (Figure
1.4). The questions are: which way of sampling is more appropriate and what resolution is
better? It is obvious that if we use a low resolution, we are going to blur the curve. The
finer the resolution is, the more precise can we restore a curve, provided that sufficient
spectral sampling frequency is used.
The difference between imaging spectrometers and earlier generation sensors is in the
difference of the spectral sampling frequency. Sensors of earlier generations use selective
spectral sampling. Imaging spectrometers have a complete systematic sampling scheme
over the entire spectral range. An imaging spectrometer, such as CASI, has 288 spectral
bands between 0.43 - 0.92 spectral region, while earlier generation sensors only have 3 - 7
spectral bands.
Figure 1.5. Sampling the same target with different spatial resolutions.
A scene including a house with garage and driveway is imaged with two different spatial
resolutions. For each cell in Figure 1.5a no object occupies an entire cell. Each cell will
contain energy from different cover types. Such cells are called mixed pixels, also known
as mixels. In Chapter 7, we will introduce some methods that can be used to decompose
mixed pixels. Mixed pixels are very difficult to discriminate from each other. Obviously a
house cannot be easily recognized at the level of resolution in Figure 1.5a, but it may be
possible in Figure 1.5b. As spatial resolution becomes finer, more details about objects in a
scene become available. In general it is true that with finer spatial resolutions objects can
be better discriminated with human eyes. With computers, however, it may be harder to
recognize objects imaged with finer spatial resolutions. This is because finer spatial
resolutions increase the image size for a computer to handle. More importantly, for many
computer analysis algorithms, they cause the effect of "seeing the tree but not the forest."
Computer techniqes are far poorer than human brain in generalization from fine details.
Temporal sampling can be regarded similar to spectral sampling. For example, temporal
sampling means how frequently we are imaging an area of interest. Are we going to use
contiguous systematic sampling as in movie making or selective sampling as in most
photographic actions? To decide the temporal sampling scheme, the dynamic
characteristics of the target under study have to be considered. For instance, if the study
subject is to discriminate crop species, the phenological calendar of each crop type should
be considered for when to collect remotely sensed data in order to best characterize each
different crop species. The data could be selected from the entire growing season between
late April to early October for mid and high latitudes in northern hemisphere. If the subject
is flood monitoring, the temporal sampling frequency should be high during the flood
period because floods usually last only a few hours to a few days.
Radiometric resolution can be understood in a similar manner as with spatial resolution.
This is a concept well illustrated in a number of digital image processing books (e.g.,
Gonzalez and Wintz, 1987; Pratt, 1991). It is associated with the level of quantization of an
image which is in turn related to how to use the minimum amount of data storage to
represent the maximum amount of information. This is often a concern in data
compression. Although we will explain the concept of radiometric resolution in Chapter 5,
we will only touch the topic of data compression in Chapter 7 from an information
extraction point of view.
1.4 Use of Remote Sensing
of fish schools in water, crop production of agricultural systems, water storage and runoff
of watersheds, population in rural and urbanized areas, and quantity and living conditions
of wildlife species.
We organize the remaining chapters of this book that lead you to take more advantages
of remote sensing in the applications mentioned above. In Chapter two, we will first
introduce the very basic physics required to understand the imaging mechanism in remote
sensing. In Chapter three, we introduce the development of sensing systems following a
historical order. In Chapter four, we introduce imaging geometry and illustrate geometrical
calibration methods that are required to achieve precise measurement of spatial dimensions
of objects. In Chapter five, we explain various methods for recovering image radiometry
affected by sensor malfunctioning, atmospheric interference and terrain relief. In Chapter
six, we illustrate some of the most commonly used image processing methods for image
enhancement. In Chapter seven, we focus on the introduction of various strategies for
information extraction from remotely sensed data. In Chapter eight, following a brief
introduction on map making, we introduce some methods that are used to combine maps
and other spatial data with remotely sensed data for analysis and extraction of information
on various targets.
Chapter 1
References
Campbell, J.B., 1987. Introduction to Remote Sensing, The Guilford Press.
Gonzalez, R.C., P. Wintz, 1987. Digital Image Processing. 2nd Ed., Addison-Wesley, Reading:MA.
Lillesand, T.M. and Kiefer, R.W., 1987, Remote Sensing and Image Interpretation, Sec. Ed., John Wiley and Sons,
Inc.: Toronto.
Emphasis on aerial photography, photogrammetry, photo interpretation, non-photographic sensing systems
and their image interpretation, and introduction to digital image processing.
Staenz, K., 1992. A decade of imaging spectrometry in Canada. Canadian Journal of Remote Sensing. 18(4):187197.
Lists most of the imaging spectrometers developed worldwide. Sensor calibration and various applications.
Pratt, W., 1991. Digital Image Processing. John Wiley and Sons, Inc.: Toronto.
Further Readings
Asrar G., ed. 1989, Theory and Applications of Optical Remote Sensing, John Wiley and Sons, Toronto.
A selection of most important fields of optical remote sensing ranging from the physical basis of energy-meter
interaction, vegetation canopy modelling, atmospheric effects reduction, applications to forest, agriculture,
coastal wetland, geology, snow and ice, climatology and meteorology, and ecosystem. Its emphasis is on the
application of remote sensing to understanding land-surface processes globally.
Jensen, J.R., 1986, Digital Image Processing, an Introductionary Perspective. Prentice-Hall: Englewood Cliffs, N.J.
A good introduction book on digital image analysis concepts and procedures. A show how type of book. Easy
for beginners. Typical topics covered include image statistics , image enhancement in spatial domain, geometric
correction, classification, change detection. Completely related to a remote sensing context.
Richards, J.A., 1986, Digital Image Processing, Springer-Verlag: New York.
A good introduction book. More mathematical than Jensen's book. Some additional materials in comparison to
Jensen's book include an entire chapter on Fourier Transform. Relationships among some basic image
enhancement and image classification algorithms.
This equation explains that the shorter wavelength has higher spectral frequency
Electromagnetic energy is a mixture of waves with different frequencies. It may be viewed
as:
Each wave represents a group of particles with the same frequency. All together they
have different frequencies and magnitudes.
With each wave, there is an electronic (E) component and a magnetic component (M).
The Amplitude (A) reflects the level of the electromagnetic energy. It may also be
considered as intensity or spectral irradiance. If we plot A against the wavelength we
then get an electromagnetic curve, or spectrum (Figure 2.1).
Any matter with a body temperature greater than 0 K emits electromagnetic energy.
Therefore, it has a spectrum. Furthermore, different chemical elements have different
spectra. They absorb and reflect spectral energy differently. Different elements are
combined to form compounds. Each compound has a unique spectrum due to its unique
molecular structure. This is the basis for the application of spectroscopy to identify
chemical materials. It is also the basis for remote sensing in discriminating one matter from
the other. Spectrum of a material is like the finger print of human being.
The wavelength of electromagnetic energy has such a wide range that no instrument can
measure it completely. Different devices, however, can measure most of the major spectral
regions.
The division of the spectral wavelength is based on the devices which can be used to
observe particular types of energy, such as thermal, shortwave infrared and microwave
energy. In reality, there are no real abrupt changes on the magnitude of the spectral energy.
The spectrum are conventionally divided into various parts as shown below:
The optical region covers 0.3 - 15 mm where energy can be collected through lenses. The
reflective region, 0.4 - 3.0 mm, is a subdivision of the optical region. In this spectral
region, we collect solar energy reflected by the earth surface. Another subdivision of the
optical spectral region is the thermal spectral range which is between 3 mm to 15 mm,
where energy comes primarily from surface emittance. Table 2.1 lists major uses of some
spectral wavelength regions.
Table 2.1. Major uses of some spectral wavelength regions
Wavelength
Use
Wavelength
Use
g ray
Mineral
1.55-1.75 m
X ray
Medical
2.04-2.34 m
10.5-12.5 m
Surface temperature
0.4-0.45 m
3 cm - 15 cm
0.7-1.1 m
Vegetation vigor
20 cm - 1 m
Ultraviolet(UV)
The first theory treats electromagnetic radiation as many discrete particles called photons
or quanta (terms in Physics). The energy of a quantum is given by
E = hv
where
E energy of a quantum (Joules)
h = 6.626 x 10-34 (Planck's constant)
v frequency
since
thus
Energy (or radiation) of a quantum is inversely proportional to the wavelength. The longer
the wavelength of a quantum, the smaller is its energy. (The shorter the wavelength, the
stronger is its energy.) Thus, the energy of a very short wavelength (UV and shorter) is
dangerous to human health. If we want to sense emittance from objects at longer
wavelength, we will have to either use very sensitive devices or use less sensitive device to
view a larger area to get sufficient amount of energy.
This has implications to remote sensing sensor design. To use the available sensing
technology at hand, we will have to balance between wavelength and spatial resolution. If
we wish to make our sensor to have higher spatial resolution, we may have to use short
wavelength regions.
The second radiation theory is Stefan-Boltzmann Law:
This law is expressed for an energy source that behaves as a blackbody - a hypothetical,
ideal radiator that absorbs and re-emits all energy incident upon it. Actual matters are not
perfect blackbody. For any matter, we can measure its emitting energy (M), and compare it
with the energy emitted from a blackbody at the same temperature (Mb) by:
" " is the emissivity of the matter. A perfect reflector will have nothing to emit.
Therefore, its e will be "0". A true blackbody has an of 1. Most other matters fall in
between these two extremes.
The third theory is Wien's displacement law which specifies the relationship between the
peak wavelength of emittance and the temperature of a matter.
max = 2897.8/T
As the temperature of a blackbody gets higher, the wavelength at which the blackbody
emits its maximum energy becomes shorter.
Figure 2.2 shows blackbody radiation curves for temperature levels of the Sun, a
candescent lamp and the Earth. During the day time we can see the energy from the sun is
overwhelming. During the night, however, we can use the spectral region between 3 m
and 16 m to observe the emittance properties of the earth surface.
At wavelengths longer than the thermal infrared region, i.e. at the microwave region, the
energy (radiation) level is very low. Therefore, we often use human-made energy source to
illuminate the target (such as Radar) and to collect the backscatter from the target. A
remote sensing system relying on human-made energy source is called an "active" remote
sensing system. Remote sensing relying on energy sources which is not human-made is
called "passive" remote sensing.
2.4 Energy Interactions in the Atmosphere
The atmosphere has different effects on the EM transfer at different wavelength. In this
section, we will mainly introduce the fact that the atmosphere can have a profound effect
on intensity and spectral composition of the radiation that reaches a remote sensing system.
These effects are caused primarily by the atmospheric scattering and absorption.
Scattering: The redirection of EM energy by the suspended particles in the air.
Different particle sizes will have different effects on the EM energy propagation.
dp <<
Rayleigh scattering Sr
dp =
Mie scattering Sm
dp >>
Non-selective scattering Sn
The atmosphere can be divided into a number of well marked horizontal layers on the basis
of temperature.
Troposphere:
It is the zone where weather phenomena and atmospheric turbulence are most marked. It
contains 75% of the total molecular and gaseous mass of the atmosphere and virtually all
the water vapour and aerosols.
height 8 - 16 km (pole to equator)
Stratosphere: 50 km Ozone
Mesosphere : 80 km
Thermosphere : 250 km
Exosphere : 500 km ~ 750 km
The atmosphere is a mixture of gases with constant proportions up to 80 km or more from
ground. The exceptions are Ozone, which is concentrated in the lower stratosphere, and
water vapor in the lower troposphere. Carbon dioxide is the principal atmosphere gas, with
its concentration varying with time. It is increasing since the beginning of this century due
to the burning of fossil fuels. Air is highly compressible. Half of its mass occurs in the
lowest 5 km and pressure decreases logarithmically with height from an average sea-level
value of 1013 mb.
Figure 2.3 Horizontal layers that divide the atmosphere (Barry and Chorley, 1982)
Scattering causes degradation of image quality for earth observation. At higher altitudes,
images acquired in shorter wavelengths (ultraviolet, blue) contain a large amount of
scattered noise which reduces the contrast of an image.
Figure 2.4 Major absorption wavelengths by CO2, H2O, O2, O3 in the atmosphere
Transmission: The remaining amount of energy after being absorbed and scattered by the
atmosphere is transmitted.
What will happen when the EM energy reaches the Earth surface? The answer is that the
total energy will be broken into three parts: reflected, absorbed, and/or transmitted.
or
reflectance
or spectral
signature.
Our second question is: how is energy reflected by a target? It can be classified into three
cases, specular reflector, irregular reflector, and perfect diffusor.
Specular reflector is caused by the surface geometry of a mater. It is of little use in remote
sensing because the incoming energy is completely reflected in another direction. Still
water, ice and many other minerals with crystal surfaces have the same property.
Perfect diffuse reflector refers to a matter which reflects energy uniformly to all directions.
This type of reflector is desirable because it is possible to observe the matter at any
direction and obtain the same reflectance.
Unfortunately most targets have a behaviour between the ideal specular reflector and
diffuse reflector. This makes quantitative remote sensing and target identification purely
from reflectance data difficult. Otherwise, it would be easy to discriminate object using
spectral reflectances from a spectral library. Due to the variability of spectral signature, one
of the current research direction is to investigate the bidirectional properties of various
targets.
Plotting reflectance against wavelength, we will get a spectral reflectance curve. Examples
of spectral curves of typical materials such as vegetation, soil and water are shown in
Figure 2.5. Clear water has a low spectral reflectance (< 10%) in the visible region. At
wavelengths longer than 0.75 m, water absorbs almost all the incoming energy.
Vegetation generally has three reflectance valleys. The one at the red spectral wavelength
region (0.65 m) is caused by high absorptance of energy by chloraphyll a and b in the
leaves. The other two at 1.45-1.55 m and 1.90-1.95 m are caused by high absorptance of
energy by water in the leaves. Dry soil has a relatively flat reflectance curve. When it is
wet, its spectral reflectance drops due to water absorption.
Figure 2.5 Typical Spectral Reflectance Curves for Soil, Vegetation and Water
(Lillesand and Kiefer, 1994)
Questions
1. Using the scattering properties of the atmosphere explain why under clear sky condition
the sky is blue. Why the sun looks red at the time of sunset or sun-rise?
2. Why X-ray is used for medical examination? Using radiation law No. 3, explain why as
a piece of iron is heated, the color of the iron begins with dark red, then changes to red, to
yellow to white.
3. Describe how one may use absorptance and transmittance of a matter in remote sensing.
4. Use Figures 2.4 and 2.5 as references, answer the following questions:
5. Can we use 6-7 m to observe the atmosphere?
6. Can we use 0.8-1.0 m to observe under water materials such as plankaton?
7. Which spectral regions should be used to observe water content in the atmosphere?
What about water content in vegetation?
Chapter 2
References
Barry, and Chorley, 1982. Climate, Weather and Atmosphere, Longman: London
Elachi, C., 1987. Introduction to the Physics and Techniques of Remote Sensing, John Wiley and Sons, Inc.: Toronto
Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd Ed., John Wiley and Sons,
Inc.: Toronto.
A camera system is composed of the camera body, lens, diaphragm, shutter and a film
(Figure 3.1):
Diaphragm
Shutter
works by open and close. The time period between the open and close
controls the amount of energy entering into the camera.
where
f = focal length
do = distance from the lens to the object
di = distance from the lens to the image (Figure 3.2)
For aerial photography,
do >> di
The diameter of a diaphragm controls the depth of field. The smaller the diameter of an
opened diaphram, the wider the distance range in which the scene constructs clearly
focused image. The diaphragm diameter can be adjusted to a particular aperture. What we
normally see on a camera's aperture setting is F 2.8 4 5.6 8 11 16 22
These F#s are obtained by f/diameter. When diameter becomes smaller, F# becomes larger
and more energy is stopped. The actual amount of energy reaching the film is determined
by:
where
i is the energy intensity J/m2s
t is time in second
F is F# as mentioned above
E is energy in J/m2
Films
A film is primarily composed of an emulsion layer(s) and base (Figure 3.3)
Color film
Figure 3.3. Layers in black and white films and colour films
The most important part of the film is the emulsion layer. An emulsion layer contains light
sensitive chemicals. When it is exposed in the light, chemical reaction occurs and a latent
image is formed. After developing the film, the emulsion layer will show the image.
Films can be divided into negative and positive, or divided in terms of their ranges of
spectral sensitivities: black and white (B/W), B/W Infrared, Color, Color Infrared.
B/W negative films are those films that have the brightest part of the scene appearing the
darkest while the darker part of the scene appearing brighter on a developed film.
Color negative films are those on which a color from the scene is recorded by its
complementary colors.
There are two important aspects of a film: its spectral sensitivity and its characteristic
curve.
Spectral sensitivity specifies the spectral region to which a film is sensitive (Figure 3.4).
Since infrared is also sensitive to visible light, the visible light should be intercepted by
some material. This is done by optical filtering (Figure 3.4). For this case, a dark red filter
can be used to intercept visible light.
Similarly, other filters can be used to stop light at certain spectral ranges from reaching the
film.
Characteristic curve indicates the radiative response of a film to the energy level.
[a]
[b]
If the desnity of a film develops quickly when the film is exposed to light, we say that the
film is fast (Fig. 3.5a). Otherwise, the film is slow (Fig. 3.5b). Film speed is defined by
labels such as ASA 100, ASA 200, ...., ASA 1000. The greater the ASA number, the faster
a film is. High speed films will have a good contrast on the image, but low speed films will
provide better details.
Color Films
There are two types of colors: additive primaries and subtractive primaries:
Three additive primaries are red, green, and blue
Three subtractive primaries are cyan, magenta, and yellow
All colors can be made by combining any two primary colors.
Additive colors apply to the mixing of light (Fig. 3.6a), while subtractive colors are used
for the mixing of paints used in printing (Fig. 3.6b). In order to represent colors onto a
medium such as a film or a colour photographic paper, subtractive colors are needed.
The development procedure for the colour negative film is shown in Figure 3.9.
RGB
RGB
Y
M
C
RG through
RB through
GB through
Dye
Film
Clear
Clear
IR
C
Clear
BGR
BGR
Clear
Clear
BGR
C
Clear
Black
Y
M
Final result
R
View Angle:
View angle is normally determined by the focal length and the frame size of a film. For a
camera, the frame is fixed, therefore the ground coverage is determined by the altitude and
the camera viewing angle (Figure 3.14)
f1 > f2 > f3
a1 < a2 < a3
Figure 3.14. Viewing angle determined by the focal length
In normal cameras
Aerial Camera
Normal lens
50 mm
300 mm
Wide angle
28 mm
150 mm
Fisheye lens
7 mm
88 mm
Figure 3.15. Resolving power test chart (from Lillesand and Kiefer, 1994).
Ground Coverage
A photograph may have a small coverage if it is taken either at a low flight height or with a
narrower viewing angle.
The advantages of photographs with small coverages are that they provide more detail,
and less distortion and displacement. It is easier to analyze a photograph with a small
coverage because similar target will have less distortion from the center to the edge of the
photograph, and from one photograph to the other.
The disadvantage of photographs with small coverages is that it needs more flight time to
cover an area and thus the cost will be higher. Moreover, mosaicing may cause more
distortion.
A large coverage can be obtained by taking the photograph from a higher altitude or using
a wider angle. The quality of photographs with a large coverage is likely to have poorer
photographic resolution due to larger viewing angle and likely stronger atmospheric effect.
The advantages are that a large coverage is simultanuously obtained, requires less
geometric mosaicing, and costs less.
The disadvantages are that it is difficult to analyze targets in detail and that target is
severely distorted.
Essentially, the size of photo coverage is related to the scale of the raw aerial photographs.
Choosing photographs with a large coverage or a small one should be based on the
following:
budget at hand
task
equipment available
The following are some of the advantages/disadvantages of aerial photography in
comparison with other types of data acquisition systems:
Advantages:
High resolution (ground)
Flexibility
High geometric reliability
Relatively inexpensive
Disadvantages:
Day light exposure (10:00 am- 2:00 pm) required
Poorer contrast at shorter wavelengths
Film non-reusable
Inconvenient
Inefficient for digital analysis
What are the differences between a camera system and a scanning system? The following
are some of the major differences:
A rotating mirror is added in front of the lens of camera
In a scanning system, films are changed to photo-sensitive detectors and magnetic tapes.
They are used to store the collected spectral energy (Figure 3.16).
Figure 3.18. Four image bands with six detectors in each band.
MSSs have been used on Landsat - 1, 2, 3, 4, 5. They are reliable systems. The spectral
region of each band is listed below:
Landsat 1,
2
Landsat 4, 5
B4
0.5 - 0.6 mm
B1
B5
0.6 - 0.7mm
B2
B6
0.7 - 0.8 mm
B3
B7
0.8 - 9.1 mm
B4
Landsat 3 had a short life. The MSS systems on Landsat 3 were modified as compared to
Landsat 1 and 2. Landsat-6 was launched unsuccessfully in 1993.
Each scene of MSS image covers 185 km X 185 km in area. It has a spatial resolution of 79
m X 57 m. An advantage of MSS is that it is less expensive. Sometimes one detector is left
blank or its signal is much different from other ones, creating banding or striping. We will
discuss methods for correcting these problems in Chapter 5.
0.45 - 0.52 mm
30 m
TM2
0.52 - 0.60
30 m
TM3
0.63 - 0.69
30 m
TM4
0.76 - 0.90
30 m
TM5
1.55 - 1.75
30 m
TM7
2.08 - 2.35
30 m
TM6
10.4 - 12.5 mm
120m
MSS data are collected on only one scanning direction. TM data are collected on both
scanning directions (Figure 3.19).
Figure 3.19. Major changes of the TM system as compared to the MSS system.
0.50 - 0.59 mm
B2
0.61 - 0.68 mm
B3
0.79 - 0.89 mm
A mirror with the view angle of 4.13 is used to allow 27 off nadir observation. An
advantage of the off-nadir viewing capability is that it allows more frequent observations
of certain targeted area on the earth and acquisitions of stereo-pair images. A disadvantage
of the HRV sensors is the difficulties involved in calibrating thousands of detectors. The
radiometric resolution of MSS is 6 to 7 bits, while both TM and HRVs have an 8 bit
radiometric resolution.
The orbital cycle is 18 days for Landsats 1 - 3; 16 days for landsats 4, 5; 26 days for SPOT1 (SPOT HRV sensors can repeat the same target in 3 to 5 days due to their off-nadir
observing capabilities).
NOAA series were named after the National Oceanic and Atmospheric Administration of
the United States.
The AVHRR sensor has 5 spectral channels
B1
0.58 - 0.68 mm
B2
0.72 - 1.10 mm
B3
3.55 - 3.95 mm
B4
10.3 - 11.30 mm
B5
11.5 - 12.50 mm
The orbit repeating cycle is twice daily. This is an important feature for frequent
monitoring. NOAA AVHRRs have been used for large scale vegetation and sea ice studies
at continental and global scales.
Two bands have 250 m, 5 have 500 m while the rest have 1000 m resolution. The sensor is
planned to provide data covering the entire Earth daily.
0.02 m
5-6
NIR -thermal
Two private companies, Lockheed, Inc. and Worldview, Inc. are planning to launch their
own commecial satellites in 2-3 years time with spatial resolutions ranging from 1 m to 3
m. In Japan, the NASDA (National Space Development Agency) has developed the Marine
Observation System (MOS). On board this system, there is a sensor called Multispectral
Electronic Self-scanning Radiometer (MESSR) with similar spectral bands as the Landsat
MSS systems. However, the spatial resolution of the MESSR system is 50 x 502.
Other countries such as India and the former USSR have also launched Earth resources
satellites with different optical sensors.
3.4 Airborne Multispectral Systems
Multispectral scanners
The mechanism of airborne multispectral sensors is similar to the Landsat MSS and TM.
The airborne sensor systems usually have more spectral bands ranging from ultraviolet to
visible through near infrared to thermal areas. For example, the Daedalus MSS system is a
widely used system that has 11 channels, with the first 10 channels ranging from 0.38 to
1.06 m and the 11th is a thermal channel (9.75 - 12.25 mm).
Another airborne multispectral scanner being used for experimental purposes is the TIMS Thermal Infrared Multispectral Scanner. It has 6 channels: 8.2 - 8.6; 8.6 - 9.0; 9.4 - 10.2;
10.2 - 11.2; 11.2 - 12.2 m.
MEIS-II
Canada Centre for Remote Sensing developed the Multispectral Electro optical Imaging
Scanner (MEIS-II). It uses 1728 - element linear CCD arrays that acquire data in eight
spectral bands ranging from 0.39 to 1.1 mm. The spatial resolution of MEIS-II can reach
up to0.3 m.
Advantages of multispectral systems over photographic systems are
where Nc is the total number of 3-band combinations and nb is the number of spectral
bands in a multispectral image. For each of these 3-band combinations, we can use red,
green, and blue to represent each band and to obtain a false-colour image.
Imaging Spectrometry
Imaging spectrometry refers to the acquisition of images in many, very narrow, continuous
spectral bands.
The spectral region can range from visible, near-IR to mid-IR.
The first imaging spectrometer was developed in 1983 by JPL. The system called
Airborne Imaging Spectrometer (AIS) collects data in 128 channels from 1.2 m to
2.4 mm. Each image acquired has only 32 pixels in a line.
The Airborne Visible-Infrared Imaging Spectrometer (AVIRIS) represents an
immediate follow-up of the AIS (1987). It collects 224 bands from 0.40 - 2.45 mm
with 512 pixels in each line.
In Canada, the first system was the FLI - Flourescence Linear Imager manufactured
by Moniteq, a company that used to be located in Toronto, Ontario.
In Calgary, the ITRES Research is producing another imaging spectrometer called the
Compact Airborne Spectroscopy Imager (CASI) (Figure 3.21).
For each line of ground targets, there will be nb x ns data collected at 2 bytes (16 bits)
radiometric resolution where nb is the number of spectral bands and ns is the number of
pixels in a line.
Due to constraint of data transmission rate, these nb x ns data cannot be transferred
completely. This leads to a division into two operation modes of CASI, spectral mode and
spatial mode.
In spectral mode, all 288 spectral bands are used, but only up to 39 spatial pixels (look
directions) can be transferred.
In the spatial mode, all 512 spatial pixels are used, but only up to 16 spectral bands can be
selected.
None imaging
}Ground Based
Imaging radar
}Air Based
SLAR systems produce continuous strips of imagery depicting very large ground areas
located adjacent to the aircraft flight line. Since cloud system is transparent to microwave
region, a SLAR has been used to map tropical areas such as SLAR Amazon River Basin.
Started in 1971 and ended in 1976, the project RADAM (Radar of the Amazon) was the
largest radar mapping project ever undertaken. In this project, the Amazon area was
mapped for the first time. In such remote and cloud covered areas of the world, radar
system is a prime source of information for mineral exploration, forest and range
inventory, water supplies and transportation management and site suitability assessment.
Radar imagery is currently neither as available nor as well understood as other image
products. An increasing amount of research is being conducted on interaction mechanism
between energy and surface targets, such as forest canopy, and on the combination of radar
image with other image products.
SLAR system organization and operation are shown in Figure 3.22.
where
Figure 3.23
From Figure 3.23, it can be seen that SLAR depends on the time it takes for a transmitted
pulse being scattered back to the antenna to determine the position of a target.
In the across track direction, there is a spatial resolution which is determined by the
duration of a pulse and the depression angle (Figure 3.24). This resolution is called ground
range resolution. (rg)
The along track distinguishing ability of a SLAR system is called azimuth resolution ra:
It is obvious that in order to minimize rg, one needs to reduce . For the case of ra, the
optimal situation is determined by which is a function of wavelength and antenna
length
For a real aperture radar, the physical antenna length must be considerably longer than the
wavelength in order to achieve higher azimuth resolution. Obviously, it has a limit at which
the dimension of the antenna is not realistic to be put onboard an aircraft or a satellite.
This limiation is overcome in synthetic aperture radar (SAR) systems. Such systems use
a short physical antenna, but through modified data recording and processing techniques,
they synthesize the effect of a very long antenna. This is achieved by making the use of
Doppler effect (Figure 3.26).
If the same antenna is used for both transmitting and receiving, then
All parameters in this formula except "d" is determined by the system. Only is parameter
related to the ground target. Unfortunately, is a poorly understood parameter which
largely limits its use in remote sensing.
We know is related not only to system variables including wavelength, polarization,
azimuth, landscape orientation, and depression angle, but also to landscape parameters
including surface roughness, soil moisture, vegetation cover, and micro topography.
moisture influences the dielectric constant of the target which in turn could significantly
change the backscattering pattern of the signal. Moisture also stops the microwave
penetrating capability.
Roughness - the standard deviation S(h) of the heights of individual facets.
In the field, we use an array of sticks arranged paralell to each other with a constant
distance interval to measure the surface roughness.
A common definition of a rough surface is one whose S(h) exceeds one eighth of the
wavelength divided by the cosine of the incidence angle
As we illustrated in the spectral reflectance section, a smooth surface will tend to reflect all
the energy input at an angle equal to the incidence angle, while a rough surface tends to
scatter the incoming energy more or less at all direction.
Polarization
Microwave energy can be transmitted and received by the antenna at a selected orientation
of the electromagnetic field. The orientation or polarization of the EM field is labelled as
Horizontal (H) and Vertical (V) direction. The antenna can transmit using either
polarization. This EM energy makes it posible for a radar system to operate in any of the
four models transmit H and recieve H, transmit H receive V, transmit V recieve H, and
transmit V receive V. By operating at different modes, the polarizing characteristics of
ground target can be obtained.
Corner reflector
It tends to collect
reflected signal at its foreground and returns the signal to the antenna.
Microwave Bands
Band
Wavelength 1
Ka
0.75 - 1.1 cm
40 - 26.5 GHz
1.1 - 1.67 cm
26.5 - 18 GHz
Ku
1.67 - 2.4cm
18 - 12.5 GHz
2.4 - 3.75cm
12.5 - 8GHz
3.75 - 7.5cm
8 - 4GHz
7.5 - 15cm
4 - 2GHz
15 - 30cm
2 - 1GHz
30 - 100cm
1 - 300 MHz
Geometric Aspects
Radar uses two types of image recording systems, a slant-range image recording system
and a ground-range image recording system.
In slant-range recording system, the spacing of targets is proportional to the time interval
between returning signals from adjacent targets.
If the terrain is flat, we can convert the slant-range spacing SR to Ground range GR
Relief distortion
Space-borne radars
Frequency L band
Swath width 100 km centered at 20 from nadir
Polarization HH
Ground Resolution 25 m x 25 m
The Euraopean Space Agency has lauched a satellite in 1991: ERS - 1, with a C
band SAR sensor.
In 1992, the Japanese JERS -1 satellite was launched with a L band radar mounted.
The L band radar has a higher penetration capability than the C band SAR.
Radarsat
Scheduled to be launched in mid 1995, Radarsat will contain a SAR system which is
very flexible in terms of configurations of incidence angle, resolution, number of
looks and swath width.
Radarsat
Frequency
Altitude
792 Km
Repeat Cycle
16 days
Radarsat Subcycle
3 day
Period
Equatorial crossing
6:00 A.M.
Platform
Satellite Orbits
Chapter 3
References
Ahmed, S. and H.R. Warren, 1989. The Radarsat System. IGARSS'89/12th Canadian Symposium on Remote
Sensing. Vol. 1. pp.213-217.
Anger, C.D., S. K. Babey, and R. J. Adamson, 1990, A New Approach to Imaging Spectroscopy, SPIE
Proceedings, Imaging Spectroscopy of the Terrestrial Environment, 1298: 72 - 86. - specifically, CASI
Curlander, J.C., and McDonough R. N., 1991. Synthetic Aperture Radar, Systems & Signal Processing. John Wiley
and Sons: New York.
Elachi, C., 1987. Introduction to the Physics and Techniques of Remote Sensing. John Wiley and Sons, New York.
King, D., 1992. Development and application of an airborne multispectral digital frame camera sensor. XVIIth
Congress of ISPRS, International Archives of Photogrammetry and Remote Sensing. B1:190-192.
Lenz, R. and D. Fritsch, 1990. Accuracy of videometry with CCD sensors. ISPRS Journal of Photogrammetry and
Remote Sensing, 90-110.
Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd. Ed., John Wiley and Sons,
Inc.: Toronto.
Luscombe, A.P., 1989. The Radarsat Synthetic Aperture Radar System. IGARSS'89/12th Canadian Symposium
on Remote Sensing. Vol. 1. pp.218-221.
Staenz, K., 1992. A decade of imaging spectrometry in Canada. Canadian Journal of Remote Sensing. 18(4):187197.
Multispectral images construct a special feature space, a multispectral space Sk. In S, each
unit becomes a grey-level vector g = (g1, g2, ..., gk)T. In multispectral images, each pixel
has a grey-level vector. There are other types of images which add additional dimensions
to the feature space. In the feature space, various operations can be performed. One of
these operations is to classify feature space into groups with similar grey-level vectors, and
give each group a same label that has a specific meaning. The classification decision made
for each image pixel is in feature space and the classification result is represented in image
space. Such an image is a thematic image which could also be used as an additional
dimension in feature space for further analysis.
4.1.1 Pixel Window
A pixel window is defined in image space as a group of neighbouring pixels. For the
computation simplicity, a square pixel neighbourhood wl(i,j) centered at pixel I(i,j) with a
window lateral length of l is preferred. Without further explanation, we refer to a pixel
window as wl(i,j). In order to ensure that I(i,j) is located at the centre of the pixel window,
it is necessary for l to be an odd number. It is obvious that the size of a pixel window wl(i,j)
is l X l. The following conditions hold for a pixel window:
This means that the minimum pixel window is the centre pixel itself, and the maximum
pixel window could be the entire image space, provided that the image space is a square
with an odd number of rows and columns. When the image space has more than one image,
a pixel window can be used to refer to a window located in any one image or any
combinations of those images.
4.1.2 Image Histogram
A histogram sometimes has two means: a table of occurrence frequencies of all vectors in
feature space or a graph plotting these frequencies against all the grey-level vectors. The
occurrence frequency in the histogram is the number of pixels in the image segment having
the same vector. When the entire image space is used as the image segment, the histogram
is referred to as h(I). When a histogram is generated from a specific pixel window, it is
identified as hl(i,j) where l, i, and j are the same as above. In practice, one-dimensional
feature space is mainly used. In this case, a histogram is a graphical representation of a
table with each grey level as an entry of the table. Corresponding to each grey level is its
occurrence frequency f(vi) , i = 0, 1, 2, ..., Nv-1 and Nv are the numbers of grey levels of an
image (e.g.,Nv = 8 in Figure 4.2).
From a histogram h(I) we can derive the cumulative histogram hc(I)={fc(vi) , i = 0, 1, 2, ...,
Nv-1}. This is obtained for each grey level by summing up all frequencies whose grey
levels are not higher than the particular grey level under consideration (Figure 4.3).
number of grey levels that is produced. The finer these resolutions are, the closer is the
information recorded in the image to the real world, and the larger are the sizes of the
image space and the grey-level vector space. The size (or alternatively the number of
pixels) of image space, S(I), has an exponential relation with the spatial resolution, and so
does the size (or the number of vectors) of the feature space, S(V), with the radiometric
resolution. Their relations take the following forms:
where k, as defined above, is the number of images in the image space. While S(I) has a
fixed exponential order of 2 with rs, S(V) depends not only on rr, but also on k. The
number of vectors in Vk becomes extremely large when k grows while rr is unchanged. For
example, each band of a Landsat TM or SPOT HRV image is quantized into 8 bits (i.e., an
image has 256 possible grey levels). Thus, when k = 1, S(V) = 256 and when k = 3, S(V) =
16,777,216. If a histogram is built in such a three-dimensional multispectral space, it would
require at least 64 Megabytes of random access memory (RAM) or disk storage to process
it. Therefore, the feature space has to be somehow reduced for certain analyses.
4.1.4 Image Formats
A single image can be represented as a 2-dimensional array. A multispectral image can be
represented in a 3-dimensional array (Figure 4.4)
The most popular ones include Band Sequential (BSQ), Pixel Interleaved, Line Interleaved
(BIL) or separate files. These format can be illustrated using the following example of a
three-band multispectral image.
AAA BBB CCC
AAA BBB CCC
Band 1 Band 2 Band 3
BIL is typically used by the Landsat Ground Station Operators' Working Group
(LGSOWG)
AAA BBB CCC, AAA BBB CCC
Band Sequential BSQ takes the following form:
AAA AAA BBB BBB CCC CCC
Pixel Interleaved format is used by PCI. It takes the form of:
ABC ABC ABC ABC ABC ABC
These are the general formats that are being used. BIL is suitable for data transfer from the
sensor to the ground. It does not need a huge buffer for data storage on the satellite if the
ground station is within the transmission coverage of the satellite.
Pixel interleaved is suitable for pixel-based operation or multispectral analysis.
Band sequential and separate file formats are the proper forms to use when we are more
interested in single-band image processing, such as image matching, correlation, geometric
correction, and when we are more concerned with spatial information processing and
extraction. For example, we use these files when linear features or image texture are of our
concern.
4.2 Factors Affecting Image Geometry
In remote sensing there are three major forms of imaging geometry as shown in Figure 4.5:
The first one is central perspective. It is the simplist because the entire image frame is
defined by the same set of geometrical parameters. In the second imaging geometry, each
pixel has its own central perspective. This is the most complicated because each pixel has
to be corrected separately if there exists geometrical distortion. The third one shows that
each line of an image has a central perspective.
The platform status which can be represented by six parameters all affect the image
geometry.
(X, Y, Z, , , )
airborne platform
earth rotation - affects satellite
continental drift
Most remote sensing satellites for earth resources studies, such as the Landsat series and
the SPOT, use Sun synchronous polar orbit around the earth (Figure 4.6) so that they
overpass the same area on the earth at approximately the same local time. Most of the
earth's surface can be covered by these satellites.
Figure 4.6. Sun synchronous polar orbit for Earth resources satellites
The effects of roll, pitch and yaw along the direction of satellite orbit or the airplane flight
track can be illustrated by using Figure 4.7.
Figure 4.7. The effects of roll, pitch and yaw on image geometry
Figure 4.8. Georeferencing is a transformation between the image space to the geographical coordinate
space
In order to achieve:
Every step involved in the imaging process has to be known, i.e., we need to know the
inverse process of geometric transformation.
This is a complex and time consuming process. However, there is a simpler and widelyused alternative: polynomial approximation.
Coefficients a's and b's are determined by using Ground Control Points (GCPs).
For example, we can use very low order polynomials such as the affine transformation
u = ax + by + c
v = dx + ey + f
A minimum of 3 GCPs will enable us to determine the coefficients in the above equations.
In this way, we don't need to use the transformation matrix T. However, in order to make
our coefficients representative of the whole image that is transformed, we have to make
sure that our GCPs are well distributed all over the image.
The third choice is that we can combine the T-1 method with the polynomial technique in
order to reduce the transformation errors involved in the direct transformation of T-1
(Figure 4.9).
We can use GCPs to refine the coefficients. Global Positioning System (GPS) and/or
Inertial Navigation Systems (INS) techniques can also be used. The integration of GPS and
INS with remote sensing sensors are being investigated (Schwarz, et al, 1993).
(2) Divide output grid into blocks (Figure 4.10):
Figure 4.11. Further transformation from u-v space to Dx-Dy space using lower order
polynomials
The choices are:
(i) Affine
(ii) Bilinear
Why (ii) is called bilinear? This is because each coordinate can be a multiplication of two
linear function of x and y.
u = (a + bx) (c + dy)
linear linear
Bilinear
Since there are four known and four unknown, therefore we can solve (i) using least
squares (ii) using regular solution of an equation group. We will only show how to obtain
ao, a1, a2, a3 in (ii).
For point
Similarly, we can obtain bo, b1, b2, b3.
Why do we use bilinear instead of affine? It is because the bilinear transformation
guarantees the continuity from block to block in the detailed mapping. The geometric
interpolation of bilinear transformation is illustrated in Figure 4.12.
by substituting the n GCPs coordinates into (1) and (2) we will obtain two groups of overdetermined equations
by multiplying
The results will appear as in Figure 4.13. Pixel position (1, 1) may be transformed to
(4850.672, 625.341).
For a pixel location in x-y space, the corresponding coordinates * in u-v space are found
through T-1. To determine the grey level at the * location in u-v space, interpolation
strategies are used. These include:
Nearest neighbour interpolation
Bilinear (linear in one dimension)
cubic - special case of spline
There are some other interpolation methods, such as use of sinc function, spline function,
etc. The most commonly used methods in remote sensing are, however, the three listed
above.
Nearest neighbour interpolation simply assigns the value to a pixel that is closest to * as
shown below:
where
Since most weight functions are limited to a local neighbourhood, only a limited number of
i's need to be used.
For instances, in the nearest neighbour (NN) interpolation, i takes the value which is
closest to u. In linear interpretation, l and h takes the nearest integer less than and equal to
u and the nearest integer greater than u, respectively. For cubic, l and h takes the second
nearest integer less than and equal to u while h takes second closest integer that is greater
that u.
For sinc function,
x can be infinite but we usually need to use a limited number of terms up to 20.
According to the above introduction of convolution, for the nearest neighbour case the
weight function is
Then Z(u, v) is obtained by applying convolution along the dashed line. The convolution
process for all the three interpolation cases can be shown by
For linear:
l = nearest integer equal or smaller than u
m = nearest integer larger than u
For Cubic:
l = nearest two integers equal or smaller than u
m = nearest two integers larger than u.
Chapter 4
References
Jensen, J.R., 1986. Digital Image Processing, a Remote Sensing Perspective.
Schwarz, K-P., Chapman, M.A., Canon, E.C. and Gong, P., 1993. An integrated INS/GPS approach to the
georeferencing of remotely sensed data. Photogrametric Engineering and Remote Sensing, 59(11): 1667-1673.
Shlien, S., 1979. Geometric correction, registration, and resampling of Landsat Imagery. Canadian Journal of
Remote Sensing. 5(1):74-87.
5. Radiometric Correction
In addition to distortions in image geometry, image radiometry is affected by factors, such
as system noise, sensor malfunction and atmospheric interference. The purpose of
radiometric calibration is to remove or reduce the sensor (detector) inconsistencies, sensor
malfunction, viewing geometry and atmospheric effects. We will first introduce the
calibration of detector responses.
Figure 5.1. Images acquired using detectors in linear array sensors and in scanners
The problem is that no detector functions the same way as others. If the problem becomes
serious, we will observe banding or striping on the image.
There are two types of approach to overcome the detector response problems: absolute
calibration and relative calibration.
5.1.1 Absolute calibration
In this mode, we attempt to establish a relationship between the image grey level and the
actual incoming reflectance or the radiation. A reference source is needed for this mode
and this source ranges from laboratory light, to on-board light, to the actual ground
reflectance or radiation.
For CASI, each detector is calibrated by the manufacturer in the laboratory. For the
Landsat MSS, a calibration wedge with 6 different grey levels is used. For the Landsat TM,
three lamps, which have 8 brightness combinations, are used.
In any case, a linear response is assumed for each detector
vo = a vi + b
vo - observed reading
vi - known source reading
e.g. for an 8-bit image 0 < vo < 255 .
Least squares method is used to derive a and b (Figure 5.2).
Figure 5.2. Responses of the six Landsat MSS detectors. A least squares linear
fitting is applied to these detector responses.
Once each detector is calibrated, the calibrated image data (digital numbers) can be
converted into radiances or spectral reflectances. For the case of converting digital
numbers of an 8 bit image into radiances, we have
Even though data may have been absolutely calibrated, an image may still have problems
caused by sensor malfunctioning. For example, in some of the early Landsat-1, 2, 3
images, there may be lines which have been dropped out. No response for that particular
detector can be found. In other cases, there are still striping problems. This happens to both
MSS and TM images. The striping problem is most obvious when an image is acquired
over water body where the actual spectral reflectances from one part to another are similar
(Figure 5.3).
Figure 5.3. When six detectors of the Landsat MSS are seeing the same
water target, their responds should be the same.
= n
desirable mean = M
desirable
=S
For an 8-bit image, you may try to use M = 128 and S = 50 or may use the mean and
standard deviation calculated from the entire sample.
This may not always work. The assumption behind this strategy is that detector responses
are linear.
The assumption for balancing histogram is that each detector has the same probability of
seeing the scene and, therefore, the grey-level distribution function should be the same.
Thus if two detectors have different histograms (a discrete version of grey-level
distribution function), they should be corrected to have the same histogram.
This is usually done by comparing their cumulative histograms as shown in Figure 5.4.
This process is done for each given grey level, g2, to find its cumulative frequencies
fc2(g2) in F2. Then in F1 find the grey-level value, g1, such that its cumulative frequency
fc1(g1) = fc2(g2). Then assign g1 to g2 in the histogram to be adjusted.
5.2 Atmospheric Correction of Remotely Sensed Data
Atmospheric correction is a major issue in visible or near-infrared remote sensing because
the presence of the atmosphere always influences the radiation from the ground to the
sensor.
The radiance that reaches a sensor can be determined by
Normally Lmax, Lmin and DNrange are known from the sensor manufacturer or operator.
However, Ls is composed of contributions from the target, background and the atmosphere
(Figure 5.5):
Figure 5.5 Target, background and scattered radiation received by the sensor.
As introduced before, the atmosphere has severe effects on the visible and near-infrared
radiance. First, it modifies the spectral and spatial distribution of the radiation incident on
the surface. Second, radiance being reflected is attenuated. Third, atmospheric scattered
radiance, called path radiance, is added to the transmitted radiance.
Assuming that Ls is the radiance received by a sensor, it can be divided into LT and LP
LS = LT + LP (1)
LT is the transmitted radiance.
LP is atmospheric path radiance.
Obviously, our interest is to determine LT.
For a given spectral interval, the solar irradiance reaching the earth's surface is
EG =
where is the target reflectance, Te is the transmittance along the viewing direction.
Therefore in order to quantitatively analyze remotely sensed data, i.e. to find ,
atmospheric transmittance T and path radiance Lp have to be known.
5.2.1 Single scattering atmospheric correction
Path radiance Lp
Lp is determined by at least two parameters: single scattering albedo and single scattering
phase function.
Single scattering albedo = 1 when no attenuation occurs. Single scattering phase function
denotes the fraction of radiation which is scattered from its initial forward direction to
some other direction.
For Rayleigh atmosphere
From the above diagram, it can be seen that forward scattering is dominated by aerosols
while back scattering is mainly due to Rayleigh scattering.
A number of path radiance determination algorithms exists. For a nadir view as Landsat
MSS, TM and SPOT HRV are usually used. Lp for these algorithms can be determined by:
For aerosol scattering, the phase function Pp(i) does not change much as wavelength
changes, the function for = 0.7 mm can be used for all wavelengths. This function is
usually found in a diagram or a table form. See a function found in Forster (1984).
In this section, we only tried to introduce some basic concepts of this complex topic. This
is only a single-scattering correction algorithm for nadir viewing condition. More
sophisticated algorithms which counts multiple-scattering do exist. Some examples of
these algorithms are LOWTRAN 7, 5S (Simulation of the Satellite Signal in the Solar
Spectrum 5S) and 6S (Second Simulation - aircraft, altitude of target). There are
FORTRAN codes available for these algorithms. The 5S and 6S are proposed by Tanre and
his colleagues (e.g. Tanre et al., 1990, IGARSS 190, p. 187).
One has to be careful when conducting atmospheric correction since there are many factors
to be counted and to be estimated. If these estimations are not properly made, the
atmospheric correction might add more bias than does the atmosphere itself.
5.2.2 Dark-target atmospheric correction
This is most suitable to the clear sky when Rayleigh atmosphere dominates since Rayleigh
scattering affects short wavelength, particularly visible, and we know that clear-deep water
has a very low spectral reflectance in the short wavelength region. If a relatively large
water body, say 1-2 km in diameter, can be found on an image, we can use the radiance of
water derived from the image as Lw and the real water radiance, L, to estimate Lp.
Lw = K DN water + Lmin
Lp = Lw - L
Lp can then be subtracted from other radiances in an image for the visible channels.
For the infrared channels, Rayleigh atmosphere has little effects and Lp is assumed to be 0.
It can be seen that this method only applies to Rayleigh atmosphere.
5.2.3 Direct digital number to reflectance transformation
This can be done by
R = a DN + b
By tying the ground reflectance measured during the flight overpass to the corresponding
pixel values on the image, we can solve the equation to obtain a and b. This is an empirical
method. In fact, both the dark-target and direct digital number conversion methods have
been most widely used in remote sensing.
5.3 Topographic Correction
In previous sections, we attempted to correct the atmospheric effects, i.e. convert image
digital numbers DNs to image radiance Ls. After atmospheric correction, we expect to
have the spectral reflectivity .
Assume that atmospheric effects can be completely removed from the image, the spectral
reflectivity obtained contains the real target reflectance r and the topographic modification
during image acquisition, G.
=rG
The G contains information about the viewing and energy incidence geometric
relationship.
Moon can be considered approximately as a surface that reflects equal amount of light in
all directions.
5.3.1 The role of relief
What effects does the relief have on the image radiometry? To answer this question, a
different coordinate system will be used and Figure 5.6 shows this image coordinate
system. In this coordinate system, z is the viewing direction and x-y plane is the image
plane.
The actual relief for a small area is defined by its normal and the light source defined by
In discrete case these are the differences between elevations between neighbourhood cells
and the grid cell under consideration.
5.3.2 Gradient Space
For perfectly white surface, when r = 1
If r is the same over the whole study area, we can use two set of (p, q)'s to recover (p, q).
Similarly, we can use three sets of
(p, q)s
1(p, q) 2(p, q) 3(p, q) to
recover both "" and (p, q).
Using (p, q), we can generate a shaded map based on a DEM of an area.
Instead of calculating (p, q) for each grid on a DEM, we can calculate a two dimensional
lookup table
p
q
-0.2
-0.1
0.1
0.2
-0.2
-0.1
0
0.1
0.2
The entire DEM {p, q} can be mapped using the above table.
in vector form is (-p, -q, 1)
in vector form is (-ps, -qs, 1)
The look direction is (0, 0, 1)
For sensors that look in nadir direction, the image coordinate system is only a shift from
the local Cartesian coordinates. Thus, the above formula can be used to correct satellite
(Landsat) imagery.
Chapter 5
References
Forster, B.C., 1984. Derivation of atmspheric correction procedures for Landsat MSS with particular reference
to urban data. Int. J. of Remote Sensing . 5(5):799-817.
Horn, B.K.P., 1986. Robot Vision. The MIT Press:Toronto.
Horn, B.K.P., and Woodham, R.J., 1979. Destriping Landsat MSS images by histogram modification. Computer
Graphics and Image Processing. 10:69-83.
Richards, J.A., 1986. Digital Image Processing. Springer-Verlag: Berlin.
Tanre, D., Deuze, J.L., Herman, M., Santer, R., Vermonte, E., 1990. Second simulation of the satellite signal in
the solar spectrum - 6S code. IGARSS'90, Washington D.C., p. 187.
Further Readings:
Woodham, R.J., and Gray, M.H., 1987. An analytic method for radiometric correction of satellite multispectral
scanner data. IEEE Transactions on Geosciences and Remote Sensing. 25(3):258-271.
6. Image Enhancement
A histogram of an image can tell us about the data distribution with respect to image grey
levels. The purpose of a histogram-based operation is that when a grey-level
transformation is made, pixels in the image having a specific range of grey levels can be
enhanced or suppressed. This is also called contrast adjustment. It can be done using:
1. histogram stretching
2. histogram compression (Figure 6.1)
Both histogram stretching and histogram compression can be done either linearly or
nonlinearly.
a) Linear adjustment (Figure 6.2)
DN' = a DN
Figure 6.2.
Figure 6.3.
ab
Figure 6.4. (a) original histogram of an image. (b) the histogram after adjustment.
This is realized by equally partitioning the cumulative histogram fc of the original image
into 255 pieces. Each piece will correspond to one digital number in the equalized image
(Figure 6.7). On the cumulative curve, find out the nth dividing point,
Figure 6.7. For the discrete case, modify the grey level value according to the principle of equal
frequency.
The equalization process can also be considered as a histogram matching method used in
image destriping as discussed in Section 5.1. Here we attempt to match the original
cumulative histogram Fc1 to the new cumulative histogram Fc2 (Figure 6.8).
Figure 6.8.
The following example shows how an equalization can be made in discrete digital form. It
starts with the generation of image histogram (first two columns in Table 6.1). Then
probability, Pi is calculated from frequency, f(vi) (third column). A cumulative histogram
Fc can be calculated from frequencies. Similarly, the cumulative distribution function
(CDF) can be derived from probabilities. Based on the cumulative distribution function we
can convert the original grey-levels into grey-levels of the equalized image (Table 2).
Table 6.1 Histogram, cumulative histogram and cumulative distribution function (CDF)
Grey Level
(DN)
Frequency
f(vi)
Probability
Pi
Cumulative
histogram Fc
CDF
0.04
0.04
17
0.17
21
0.21
15
0.15
36
0.36
18
0.18
54
0.54
24
0.24
78
0.78
12
0.12
90
0.90
90
0.90
10
0.10
100
1.00
100
1.00
Input Level
(23 - 1) * CDF
Output
0
1
2
3
4
5
6
7
0.28
1.47
2.52
3.78
4.46
6.3
6.3
7
0
1
3
4
4
6
6
7
Density Slicing is to represent a group of contiguous digital numbers using a single value.
Although some details of the image will be lost, the effect of noise can also be reduced by
using density slicing. As a result of density slicing, an image may be segmented, or
sometimes contored, into sections of similar grey level. Each of these segments is
represented by a user specified brightness.
Similarly, we can represent a section of grey levels using different colors, or
pseudocoloring. This has been used in coloring classification maps in most image analysis
software systems. For example, five classes can be represented by red, green, blue, yellow,
and grey. This can be realized by assigning red, green, and blue color guns with the
following values:
Class No
Red Gun
255
255
100
Green Gun
Blue Gun
Color
We can also use 5x5, 7x7, etc. This filter is also called a box-car filter.
2. Averaging with different weights
The last filter can be used to remove drop-out lines in Landsat images. This is done by
applying a filter only along the drop-out lines in those images.
3. Median filter
This filter is more useful in removing outliers, random noise, and speckles on RADAR
imagery, than a simple average filter. It has a desirable effect of keeping edges to some
extent. This filter can also be applied to drop-out line removal in some Landsat images.
By moving (i, j) all over an image, the original image, I, can be filtered and the new image,
I', can be created.
For 2,
is encountered, the "1" in the center will be replaced by a "0". Otherwise, the center pixel
value is not changed.
where
Additive operators
The center pixel of a 3 x 3 pixel window are converted by these operators from zero state
to one state when a hit is obtained. The basic operators include
Interior Fill - create one if all four-corrected neighbour pixels are one
Diagonal Fill - create one if this process will eliminate eight-connecting of the
background.
where
where
and
There are 119 patterns which satisfy the above condition. For example,
Eight-Neighbour Dilate create one if at least one eight-connected neighbour pixel is one.
where
Interior Pixel Removal - Erase one if all 4-connected neighbours are ones
Generalized Dilation
It is expressed as
where I(i,j) for 1 < i, j < N is a binary-valued image and H(m,n) for 1 < m,n < a, a is an odd
integer called a structuring element. Minkowski addition is defined as
According to the rules defined above, you can observe what it looks like.
Some properties of Dilation and Erosion
I(i,j) = I
Dilation is commutative
I J=J I
But, in general, erosion is not commutative
I J J I
Dilation and erosion are opposite in effect; dilation of a background of an object behaves
like erosion of the object
The following chain rules hold for dilation and erosion
A
(B
C) = (A
B)
(B
C) = (A B)
C
C
(m,n)]
H(m,n)
The open operation, also called openning, breaks thin connections and clears isolated
pixels with binary values of 1.
6.4.4 Grey scale image morphological filtering
Applying mathematical morphology to grey scale images is equivalent to finding the
maximum or the minimum of a neighborhood defined by the structuring element. If a 3X3
neighborhood is taken as a structuring element, then dilation is defined as
I'(i,j) = max (I,I0,I1,I2,I3,I4,I5,I6,I7)
and erosion is defined as
I'(i,j) = min (I,I0,I1,I2,I3,I4,I5,I6,I7).
Similarly, closing refers to a dilation followed by an erosion while openning means erosion
followed by dilation. The effect of closing on grey scale images is that small objects
brighter than background are preserved and bright objects with small gaps in between may
become connected. Openning, on the other hand, removes bright objects that are small in
size and breaks narrow connections between two bright objects.
6.5 Image Enhancement in Multispectral Space - Multispectral Transformation
The multispectral or vector nature of most remote sensing data makes it possible for
spectral transformations to generate new sets of image components or bands. The
transformed image may make evident features not discernable in the original data or
alternatively, it might possibly preserve the essential information content of the image with
a reduced number of the transformed dimensions. The last point has significance for the
display of a data in three dimensions on a colour monitor or in colour hardcopy, and for
transmission and storage of data.
6.5.1 Image arithmetic, band ratios and vegetation indices
Addition, subtraction, multiplication, and division of the pixel brightnesses from two bands
of image data form a new image. Multiplication is not as useful as others.
We can plot the pixel values in a two-dimensional space (Figure 6.10.) This twodimensional diagram is called a scatter plot.
Vegetation Indices
This is calculated from the raw remote sensing data. We can also calculate the NDVI using
the processed remote sensing data (after converting digital numbers to spectral
reflectances)
To suppress the effect of different soil backgrounds on the NDVI, Huete (1989)
recommended to use a soil-adjusted vegetation index:
For simplicity purpose, two-dimensional data will be used as examples to illustrate the
procedure of principal component transformation. Without loss of generality, the procedure
can be applied to data in multispectral space of any dimension.
The Covariance Matrix and Correlation Matrix
Two examples will be used to illustrate the usefulness of covariance matrix.
Example 1
Pixel
B1
B2
Xi - M
X1
-2, -0.33
X2
-1, -1.33
X3
1, -1.33
X4
2, -0.33
X5
1, 1.67
X6
-1, 1.67
2.33
Pixel
B1
B2
Xi - M
X1
-1.5, 1.5
X2
0.5, -0.5
X3
1.5, 0.5
X4
1.5, 1.5
X5
-0.5, 0.5
X6
-1.5, 0.5
3.5
3.5
V is an nb x nb symmetric matrix.
The mean vectors and (Xi - M) are as listed in the two example tables.
The covariance matrix for example one is
What are the differences between V1 and V2? We can answer this question by further
examining their corresponding correlation matrices R1 and R2.
From R1, we can see that the correlation between Band 1 and 2 is 0. This means that Band
1 and Band 2 contain independent information about our target. We cannot use B1 to
replace B2.
For R2, the correlation between Band 1 and Band 2 is 0.761, which is quite high. Using
either channel, we can obtain, to a large extent, information about the other channel.
Once the transformation is done, the covariance matrix is in the new coordinating system
Now the results can be interpreted using the data in example 2 (Figure 6.12).
Figure 6.12. The new axes derived from the PCT in the original coordinate system.
B'1 and B'2 are the new axes. In this coordinating system, data variance along B'1 is 2.67
while variance on B'2 is only 0.33. This means that in the rotated space, the data variance
along one axis is the same as its corresponding eigenvalue.
From, 2.67 + 0.33 = 1.90 + 1.10 = 3.00, we can see that the rotation will not affect the total
variance of the original data. Using 1.90/3.00 and 1.10/3.00 we can determine the
percentage of total variances that B1 and B2 represent.
B1 represents 1.90/3.00 = 63.3% of the total variance of the original data
B2 represents 1.10/3.00 = 36.7% of the total variance of the original data
The percentages are called loading of each band.
For B'1, it represents 2.67/3.00 = 89% of the total variance while B'2 contains only 11% of
the total variance.
From the loadings of B'1 and B'2, we can see that after the rotation we can add more
loading in one band while reducing the amount of loading in another band. For
multispectral space with nb dimensions, after the principal component transformation, we
will have a few higher loadings for the first few bands and a very low loading for the rest.
We call those bands containing relatively high loadings the principal components. We can,
therefore, make the use of these principal components in our data analysis while ignoring
those relatively minor components. By so doing, we will not lose much of the original data
variability. This serves as a purpose of reducing data dimensionality. It's application in
classification (keeping the maximum variance) and in change detection (keeping the
minimal variance) normally holds the promise.
The PCT is a linear transformation technique which helps to enhance remotely sensed
imagery. Although, principal components are often used, minor components may also be
useful in highlighting information on low data variability that the remote sensing data have.
For example, a few researchers have used the PCT to multi-temporal change detection.
They found that changes in information of a scene are preserved in minor components.
6.5.3 Tasselled Cap Transform (K-T transform)
Different from the PCT which is based on the data covariance matrix, Kauth and Thomas
(1976) have developed a linear transformation which is physically-based on crop growth.
Figure 6.13. A 3-D data scatterplot of the multispectral space constructed by the
green, red and near-infrared bands (Which looks like a tasselled cap.)
The growing cycle of crop started from bare soil, then to green vegetation and then to crop
maturation with crops turning yellow. These different stages of vegetation growth has
made the data distribution in the three dimensional multispectral space (Figure 6.13)
appear in a shape of a tasselled cap.
Kauth and Thomas defined a linear transformation to enhance the data according to the
data structure. They have defined four components called, redness (soil), greenness
(vegetation), yellowness and noise, using the following transformation matrix for Landsat
MSS data
Later, Crist, Cicone and Kauth developed a new transformation technique for Landsat TM
data. (Crist and Kauth, 1986; Crist and Cicone, 1984)
Their new redness or brightness and greenness are defined as:
Redness = 0.3037 TM1 + 0.2793 TM2 + 0.4743 TM3
+ 0.5586 TM4 + 0.5082 TM5 + 0.1863 TM7
Greenness = -0.2848 TM1 - 0.2435 TM2 - 0.5436 TM3
+ 0.7243 TM4 + 0.0840 TM5 - 0.1800 TM7 .
Chapter 6
References
Crist, E.P. and Cicone, R.C., 1984. A physically-based transformation of the Thematic Mapper data - the
Tessled Cap. IEEE Transactions on Geoscience and Remote Sensing. GE-23:256-263
Crist,E.P., and KauthR.J., 1986. The Tessled Cap De-Mystified. Photogrammetric Engineering and Remote
Sensing. 52(1):81-86.
Huete, A.R., 1989. Soil influences in remotely sensed vegetation canopy spectra. In Theory and Applications
of Optical Remote Sensing. Ed. by G. Asrar, John Wiley and Sons: New York.
Kauth, R.J., Thomas, G.S. 1976. The tessled cap - a graphic description of the spectral-temporal development
of agricultural crops as seen by Landsat. Proceedings of the symposium on Machine Processing of Remotely Sensed
Data. Purdue University, West Lafayette, Indiana, pp. 4B41-51.
Pratt, W., 1991. Digital Image Processing. John Wiley and Sons: Toronto.
Richards, J.A., 1987. Digital Image Processing. Springer-Verlag, Berlin.
7. Information Extraction
7.1 Image Interpretation
To derive useful spatial information from images is the task of image interpretation. It
includes
detection: such as search for hot spots in mechanical and electrical facilities and white
spot in x-ray images. This procedure is often used as the first step of image interpretation.
identification: recognition of certain target. A simple example is to identify vegetation
types, soil types, rock types and water bodies. The higher the spatial/spectral resolution of
an image, the more detail we can derive from the image.
delineation: to outline the recognized target for mapping purposes. Identification and
delineation combined together are used to map certain subjects. If the whole image is to be
processed by these two procedures, we call it image classification.
enumeration: to count certain phenomena from the image. This is done based on
detection and identification. For example, in order to estimate household income of the
population, we can count the number of various residential units.
mensuration: to measure the area, the volume, the amount,and the length of certain target
from an image. This often involves all the procedures mentioned above. Simple examples
include measuring the length of a river and the acreage of a specific land-cover class. More
complicated examples include an estimation of timber volume, river discharge, crop
productivity, river basin radiation and evapotranspiration.
In order to do a good job in the image interpretation, and in later digital image analysis,
one has to be familiar with the subject under investigation, the study area and the remote
sensing system available to him. Usually, a combined team consisting of the subject
specialists and the remote sensing image analysis specialists is required for a relatively
large image interpretation task.
Depending on the facilities that an image interpreter has, he might interpret images in raw
form, corrected form or enhanced form. Correction and enhancement are usually done
digitally.
Elements on which image interpretation are based
Image tone, grey level, or multispectral grey-level vector
Human eyes can differentiate over 1000 colors but only about 16 grey levels. Therefore,
colour images are preferred in image interpretation. One difficulty involved is use of
multispectral image with a dimensionality of over 3. In order to make use of all the
information available in each band of image, one has to somehow reduce the image
dimensionality.
Image texture
Spatial variation of image tones. Texture is used as an important clue in image
interpretation. It is very easy for human interpreters to include it in their mental process.
Most texture patterns appear irregular on an image.
Pattern
Regular arrangement of ground objects. Examples are residential area on an aerial
photograph and mountains in regular arrangement on a satellite imagery.
Association
A specific object co-occurring with another object. Some examples of association are an
outdoor swimming pool associated with a recreation center and a playground associated
with a school.
Shadow
Object shadow is very useful when the phenomena under study have vertical variation.
Examples include trees, high buildings, mountains, etc.
Shape
Agricultural fields and human-built structures have regular shapes. These can be used to
identify various target.
Size
Relative size of buildings can tell us about the type of land uses while relative sizes of tree
crowns can tell us about the approximate age of trees.
Site
Broad leaf trees are distributed at lower and warmer valleys while coniferous trees tend to
be distributed on a higher elevation, such as tundra. Location is used in image
interpretation.
Multilevel thresholding
I(i,j) =
2
I1
I2
I7
I3
I6
I5
I4
(1) Suppose I as a seed (starting point) is of Label K, then Ii will also belong to K, if |Ii - I|
< .
where
(2) If the second point is not found in the local neighbourhood, then remove the label K
from the seed point I.
(3) If the second point is found, then operate (1) with the second point using m1. If a third
point I is found, a new m2 will be generated based on m1 an Ij.
(4) Gradually growing a local area by using the criterion in (1). If an nth point is found, the
mn-1 is adjusted to the group mean
(5) Repeat (1) to (4) with different seeds and s. Thresholding is faster, however, it is not
adaptive to local properties. e.g. if a neighbourhood is as following
5
while with the region-growing technique, if the seed I = 2 and = 1, 2 will not be assigned
to a segment label because no neighbourhood pixel will meet the criterion in (4).
Image segmentation can also be done using clustering algorithms. Segmentation is usually
used as the first step in image analysis. Once an image is properly segmented, the
following operation can be performed: classification, morphological operation, and image
understanding through knowledge-based or more advanced computation.
7.3 Conventional Multispectral Classification Methods
7.3.1 General procedures in image classification
Classification is the most popularly used information extraction techniques in digital
remote sensing. In image space I, a classification unit is defined as the image segment on
which a classification decision is based. A classification unit could be a pixel, a group of
neighbouring pixels or the whole image. Conventional multispectral classification
techniques perform class assignments based only on the spectral signatures of a
classification unit. Contextual classification refers to the use of spatial, temporal, and
other related information, in addition to the spectral information of a classification unit in
the classification of an image. Usually, it is the pixel that is used as the classification unit.
General image classification procedures include (Gong and Howarth 1990b):
(1) Design image classification scheme: they are usually information classes such as urban,
agriculture, forest areas, etc. Conduct field studies and collect ground infomation and other
ancillary data of the study area.
(2) Preprocessing of the image, including radiometric, atmospheric, geometric and
topographic corrections, image enhancement, and initial image clustering.
(3) Select representative areas on the image and analyze the initial clustering results or
generate training signatures.
(4) Image classification
Supervised mode: using training signature
unsupervised mode: image clustering and cluster grouping
(5) Post-processing: complete geometric correction & filtering and classification
decorating.
(6) Accuracy assessment: compare classification results with field studies.
The following diagram shows the major steps in two types of image classification:
Supervised:
Unsupervised
In an ideal information extraction task, we can directly associate a spectral class in the
multispectral space with an information class. For example, we have in a two dimensional
space three classes: water, vegetation, and concrete surface.
By defining boundaries among the three groups of grey-level vectors in the twodimensional space, we can separate the three classes.
One of the differences between a supervised classification and an unsupervised one is the
ways of associating each spectral class to an information class. For supervised
classification, we first start with specifying an information class on the image. An
algorithm is then used to summarize multispectral information from the specified areas on
the image to form class signatures. This process is called supervised training. For the
unsupervised case,however, an algorithm is first applied to the image and some spectral
classes (also called clusters) are formed. The image analyst then try to assign a spectral
class to the desirable information class.
From the above diagram, there are two obvious ways of classifying this pixel.
Fig. 1 shows spectral curves of two types of ground target: vegetation and soil. If we
sample the spectral reflectance values for the two types of targets (bold-curves) at three
spectral bands: green, red and near-infrared as shown in Fig. 1, we can plot the sampled
values in the three dimensional multispectral space (Fig. 2). The sampled spectral values
become two points in the multispectral space. Similar curves in Fig. 1 will be represented
by closer points in Fig. 2 (two dashed curves in Fig. 1 shown as empty dots in Fig. 2. From
Fig. 2, we can easily see that distance can be used as a similarity measure for classification.
The closer the two points, the more likely they are in the same class.
We can use various types of distance as similarity measures to develop a classifier, i.e.
minimum-distance classifier.
In a minimum-distance classifier, suppose we have nc known class centers
C = {C1, C2, ..., Cnc}, Ci, i = 1, 2, ..., nc is the grey-level vector for class i.
As an example, we show a special case in Fig. 3 where we have 3 classes (nc = 3) and two
spectral bands (nb = 2)
If we have a pixel with a grey-level vector located in the B1-B2 space shown as A (an
empty dot), we are asked to determine to which class it should belong. We can calculate
the distances between A and each of the centers. A is assigned to the class whose center
has the shortest distance to A.
In a general form, an arbitrary pixel with a grey-level vector g = (g1, g2, ..., gnb)T,
is classified as Ci if
d(Ci, g) = min (d(Ci1,g1), d(Ci2,g2), ..., d(Cinb,gnb))
Now, in what form should the distance d take? The most-popularly used form is the
Euclidian distance
For dm and de, because taking their squares will not change the relative magnitude among
distances, in the minimum distance classifiers, we usually use
as the distance
measures so as to save some computations.
Class centers C and the data covariance matrix V are usually determined from training
samples if a supervised classification procedure is used. They can also be obtained from
clustering.
For example, there are ns pixels selected as training sample for class Ci.
where j = 1, 2, ..., nb
k = 1, 2, ..., ns
If there are a total of nt pixels selected as training samples for all the classes
i = 1, 2, ..., nb.
k = 1, 2, ..., nt.
The covariance matrix is then obtained through the following vector form
However, P(x) is not needed for the classification purpose because if we compare P(C1|x)
with P(C2|x), we can cancel P(x) from each side. Therefore, p(x|Ci) i = 1, 2, ..., nc are the
conditional probabilities which have to be determined. One solution is through statistical
modelling. This is done by assuming that the conditional probability distribution function
(PDF) is normal (also called, Gaussian distribution). If we can find the PDF for each class
and the a priori probability, the classification problem will be solved. For p(*x|ci) we use
training samples.
For one-dimensional case, we can see from the above figure that by generating training
statistics of two classes, we have their probability distributions. If we use these statistics
directly, it will be difficult because it requires a large amount of computer memory. The
Gaussian normal distribution model can be used to save the memory. The one-dimensional
Gaussian distribution is:
, i = 1, 2, ..., nc
The interpretation of the maximum likelihood classifier is illustrated in the above figure.
An x is classified according to the maximum p(x|Ci) P(Ci). x1 is classified into C1, x2 is
classified into C2. The class boundary is determined by the point of equal probability.
(2)
Often, we assume P(Ci) is the same for each class. Therefore (2) can be further simplified
to
(3)
g(x) is referred to as the discriminant function.
By comparing g(x)'s, we can assign x to the proper class.
With the maximum likelihood classifier, it is guaranteed that the error of misclassification
is minimal if p(x|Ci) is normally distributed.
Unfortunately, the normal distribution cannot always be achieved. In order to make the
best use of the MLC method, one has to make sure that his training sample will generate
distributions as close to the normal distribution as possible.
How large should one's training sample be? Usually, one needs 10 x nb, preferably 100 x
nb, pixels in each class (Swain and Davis, 1978).
MLC is relatively robust but it has the limitation when handling data at nominal or ordinal
scales. The computational cost increases considerably as the image dimensionality
increases.
7.3.3 Clustering algorithms
For images that the user has little knowledge on the number and the spectral properties of
spectral classes, clustering is a useful tool to determine inherent data structures. Clustering
in remote sensing is the process of automatic grouping of pixels with similar spectral
characteristics.
Clustering measures - measures how similar two pixels are. The similarity is based on:
Although m can be arbitrarily selected, it is suggested that they be selected evenly in the
multispectral space. For example, they can be selected along the diagonal axis going
through the origin of the multispectral space.
2. Assign each pixel x in the image to the closest cluster centre m
3. Generate a new set of cluster centers based on the processed result in 2.
Clustering Algorithm 2
This procedure is rarely used in remote sensing because a relatively large number of pixels
in the initial cluster centers requires a huge amount of disk storage in order to keep track of
cluster distances at various levels. However, this algorithm can be used when a smaller
number of clusters is obtained previously from some other methods.
Clustering Algorithm 4: Histogram-based clustering.
Histogram in high dimensional space
H(V) is the occurrence frequency of
grey-level vector V. The algorithm is to find peaks in the multi- dimensional
histogram:
(1) Construct a multi-dimensional histogram
(2) Search for peaks in the multispectral space using an eight-neighbour comparison
strategy to see if the center frequency is the highest in a 3 x 3 grey-level vector
neighbourhood. For three dimensional space, search the peak in a neighbourhood.
(3) If a local highest frequency grey-level vector is found, it is recorded as a cluster center.
(4) After all centers are found, they are examined according to the distance between each
pair of clusters. Certain clusters can be merged if they are close together. If a cluster center
has a low frequency it can be deleted.
The disadvantage of this algorithm is that it requires a large amount of memory space
(RAM). For an 8-bit image, we require 256 x 4 bytes to store frequencies (each frequency
is a 40 byte integer) if the image has only one band. As the dimensionality becomes higher,
we need x 4 bytes of memory. When NB = 3, it requires 64 MB (256nb). Nevertheless, this
limit could partly be overcome by a grey-level vector reduction algorithm (Gong and
Howarth, 1992a).
7.3.4 Accuracy assessment
Accuracy assessment of remote sensing product
The process from remote sensing data to cartographic product can be summarized as
following:
The reference that the remote sensing products are to be compared with is created based on
human generalization. Depending on the scale of the reference map product, linear features
and object boundaries are allowed to have a buffer zone. As long as the boundaries fall in
their respective buffer zones, they are considered correct.
However, this has not been the case in assessing remote sensing products. In the evaluation
of remote sensing products, we have traditionally adopted a hit-or-miss approach, i.e., by
overlaying the reference map on top of the map product obtained from remote sensing,
instead of giving the RS products tolerant buffers.
Some of the classification accuracy assessment algorithms can be found in Rosenfield and
Fitz patrick-lins (1986) and Story and Congalton (1986)
In the evaluation of classification errors, a classification error matrix is typically formed.
This matrix is sometimes called confusion matrix or contingency table. In this table,
classification is given as rows and verification (ground truth) is given as columns for each
sample point.
The above table is an example confusion matrix. The diagonal elements in this matrix
indicate numbers of sample for which the classification results agree with the reference
data.
The matrix contain the complete information on the categorical accuracy. Off diagonal
elements in each row present the numbers of sample that has been misclassified by the
classifier, i.e., the classifier is committing a label to those samples which actually belong to
other labels. The misclassification error is called commission error.
The off-diagonal elements in each column are those samples being omitted by the
classifier. Therefore, the misclassification error is also called omission error.
In order to summarize the classification results, the most commonly used accuracy measure
is the overall accuracy:
More specific measures are needed because the overall accuracy does not indicate how the
accuracy is distributed across the individual categories. The categories could, and
frequently do, exhibit drastically differing accuracies but overall accuracy method
considers these categories as having equivalent or similar accuracies.
By examining the confusion matrix, it can be seen that at least two methods can be used to
determine individual category accuracies.
(1) The ratio between the number of correctly classified and the row total
(2) The ratio between the number of correctly classified and the column total
(1) is called the user's accuracy because users are concerned about what percentage of the
classes has been correctly classified.
(2) is called the producer's accuracy.
The producer is more interested in (2) because it tells how correctly the reference samples
are classified.
However, there is a more appropriate way of presenting the individual classification
accuracies. This is through the use of commission error and omission error.
Commission error = 1 - user's accuracy
Omission error = 1 - producer's accuracy
Kappa coefficient
The Kappa coefficient (K) measures the relationship between beyond chance agreement
and expected disagreement. This measure uses all elements in the matrix and not just the
diagonal ones. The estimate of Kappa is the proportion of agreement after chance
agreement is removed from consideration:
= (po - pc)/(1 - pc)
po = proportion of units which agree, =Spii = overall accuracy
pc = proportion of units for expected chance agreement =Spi+ p+i
pij = eij/NT
pi+ = row subtotal of pij for row i
p+i = column subtotal of pij for column i
po = 0.63
One of the advantages of using this method is that we can statistically compare two
classification products. For example, two classification maps can be made using different
algorithms and we can use the same reference data to verify them. Two s can be derived,
1,
2. For each , the variance
can also be calculated.
It has been suggested that a z-score be calculated by
s are significantly
e.g. if Z> 1.96, then the difference is said to be significant at the 0.95 probability level.
Given the above procedures, we need to know how many samples are need to be collected
and where they should be placed.
Sample size
(1) The larger the sample size, more representative an estimate can be obtained, therefore,
more confidence can be achieved.
(2) In order to give each class a proper evaluation, a minimum sample size should be
applied to every class.
(3) Researchers have proposed a number of pixel sampling schemes (e.g., Jensen, 1983).
These are:
Random
Stratified Random
Systematic
Stratified Systematic Unaligned Sampling
7.4 Non-Conventional Classification Algorithms
1. By conventional classification, we refer to the algorithms which make the use of only
multi-spectral information in the classification process.
2.
3. The problem with multi-spectral classification is that no spatial information on the image
has been utilized. In fact, that is the difference between human interpretation and
computer-assisted image classification. Human interpretation always involves the use of
spatial information such as texture, shape, shade, size, site, association etc. While the
strength of computer technique lag is on the handling of the grey-level values in the image,
in terms of making use of spatial information, computer technique lag for behind.
Therefore, it is an active field in image understanding (which is a subfield of pattern
recognition, or artificial intelligence to make use of spatial patterns in an image).
We can summarize three general types of non-conventional classification:
Preprocessing approach,
Post processing approach, and
Use of contextual classifier.
Diagram 1 shows the procedures involved in a preprocessing method. The indispensable
part of a preprocessing classification method is the involvement of spatial-feature
extraction procedures.
Thanks to the development in the image understanding field, we are able to use part of
the spatial information in image classification. Overall, there are two types of approaches
to make use of spatial information.
- Region-based classification (object-based)
- Pixel window-based classification
Object-based classification
In order to classify objects, one has to somehow partition the original imagery. This can be
done with image segmentation techniques that have been introduced previously, such as
thresholding, region-growing and clustering.
The resultant segmented image can then be passed on to the region extraction procedure,
where segments are treated as a whole object for the successive processing.
For instance, we can generate a table for each object as an entity table. From the entity
table, we can proceed with various algorithms to complete classification, or prior to
classification, we may do some preprocessing, such as filtering out some small objects.
The grey-level variability within a pixel window can be measured and used in a
classification algorithm. The grey-level variability is referred to as texture (Haralick,
1979). The following is some commonly-used texture measures:
(1) Simple statistics transformation
For each pixel-window, we can calculate parameters as in Table 1 (Hsu, 1978; Gong and
Howarth, 1993).
TABLE 7.4. STATISTICAL MEASURES USED FOR SPATIAL FEATURE
EXTRACTION
Feature
Full Name
Mathematical Description
Code
AVE
Average
STD
Standard Deviation
SKW
Skewness
KRT
Kurtosis
ADA
CCN
ACN
CAN
CAS
CSN
CSS
RXN
Range
MED
Median
______________________________________________________________________
Pixel value at location
Value for the center pixel
Values for a pair of adjacent pixels
Values for a pair of every second neighbors
Number of pixels in the window
Number of pairs of adjacent neighbors
Number of pairs of every second neighbor
References
Chen, Q., and others, 1989. Remote Sensing and Image Interpretation. Higher Education Press, Beijing, China, (In
Chinese).
Gong P. and P.J. Howarth, 1990a. Land cover to land use conversion: a knowledge-based approach, Technical
Papers, Annual Conference of American Society of Photogrammetry and Remote Sensing, Denver, Colorado, Vol. 4,
pp.447-456.
_____, 1990b. An assessment of some factors influencing multispectral land-cover classification,
Photogrammetric Engineering and Remote Sensing, 56(5):597-603.
_____, 1990c. Impreciseness in land-cover classification: its determination, representation and application. The
International Geoscience and Remote Sensing Symposium, IGARSS '90, pp. 929-932.
_____, 1992a. Frequency-based contextual classification and grey-level vector reduction for land-use
identification. Photogrametric Engineering and Remote Sensing, 58(4):421-437.
_____, 1992b. Land-use classification of SPOT HRV data using a cover-frequency method. International Journal
of Remote Sensing, .
_____, 1993. An assessment of some small window-based spatial features for use in land-cover classification,
IGARSS'93, Tokyo, August 18-22, 1993.
Gonzalez, R. C., and P. Wintz, 1987. Digital Image Processing, 2nd. Ed., Addison-Wesley Publishing Company,
Reading, Mass.
Haralick, R. M., 1979. Statistical and structural approaches to texture. Proceedings of the IEEE, 67(5):786-804.
Haralick, R. M., Shanmugan, K. and Dinstein, I., 1973. Texture features for image classification. IEEE
Transactions on System, Man and Cybernetics, SMC-3(6):610-621.
Hsu, S., 1978. Texture-tone analysis for automated landuse mapping. Photogrammetric Engineering and Remote
Sensing, 44(11):1393-1404.
Jensen, J.R., 1983. Urban/Suburban Land Use Analysis. In R.N. Colwell (editor-in-chief), Manual of Remote
Sensing, Second Edition, American Society of Photogrammetry, Falls Church, USA, pp. 1571-1666.
Lillesand, T. M., and R. W. Kiefer, 1994. Remote Sensing and Image Interpretation. 3rd Edition, John Wiley and
Sons, New York.
Peddle, D., 1991. Unpublished Masters Thesis, Department of Geography, The University of Calgary.
Richards, J. A., 1986. Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag, Berlin.
Rosenfield, G. H., and K. Fitzpatrick-Lins, 1986. A coefficient of agreement as a measure of thematic
classification accuracy. Photogrammetric Engineering and Remote Sensing, 52(2):223-227.
Story, M. and R. G. Congalton, 1986. Accuracy assessment, a user's perspective. Photogrammetric Engineering
and Remote Sensing, 52(3):397-399.
Swain, P. H., and S. M. Davis (editors.), 1978. Remote Sensing: The Quantitative Approach. McGraw-Hill, New
York.
Yen, J., 1989. Gertis: a Dempster-Shafer approach to diagosing hierarchical hypotheses. Communications of the
ACM. 32(5):573-585.
Further Readings
Ball, G. H., and J. D. Hall, 1967. A clustering technique for summarizing multivariate data. Behavioral Science,
12:153-155.
Bezdek, J.C., R. Ehrlich & W. Fall, 1984, FCM: the fuzzy c-means clustering algorithm, Computers and
Geoscience, 10:191-203.
Bishop, Y. M. M., S. E. Feinberg, and P. W. Holland, 1975. Discrete Multivariate Analysis - Theory and Practice.
The MIT Press, Cambridge, Mass.
Chittineni, C. B., 1981. Utilization of spectral-spatial information in the classification of imagery data.
Computer Graphics and Image Processing, 16:305-340.
Cibula, W. G., M. O. Nyquist, 1987, Use of topographic and climatological models in geographical data base
to improve Landsat MSS classification for Olympic national park. Photogrammetric Engineering and Remote
Sensing, 53(1):67-76.
Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, Vol.
20, No. 1, pp. 37-46.
Congalton, R. G., and R. A. Mead, 1983. A quantitative method to test for consistency and correctness in
photointerpretation. Photogrammetric Engineering and Remote Sensing, 49(1):69-74.
Conners, R. W., and C. A. Harlow, 1980. A theoretical comparison of texture algorithms. IEEE Transactions on
Pattern Analysis and Machine Intelligence, PAMI-2(3): 204-222.
Fleiss, J. L., J. Cohen, and B. S. Everitt, 1969. Large sample standard errors of Kappa and weighted Kappa.
Psychological Bulletin, Vol. 72, No. 5, pp. 323-327.
Fu, K. S and Yu, T. S., 1980. Spatial Pattern Classification Using Contextual Information, Research Studies Press,
Chichester, England.
Fung, T., and E. F. LeDrew, 1987. Land cover change detection with Thematic Mapper spectral/textural data
at the rural-urban fringe. Proceedings of 21st Symposium on Remote Sensing of Environment, Ann Arbor, Mi., Vol. 2,
pp.783-789.
_____, 1988. The determination of optimal threshold levels for change detection using various accuracy
indices. Photogrammetric Engineering and Remote Sensing, 54(10):1449-1454.
Gong, P., D. Marceau, and P. J. Howarth, 1992. A comparison of spatial feature extraction algorithms for
land-use mapping with SPOT HRV data. Remote Sensing of Environment. 40:137-151.
Gong, P., J. R. Miller, J. Freemantle, and B. Chen, 1991. Spectral decomposition of Landsat TM data for urban
land-cover mapping, 14th Canadian Symposium on Remote Sensing, pp.458-461.
Ketting, R. J., and Landgrebe, D. A., 1976. Classification of multispectral image data by extraction and
classification of homogeneous objects. IEEE Transactions on Geoscience and Electronics, GE-14(1):19-26.
Landgrebe, D. A. and E. Malaret, 1986. Noise in remote sensing systems: the effects on classification error.
IEEE Transactions on Geoscience and Remote Sensing, GE-24(2):
First hand
Second hand - digitizing from maps.
To know how spatial data are collected, helps us to appreciate the possible level of errors or
uncertainties involved in the data collection process.
In what forms are spatial data collected? How is spatial sampling done?
Random collection
Systematic sampling or complete coverage
Other hybrids of the first two
One needs to determine the density of sampling, obviously, the denser one collects data, the more
likely one would represent the reality.
The density of sampling is a function of a number of factors,
(1) the complexity of the phenomena,
(2) the capability of the measuring tools
(3) the available accuracy requirement
(4) economic considerations
At most of the time, we tend to use second-hand spatial data, i.e., currently available data and they are
often in map forms.
How are maps made?
For thematic maps,
(1) Manually Base map preparation
Thematic data transfer (from survey, aerial photographs remote sensing images) on to the base map
Interpolation or extrapolation may be needed
Classification, generalization, symbolization and decoration
Layer separation and printing
Linear
Areal
Volumetric
According to thematic entities
Natural resources, forest, geological, lithological, agricultural, climatic
Man-made
Municipal
Cadastral, etc.
In daily life, we use our sensing organs and brain to recognize things and then make decisions and
take actions. Our sensing organs include eyes, ears, nose, tongue and skin touch. The first three are our
remote sensors. Our sensors pass scene, sound, smell, tastes and feeling to our brains, our brains
process the evidences collected by different sensors and analyze them and then compare with things in
our memory that have been recognized before to see if based on the data collected we can recognize
(label) the newly detected thing as one of the things which has been recognized before. If the
recognized thing is a tree in our way, our brain may decide to go around it. In an increasingly
competitive society, in order to make optimized decisions, we have to make best use of all the
evidences that are available to arrive an accurate recognition. In our daily life, we experience
thousands of processes like this, evidence collection - evidence analysis - decision making - action
taking. For example, our eyes cannot resolve details either from too far away or due to their sizes
being too small. This has been made possible with the help of a telescope and a macroscope.
We cannot see in the spectral ranges outside the visible spectral wavelength region, various detectors
sensitive to different non-visible regions can record images for us to see as if our eyes were sensitive
to those spectral regions. In spatial data handling, our brains cannot memorize exactly the location and
spatial extent that certain phenomenon occupies, electro-magnetic media can be used to do so. The
evidence volume is so large that our brain can only process a very small amount of it. Therefore, we
need to use computers to assist us to do so. In this chapter, we examine some of the techniques that
can be used in computer assisted handling of various spatial evidence, especially integrated analysis of
spatial evidence from multiple sources, such as from field survey, remote sensing and/or existing map
sources.
Data integration: integrate spatial data from different sources for a single application. What types of
application are we referring to?
One problem in data integration is:
incompatibility between spatial data sets, in the following aspects:
data structures
data types
spatial resolutions
levels of generalization
- Data structures Raster vs.Vector
Discrepancies in concepts of spatial representation
cell object
Location (i, j) {(xi, yi)}
Entity/Attribute Incomplete/Broken Complete
Being Represented
Ease of representing Discrete
continuous
Phenomena Phenomena
More flexible
Level of Generalization Low High
Communication Hard Easy
Storage Large amount Less
_________
Is overlay of digital files a data integration method?
Yes, a very preliminary one. Given two data sets A = {(x, y) : z}
B = {(x, y) : u} AUB = {(x, y) : z, u}
It is more or less a data accumulation.
Five types of models,
PM Point Model : (x, y, z, ...)
GM Grid Model : (i, j, z, ...) i, j
LM Line Model : ({x, y}, z, ...)
AM Area Model : ({x, y}, z, ...)
CM Cartographic Model: Traditional meaning {PM, LM, AM}
Now {PM, GM, LM, AM}
_________
* An important extension, 3rd spatial dimension and the temporal dimension.
* Discussion:
_________
(1) Mark and Csillag's model (1989)
Homogeneity is broken only at the boundaries.
R
{0, 1} or {A, B, ..., }
Let W denote a finite collection of mutually exclusive statements about the world. By e = 2W we
denote the set of all events. An empty set f, a subset of every set by definition is called the impossible
event, since the outcome of a random selection can never be an element of f. On the other hand, the set
W itself always contains the actual outcome, therefore it is called the certain event. If A and B are
events, then so are the union of A and B, AB, and the complements of A( ) and
B( ) , respectively. For example, the event AB occurs if and only if A occurs or B occurs. We call the
pair (W, e) the sample space. Define a function P: e [0, 1] to be a probability if it could be induced
in the way described, i.e. if it satisfies the following conditions which are well known as the
Kolmogorov axioms,
(1) P(A) 0 for all A
(2) P(W) = 1
(3) For A, B
W , from AB = f follows
n, the event Ai occurred ki times. Then under the conventional evaluation, called maximum
likelihood evaluation:
Pm(Ai) =
but under an alternative evaluation, called the Bayesian evaluation:
Pb(Ai) =
under this evaluation, we implicitly assume that each event has already occurred once even before the
experiment commenced. When Ski ,
Pm(Ai) = Pb(Ai)
Nevertheless,
0 < Pb(Ai) < 1 .
Let P(AB) denote the probability of even A occurring conditioned on event B having already
occurred. P(AB) is known as the posterior probability of A subject to B, or the conditional probability
of A given B.
For a single event A, A
P(A) 1
P( ) = 1 - P(A)
For A
B and A, B
= S1 - S2 + S3 - S4 + ... + (-1)n-1 Sn
where S1 =
S2 =
S3 =
Sn = P(A1, A2, ..., An)
For conditional probability, P(AB), A, B
P(AB) = .
We then have
In logical expression, when e implies h, that is, e h, can be alternatively read as 'e is sufficient for h'
or as 'h is necessary for e'. There is no ambiguity between e and h, i.e., the reliability is 100%.
However, in reality, the reliability of e in support of h is lower than a logical implication.
An evidence e can usually be in two states: absent or present when P(e) = 0 or P(e) = 1, it is of no
practical interest. Either way there is nothing to observe. For h, it is the same. Therefore, we shall
assume 0 < P(e) < 1 and 0 < P(h) < 1.
To study the necessity and sufficiency measures of e for h, we need to explore the influence that a
state of e has on h. If the state of e makes h more plausible, we say that the state of e encourages h. If it
makes h less plausible, we say that the state of e discourages h. If it neither encourages nor
discourages h, then the state of e has no influence on h, or e and h are independent of each other.
For the necessity measure, we first explore how the absence of e influences h. From O(h ) = O(h)
we define N =
0N
Similarly, we have
For S = P(he) = 1 , e h \ e is sufficient for h
1 < S < P(he) > P(h) , e encourages h
S = 1 No influence
0 < S < 1 P(he) < P(h) , e discourages h
S = 0 P(he) = 0 , e , e \ e is sufficient for .
From the above analysis, it is clear that N and S are the measures for necessity and sufficiency,
respectively. N, S and O(h) needed to evaluate O(h ) and O(he) are provided by domain experts.
Quite often, instead of directly supply N and S domain experts may supply values of P(eh) and P(e ).
This implies that observing evidential probabilities under a certain hypothesis h or .
N= =
S= .
L(hj) =
L is a measure of the accumulated penalty incurred given the evidence could have supported any of the
available classes and the penalty functions relating all these classes to class hj.
Thus a useful decision rule for evaluating a piece of evidence for support of a class is to choose that
class for which the average loss is the smallest, i.e.,
xi encourages hj, if L(hj) < L(hk) for all
This k_ j is the algorithm that implemented Bayes' rule. Because p(hkxi) is usually not available, it is
evaluated by p(xihk), p(hk) and p(xi)
p(hkxi) =
Thus
L(hj) =
l(jk)'s can be defined by domain experts.
A special case for l(jk)s is given as follows:
Suppose l(jk) = 1 - Fjk with Fjj = 1 and Fjk to be defined. Then from the above formula we have
L(hj) = =1The minimum penalty decision rule has become searching the maximum for g(hj) which is
.
Thus the decision rule is
xi encourages hj, if g(hj) > g(hk) fall all k_ j
If Fjk = djk , the delta function, i.e.,
djk =
g(hj) is further simplified to
and thus the decision rule becomes
xi encourages hj if p(xihj)p(hj) > p(xihk)p(hk) for all j_k.
Fuzzy set is a "class" with a continuum of grades of membership (Zadeh, 1965). More often than not,
the classes of objects encountered in the real physical world do not have precisely defined criteria of
membership. For example, the "class of all real numbers which are much greater than 1", or the "class
of beautiful cats", do not constitute classes or sets in the usual mathematical sense of these terms.
However, the fact remains that such imprecisely defined "classes" pay an important role in human
thinking, particularly in the domain of patterns recognition, and abstraction
Let W, a non-empty set, be the formal basis of our further exertions. Set W is often called the
universe of discourse or frame of discernment. Our focus is primarily on finite sets. In such cases, the
number of elements in W, its cardinality is abbreviated by W. Any element in W is denoted by w.
For a specific w W, $ set A which makes either w A or w A. This is the basic requirement in
ordinary set theory.
Set A is denoted by A = {w1, w2, ... , wn} , wi is the ith element of set A. When elements in A cannot
be explicitly listed, A is denoted by { w .... }. The later part in the brackets is a description to those
elements which is included. In general,
A = { wA(w) true } ,
where A is a function of w.
Given A, B defined on W, if for any w W we have w A w B, then
A
If A
B
B and B
A, then A = B
W.
An empty set is one that does not contain any element in W. An empty set is denoted as f.
Any A on W , f
W.
A discussed so far is called a single element set. When any A W becomes an element of another set
U, U is also a set, it is sometimes called a set class. All the set classes for W becomes 2W . For
instance, if W = {black, white} then 2W = {{black, white}, {black}, {white}, f} . In fact, sets defined
on W could be a set class. Therefore, a set A defined on W is sometimes denoted as A 2W .
Definition 1. Given A, B 2W ,
A B = {ww A or w B},
A B = {ww A and w B},
= {ww A},
are called the union of set A and B, the intersection of A and B and the complement of A, respectively.
When "", "", and "-" operators are used in combination, "-" has higher priority than "" and "".
It can be proven that for any W and A, B 2W , the following relationships hold:
()= ,
()= .
These are called De-Morgan's law.
The following are some properties for set arithmetics.
AA=A,AA=A
AB=BA,AB=BA
(A B)C = A(B C )
(A B)C = A(B C )
(A B)B = B,
(A B)B = B
A(B C) = (A B)(A C),
A(B C) = (A B)(A C)
A W = W, A W = A
A f = A, A f = f
()=A
A =W
A =f.
Definition 2. The two denotations
Ai = {wwW, $ iI such that wAi }
Ai = {wwW, $ iI such that wAi }
are called the union and intersection of set class { AiiI } .
I = { 1, 2, ... , n, ... } is called index set.
When I = { 1, 2 }, definition 2 is equivalent to definition 1.
Definition 3.
A - B = {wwA and wB } is called the difference set of B for A.
A-B=A
=-A
A projection from W to F is defined by:
f:WF.
Projection is the extension to the concept of a function. For any w W, there exists an element j =
f(w). w is the original image and f(w) is called the image of w.
W is the definition range for f, and
f(W) = { j$ wW such that j = f(w) } .
f(W) is called the value range.
If f(W) = F , then f is full projection from W to F.
If for any given w1, w2 W and w1 _ w2, we have
f(w1) _ f(w2) ,
Definition 7. Given
Fuzzy set theory and probability theory are used to handle two different types of uncertainty. We use
probability to study random phenomena. Each event itself has distinct meaning and not uncertain.
However, due to the lack of sufficient condition the outcome for certain event to occur during a
process cannot be determined.
In fuzzy set theory, concept or event itself does not have a clear definition. For example, "tall mean",
how tall they are is not defined. Here, whether certain phenomena belong to this concept is difficult to
determine. We call it fuzziness the uncertainty involved in a classification due to the imprecise
concept definition. The root for fuzziness is that there exists transitions between two phenomena. Such
transitions make it possible for us to label phenomena into either this or that class. Fuzzy set theory is
the base for us to study membership relationships from the fuzziness of phenomena.
Fuzzy statistics is used to determine estimate the degree of membership or membership function. In
order to do so we need to design a fuzzy statistic experiment. In such an experiment, similar to fuzzy
statistics, there are four elements:
1. Universe of discourse W ;
2. An element w in W ;
3. An ordinary set A which is varying on the W basis. A is related to a fuzzy set which corresponds to
a fuzzy concept. Each time A is fixed, it represents a deterministic definition of the fuzzy concept as
its approximation.
4. Condition S which contains all the objective and subjective factors that are related to the definition
of the fuzzy concept and therefore is a constraint of the variation of A.
The purpose of fuzzy statistics is to use a deterministic approach to study the uncertainties. The
requirements for a fuzzy statistical experiment is that in each experiment a deterministic decision on
whether w belongs to A. Therefore, in each experiment, A is a definite ordinary set. In fuzzy statistical
experiments, w is fixed while A is changing.
In n experiments, calculate the membership frequency of w belonging to fuzzy set , denoted by f
f=
As n increases, f may stabilize. The stabilized membership frequency is the degree of membership for
w belonging to . We call fuzzy statistics involving more than one fuzzy concepts, multi-phase fuzzy
statistics.
Definition 8. Given Pm = { 1 , ..., m} Ai F(W), i = 1, ..., m, this type of experiments is m-phase
fuzzy statistical experiments, provided that in each experiment we can determine a projection such that
e : W Pm .
Each fuzzy set in Pm is one phase of Pm.
The results of multi-phase fuzzy statistics enable us to obtain a fuzzy membership function for each
phase on W. They have the following properties:
m 1(w) + m 2(w) + ... + m m(w) = 1
If W = {w1, w2, ... , wn} is a finite universe of discourse we have
m 1(wj) + ... + m m(wj) = n .
An important concept needed in fuzzy set theory is that of a fuzzy relation which generalizes the
conventional set-theoretic notion of relation. Let W1 and W2 be two universes. A fuzzy relation has
the membership function mR : W1 x W2 [0, 1]. The projection of on W1 is the marginal fuzzy set
m = sup {m(w1, w2)w2 W2}
for all w1 W1 . If 1 is a fuzzy set on W1 the m 1 can be extended to W1 x W2 by
m = m 1(w1)
Let wo be an unknown value ranging over a set W, and let the piece of imprecise information be
given as a set E, i.e., wo E is known for sure and E 2. If we ask whether another set A contains
wo, there can be two possible answers:
if A E = f then it is impossible that wo A
if A E _ f then it is possible
Formally, we obtain a mapping
PossE : 2W [0, 1] , PossE(A) =
where 1 indicate "possible" and 0 "impossible".
When E becomes a fuzzy set , we define
Poss : 2W [0, 1]
Poss = sup {aA Ea _ f, a [0, 1]}
= sup {m w A}
Hence given a fuzzy set the small positive integer
= (1, 1, 0.8, 0.6, 0.4, 0.2) .
Given A = {3} , the possibility is 0.8
A = {xx 3} Poss = 1 .
Possibility tells us about the possibility of "not A", hence the necessity of the occurrence of A,
Nec = 1 - Poss .
In addition to the operations of union and intersection, one can define a number of other ways of
forming combinations of fuzzy sets and relating them to one another.
Algebraic product: Given and the algebraic product of and denoted by is defined in terms of the
membership functions of and ,
f=ff
This indicate that
(A, B ; L)
AB
=(, ; ).
In fact,
f = for w W .
The problem,
Given spatial data E = {e1, e2, ... , em} from m different sources S1, S2, ... , Sm, one wishes to decide
which hypothesis among n of them H = {H1, H2, ... , Hn} is most likely to happen. Or in a
classification problem, one wishes to decide which class among n classes {C1, C2, ... , Cn} is the most
appropriate one into which E to be classified. Formally stated, one wishes to find out a projection F
such that
F : S1 x S2 x ... x Sm H
which satisfies
(1) 0 FHj(E) 1 for j = 1, 2, ... , n
(2) FHj(E) = 1 .
It requires relatively deep mathematical knowledge to determine a projection from the Cartesian
product space S1 x S2 x ... x Sm to H, interested reader may find Kruse et al. (1991) a starting point.
This may be relaxed by finding a projection between each source Si to H.
Therefore,
One may follow the steps listed below to solve the problem posed.
Step 1. Consider each element in H fuzzy set j, j = 1, 2, ... , n. Determine the fuzzy membership
function on each source Si, i = 1, 2, ... , m for each Hj, j = 1, 2, ... , n. Thus a total of m x n
membership functions need to be found. Usually, expert knowledge or fuzzy sets.
Step 2. Combine evidences from different sources to validate hypotheses or to conduct classification.
Fuzzy set operations including union, intersection, complement and algebraic operation can be used
for such purposes.
Step 3. Compare combined degree of membership for each hypothesis (class), confirm the hypothesis
with the highest degree of membership.
Gong (1993) and a fuzzy classifier in a forest ecological classification research (Crain et al., 1993) are
all following this procedure. It needs to be further validated. The assumption here is obviously each
hypothesis is independent to the other.
8.7 Introduction To Neural Networks
Similar to the earlier part of this course, our interests are still focused on the problem that given a
piece of evidence e E test the hypothesis that e validates
h H. Transform this into classification or pattern recognition, we would like to have an algorithm or
a system that is capable of classifying or recognizing a given set of observations and label it into a
class or a pattern. We would like the system or the algorithm learns from observations of patterns that
are labelled by class and then is able to recognize unknown patterns and properly label them with
output of class membership values.
One of the most exciting developments during the early days of pattern recognition was the
perception, the idea that a network of elemental processors arrayed in a manner reminiscent of
biological neural nets might be able to learn how to recognize or classify patterns in an autonomous
manner. However, it was realized that simple linear networks were inadequate for that purpose and
that non-linear networks based on threshold-logic units lacked effective learning algorithms. This
problem has been solved by Rumelhart, Hinton and Williams [1986] with a generalized delta rule
(GDR) for learning. In the following section, a neural network model based on the generalized delta
rule is introduced.
8.7.1. The Generalized Delta Rule For the Semilinear Feed Forward Net With Back Propagation
of Error
The architecture of a layered net with feed forward capability is shown below:
In this system architecture, the basic elements are nodes " " and links "". Nodes are arranged in
layers. Each input node accepts a single value. Each node generates an output value. Depending on the
layer that a node is located, its output may be used as the input for all nodes in the next layer.
The links between nodes in successive layers are weight coefficients. For example wji is the link
between two nodes from layer i to layer j. Each node is an arithmetic unit. Nodes in the same layer are
independent of each other, therefore they can be implemented in parallel processing. Except those
nodes of the input layer i, all the nodes take the inputs from all the nodes of the layer and use the linear
combination of those input values as its net input, or a node in layer j, the net input is,
uj = .
The out of the node in layer j is
Oj = f (uj)
where f is the activation function. It often takes the form as a signoidal function,
Oj =
qj serves as a threshold or bias. The effect of a positive qj is to shift the activation function to the left
along the horizontal axis. The effect of qo is to modify the shape of the signoid. These effects are
illustrated in the following diagram
This function allows for each node to react to certain input differently, some nodes may be easily
activated or fined to generate a high output value when qo is low and qj is small. In contrary, when qo
is high and qj is large a node will have a slower response to the input uj. This is considered occurring
in the human neural system where neurons are activated by different levels of stimuli.
Such a feed forward networks requires a single set of weights and biases that will satisfy all the (input,
output) pairs presented to it. The process of obtaining the weights and biases is called network learning
or training. In the training task, a pattern p = {IPi}, i = 1, 2, ..., ni, ni is the number of nodes in the
input layer. IP is the input pattern index.
For the given input p , we require the network adjust the set of weights in all the connecting links and
also all the thresholds in the nodes such that the desired outputs p (= {tpk}, k = 1, 2, ..., nk, nk is the
number of output nodes) are obtained at the output nodes. Once this adjustment has been
accomplished by the network, another pair of input and output, p and p is presented and the network
is asked to learn that association also.
In general, the output p = {Opk} from the network will not be the same as the target or designed
values p. For each pattern, the square of the error is
Ep = 2
Crain, I.K., Gong, P., Chapman, M.A., 1993. Implementation considerations for uncertainty
management in an ecologically oriented GIS. GIS'93, Vancouver, B.C., pp.167-172.
Duda, R. O. and P. E. Hart, 1973. Pattern Classification and Scene Analysis. Wiley and Sons, New
York, 482p.
Freeman J.A., D. M. Skapura, 1991. Neural Networks, Algorithms, Applications, and Programming
Techniques, Addison-Wesley:New York.
Gong, P., 1993. Change detection using principal component analysis and fuzzy set theory. Canadian
Journal of Remote Sensing. 19(1): 22-9.
Gong, P., and D.J. Dunlop, 1991. Comments on Skidmore and Turner's supervised non-parametric
classifier. PE&RS. 57(1):1311-1313.
Gong, P. and P. J. Howarth, 1990. Land cover to land use conversion: a knowledge-based approach,
Technical Papers, Annual Conference of American Society of Photogrammetry and Remote Sensing,
Denver, Colorado, Vol. 4, pp.447-456.
Gong, P., A. Zhang, J. Chen, R. Hall, I. Corns, Ecological land systems classification using
multisource data and neural networks, Accepted by GIS'94, Vancouver, B.C., February, 1994.
Goodchild, M.F., G. Sun, S. Yang, 1992. Development and test of an error model for categorical data.
International Journal of Geographical Information Systems. 6(2): 87-104.
Kosko, B., 1992. Neural Networks and Fuzzy Systems. Prentice-Hall; Englewood Cliffs, New Jersey.
Kruse R., E. Schwecke, J. Heinsohn, 1991. Uncertainty and Vagueness in Knowledge Based on
Systems, Numerical Methods. Springer-Verlag: New York.
Mark D. and Cscillag F., 1989. The nature of boundaries on area-class maps. Cartographica, pp. 6577.
Pao Y., 1989. Adaptive Pattern Recognition and Neural Networks. Addison-Wesley: Reading, MA.
Richards, J. A., 1986. Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag,
Berlin.
Shinghal R., 1992. Formal Concepts in Artificial Intelligence, Fundamentals. Chapman & Hall: New
York.