Você está na página 1de 12

A NOVEL VISION-BASED TREE CROWN AND SHADOW

DETECTION ALGORITHM USING IMAGERY FROM AN


UNMANNED AIRBORNE VEHICLE
Calvin Hung, Mitch Bryson and Salah Sukkarieh
Australian Centre for Field Robotics
The Rose Street Building J04
The University of Sydney
NSW 2006 Australia
+61 2 9351 7154
{c.hung, m.bryson, s.sukkarieh}@acfr.usyd.edu.au

Abstract
The majority of the weed detection research concentrates on problems where
weeds grow within an artificial plantation in which the weeds exist between lines
of crops. These problems are typically solved by learning the plantation
geometry using various pattern recognition techniques, and then any vegetation
that does not fit in the geometric model is classified as weed. In the case where
weeds are distributed over a natural landscape, the distribution of weed and
non-weed vegetation is more complex and cannot be discriminated by using a
simple geometric model which describes the relative distribution of vegetation.
Further analysis in weed colour and texture is commonly used to improve the
classification results; however the shape property of the weed itself is not
usually explored in this field as a means of weed detection.
In this paper, we examine the use of a template-matching tree crown detection
technique, (used successfully Scandinavian forestry studies) for identifying
woody weeds in a natural landscape based on their shadow. The proposed
algorithm is divided into two stages, the first stage segments the image using
colour and texture features, the second stage utilises template matching using
shape information related to projected shadow of the woody weed, relying on
information about the time of day, sun angle and UAV pose information
collected by the onboard navigation system. We present experimental results of
the approach using colour vision data collected from a UAV during operations in
the Julia Creek in July 2009 and August 2010.

1. Introduction
It is easy for humans to identify different types of objects in a visual
environment, however robust and efficient vision based object recognition is a
challenging problem in the field of robotics and machine vision. Progress has
been made in past decades and different algorithms have been applied in
various applications. Examples include: manufacturing control such as bin
picking [Perkins, 1978], [Agrawal et. al, 2010] and sorting [Grimson, 1991],

1
remote sensing object detection of objects [Binford, 1989], [Mundy, 1990],
pedestrian detection [Gavrila, 2000] and vehicle detection [Betke et. al, 2000].
This paper concentrates on remote sensing object recognition in the context of
aerial surveying. Various methods have been proposed for detection and
recognition of objects from aerial images [Thompson and Mundy, 1987],
[Brooks et. al, 1981], however these techniques mostly deal with well defined
man-made objects whose visual signature could be accurately modeled. Unlike
man-made objects, natural objects, such as trees, are less uniform and difficult
to generalise with geometric models. A related problem exists in the field of
forest research, where various tree counting algorithms have been proposed to
detect densely populated tree species with well defined outlines [Pollock, 1996],
[Larsen and Rudemo, 1997], [Olofsson et. al, 2006].
An image frame is a two dimensional projection of the real world from the
camera point of view; this loss of one dimension of the real world is one of the
limiting factors in computer vision using monocular cameras. There are
techniques from computer vision developed to retrieve the lost dimension using
shading with photometric stereo [Brooks, 1989] and shadow by realising the
fact that the shadow of an object is a projection from the point of view of the
light source. In this paper we study the effect of utilizing this extra shadow
projection to add an extra dimension to the otherwise two dimensional image.
Similar approaches have been used in object recognition and bin picking by
casting shadows [Agrawal et. al, 2010] and ray traced templates to find
individual trees in aerial photographs [Larsen and Rudemo, 1997].
The aim of this paper is to track or map the distribution of objects in an outdoor
unstructured environment. For outdoor robotic applications object recognition
has proven challenging due to the structural complexity of natural objects. To
deal with the complexity this paper introduces a new method for remote sensing
object detection and recognition for aerial surveying of vegetation. The
technique models tree observations using a geometric model as well as colour,
texture and shadow information to detect and identify individual trees.
The proposed algorithm is divided into two stages. The first stage is a basic
image segmentation using colour and texture information involving selecting
colour and texture features from the original image and the training of Support
Vector Machines (SVM) [Burges, 1998], [Vapnik, 2000] to distinguish between
background, trees and shadows in the image. In the second stage, a target
template is generated using prior information. To quantify shape information of
the target template, it is necessary to construct an object geometric model.
This model is then used to produce an appearance template model. Based on
the navigation solution, the position and attitude of the platform and the camera
is known. Combining this information with a solar position model, the ideal
appearance of the object with shading and shadow can be predicted. The
relative position of target object and its corresponding shadow is treated as the
context information and can be used as supporting evidence of detections. The
proposed algorithm utilises different levels of vision features including low level
features such as colour and luminance, intermediate level features such as
shape and textured regions and high level features such as context information.

2
Low level vision assigns labels to every pixel whereas high level vision is
responsible for labelling discrete objects. Different levels of vision features are
used in the proposed algorithm to extract the most features from the
information rich vision data.

2. Related Work

2.1 object recognition


The aim of object recognition is to identify predefined targets within an image.
Object recognition using geometric models has gone through significant
developments in the past four decades. Approaches range from the traditional
object geometric models to more recent development in statistical learning
methods [Mundy, 2006].
Three dimensional object recognition using alignment is one of the first model-
based object recognition algorithm, the algorithm uses a pre-generated model
projecting onto the images, and then checks the expected features
[Huttenlocher and Ullman, 1987], [Huttenlocher, 1988]. This approach is
chosen and extended in this paper, instead of having to model from multiple
possible viewpoints, an exact object template can be generated using the
appearance synthetic geometric model with the knowledge of platform
navigation solution and sun path model. This extension makes the algorithm
viewpoint invariant, greatly reduces the size of the search space and allows the
recognition and detection algorithm to run more efficiently.

2.2 Tree Crown Detection


A few tree detection algorithms had been proposed in the forest counting
research field over the past few years to detect natural vegetations from
outdoor unstructured aerial images. The proposed object detection algorithm is
designed to tackle the problem with similar settings therefore similar tree crown
detection algorithms are studied in this literature review.
There are three methods most commonly used in tree crown detection
algorithm. The first method is a valley finding algorithm [Gougeon, 1995], in
which the local maxima of the image are treated as the tree crown and the local
minima are treated as the edge of the crown. The second method is a region
growing method [Erikson, 2003]. In this algorithm, random seed points are
generated in the image, the colour and texture attributes of the surrounding
pixels are calculated and then grouped into the same region if they are similar.
The regions grow until the boundary of each regions collides. The third method
is a template matching method [Pollock, 1996]. A tree crown model is
generated and is matched against the grey scale image. The summary and
performance comparison can be found in [Erikson, 2005].
The template matching method is chosen in this study, because unlike the
other two approaches, the template based approach uses both the low level

3
vision feature of luminance and also intermediate level feature of shape. The
tree recognition in aerial images of forest based on synthetic tree crown image
model was first proposed by [Pollock, 1996], this algorithm was then expanded
to include shadow [Larsen and Rudemo, 1997] and further improvements and
implementations to discriminate tree species was shown in [Olofsson et. al,
2006].

3. Algorithm
Object recognition is the identification of certain target objects inside an image
frame and is part of a more general pattern recognition problem. Pattern
recognition can be performed using either a priori knowledge or using statistical
information learned from the data. In this object recognition application both
approaches are used. Patterns such as colour and texture which are not
obvious to the human eye and difficult to model directly are learned statistically
from the data set, whereas patterns such as shape, scale and orientation of the
target object can be modelled directly using a priori knowledge.
The object recognition algorithm is divided into two modules: an image
segmentation module that takes the colour and texture information into account
and a model-object matching module that takes the shape, scale and context
information into account. The two stage approach breaks down the otherwise
difficult to solve vision problem into manageable sub-problems. This also allows
each distinctive module to be evaluated separately, therefore future
improvements can be made independently. This also allows us to generalise
the algorithm for different applications with similar structure by re-learning the
statistical models and using other prior knowledge.

3.1 Image Segmentation


This section discusses the algorithm used to perform first stage image
segmentation is discussed in this section. The image segmentation stage
divides the original image into three different classes based on the colour and
texture features. The aim is to change the representation of images from
arbitrary colours into meaningful labels which could be analysed by later
stages. The three classes are the target object class, the shadow class and the
background class.
The images were collected using a 3 CCD camera. To extract texture
information, the MPEG-7 texture descriptors are used [Jain and Farrokhnia,
1991] [Manjunath and Ma, 1996]. The texture descriptors consist of 30
individual channels, and are a combination of six different orientations and five
octaves in the radial direction. Each channel is a 2D Gabor function, mean and
variance values can be obtained in each channel. The images are captured in
Red Green Blue (RGB) colour space, to extract the colour features the images
are transformed to a Hue, Saturation and Value (HSV) colour space to reduce
the sensitivity towards change in light intensity.

4
After feature extraction the colour and texture features are grouped into one
single feature vector consisting of three colour channels and thirty texture
channels. Each feature vector is then assigned with a label representing its
class. The aim is to segment the original image into three different classes,
object, shadow and background. SVM is used as the classifier to predict the
labels of the feature vectors.

Figure 1: Image Segmentation: The classifier convert the original colour images into
images with meaningful class labels.

3.2 Object Recognition Using Shape


Object detection algorithms based purely on statistical information have their
performance limited by the quality of the training data. Additionally, prior
knowledge such as shape and position of shadow based on the input light
direction is too subtle to be incorporated into the statistical learning process.
This necessitates separation of image segmentation based on statistical
learning and object appearance model based on prior knowledge.
The algorithm used to generate a target object appearance model is discussed
in this section. The platform navigation data and the target object outline are
used as the prior knowledge in this algorithm; geometric model is constructed
based on the target object outline, platform position and time are used to
estimate the direction of the incident sun light and then used to predict the
direction and outline of the shadow. At the end the platform pose is used to
estimate the appearance of the object and shadow in each observed image
frame. The object appearance model encapsulates both the object shape and
context information.
In this algorithm simple geometric models are used to approximate the outline
of the target objects. In contrast to industrial applications where more
complicated CAD models are used, it is not possible to generalise and model

5
the exact shape of a natural object, therefore a simple geometric shape or a
combination of geometric shapes can be used as a good approximation. In this
paper the algorithm is applied on ellipsoid appearance trees therefore an
ellipsoid is used to approximate the shape.
In addition to the object model which provides the shape prior information, the
shading and shadow cast by the object can provide extra shape information.
More importantly, the shading and shadow orientation actually provide context
information. The orientation of shading and shadow can be predicted using a
solar model and knowledge of the vehicle pose. Any potential object detections
with the wrong shading and shadow orientation can thus be rejected.
A sun path model is used to estimate the position of the sun in the sky at
defined times and locations where images are collected. The model returns a
light vector which represents the orientation of the incident light from the sun.
The images time stamps are used in the sun path model to predict the exact
position of the sun in the sky; combining this with the platform position allows
prediction of the direction of the shadow. The platform pose is used to estimate
the camera pose. Combined with the solar model we are able to predict the
shadow position with respect to the target object in each image frame.

3.3 Correlation Map


In the previous sections, image segmentation was performed using the learnt
statistical model and the object and shadow model was generated using the
prior knowledge. In this section, the outputs from both approaches will be
combined to perform object detection.
The output of the statistical model is a segmented image based on the class
labels, the image is divided into three separate regions; background, object and
shadow. The output of the object appearance model is an object shadow
template, it is also divided to the same three regions. The detection is
performed using correlation template matching, this method exhaustively
search through the image space to calculate the correlation.
Two correlation maps are generated from each frame. The object template is
matched with the object segment and the shadow template is matched with the
shadow segment, these two correlation maps use the shape information and
act as weak detectors. The two individual correlation maps are then combined
to generate the final detection map. This step encapsulates the context
information, that is, the relative positions of the object and shadow. The final
correlation map is a strong detector. The entire algorithm is summarised in
Figure 2.

6
Figure 2: Algorithm Summary: During each observation both vision and pose
information are obtained. The Vision information is processed using statistical learning
and simplified into three classes. The pose information and the solar model are used to
generate the object appearance model which is then compressed into a prior template.
The outputs of both approaches are combined together using correlation template
matching, a correlation map is produced at the end of the algorithm, the map indicates
the likelihood of target object in the image coordinates.

4. Results and Discussions


The results generated using the proposed algorithm are presented in this
section. We also discuss the performance of the algorithm.
Aerial image data collected from Queensland, Australia was used to evaluate
the proposed algorithm. The flight area was at Julia Creek in a rural area. The
survey area was flat and vegetation is distributed sparsely within the area
consisting mainly of isolated trees. The robotic platform used to collect data
was a one third scale J3 Cub aircraft with a sensor payload consisting of a
3CCD camera with 200m by 140m field of view. The navigation data was
collected using GPS and inertial sensors.

7
Figure 3: Scaled down J3 Cub: This platform is used to collect aerial image data over
Julia Creek, Queensland, Australia. The mission area is flat with trees distribute
sparsely, the data set is ideal to test the object detection algorithm.

The detection algorithm utilises the colour and texture information from the
segmentation stage, the shape information from the weak object and shadow
detectors and the context information from the strong detector.
In this data set the target objects were trees, they varied slightly in size
depending on environmental conditions . Three different template sizes were
defined according to the prior knowledge on the typical size of the trees, these
templates were created to capture most targets within the size range. This
multi-scale correlation map is shown in Figure 4, where the response to large,
medium and small size template is colour coded with red, green and blue
respectively.
To evaluate the overall performance of the algorithm, the correlation map was
thresholded to produce regions of interest, the centroids of each region were
calculated and a correction vector was also calculated to compensate the
difference between the centre of the template and the actual portion of the
geometric model. This process is shown in Figure 4.

8
Figure 4: Detection Result: To evaluate the detection rate it is necessary to identify the
location of each detected objects. Top left is the original image, top right is the
correlation map, bottom left is the threshold correlation map with region of interest,
bottom right is the original image overlay with the centroid of the corresponding region
of interest.

The detection results from individual image frames were then transformed into
global coordinates using an onboard mapping system [Bryson et. al., 2010] to
generate the global distribution of the vegetation. The result is shown in Figure
5. The overall sensitivity and specificity of the algorithm are both at 80%.

9
Figure 5: Mapping over the part of the mission area. White dots represent the detected
crowns.

5. Conclusion and Future Work


This paper presented a novel object detection algorithm using aerial photos
taken from a monocular camera. The algorithm utilised two aspects of pattern
recognition; a statistical model learnt from the data and prior knowledge from
the understanding of the problem. The algorithm learnt colour and texture
features of the target object and matched the labelled image to the object
appearance model created using prior knowledge. Two weak detection maps
were generated separately for the target object and the corresponding shadow.
The weak detection map used the shape information. A strong detection map
is generated by combining both the object and shadow detection maps,
encapsulating the context information. The two stage approach broke down the
otherwise difficult vision problem into manageable sub-problems. The proposed
algorithm was evaluated and optimised separately. Achieving final sensitivity of
around 80% and a specificity also around 80%. Also in this particular
application further vegetation classification can be performed with the global
vegetation distribution map, this can be done by extracting features such as
colour, texture, shape and size around the region of interest and training a
separate classifier to further differentiate species of vegetation.

There are a few future improvements that can be made. Firstly the algorithm
gets confused when targets objects are very close together; because the
outline is an approximation rather than an exact model it is not able to
distinguish whether the detection is one large object or multiple smaller objects.

10
This ambiguity problem is intrinsic to the detection algorithm because the
performance is restricted by the resolution of the data. This can potentially be
resolved by using an active sensing strategy where the most ambiguous region
is selected for further surveying at different resolution. This monocular image
detection algorithm could be combined with a stereo vision algorithm in order to
provide extra depth information, although depth estimate could be of poor
quality due to the small base line. Depth could be used as an extra input to the
statistical learning algorithm. This algorithm could also be extended to solve
other object recognition problems within other aerial imaging scenarios, such as
power-line surveying, traffic monitoring and wild life population monitoring.

References
A. Agrawal, Y. Sun, J. Barnwell, and R. Raskar. Vision-guided Robot System
for Picking Objects by Casting Shadows. The International Journal of Robotics
Research, 2010.
M. Betke, E. Haritaoglu, and L.S. Davis. Real-time multiple vehicle detection
and tracking from a moving vehicle. Machine Vision and Applications, 12(2):69–
83, 2000.
T.O. Binford. Spatial understanding: the successor system. In Proceedings of a
workshop on Image understanding workshop, page 20. Morgan Kaufmann
Publishers Inc., 1989.
M.J. Brooks and B.K.P. Horn. Shape and source from shading. Shape from
shading, 1989.
RA Brooks. Symbolic reasoning among 3d objects and 2D models. Artificial
Intelligence, 17:285–348, 1981.
M. Bryson, A. Reid, F. Ramos and S. Sukkarieh. Airborne Vision-Based
Mapping and Classification of Large Farmland Environments. Journal of Field
Robotics, 27(5): 632-655, 2010
C.J.C. Burges. A tutorial on support vector machines for pattern recognition.
Data mining and knowledge discovery, 2(2):121–167, 1998.
M. Erikson. Segmentation of individual tree crowns in colour aerial photographs
using region growing supported by fuzzy rules. Canadian Journal of Forest
Research, 33(8):1557–1563, 2003.
M. Erikson and K. Olofsson. Comparison of three individual tree crown
detection methods. Machine Vision and Applications, 16(4):258–265, 2005.
D. Gavrila. Pedestrian detection from a moving vehicle. Computer Vision ECCV
2000, pages 37–49.
FA Gougeon. A crown-following approach to the automatic delineation of
individual tree crowns in high spatial resolution aerial images. Canadian Journal
of Remote Sensing, 21(3):274–284, 1995.

11
W.E.L. Grimson. Object recognition by computer: the role of geometric
constraints. 1991.
D.P. Huttenlocher. Three-dimensional recognition of solid objects from a two-
dimensional image. AITR-1045, 1988.
D.P. Huttenlocher and S. Ullman. Object recognition using alignment. In
Proceedings of the 1st International Conference on Computer Vision, pages
102–111, 1987.
A.K. Jain and F. Farrokhnia. Unsupervised texture segmentation using Gabor
filters. Pattern recognition, 24(12):1167–1186, 1991.
M. Larsen and M. Rudemo. Using ray-traced templates to find individual trees
in aerial photographs. In Proceedings of the Scandinavian Conference on
Image Analysis, volume 2, pages 1007–1014. Citeseer, 1997.
B.S. Manjunath and W.Y. Ma. Texture features for browsing and retrieval of
image data. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
18(8):837 –842, August 1996.
J. Mundy. Object recognition in the geometric era: A retrospective. Toward
Category-Level Object Recognition, pages 3–28, 2006.
JL Mundy and AJ Heller. The evolution and testing of a model-based object
recognitionsystem. In Computer Vision, 1990. Proceedings, Third International
Conference on, pages 268–282, 1990.
K. Olofsson, J. Wallerman, J. Holmgren, and H. Olsson. Tree species
discrimination using Z/I DMC imagery and template matching of single trees.
Scandinavian Journal of Forest Research, 21:106–110, 2006.
WA Perkins. A model-based vision system for industrial parts. IEEE
transactions on computers, pages 126–143, 1978.
R.J. Pollock. The automatic recognition of individual trees in aerial images of
forests based on a synthetic tree crown image model. The University of British
Columbia (Canada), 1996.
D. Thompson and J. Mundy. Three-dimensional model matching from an
unconstrained viewpoint. In 1987 IEEE International Conference on Robotics
and Automation. Proceedings, volume 4, 1987.
V.N. Vapnik. The nature of statistical learning theory. Springer Verlag, 2000.

12

Você também pode gostar