Você está na página 1de 8

A Comparative Study of Land Cover Multispectral and Statistical Texture

Data Classification Using Data Mining Techniques

Salman Qadri, Mutiullah, Muhammad Shahid, Muzammil-ul-Rehman and Ejaz Ahmad Rehmani.
Department of Computer Science & IT, Faculty of Management Sciences, The Islamia University of Bahawalpur, Punjab
63100, Pakistan.*Author for correspondence; e-mail address: salman.qadri@iub.edu.pk

ABSTRACT
The main objective of this study was to find out the importance of data mining techniques for the
classification of five land cover (LC) types such as, fertile cultivated land, green pasture, desert-rangeland,
bare land and Sutlej-river land. A novel optimized land classification framework (OLCF) was design to
classify the subjective land cover types accurately. The above mentioned five types of LC have strong
correlation among each other. It was observed by visual perception that three selected LC such as desert
rangeland, Sutlej river land and bare land have close resemblance of physical features and remaining two
LC, fertile cultivate land (cropland) and green pasture (grass) have almost similar physical features. Due to
these reasons, it seems very difficult to discriminate these vast land cover areas accurately. To resolve this
problem, Remote sensing data of these five LC types are acquired by using handheld crop scan device
MSR5 in the form of five spectral bands (blue, green, red, infrared and near-infrared) while texture data
was arranged with a digital camera by the transformation of acquired images into texture features. All these
datasets were taken in the region of Bahawalpur Pakistan. After data acquisition and preprocessing then
implemented feature selection and reduction techniques as a result an optimized set of feature was obtained.
These features were deployed to WEKA software for classification. A comparative analysis was performed
on the results of Multilayer Perceptron (MLP), Random Forest (RF), J48 and Nave Bayes (NB). It was
observed that MLP outperformed exceptionally and received an overall accuracy of 97.333% for texture
dataset and 96.66% for spectral dataset respectively.
Keywords: Textural features, Remote Sensing, MLP, multi-spectral, Land C

1. INTRODUCTION the benefits of technology by involving it in


different domain such as engineering, agriculture,
Data mining techniques are playing an important economics and environmental sciences
role with image processing and remote sensing etc(WalterShea et al. , 1992). They are trying to
for betterment of the agriculture domain(Parihar
enhance the cultivation area with better varieties
and Oza, 2006). This study combines the data of crops. Conventionally these cultivated lands
mining with remote sensing to extract the useful are monitored through intensive field base
knowledge in the given datasets; recently data survey(Foody, 2002). For the success of these
mining is widely used in agriculture field. It is field base surveys, a heavy investment with large
used to classify the vast LC area into different human resource is also required. Especially for
classes and estimation of different crop yield developing countries such as Pakistan, it is not so
assessment models (Blaschke et al. , 2000) .This
easy to spend a big amount on such projects.
would be helpful not only for the current needs Although almost half of the total population of
but also fulfill future prediction. In this century, these countries is associated with agriculture
world has to face the different challenges of profession (Pakistan, 2000) All discussed issues
human survival such as lack of food, poverty, emphasize the significance of the proper land
drought and different catastrophic events. classification, management and better utilization.
(Rundquist, 2000).These issues can be resolved As per geographical distribution, it is observed
by providing better food, water, environment, that land is categorized into different types like
security and increase in crop production with bare, fertile, rocky, salinity and sandy etc. In
better utilization of cultivated land. The LC developing countries including Pakistan, the
information is essential for better planning and conventional field based survey system could not
management purposes. Scientists are trying to get
been managed due to both financial and technical get information from GIS database and remote
constraints. For this reason, data mining with sensing data by using inductive learning methods
remote sensing and image processing technology to improve land use classification of images. In
could not been implicated for natural resource this study, it was tried to compare the
management as was suggested by different performance of two types of data (multispectral
researchers (Kureshy, 1995). Similarly the and texture) for the classification by using MSR5
Chinese Academy of Sciences (CAS) with and photographic dataset, before this study, there
collaboration of different research teams was no such type of datasets were deployed to
developed a land use dataset of temporal data mining algorithms for land cover
data(Gao et al. , 1999, Liu et al. , 2002, Liu et al. classification. The objective of this study was to
, 2003, Shifa et al. , 2011) discriminate the cotton build up a simple, comprehensive and
and sugar plants by using multispectral data and exceptional framework to classify the above
observed 98% overall classification accuracy. discussed five LC types in natural environment
(REHMANI et al. , 2015)acquired Two types of by using optimized set of spectral and statistical
remote sensing data (radiometric and parameters; to accomplish this study we had used
photographic) of five different wheat varieties statistical texture features for photographic data
and compared the classification accuracy 96%for and spectral features for MSR5 (multispectral)
radiometric data and 93.14% for photographic data.
data. A two layer Conditional Random Field
(CRF) model for land cover and land use 2. MATERIAL AND METHODS
classification was proposed by(Albert et al. ,
This study focused at the LC classification
2014). Similarly a multilayer conditional random
through remote sensed data by using data mining
field (MCRF) land classification model was
classification algorithms. This research was
suggested. It was used for multi temporal with
conducted at The Islamia university of
multi scale remote sensing data (Hoberg et al. ,
Bahawalpur province Punjab (Pakistan), located
2012). A texture features with variable window
at 292344N and 71411E.. This data are
size images were used for four land cover of
acquired by using a device named Multispectral
aerial data. Different statistical features were
Radiometer Crop Scan (MSR5). It provides data
used to classify the land cover data (Helmholz et
equivalent to Satellite Landsat5 TM (Thematic
al. , 2014). A supervised pixel-based
Mapper). Its output data consists of five spectral
classification method was developed by
bands visible (blue, green, red), infrared and near
implementing Markov Random Field (MRF)
infrared, ranges from 450nm to 1750nm, where
method to differentiate the agriculture land cover
as photographic data was acquired by a digital
data (cropland and grass land) (Caridade et al. ,
2008) In data mining classification is an ultimate camera.
objective. classification is achieved in training 2.1 PROPOSED METHODOLOGY
dataset to predict the class of future objects whose We proposed an optimized land classification
class label is not known (Bayardo Jr, 1997, Di et framework (OLCF) for subjective (LC) types. To
al. , 2000) Image segmentation gives lots of complete this study the following routine steps of
object information not only for spectral, but also image pre-processing, feature extraction,
about the spatial or shape features (Blaschke, selection, reduction and classification were
2010, Hussain et al. , 2013) Hu and Wang adopted, which were discussed in the following
Compared the object-based approaches with sections. The proposed methodology had been
traditional pixel-based approaches which shown implemented by using WEKA software versions
the improved classification accuracy (Hu and 4.3 [http://www.cs.waikato.ac.nz] (Szczypiski
Wang, 2013) Classified photographic urban land- et al. , 2009) with Mazda software versions 4.6
used data in four classes such as, office, (Svotwa et al. , 2014) on Intel(R) Core i3
industrial, public, and transportation, The data processor 2.4 GHz with 64-bit operating system.
was classified by applying decision tree and
achieved an accuracy of 61.88% (waikato.ac.nz,
14,2016) described the data mining algorithms to
compatible data to satellite LANDSAT5 TM. It
provides five different segment of spectrum,
including blue (450 to 520 nm), green (520 to 600
nm), red (630 to 690 nm), infrared (760 to 900)
and near infrared (1550 to 1750 nm).MSR5 crop
scan data had been previously used for the crops
classification (Garatuza-Payan et al. , 2003, Shifa
Figure 1. Proposed optimized land classification et al., 2011) and vegetation cover estimation and
framework (OLCF) diseases identification (Taghvaeian et al. , 2012,
2.2 DIGITAL PHOTOGRAPHIC DATA Taghvaeian et al. , 2013) For this study, it was
ACQUISITION obtained 60 MSR scans of each plot at 4 feet
Digital photographs of subjective LC were taken height of subjective land. These scans were taken
by digital camera of Nikon Company; model at the same sites where the digital photographic
Coolplex having a resolution of 14.1 megapixels. data were acquired of these LC types. Each MSR
The 12 colored photograph of each type of LC scan composed of five spectral bands, three
with the dimensions of 42883216 pixels and 24 visible (Blue, Green, Red) and two invisible
bits depth having jpg format were obtained. To infrared and near infrared. Five different types of
increase the dataset, 5 non overlapping regions of LC contain total 300 spectral data instances
interests (ROIs) of window size (512x512) on (CROPSCAN, 2001)
each image were developed, in this way total 2.5 Optimized Classification Framework
300(605) sub images data were arranged for the (OCF)
analysis. After acquiring both multispectral and
photographic data, then the proposed optimized
The photographic data were taken at the height of classification framework (OCF) was used to
4 feet from the ground surface. Whole data implement for further processing and analysis.
collecting process was completed during the For photographic data, each image contains some
months of April to December in 2015 at noon extraneous portion, so before starting to further
time (12.00 pm to 2.00 pm) under natural processing, applicable image portion was
sunshine. For better overall experimental obtained. By using image converter software, the
accuracy the sunlight intensity was measured by obtained images were transformed to gray level
digital Luxmeter MS 6610, MATECH (8 bit) and stored in bmp format. The MaZda
software version (4.6) was used to calculate
texture features (Szczypiski et al., 2009). For
this study total 234 texture features were
calculated for each region of interest (ROI) by
using Mazda software version 4.6. These were
divided as first order 9 parameters and 11 second
order (Haralick) parameters and 5 Auto
regression parameters resulting from gray level
co-occurrence matrix (GLCM) in all four
directions (0, 45, 90 and 135) up to 5 pixel
distance 220 (1145) (Haralick et al. , 1973). It
means that each ROI had described by 234
textural features and statistically the data were
accessible in 70200(300234) dimensional
Figure 2. Photographic Land Cover data features vector space. It is important to describe
2.3 MSR5 DATA ACQUISITION here that all of the obtained features were not
Radiometric data were acquired by Multispectral uniformly significant for subjective land cover
Radiometer (MSR5) made-up of CROPSCAN classification. So, it was necessary to reduce the
Inc. (USA). MSR5 have the quality to provide feature dimensionality to obtain the most
discriminate features, which had the ability to non-linear discriminant analysis (NDA) available
separate and categorize the LC classes accurately. in B11 software which is integrated with MaZda.
2.8 FEATURES SELECTION It had given better clustering result on texture
We had used three supervised feature selection data. While for MSR5 datasets linear
methods Fisher Co-efficient (F), Probability Of discriminant analysis (LDA) gave the best results
Error plus Average Correlation Co-efficient for data clustering and analysis. The objective of
(POE+ACC) and Mutual Information Co- linear discriminant analysis (LDA) was to get a
efficient (MI).These techniques are available in linear transform matrix (Zapotoczny, 2011)
MaZda software version (4.6). Each technique 2.10 CLASSIFICATION
gives 10 most discriminate features in descending Classification is a key data mining technique
order as per their significance. In this way total which is used in wide spectrums of applications.
30 features (10 features by each technique) were Classification is an ongoing process for assigning
selected. As discussed by (Shahid M., 2014) the a given part of information into any of theknown
combined techniques give better classification classes. In data mining, actually it is the
results, hence all the above mentioned techniques procedure to acquire the information in the huge
were merged together (F+PA+MI) to get the most volume of data (Han and Kamber, 2006). In this
discriminate features, in this way a set of 30 study different classification methods of data
features were obtained for further analysis. mining are employed on two different types of
LC dataset. We had applied different
classification algorithm by using WEKA
software version (4.3) such as Multilayer
Perceptron (MLP), Nave Bayes (NB), Random
Forest (RF) and J48. These classifiers were
employed on two types of dataset i.e. texture and
spectral. All the classifiers were implemented
after applying feature selection and reduction
techniques due to get the better overall accuracy
results. For processing in Weka software, both
types of dataset were arranged into the Attribute
Relation File Format (ARFF). Multilayer
Perceptron (MLP): It is known as Artificial
Neural Network (ANN). It is a feed forward
neural network with one or more layers between
input and output layer. It has three layers: input
layer, hidden layer and output layer. Hidden layer
is the middle layer it may be more than one. In
each layer every neuron or node is associated to
every neighboring layers node. The training or
testing parameters are depends on the input layer,
Table 1. Feature Table (F+PA+MI) for ROI and additional processing depends the hidden and
(512x512) output layers.
2.9 FEATURES REDUCTION RandomForest (RF): It is an ensemble learning
Before classification the data were processed to technique for classification; it is mostly used for
minimize the consequence of unnecessary large datasets. It also has the capability to handle
disparity within the data due to outliers and other the huge volume of features without deleting in
objects by applying feature reduction techniques. the dataset. For unsupervised data clustering, RF
The combined feature selection techniques can also be used for better classification results.
(F+PA+MI), only picks the most important J48: It is the optimized form of C4.5 classifier. Its
factors, but does not directly state the level of result is decision tree which is same as tree
discrimination power. To find the data clustering, structure. It contains different nodes such as root
the selected 30 features data were arranged for
node intermediate node and leaf node. Every
node in a tree contains a decision and as a result
all the nodes describe the decision tree (Di et al.,
2000) Naive Bayes (NB): Naive Bayes classifier
is a set of supervised learning algorithms
dependent on employing Bayes theorem. Nave
Bayes is also called a conditional probability
model: this classifier is very fast as compared to
others complicated classifiers. Naive Bayes Table 3 represents a confusion matrix of texture
classifiers have worked excellent in many real- data; it includes the information which is actual
time datasets, famously document classification and predicted data for MLP classification system.
and spam filtering. They require a small amount MLP shows the best overall accuracy among
of training data to estimate the necessary different employed classifiers.
parameters. Table3: Multilayer Perceptron (MLP) texture
Results and Discussions: data confusion table
In this study by using WEKA software, we had
selected above discussed four data mining
classification algorithms. We had built and
compared the results on both types of datasets.
These data mining techniques have the abilities to
analyze the large datasets. For both types of
dataset (texture and spectral), we have split
dataset into 66% for training and 34% for testing Texture data classification graph of MLP was
with 10 fold cross validation method. We have shown in figure 4. It showed that each LC type
also measured some other performance had 60 data instances (ROIs) and these ROIs or
measuring parameters such as true positive (TP), data had shown into their respective classes.
false positive(FP), receiver-operating Given below figure explained the data
characteristic (ROC), mean absolute error classification graph of MLP.
(MAE), root mean squared error (RMSE),
Confusion matrix, time complexity (T) and
overall accuracy (OA). At first we had taken the
texture dataset for land cover classification. We
had employed different data mining classifiers
that showed different accuracy results. Texture
data classification results were acquired with the
10 fold Cross-validation method by using
classifiers including MLP, RF, NB and J48 with
an optimized set of 30 texture features. The
classifier MLP demonstrates the highest overall
accuracy of 97.6667% as compared to the others
deployed classifiers. As a result, it represents the Figure4: Multilayer Perceptron (MLP) texture
higher overall accuracy (OA) with others data classification graph
performance evaluating parameters including For the multi-spectral dataset, the same data
kappa coefficient, TP, FP, ROC, MAE, RMSE mining classifiers were deployed as in above
and time complexity factor. All the texture base discussed texture dataset. The 10 fold Cross-
land cover classification results with performance validation approach with additional 5 spectral
oriented parameters are shown in the given features were used for data classification. Here
table2. MLP classifier also showed the highest overall
Table2: Texture data classification table accuracy as compared to the others deploying
classifiers. As a result, the deployed spectral
features provided the higher overall accuracy
with others performance evaluating parameters qualitative parameters we can accurately classify
including kappa coefficient, TP, FP, ROC, MAE different land cove types into their appropriate
, RMSE and time complexity factor. Given classes (Armstrong et al. , 2007).
below table showed different classifiers results 4. CONCLUSION
for multi-spectral dataset. In this study five different types of land cover
Table4: Spectral data classification table were classified into their appropriate classes. A
comparative study of four data mining classifiers
such as MLP, RF, NB and J48 had been
performed after implementation on texture and
spectral dataset. Both types of land cover dataset
(texture and spectral) classification had been
observed in the sense of overall accuracy with
others performance oriented parameters as
It had the information which was actual and discussed above. All the classifiers had given
predicted data for MLP classification system. satisfactory results but multilayer perceptron had
MLP showed the best overall accuracy among outperformed exceptionally. After deploying
different employed classifiers for multi-spectral multilayer perceptron, an average accuracy of
dataset. 96.333% for spectral data and 97.666% for
Table5: Multilayer Perceptron (MLP) Spectral texture data were observed. It was the best overall
data confusion table accuracy among all the remaining deployed
classifiers results of five different types of land
cover, fertile land, green pasture, desert
rangeland, bare land and Sutlej river land. In this
study, it is important to discuss here that in digital
photgraphic dataset if texture feature space would
not been optimized by employing combined
Multi-spectral data classification graph for MLP feature selection techniques (F+PA+MI) and
classifier is shown in figure 5. It showed that each feature reduction by non linear discremenent
LC type had also 60 data instances (ROIs) and analysis (NDA) then looks very difficult to
these data had moved into their respective achieve such an excellent overall accuracy .
classes. Given below figure explain the data Although it is lengthy, time comsuming and
classification of MLP classifier for Multi-spectral complex procedure but it will lead to better
accuracy results which is almost equal or better in
data.
some cases for analysis and classification as
compared to multispect data.In future we may
enhance this study as a data fusion for combining
both textural and multi-spectral dataset for better
classification results.

REFERENCES

Figure5: Multilayer Perceptron (MLP) spectral


data classification grap Albert L, Rottensteiner F, Heipke C. A two-layer
Conditional Random Field model for
dd All above discussion shows that, better data simultaneous classification of land cover and land
acquisition, preprocessing, optimized selected use. The International Archives of
features and different data mining classifiers can Photogrammetry, Remote Sensing and Spatial
also impact on results for classification. By Information Sciences. 2014;40:17.
implementing this optimized land classification Armstrong LJ, Diepeveen D, Maddern R. The
framework (OLCF) rather than traditional application of data mining techniques to
characterize agricultural soil profiles. using very high resolution mono-temporal
Proceedings of the sixth Australasian conference satellite images. ISPRS Journal of
on Data mining and analytics-Volume 70: Photogrammetry and Remote Sensing.
Australian Computer Society, Inc.; 2007. p. 85- 2014;97:204-18.
100. Hoberg T, Rottensteiner F, Heipke C. Context
Bayardo Jr RJ. Brute-Force Mining of High- models for CRF-based classification of
Confidence Classification Rules. KDD1997. p. multitemporal remote sensing data. ISPRS
123-6. Annals of Photogrammetry, Remote Sensing and
Blaschke T. Object based image analysis for Spatial Information Sciences. 2012;7:128-34.
remote sensing. ISPRS journal of Hu S, Wang L. Automated urban land-use
photogrammetry and remote sensing. 2010;65:2- classification with remote sensing. International
16. Journal of Remote Sensing. 2013;34:790-803.
Blaschke T, Lang S, Lorup E, Strobl J, Zeil P. Hussain M, Chen D, Cheng A, Wei H, Stanley D.
Object-oriented image processing in an Change detection from remotely sensed images:
integrated GIS/remote sensing environment and From pixel-based to object-based approaches.
perspectives for environmental applications. ISPRS Journal of Photogrammetry and Remote
Environmental information for planning, politics Sensing. 2013;80:91-106.
and the public. 2000;2:555-70. Kureshy K. Geography of Pakistan, National
Caridade C, Maral AR, Mendona T. The use of Book Service, Lahore, Pakistan. 1999.
texture for image classification of black & white Optimization of Rainfall. 1995.
air photographs. International Journal of Remote Liu J, Liu M, Zhuang D, Zhang Z, Deng X. The
Sensing. 2008;29:593-607. spatial pattern analysis of land use change of
CROPSCAN I. MSR User's Manual Rochester, China. Science in China D. 2002;32:1031-40.
MN, USA: 13-152001. Liu J, Liu M, Zhuang D, Zhang Z, Deng X. Study
Di K, Li D, Li D. Land use classification of on spatial pattern of land-use change in China
remote sensing image with GIS data based on during 19952000. Science in China Series D:
spatial data mining techniques. International Earth Sciences. 2003;46:373-84.
Archives of Photogrammetry and Remote Pakistan. Government of Pakistan Demographic
Sensing. 2000;33:238-45. Survey. In: Federal Bureau of Statistics, editor.
Foody GM. Status of land cover classification Province Census Report of Sindh: Statistics
accuracy assessment. Remote sensing of Division, Islamabad; 2000.
environment. 2002;80:185-201. Parihar JS, Oza MP. FASAL: an integrated
Gao Z, Liu J, Zhuang D. The research of Chinese approach for crop assessment and production
land-use/land-cover present situations. forecasting. Asia-Pacific Remote Sensing
JOURNAL OF REMOTE SENSING-BEIJING-. Symposium: International Society for Optics and
1999;3:134-8. Photonics; 2006. p. 641101--13.
Garatuza-Payan J, Tamayo A, Watts C, REHMANI E, NAWEED M, SHAHID M,
Rodrguez JC. Estimating large area wheat QADRI S, ULLAH M, GILANI Z. A
evapotranspiration from remote sensing data. Comparative Study of Crop Classification By
Geoscience and Remote Sensing Symposium, Using Radiometric and Photographic Data. Sindh
2003 IGARSS'03 Proceedings 2003 IEEE University Research Journal-SURJ (Science
International: IEEE; 2003. p. 380-2. Series). 2015;47.
Han J, Kamber M. Book on Data Mining: Rundquist BC. Fine-scale spatial and temporal
Concepts and Techniques. Morgan Kaufmann variation in the relationship between spectral
Publishers; 2006. reflectance and a prairie vegetation canopy2000.
Haralick RM, Shanmugam K, Dinstein IH. Shahid M. REA, Naweed M.S., Qadri S.,
Textural features for image classification. Mutiullah. Varietal discrimination of wheat seeds
Systems, Man and Cybernetics, IEEE by machine vision approach. Life Science
Transactions on. 1973:610-21. Journal. 2014;Vol.11:245-56.
Helmholz P, Rottensteiner F, Heipke C. Semi- Shifa MS, Naweed MS, Omar M, Jhandir MZ,
automatic verification of cropland and grassland Ahmed T. Classification of cotton and sugarcane
plants on the basis of their spectral behavior. Pak
J Bot. 2011;43:2119-25.
Svotwa E, Chitambo T, Chiota WM,
Shamudzarira M. Optimizing grass mulch
application rate in flue cured tobacco float
seedlings for the control of salt injury and
improvement of seedling quality. Scientia.
2014;4:43-9.
Szczypiski PM, Strzelecki M, Materka A,
Klepaczko A. MaZdaA software package for
image texture analysis. Computer methods and
programs in biomedicine. 2009;94:66-76.
Taghvaeian S, Chvez J, Hansen N. Ground-
Based Remote Sensing of Corn
Evapotranspiration under Limited Irrigation
Practices. Proceedings of the 32nd Annual
American Geophysical Union Hydrology Days.
2012:119-31.
Taghvaeian S, Chvez JL, Hattendorf MJ,
Crookston MA. Optical and thermal remote
sensing of turfgrass quality, water stress, and
water use under different soil and irrigation
treatments. Remote Sensing. 2013;5:2327-47.
waikato.ac.nz. THE UNIVERSITY OF
WALKATO, COMPUTER SCIENCE
DEPARTMENT. 14,2016.
WalterShea E, Blad B, Hays C, Mesarch M,
Deering D, Middleton E. Biophysical
properties affecting vegetative canopy
reflectance and absorbed photosynthetically
active radiation at the FIFE site. Journal of
Geophysical Research: Atmospheres.
1992;97:18925-34.
Zapotoczny P. Discrimination of wheat grain
varieties using image analysis and neural
networks. Part I. Single kernel texture.
Journal of Cereal Science. 2011;54:60-8.

Você também pode gostar