Você está na página 1de 13

Classification of Pakistani Flora Using Machine Vision Approach

Muhammad Abid Saleem


PhD Schollar
Islamia university Bahawalpur
Abstract:
Pakistan is rich country in terms of flora. It has a big variety of plants from herbs to
trees. In this study present a system for automated classification of plants leaves on the bases
of leave images. We target 10 fruit and 2 non fruit plants which are very common in Pakistan.
These plants not only grow for commercially but also as a house hold item. Images of the leaf
scan by high resolution scanner Cannon 1300. Consider 4 features of the leaf and then applying
supervised K-Nearest Neighbor training on the system using 150 samples. After training system
tested using 1440 items and find 87.73 percent of accuracy.

Introduction:
Following the establishment of the new nation of Pakistan in 1947, a high priority was
given to establishment of universities and a scientific infrastructure. Early in the scientific
planning, the production of a Flora was seen as a priority in the area of botany. In 1960, R. R.
Stewart retired from active work at Gordon College, and turned his herbarium of some 50,000
specimens over to his collaborator Prof. E. Nasir[1]. The “Stewart Herbarium,” later presented
as a gift to the nation to create the nucleus of the National Herbarium of Pakistan [2], was a
critical element since most specimens collected earlier in Pakistan were kept in either European
or Indian herbaria [3]. This collection, and that established later at the University of Karachi by
S.I. Ali, provided the necessary foundation to begin writing the flora. With initial funding from
the U.S. Department of Agriculture, the Flora of Pakistan project was initiated in 1968, with
Nasir and Ali as Joint Editors, and in 1970, the first fascicle of the Flora (Flacourtiaceae)
appeared. Stewart [1] published a preliminary checklist of the plants of the region and a guide
to the developing Flora project, listing 5,783 species. Subsequent Flora treatments have not
changed that overall estimate of species numbers appreciably [4], although new treatments for
individual genera/ families differ, often substantially, from those by Stewart.

By 1995 the Flora project had produced 197 treatments (one per family), ranging in size
from a few pages to nearly 500 pages (Poaceae). Nasir (replaced after his death by M. Qaiser)
and Ali or their colleagues and students wrote many of these treatments, while others have
been completed by specialist’s worldwide working with them. Even though Pakistani herbaria
have developed rapidly, authors have had to consult extensively with British and other foreign
herbaria since they contain large historical collections and the type specimens of most species
from Pakistan.

Following the expiration of USDA funding in 1995 and a period of reduced activity due to
lack of funding, the Flora of Pakistan was revived following negotiations between S. I. Ali and
Peter H. Raven of the Missouri Botanical Garden. In February 2000, the University of Karachi
and the Missouri Botanical Garden signed an agreement to co-publish the remaining volumes of
the Flora of Pakistan [5], According to the agreement, the University would provide edited
manuscripts and print the volumes, and the Garden would provide or raise funds in support of
the project, and promote and distribute the Flora outside of Pakistan. The Garden supported
this proposal for several reasons:

1) It would finish a near-complete Flora of an important and insufficiently known region.

2) It connects geographically and floristically with the Flora of China project


headquartered at the Garden (many taxa in common, often requiring a coordinated
approach)

3) It provides the best opportunity to develop a database for plants of South Asia, able
to interface with databases for China and elsewhere and serve Pakistan as an important
biodiversity management tool

4) The project serves as a focus for botanical research in Pakistan, providing training,
research, and employment opportunities for indigenous botanists.

5) It supports new botanical exploration and collecting in Pakistan, and provides a


potential source of new specimens from that region, which is poorly represented in
American herbaria.

Electronic Database of Pakistani Flora:


The development of the electronic database of all plants in Pakistan was a logical and
necessary extension of the project. No other floras in the south Asian region are available
electronically, nor are any likely to be available in the near future. A comprehensive and
accessible on-line flora of Pakistan is an essential step toward understanding the plants of
South Asia generally. Information from Pakistan, especially in an accessible database form, is
particularly useful in relation to other floristic projects such as the Flora of China, the Flora
Malesiana, and ongoing work in India and in the central Asian region. Local published floras
exist for many parts of Pakistan (Stewart 1982), but the Flora of Pakistan supersedes them. The
Flora of Afghanistan [6] is actually a synoptical checklist of limited scope, covering only the
results of expeditions to the Karakoram and Hindu Kush by Japanese botanists in 1955. A later
report from the same expedition [7] enumerated plants from the part of the region in Pakistan.
This region where the Western Himalayas meet the Karakorams and the Hindu Kush in northern
Pakistan and the northern Baluchistan region are both rich in endemic plants, and many genera
of agricultural and horticultural importance occur in Pakistan, yet our knowledge of them and
access to information about them is limited at the present time.

Related Work In The World:


There are more than 270,000 plant species that have been named around the world [8].
Therefore it is not possible for peoples to recognize all. There are a lot of researchers trying to
build a system that can automatically recognize plants by using leaf, flowers and seeds. Some of
the researches have the following details.

1. Plant flower recognition system:


There are a lot of researchers using flower features such as color, size, shape, boundary
etc. to recognize flowers. T. Kaneko [9] used both the leaf and flower to recognize wild flowers.
M. H. Gandelin [10] applied modified Fourier descriptor and shape analysis to recognize rose
flowers.
Y.S Cho and P.T. Lim [11] demonstrated a virus infection clustering for flower image
identification.

2. Plant Seed Recognition Systems:


A study was held for Argentina’s commercial production industry by P.M Granitto [12] using
Naïve Bays classifier. Color and shape base seed recognition system was developed by
Arman Arefi [13]. Seed identification of plants done by Kiratiratanapruk, K [14] using
Support Vector Machine. ANN-BP model used by Jinwei Li [15] to identify the rapeseed.
Rubber seed identification perform by Hashim, H [16] using ANN with Levenberg-Marquardt
Algorithm. Cotton seed categorization by K.S., Jamuna [17] using Naïve Bayes Classifier.

3. Plant Leaf Recognition System:


We focus on Leaf for this system because it is easy to collect and have mostly features
on available on 2D surface. We have to scan or capture by camera and leaf is ready for feature
extraction. Most of the features depend on shape not color and symmetry among the leaf is
found better then flowers and leaf. Mostly disease of plants identify by leaves. There are many
techniques use for leaf recognition. Some very successful techniques are following.
3.1. Neural Network Technique:
Hong and Chi [18] applied neural network methods for vein pattern extraction to
recognize leaf images. Jiazhi and He [19] proposed neural network methods for recognizing
digital images of plant leaves. Stephen [20] presented a leaf recognition algorithm for plant
classification using a probabilistic neural network. Huang and Peng [21] studied leaf shape and
texture features combined with a probabilistic neural network to recognize 30 kinds of
broadleaved trees. Yun and Zhu [22] proposed leaf vein extraction combined with a cellular
neural network for plant recognition. Panagiotis [23] implemented a feed-forward neural
network for the classification of plant leaves. Xiao [24] used k-nearest neighbor classification
and a probabilistic neural network to recognize plant leaves.

3.2. Fuzzy Logic Technique:


Yan [25] proposed fuzzy curves and surfaces to identify and diagnose cotton
diseases using cotton leaf images.
3.3. Support Vector Machine:
Jordi [26] implemented a support vector machine to recognize plant leaves. Wu
and Chengwei [27] used the support vector machine method to measure the damage
degree of leaf miners. Liqun [28] used Support Vector Machine for growth of flue-cured
tobacco leaves.

3.4. Leaf shape matching:


Wang [29] presented leaf image retrieval by using simple leaf shape features and
the centroid-contour distance. Cholhong and Nishida [30] used Acer spp. leaf shapes and
polygon approximation to recognize Acer plant species. Ji [31] proposed a leaf shape
matching method for plant species.
3.5. Moving center hypersphere classification
Guo [32] used the moving center hypersphere technique to classify plant leaves.
Xiao [33] proposed a moving center hypersphere to recognize leaf images. Based on the
previous work, the present research tried to extract more leaf features to increase the
recognition precision and apply a simple matching algorithm to identify Thai herb
leaves.

System Development:
This is software project developed using C# .Net and SQL Server. The system is consist of
two parts
1) Image Processing.
2) Train the system to generate basic Knowledgebase.
3) Testing of System.

1. Image Processing:
This is the core part of system, responsible for indentify the image and label the classes.
1) Image Capture
2) Feature Extraction

1.1. Image Capture:


Image Capturing System is quite simple. We use high resolution scanner (3500 x 2300)
to digitize the images.

1.2. Image Analysis:


After capturing the image, we have made the image able to analysis. For this purpose
we perform two more steps.
1) Modified the image according to requirement
2) Features Extraction.
Now more details of module is presented here

1.1.1. Modification of Image:


We can apply the image identifying process directly on leaf image. We have to modify
the image for this purpose. Following modification will be apply on image for further
processing.
Resizing Image: There are different sizes of images of different plants. And also a single plant
have differ size of leaves. So that it is necessary that each leaf image should on similar size and
aspect ratio.
Color Modification: The Image scanned from scanner is colored. To apply feature extraction we
modify the image twice. First it converts to gray scale image and then convert to bit (pure black
& white) image.
Filling the Holes: There might be a chance that there are some holes and noise on black and
white image or the leaf is breaks. The system automatically fill the holes and remove the noise
on the image and make it pure black and white which means that the background is totally
black and the leaf is totally white.
Boundary Tracking: system tracks the boundary of black and white image. Boundary tracking
helps to analyze that which is background and which is the leaf in an image so it is very helpful
for extracting features.

1.2. Feature Extraction:


Features are the characteristic of leaf that can be used in leaf recognition process.
Aspect ratio: Is the ratio of the height to the width in an image system calculates the aspect
ratio by the equation 1:
AspectRatio = hp/wp (1)
Where hp is the height of image in pixels and wp is the width of image in pixels.

Roundness: is an approximate roundness value of a leaf. System measures the similarity of the
leaf to a round object. We can calculate roundness by equation2
RN = 4π * A/B2 (2)
Where, RN = roundness, A = area is the area of the leaf found by counting the number of
white pixels in the leaf only and B = the approximate length of the leaf boundary. A roundness
value of RN = 1 indicates a perfectly round object.

Upper Leaf Area Ratio: is calculated by dividing upper leaf area by upper image area. The upper
leaf area is calculated by counting the white pixels in upper leaf image.

Lower leaf area ratio: is calculated by dividing the lower leaf area by lower image area. The
lower leaf area is calculated by counting the white pixels in lower leaf image.
System Block Diagram

Image Modification

Input Images Classified Images


Feature Extraction

Classification

System Flow Chart

Start

Image Capture by high


resolution scanner

Image Analysis

Classified Data
Classified Image
Database
Classified Image

End
Image Analysis

Start

Image Modification

Resize Image

Color
Modification

Filling Holes

Boundary
Tracking

1
1

Feature Extraction

Aspect Ratio

Roundness

Upper Leaf Area


Ratio

Lower Leaf Area


Ratio

End

2. Training the System for Basic Knowledgebase:


For an automated system that recognized the leaf images we have to train the system.
There are many methods for training. K-NN (K-Nearest Neighbor) is very successful and easy to
implement.
In this method is K is number of features and Nearest Neighbor are the values of the
features. We decided to create a basic knowledgebase for this system having 150 samples of 12
species.
The formula for K-NN is
√(𝑎𝑛 − 𝑎𝑛−1 − … . 𝑎1 )2 + (𝑏𝑛 − 𝑏𝑛−1 − … 𝑏1 )2 + (𝑐𝑛 − 𝑐𝑛−1 − … . 𝑐1 )2 + … . (3)
Where a, b, c are features of the class and n is the member of class. Every now and then
we extract the features from image. We save these values in database.
3. Results & Discussion
After complete the training we test the project. During testing system continue to learn. We
extract the features from image, then query features from database in form of class, put them in
formula and compare the values to classify the Leaf. Initially system was trained on 20 leaves for each
species.
Three times system is trained and testes and each time system overall accuracy increases. Two
operations perform on system Training and Test and three possible outputs. Identified means system
correctly identify the leaf. System may be wrongly identified the leaf to some other species in spite of
correct one. This identification marked as Incorrect. System may also unable to recognize the leaf for
any already available class. This output is mark as Not Identified. Incorrect and Not Identified are
incorrect answers and reduce the accuracy of system. Description of these trainings is here.

 After 1st Training

Sr Not % of
Name Trained Tested Identified Incorrect
No Identified Accuracy
1 Magniferia Indica 20 20 12 8 0 60
2 Psidium guajava 20 20 12 8 0 60
3 Dilbergia Sisso 20 20 4 16 0 20
4 Punica Granatum 20 20 8 12 0 40
5 Caradamom 20 20 8 12 0 40
6 Lausonia Inermis 20 20 16 4 0 80
7 Zizyphus Jajaba 20 20 16 4 0 80
8 Eucalyptus Citriodora 20 20 8 8 4 40
9 Helichrysum Melitense 20 20 4 16 0 20
10 Morus Nigra 20 20 16 4 0 80
11 Eugenia Jambolana 20 20 8 12 0 40
The highest results from Lausonia Inermis, Zizyphus Jajaba, Morus Nigra which shows 80% accuracy and
lowest from Dilbergia Sisso & Helichrysum Melitense which sow 20% accuracy.
90
80
70
60
50
40
Trained
30
20 Tested
10
0 Identified

Lausonia inermis

Morus nigra
Eucalyptus citriodora
Magniferia Indica

Punica Granatum

Eugenia jambolana
Psidium guajava

Dilbergia Sisso

Zizyphus jajaba
caradamom

Helichrysum melitense
Incorrect
Not Identified
% of Accuracy

1 2 3 4 5 6 7 8 9 10 11

Overall accuracy of System: 50.91

 2nd Training:
Sr Not % of
Name Trained Tested Identified Incorrect
No Identified Accuracy
1 Magniferia Indica 20 20 16 2 0 80
2 Psidium guajava 20 20 16 2 0 80
3 Dilbergia Sisso 20 20 16 2 0 80
4 Punica Granatum 20 20 20 0 0 100
5 caradamom 20 20 18 1 1 90
6 Lausonia inermis 20 20 8 6 0 40
7 Zizyphus jajaba 20 20 8 6 0 40
8 Eucalyptus citriodora 20 20 20 0 0 100
9 Helichrysum melitense 20 20 16 4 0 80
10 Morus nigra 20 20 16 4 0 80
11 Eugenia jambolana 20 20 12 4 0 60
120

100

80

60
Trained
40 Tested
Identified
20
Incorrect
0 Not Identified

Morus nigra
Magniferia Indica

Punica Granatum

Lausonia inermis

Eucalyptus citriodora
Psidium guajava

Eugenia jambolana
caradamom

Zizyphus jajaba

Helichrysum melitense
Dilbergia Sisso

% of Accuracy

1 2 3 4 5 6 7 8 9 10 11

Overall Accuracy: 67.27

 3rd Training:

Not % of
Sr No Name Trained Tested Identified Incorrect
Identified Accuracy
1 Magniferia Indica 20 20 20 0 0 100
2 Psidium guajava 20 20 20 0 0 100
3 Dilbergia Sisso 20 20 20 0 0 100
4 Punica Granatum 20 20 20 0 0 100
5 caradamom 20 20 16 4 0 80
6 Lausonia inermis 20 20 19 0 1 95
7 Zizyphus jajaba 20 20 16 4 0 80
8 Eucalyptus citriodora 20 20 20 0 0 100
9 Helichrysum melitense 20 20 15 3 2 75
10 Morus nigra 20 20 15 5 0 75
11 Eugenia jambolana 20 20 12 3 0 60

10
20
30
40
50
60
70
80
90
40
60
80

20
0

0
100
100
120

Conclusion:
Magniferia Indica

1
2
Psidium guajava

Evolution Training:
Overall accuracy: 87.73
3
Dilbergia Sisso

2
4
Punica Granatum

5
caradamom

Accuracy %

3
6
Lausonia inermis
7

Zizyphus jajaba
8

Eucalyptus citriodora

Accuracy
9

Helichrysum melitense
10

Morus nigra
11

Eugenia jambolana
Series5
Series4
Series3
Series2
Series1

Series6
The system completes the primary objectives of research. There are total 12 species of plants were
considered. System trained and tested 3 times. Every plant has 20 images in each stage in training set
and 20 in test set. The images total images use in the system are 1440. The graphs are given above
clearly show the increase in system evaluation of system. In the first test the system was train on 20
samples of each specie and 20 samples in test, the overall performance of system was 50.91%. The best
performance is Lausonia Inermis, Zizyphus Jajaba and Morus Nigra. These species output was 80% but
Helichrysum melitense show only 20%. In the 2nd run Punica Granatum and Eucalyptus citriodora show
maximum performance at 100% but Lausonia inermis, Zizyphus jajaba work lowest position at 40%. In
the third and final test total 5 species show 100% performance and Eugenia jambolana was at the lowest
rank with 60% performance.

Future Work:
It is very good to see that system have 87.73 after the last training but there are more considerations as
well. The images were scan on very high performance scanner in controlled environment but in real life
these scanner are not available. We required developing a system that can work with poor quality
pictures. We use 4 features to extract and match but leaf has many other features in its shape. We
required to extracting more features in shape. We did not work on color features. In future the color
features increase the reliability of the work.