Você está na página 1de 6

Low Resolution Image Fish Classification

Using Convolutional Neural Network


Muhammad Naufal Rachmatullah Iping Supriana
School of Electrical Engineering and Informatics School of Electrical Engineering and Informatics
Bandung Institute of Technology Bandung Institute of Technology
Bandung, Indonesia Bandung, Indonesia
23516069@std.stei.itb.ac.id iping@informatika.org

Abstract—Fish classification using low resolution images is a another advantage of this architecture is the minimal
challenging task. Some of the prominent problems in this task are preprocessing stage.
environmental changes, varied fish size, feature variance,
segmentation failure and poor image quality. We proposed an Some research on fish classification process using
unsupervised feature extraction (feature learning) to overcome convolutional neural network method has been done. In [7] CNN
these problems. Convolutional neural network is used to extract method is used to perform fish classification process. The
and classify the low- resolution fish images. Data augmentation is accuracy is 98.64% on test data. However, the proposed
also used to deal with imbalance data. Using 2 convolutional layers architecture has a complex architecture and requires a large
combined with dropout and data augmentation, our model number of training iterations (65,000). Another research on fish
achieves 99.7% of accuracy on testing data. classification using CNN is done by [8]. In the study, the
proposed CNN architecture consists of 3 convolution layers, 2
Keywords - Convolutional Neural Network, Data Augmentation, layers pooling and one fully connected layer. In addition, the
Fish Classification, Low Resolution Image. iteration used amounted to 20 iterations. The result shows,
96.51% of accuracy in test data for 4 fish species. Even though
I. INTRODUCTION the architecture is quite simple and has a small number of
Fish recognition and classification is a challenging task in the iteration, it only classifies 4 fish species. It was expected that the
marine and agriculture domain and is a promising field to proposed CNN architecture has a good level of generalization
research further. Even though in real-time data collection with a simple architecture and small number of training iteration.
problem had made a great progress, More improvements are With these settings, not only it can reduce the complexity of
needed to recognize and classify fish from underwater CNN but also can reduce the training and prediction time
images[1]. In order to solve the fish recognition and process.
classification problem, the following problems should be The aim of this study was to overcome the problems of fish
addressed: 1. varied fish size and orientation, 2. feature variance species classification. The proposed approach to overcome these
3. environmental changes, 4. poor image quality, 5. problems was applying an unsupervised approach at the feature
segmentation failures [2]. extraction stage using the deep learning method. We used deep
One approach to solve these problems is by using a feature learning architecture, particularly the convolutional neural
learning algorithm. Feature learning approach learns an network (CNN). The proposed CNN architecture not only
informative feature directly from the data so that the features produced features which have a high level of discrimination but
have a good level of generalization. In addition, features learning also has a simple architecture with small number of iteration. It
algorithm is able to perform feature extraction at mid-level and was expected that with this setting it can improve the
low-level data representation. The ability to achieve multiple classification accuracy and reduce the processing time. This
levels of good feature representation in hierarchical structures paper is structured as follows: Section 2 represents the related
helps build sophisticated recognition systems [3]. works of this study. Section 3 describes the images dataset.
Section 4 describes the implementation of the CNN
Deep learning is one method that implements a feature Architecture. Section 5 present the classification result and
learning process. This method consists of several levels of non- experiment analysis and finally, Section 6 provides a conclusion.
linear operation and is one effective way to represent high-level
abstraction of data. The automatic feature learning process II. RELATED WORK
within several levels of features, enabling a deep learning system
to learn complex functions mapping between input and output
directly from the data [4]. Fish classification system using computer vision approach is
divided into three types based on its input data. The first one is
Deep learning has been successfully applied to various fish classification based on dead fish. In this type of data, images
research domains and provides satisfactory results [5]. One of are obtained by putting the fish above a clear background, which
the deep learning architectures that is suitable for image analysis allows easier processing and classification. The fish position and
and computer vision is the convolutional neural network (CNN). its distance from the camera are also considered. The fish are
One of the main advantages CNN has is being able to perform placed sideways so that the fish appears prominently on the
feature learning directly from the input image. The feature image.
learning result from CNN is a feature map (fmap), which holds
important and unique information from the data [6]. In addition,

978-1-5386-4804-9/18/$31.00 @2018 IEEE


Figure 1. Fish Species Example

The image is also taken from above for a better viewing


angle. These image-taking approaches allow easier fish TABLE I. FISH DATASET DISTRIBUTION
classification, because only the necessary features are shown in ID Species Number of Images
the image. Research conducted by [7] uses this type of data. The 1 Abudefduf vaigiensis 305
fish image is taken by using a smartphone camera. Capture time 2 Acanthurus nigrofuscus 2,511
is made during the daytime with a distance of about 30 cm from 3 Amphiprion clarkii 2,985
the object. Species used in this study is 6 species with each 4 Chaetodon lunulatus 2,494
species amounting to 90 images of fish. The features used in this 5 Chaetodon speculum 24
study are color and texture. The classification method used is 6 Chaetodon trifascialis 375
7 Chromis chrysura 3,593
SVM with accuracy of 97.96%. 8 Dascyllus aruanus 904
Another task is fish classification in constrain environment. 9 Dascyllus reticulatus 3,196
One of research using this type of data has done by [8]. It used 10 Hemigymnus melapterus 147
11 Myripristis kuntee 3,004
life fish dataset which collected using Cam-Trawl technique. 12 Neoglyphidodon nigroris 129
The dataset contains 1325 images from 7 fish species. Every 13 Pempheris Vanicolensis 49
images preprocessed gray level images with 300 x 300 14 Plectrogly-Phidodon dickii 2,456
dimension. This paper also proposed a technique called part 15 Zebrasoma scopas 271
aware to extract fish feature.
B. Convolutional Neural Network Architecture
The last type in fish classification task is for open ocean The Convolutional Neural Network architecture consists of
environment. Fish classification in open ocean environment not several convolution layers and several fully connected layers.
only consider the fish movement factor but also the change in The convolution layer serves as a feature extraction of the input
background image. In addition, the system developed for fish image while the fully connected layer is assigned as a classifier.
classification domains for open water must be able to overcome The convolution layer consists of three main operations:
problems related to affine transformation and distortions such as convolution, non-linearity, and pooling. While fully connected
rotation, lighting changes, object blurring, and object size layer is an input and output pair-wise like a conventional
differences. Research [9] proposed an efficient match kernels multilayer perceptron architecture consisting of input layer,
(EMK) and kernel descriptor (KDES) method to classify fish hidden layer and output layer.
taken from the ocean. The test was performed using MAED
2014 dataset for fish classification task. The dataset comes from The main function of the convolution layer is to generate
10 species of fish, totaling 43,555 trainings and 4987 test images. features from the input image. In convolution process features
Test results provide an accuracy of 88.95%. such as edges, corners, and intersections, are extracted from
images using a variety of kernel types. The results are then
III. METHODOLOGY combined to produce more global features. In general, the
convolution process is first specified the variables Q (x1, x2, ...,
A. Dataset xq) and the variable R (y1, y2, ..., yr) as the output map. Then
define the wrq as the convolutional kernel of R x Q, with the size
In this paper we use dataset from Fish CLEF 2015 [10]. This of each kernel is K x L. Thus, the convolution operation of each
dataset consists two types of data, images and video, but we only kernel wrq and the input image xq is shown in the following
use image dataset to measure the classification performances. equation:
Image dataset consist of image data which belong to 15 fish
species. The dimension of these images also ranges from 22x35
to 408x171. The distribution of the data is shown in table 1. In ∗ =
addition, the example of each fish species is shown in figure 1.
where:

( , ) = ( , ) ( + , + )
In this paper we developed a CNN architecture for low
resolution images. Therefore, the developed architecture cannot
be too deep like common CNN architecture. The use of a very
deep convolutional layers in low-resolution images caused the
signal update (gradient signal) to be very small so that the
learning process becomes very slow or even stop altogether. This
is also called vanishing gradient.
The architecture we developed consists of 2 to 4 convolution (a) Standard fully connected (b) After Applying dropout
layers and one fully connected layer. The number of convolution
layers will be adjusted in the testing phase. The convolution Figure 2. Dropout on fully connected layer.
layer consists of one convolution operation, one nonlinear
operation and one pooling operation. However, specifically for C. Training strategy
pooling operations, it will not always exist on every convolution
Prior to the training process, first fish species datasets
layer. This is because every time a pooling operation with a
divided into two types of data, namely training data and data
kernel of at least 2x2 is done, the feature map reduced 50% from
testing. The portion of data training and testing data is 70% and
the previous one. Excessive use of pooling layers in low-
30%. During the training process, we used batch training with 3
resolution images resulted in the loss of informative features of
settings of 8, 16 and 32. The training was also done using
the image.
different convolution layers, 2, 3, and 4 layers. In addition, we
In the first convolution layer, accept the input in the form of also do training for models that use dropouts and do not use
images that has been resized. This layer has 32 convolution dropouts. In general, the architecture settings are shown in Table
kernels with 3 x 3 x 3 (3 x 3 kernel width with 3 color channels). 2.
Stride / kernel shift used on this layer is one shift. The second
convolution layer, has the same settings as the first layer. TABLE II. ARCHITECTURE
The third and fourth convolution layers consist of 64 Layer Type Kernel Size Stride, Pad
convolution kernels, with kernel widths of 3 x 3. The filter size Architecture for 2 Conv layers
of each layer is made small so that the featured map can represent 1 Conv 3 x 3 x 32 1,0
the fish object as best as possible. Furthermore, the fully Relu
connected layer consists of one hidden layer and one output layer 2 Conv 3 x 3 x 32 1,0
Relu
with each layer consisting of 128 nodes and 15 nodes Pool 2x2 1,0
respectively. Architecture for 3 Conv layers
The nonlinear operation used on each convolution layer is 1 Conv 3 x 3 x 32 1,0
Relu
the Rectified Linear Units (ReLU) as shown in the following 2 Conv 3 x 3 x 32 1,0
equation: Relu
3 Conv 3 x 3 x 64 1,0
Relu
( ) = max( , 0) Pool 2x2 1,0
Architecture for 4 Conv layers
1 Conv 3 x 3 x 32 1,0
with x is a feature map from convolution result. Furthermore, Relu
type of pooling operation used is max pooling with kernel width 2 Conv 3 x 3 x 32 1,0
is 2x2. This pooling operation is only used in the last Relu
convolution layer for 2 and 3 convolution layers, and on layers 3 Conv 3 x 3 x 64 1,0
Relu
3 and 4 for architectures with 4 convolution layers. On the fully Pool 2x2 1,0
connected layer, the last layer (classification layer) used 4 Conv 3 x 3 x 64 1,0
softmax activation function. Loss function used in this Relu
architecture is categorical cross entropy as shown in the Pool 2x2 1,0
Fully Connected Layer (applies in all conv. Architecture)
following equation:
5 Fully connected 128 Node
6 Fully connected 15 Node (class)

= , log ,
D. Data Augmentation
Where S is number of class (fish species), y is binary indicator The data used in this study are highly imbalanced, with the
(1 if label c is the true label for data d, 0 else), p is predicted least species is 24 images and the largest species more than 3500
probability data d is of class c. images (more than 100 times). In order to deals with imbalanced
data, we suggest data augmentation. Data augmentation enables
On a fully connected layer there is a process called dropout. to increase data with some basic image processing techniques to
The dropout process is used to avoid overfitting on trained model get a model with better generalization [12].
[11]. Generally, the process of dropout is to deactivate input
node and hidden node. The selection of dropped nodes is Data augmentation process is done after training data and
determined randomly. The illustration of the drop out process is testing is separated and only done in training data. This process
shown in Figure 2. is performed on species with less than 500 images. The species
are: Abudefduf vaigiensis, Chaetodon speculum, Chaetodon
trifascialis, Hemigymnus melapterus, Neoglyphidodon nigroris,
Pempheris vanicolensis, and Zebrasoma scopas. The operation
used to perform the data augmentation process is rotation. Each
image will rotate by -10o and 10o at random. Example of data TABLE IV. MODEL ACCURACY ON TESTING DATA
augmentation result is shown in figure 3. Setting Testing Accuracy
CNN 2 Layers 8 batch 97.7276103
CNN 2 Layers 8 batch + Data Aug 99.4207634
CNN 2 Layers 16 batch 98.0840636
CNN 2 Layers 16 batch + Data Aug 99.6583989
CNN 2 Layers 32 batch 97.980098
CNN 2 Layers 32 batch + Data Aug 99.73266
CNN 3 Layers 8 batch 97.6830536
CNN 3 Layers 8 batch + Data Aug 99.3762067
CNN 3 Layers 16 batch 98.1731769
CNN 3 Layers 16 batch + Data Aug 99.4059112
Figure 3. Data Augmentation Result CNN 3 Layers 32 batch 98.0395069
CNN 3 Layers 32 batch + Data Aug 99.6286945
CNN 4 Layers 8 batch 98.247438
IV. EXPERIMENTAL RESULT CNN 4 Layers 8 batch + Data Aug 99.0791623
CNN 4 Layers 16 batch 98.7078568
Using dataset described in section III, we first perform an
CNN 4 Layers 16 batch + Data Aug 99.4801723
experiment on influence of dropout. After that we compare CNN 4 Layers 32 batch 98.2771424
different setting of CNN architecture to find the best model. CNN 4 Layers 32 batch + Data Aug 99.6138423
Lastly, we also compering the impact of data augmentation in
testing data accuracy. All test cases are trained with 50 iterations.
The size of input images are 50 x 50 pixels. Moreover, we use CNN with 2 layers has more than 2 Million trainable
GPU tesla P100 and Keras Framework to train the CNN parameters whereas CNN with 3 layers and 4 layers have 3.1
architecture. Million and 800K respectively. Even though CNN 2 Layer has
fewer trainable parameters than CNN with 3 layers, it gives a
A. Dropout vs No Dropout better accuracy. The reason is, CNN with two layers, has
successfully generating a better discriminative feature than CNN
The first stage in our test cases is to measure the performance with three layers to classifying 15 class of fish species.
of model with dropout setting and without dropout. Models was
constructed using settings for 2, 3, and 4 convolution layers with In addition to classification performance measurement,
batch training of 8. Training and testing results show that models training time required on each model is also calculated. The
built using dropout have higher testing accuracy than the models results of time processing measurement are shown in Figure 4.
without dropouts. The best test accuracy is generated by The fastest training time is obtained by CNN with two layers
architecture with 4 layers of convolution with a dropout in fully- with batch training of 32.
connected layers. Table III shows the comparison between the
two settings.
Processing Time
TABLE III. COMPARISON CNN WITH DROPOUT AND WITHOUT DROPOUT 1,500.00
Time (second)

Setting Training Accuracy Testing Accuracy 1,300.00


2 Convolutional Layers 1,100.00
With Dropout 99.96817 97.72761
Without Dropout 99.29981 96.80677 900.00
3 Convolutional Layers 2 Layers 3 Layers 4 Layers
With Dropout 99.72629 97.68305
Without Dropout 99.84087 97.51968 Number of Layer
4 Convolutional Layers batch 8 batch 16 batch 32
With Dropout 99.89815 98.24744
Without Dropout 99.52896 97.54938

Figure 4. Training Time


B. CNN Architecture for Fish Classification
In this section we perform training for various CNN We also compare the results of the classification with several
architecture defined in table II. Each defined architecture trained state of the art methods. The result of our proposed method has
using batch training with size of 8, 16 and 32. In addition to a higher accuracy than other methods as seen in table 5.
comparing the number of layers that produce the greatest data
testing accuracy, this test also aims to determine the effect of
data augmentation. The result of model accuracy in data testing TABLE V. COMPARISON OF CLASSIFICATION ACCURACY

is shown in table IV. Method Result


Graph Embedding Discriminant Analysis [13] 91.66%
From the test results can be seen that the greatest accuracy is
Non-rigid Part Model and Hierarchal Partial 98.4
generated from the model consisting of 2 layers of convolution Classifier [14]
with data augmentation and 32 batch training. In addition, the Hierarchical classification with reject option [15] 96.51%
use of batch training also affects the accuracy of testing data. Ours 99.7%
Seen in the table, the optimum data testing accuracy is generated
from the model using batch training of 32.
TABLE VI. TESTING DATA CLASSIFICATION RESULT

ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 99 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 714 0 0 0 0 0 0 2 0 0 0 0 0 0
3 0 0 890 0 0 1 0 0 0 0 0 0 0 0 0
4 0 0 0 755 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 117 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 1103 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 271 0 0 0 0 0 0 0
9 0 1 0 0 0 0 0 0 952 0 2 0 0 0 0
10 0 0 0 0 0 0 0 0 0 46 0 1 0 0 0
11 1 0 0 0 0 0 0 0 1 1 918 0 0 1 0
12 0 0 0 0 0 0 0 0 0 0 0 48 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0
14 0 0 0 0 0 0 0 0 0 1 4 0 0 689 1
15 0 0 0 0 0 0 1 0 0 0 0 0 0 0 92

C. CNN Model Analysis


The CNN model with the best data testing accuracy is
obtained with an architecture consisting of two convolution
layers with 32 batches of training. The classification error
distribution of this model is shown in the confusion matrix table
as shown in Table 6. From the confusion matrix results, the
precision, recall and f-measure values are calculated (table 7).
Four species (Chaetodon lunulatus, Chaetodon speculum,
Dascyllus aruanus, and Pempheris Vanicolensis) have 100%
precision, recall and f-measure.
Figure 5. Error Instance
Meanwhile, Plectrogly-Phidodon dickii, have the highest
classification error. The most false positive number for this In the CNN model with 2 layers, the first layer produces an
species is classified as species id 11 (Myripristis Kuntee) with 4 output of 32 images with 48 x 48 resolution, while in the second
instances. The reason is these two species have nearly the same layer the output of 32 images with size of 46 x 46. Output of each
color. Moreover, Myripristis Kuntee has more training data than layer of convolution is open in Figure 6.a and 6.b. From the first
Plectrogly-Phidodon dickii, 2082 and 1761 respectively. In layer we can see that the feature maps are condensed in some
addition, this model produces 18 instances classification errors parts of the images. On the other hand, the second layer have
from 6733 testing data as shown in figure 5. more sparse. This feature maps are generating automatically
without any predefined domain knowledge.
TABLE VII. PRECISION, RECALL, F-MEASURE
ID Species Precision Recall F-Measure
1 Abudefduf vaigiensis 100.00 99.00 99.50
2 Acanthurus nigrofuscus 99.72 99.86 99.79
3 Amphiprion clarkii 99.89 100.00 99.94
4 Chaetodon lunulatus 100.00 100.00 100.00
5 Chaetodon speculum 100.00 100.00 100.00
6 Chaetodon trifascialis 100.00 99.15 99.57
7 Chromis chrysura 100.00 99.91 99.95
8 Dascyllus aruanus 100.00 100.00 100.00
9 Dascyllus reticulatus 99.69 99.69 99.69
10 Hemigymnus melapterus 97.87 95.83 96.84
11 Myripristis kuntee 99.57 99.35 99.46
12 Neoglyphidodon nigroris 100.00 97.96 98.97
13 Pempheris Vanicolensis 100.00 100.00 100.00
14 Plectrogly-Phidodon dickii 99.14 99.86 99.49
15 Zebrasoma scopas 98.92 98.92 98.92
Total 99.65 99.30 99.48
(a)
Recognit., vol. 65, pp. 273–284, 2017.
[4] Y. Bengio, “Learning Deep Architectures for AI,” Found. Trends®
Mach. Learn., vol. 2, no. 1, pp. 1–127, 2009.
[5] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “A
survey of deep neural network architectures and their applications,”
Neurocomputing, vol. 234, pp. 11–26, 2017.
[6] V. Sze, Y.-H. Chen, T.-J. Yang, and J. Emer, “Efficient Processing of
Deep Neural Networks: A Tutorial and Survey,” pp. 1–32, 2017.
[7] J. Hu, D. Li, Q. Duan, Y. Han, G. Chen, and X. Si, “Fish species
classification by color, texture and multi-class support vector machine
using computer vision,” Comput. Electron. Agric., vol. 88, pp. 133–
140, 2012.
[8] M. C. Chuang, J. N. Hwang, F. F. Kuo, M. K. Shan, and K. Williams,
(b) “Recognizing live fish species by hierarchical partial classification
based on the exponential benefit,” 2014 IEEE Int. Conf. Image
Figure 6. Feature Maps (a) from first layer (b) from second layer Process. ICIP 2014, pp. 5232–5236, 2014.
[9] S. Palazzo and F. Murabito, “Fish Species Identification in Real-Life
V. CONCLUSION Underwater Images,” Proc. ACM Int. Work. Multimed. Anal. Ecol.
The use of CNN methods in fish classification tasks on low Data, pp. 13–18, 2014.
resolution images provides 99.7% classification accuracy on [10] A. Joly et al., “LifeCLEF 2015 : Multimedia Life Species
FishCLEF dataset consisting of 15 species of fish. This was Identification Challenges To cite this version : HAL Id : hal-
achieved by using 2 layers of convolution combined with data 01182782 LifeCLEF 2015 : Multimedia Life Species Identification
augmentation and dropout. CNN with two layers, has Challenges,” 2015.
successfully generating a better discriminative feature than
[11] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R.
others to classifying 15 class of fish species. The framework we
developed, does not have many parameters, so it can be extended Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks
to other tasks related to image classification. In the future, we from Overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958,
would like to use our model to classify fish species using video 2014.
input in FishCLEF dataset. [12] A. Krizhevsky, I. Sutskever, and H. Geoffrey E., “ImageNet
Classification with Deep Convolutional Neural Networks,” Adv.
REFERENCES Neural Inf. Process. Syst. 25, pp. 1–9, 2012.
[13] S. Hasija, M. J. Buragohain, and S. Indu, “Fish Species Classification
[1] M. Chuang, J. Hwang, and K. Williams, “A Feature Learning and Using Graph Embedding Discriminant Analysis,” 2017 Int. Conf.
Object Recognition Framework for Underwater Fish Images,” vol. Mach. Vis. Inf. Technol., pp. 81–86, 2017.
25, no. 4, pp. 1862–1872, 2016. [14] M. C. Chuang, J. N. Hwang, and K. Williams, “A feature learning and
[2] M. Alsmadi, K. B. Omar, S. A. Noah, and I. Almarashde, “A Hybrid object recognition framework for underwater fish images,” IEEE
Memetic Algorithm with Back-propagation Classifier for Fish Trans. Image Process., vol. 25, no. 4, pp. 1862–1872, 2016.
Classification Based on Robust Features Extraction from PLGF and [15] P. X. Huang, B. J. Boom, and R. B. Fisher, “GMM improves the reject
Shape Measurements,” Inf. Technol. J., vol. 10, no. 5, pp. 944–954, option in hierarchical classification for fish recognition,” 2014 IEEE
May 2011. Winter Conf. Appl. Comput. Vision, WACV 2014, pp. 371–376, 2014.
[3] M. Martineau, D. Conte, R. Raveaux, I. Arnault, D. Munier, and G.
Venturini, “A survey on image-based insect classification,” Pattern

Você também pode gostar