Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract—Fish classification using low resolution images is a another advantage of this architecture is the minimal
challenging task. Some of the prominent problems in this task are preprocessing stage.
environmental changes, varied fish size, feature variance,
segmentation failure and poor image quality. We proposed an Some research on fish classification process using
unsupervised feature extraction (feature learning) to overcome convolutional neural network method has been done. In [7] CNN
these problems. Convolutional neural network is used to extract method is used to perform fish classification process. The
and classify the low- resolution fish images. Data augmentation is accuracy is 98.64% on test data. However, the proposed
also used to deal with imbalance data. Using 2 convolutional layers architecture has a complex architecture and requires a large
combined with dropout and data augmentation, our model number of training iterations (65,000). Another research on fish
achieves 99.7% of accuracy on testing data. classification using CNN is done by [8]. In the study, the
proposed CNN architecture consists of 3 convolution layers, 2
Keywords - Convolutional Neural Network, Data Augmentation, layers pooling and one fully connected layer. In addition, the
Fish Classification, Low Resolution Image. iteration used amounted to 20 iterations. The result shows,
96.51% of accuracy in test data for 4 fish species. Even though
I. INTRODUCTION the architecture is quite simple and has a small number of
Fish recognition and classification is a challenging task in the iteration, it only classifies 4 fish species. It was expected that the
marine and agriculture domain and is a promising field to proposed CNN architecture has a good level of generalization
research further. Even though in real-time data collection with a simple architecture and small number of training iteration.
problem had made a great progress, More improvements are With these settings, not only it can reduce the complexity of
needed to recognize and classify fish from underwater CNN but also can reduce the training and prediction time
images[1]. In order to solve the fish recognition and process.
classification problem, the following problems should be The aim of this study was to overcome the problems of fish
addressed: 1. varied fish size and orientation, 2. feature variance species classification. The proposed approach to overcome these
3. environmental changes, 4. poor image quality, 5. problems was applying an unsupervised approach at the feature
segmentation failures [2]. extraction stage using the deep learning method. We used deep
One approach to solve these problems is by using a feature learning architecture, particularly the convolutional neural
learning algorithm. Feature learning approach learns an network (CNN). The proposed CNN architecture not only
informative feature directly from the data so that the features produced features which have a high level of discrimination but
have a good level of generalization. In addition, features learning also has a simple architecture with small number of iteration. It
algorithm is able to perform feature extraction at mid-level and was expected that with this setting it can improve the
low-level data representation. The ability to achieve multiple classification accuracy and reduce the processing time. This
levels of good feature representation in hierarchical structures paper is structured as follows: Section 2 represents the related
helps build sophisticated recognition systems [3]. works of this study. Section 3 describes the images dataset.
Section 4 describes the implementation of the CNN
Deep learning is one method that implements a feature Architecture. Section 5 present the classification result and
learning process. This method consists of several levels of non- experiment analysis and finally, Section 6 provides a conclusion.
linear operation and is one effective way to represent high-level
abstraction of data. The automatic feature learning process II. RELATED WORK
within several levels of features, enabling a deep learning system
to learn complex functions mapping between input and output
directly from the data [4]. Fish classification system using computer vision approach is
divided into three types based on its input data. The first one is
Deep learning has been successfully applied to various fish classification based on dead fish. In this type of data, images
research domains and provides satisfactory results [5]. One of are obtained by putting the fish above a clear background, which
the deep learning architectures that is suitable for image analysis allows easier processing and classification. The fish position and
and computer vision is the convolutional neural network (CNN). its distance from the camera are also considered. The fish are
One of the main advantages CNN has is being able to perform placed sideways so that the fish appears prominently on the
feature learning directly from the input image. The feature image.
learning result from CNN is a feature map (fmap), which holds
important and unique information from the data [6]. In addition,
( , ) = ( , ) ( + , + )
In this paper we developed a CNN architecture for low
resolution images. Therefore, the developed architecture cannot
be too deep like common CNN architecture. The use of a very
deep convolutional layers in low-resolution images caused the
signal update (gradient signal) to be very small so that the
learning process becomes very slow or even stop altogether. This
is also called vanishing gradient.
The architecture we developed consists of 2 to 4 convolution (a) Standard fully connected (b) After Applying dropout
layers and one fully connected layer. The number of convolution
layers will be adjusted in the testing phase. The convolution Figure 2. Dropout on fully connected layer.
layer consists of one convolution operation, one nonlinear
operation and one pooling operation. However, specifically for C. Training strategy
pooling operations, it will not always exist on every convolution
Prior to the training process, first fish species datasets
layer. This is because every time a pooling operation with a
divided into two types of data, namely training data and data
kernel of at least 2x2 is done, the feature map reduced 50% from
testing. The portion of data training and testing data is 70% and
the previous one. Excessive use of pooling layers in low-
30%. During the training process, we used batch training with 3
resolution images resulted in the loss of informative features of
settings of 8, 16 and 32. The training was also done using
the image.
different convolution layers, 2, 3, and 4 layers. In addition, we
In the first convolution layer, accept the input in the form of also do training for models that use dropouts and do not use
images that has been resized. This layer has 32 convolution dropouts. In general, the architecture settings are shown in Table
kernels with 3 x 3 x 3 (3 x 3 kernel width with 3 color channels). 2.
Stride / kernel shift used on this layer is one shift. The second
convolution layer, has the same settings as the first layer. TABLE II. ARCHITECTURE
The third and fourth convolution layers consist of 64 Layer Type Kernel Size Stride, Pad
convolution kernels, with kernel widths of 3 x 3. The filter size Architecture for 2 Conv layers
of each layer is made small so that the featured map can represent 1 Conv 3 x 3 x 32 1,0
the fish object as best as possible. Furthermore, the fully Relu
connected layer consists of one hidden layer and one output layer 2 Conv 3 x 3 x 32 1,0
Relu
with each layer consisting of 128 nodes and 15 nodes Pool 2x2 1,0
respectively. Architecture for 3 Conv layers
The nonlinear operation used on each convolution layer is 1 Conv 3 x 3 x 32 1,0
Relu
the Rectified Linear Units (ReLU) as shown in the following 2 Conv 3 x 3 x 32 1,0
equation: Relu
3 Conv 3 x 3 x 64 1,0
Relu
( ) = max( , 0) Pool 2x2 1,0
Architecture for 4 Conv layers
1 Conv 3 x 3 x 32 1,0
with x is a feature map from convolution result. Furthermore, Relu
type of pooling operation used is max pooling with kernel width 2 Conv 3 x 3 x 32 1,0
is 2x2. This pooling operation is only used in the last Relu
convolution layer for 2 and 3 convolution layers, and on layers 3 Conv 3 x 3 x 64 1,0
Relu
3 and 4 for architectures with 4 convolution layers. On the fully Pool 2x2 1,0
connected layer, the last layer (classification layer) used 4 Conv 3 x 3 x 64 1,0
softmax activation function. Loss function used in this Relu
architecture is categorical cross entropy as shown in the Pool 2x2 1,0
Fully Connected Layer (applies in all conv. Architecture)
following equation:
5 Fully connected 128 Node
6 Fully connected 15 Node (class)
= , log ,
D. Data Augmentation
Where S is number of class (fish species), y is binary indicator The data used in this study are highly imbalanced, with the
(1 if label c is the true label for data d, 0 else), p is predicted least species is 24 images and the largest species more than 3500
probability data d is of class c. images (more than 100 times). In order to deals with imbalanced
data, we suggest data augmentation. Data augmentation enables
On a fully connected layer there is a process called dropout. to increase data with some basic image processing techniques to
The dropout process is used to avoid overfitting on trained model get a model with better generalization [12].
[11]. Generally, the process of dropout is to deactivate input
node and hidden node. The selection of dropped nodes is Data augmentation process is done after training data and
determined randomly. The illustration of the drop out process is testing is separated and only done in training data. This process
shown in Figure 2. is performed on species with less than 500 images. The species
are: Abudefduf vaigiensis, Chaetodon speculum, Chaetodon
trifascialis, Hemigymnus melapterus, Neoglyphidodon nigroris,
Pempheris vanicolensis, and Zebrasoma scopas. The operation
used to perform the data augmentation process is rotation. Each
image will rotate by -10o and 10o at random. Example of data TABLE IV. MODEL ACCURACY ON TESTING DATA
augmentation result is shown in figure 3. Setting Testing Accuracy
CNN 2 Layers 8 batch 97.7276103
CNN 2 Layers 8 batch + Data Aug 99.4207634
CNN 2 Layers 16 batch 98.0840636
CNN 2 Layers 16 batch + Data Aug 99.6583989
CNN 2 Layers 32 batch 97.980098
CNN 2 Layers 32 batch + Data Aug 99.73266
CNN 3 Layers 8 batch 97.6830536
CNN 3 Layers 8 batch + Data Aug 99.3762067
CNN 3 Layers 16 batch 98.1731769
CNN 3 Layers 16 batch + Data Aug 99.4059112
Figure 3. Data Augmentation Result CNN 3 Layers 32 batch 98.0395069
CNN 3 Layers 32 batch + Data Aug 99.6286945
CNN 4 Layers 8 batch 98.247438
IV. EXPERIMENTAL RESULT CNN 4 Layers 8 batch + Data Aug 99.0791623
CNN 4 Layers 16 batch 98.7078568
Using dataset described in section III, we first perform an
CNN 4 Layers 16 batch + Data Aug 99.4801723
experiment on influence of dropout. After that we compare CNN 4 Layers 32 batch 98.2771424
different setting of CNN architecture to find the best model. CNN 4 Layers 32 batch + Data Aug 99.6138423
Lastly, we also compering the impact of data augmentation in
testing data accuracy. All test cases are trained with 50 iterations.
The size of input images are 50 x 50 pixels. Moreover, we use CNN with 2 layers has more than 2 Million trainable
GPU tesla P100 and Keras Framework to train the CNN parameters whereas CNN with 3 layers and 4 layers have 3.1
architecture. Million and 800K respectively. Even though CNN 2 Layer has
fewer trainable parameters than CNN with 3 layers, it gives a
A. Dropout vs No Dropout better accuracy. The reason is, CNN with two layers, has
successfully generating a better discriminative feature than CNN
The first stage in our test cases is to measure the performance with three layers to classifying 15 class of fish species.
of model with dropout setting and without dropout. Models was
constructed using settings for 2, 3, and 4 convolution layers with In addition to classification performance measurement,
batch training of 8. Training and testing results show that models training time required on each model is also calculated. The
built using dropout have higher testing accuracy than the models results of time processing measurement are shown in Figure 4.
without dropouts. The best test accuracy is generated by The fastest training time is obtained by CNN with two layers
architecture with 4 layers of convolution with a dropout in fully- with batch training of 32.
connected layers. Table III shows the comparison between the
two settings.
Processing Time
TABLE III. COMPARISON CNN WITH DROPOUT AND WITHOUT DROPOUT 1,500.00
Time (second)
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 99 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 714 0 0 0 0 0 0 2 0 0 0 0 0 0
3 0 0 890 0 0 1 0 0 0 0 0 0 0 0 0
4 0 0 0 755 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 117 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 1103 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 271 0 0 0 0 0 0 0
9 0 1 0 0 0 0 0 0 952 0 2 0 0 0 0
10 0 0 0 0 0 0 0 0 0 46 0 1 0 0 0
11 1 0 0 0 0 0 0 0 1 1 918 0 0 1 0
12 0 0 0 0 0 0 0 0 0 0 0 48 0 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0
14 0 0 0 0 0 0 0 0 0 1 4 0 0 689 1
15 0 0 0 0 0 0 1 0 0 0 0 0 0 0 92