Você está na página 1de 4

The comparison of significant changes in the Convolutional

Neural Networks with exceptional structure


Erdenebileg B.
e.byambadorj@studenti.unitn.it

1 Introduction weighted filters slide over an image, the fea-


ture maps are convolved in the result.
The assertion that natural images having Due to the reason a CNN learns the values
the property of stationary allows for Con- of filters on its own during the operation, it
volutional Neural Networks to extract fea- should only be provided with a depth size.
tures of any image at less costs, compared
For the handwritten-digit recognition
to the typical expense of sole fully-connected
problem, the deep architecture contains the
networks. Furthermore, even regardless
two of convolutional layers, which extracts
of the advantages apart from its sufficient
32 features in the first layer, then 64 fea-
accuracies in handwritten-digit recognition
tures from the output of the previous layer
problems, the architecture having multi-
in the other one.
convolutional layers demonstrates the as-
sumption that the more convolution steps
get, the more complicated features a model
is able to recognize. 1.2 Pooling
The objective of this experiment con-
Consequences of convolutions produce an
sists of dissecting the architecture mentioned
enormous-sized feature space. Besides, using
above in the exceptional circumstances by
that entire space leads to not only an un-
removing its major layers at a time and in-
manageable situation, but overfitting. Re-
terpret the comparison of them as a conclu-
ducing along its dimensions, it enables to
sion. Eventually, the assumption previously
control overfitting, and the possession of a
specified can also be manifested.
smaller feature space.
The following subsections cover the ba-
Thenceforth, the pooled feature maps are
sic principles of the essential components of
assumed to be prepared for the next convo-
which the deep architecture based off.
lutional layer.
There are the two of 2 2 patched max-
1.1 Convolution
pooling filter corresponding to their convo-
The main purpose of convolution is to ex- lutional layers. By the end of the second
tract as many features as given at a corre- pooling, it results 7 7 patches of the 64
sponding convolutional layer. As the stable- features.

1
1.3 ReLU to avoid overfitting of a training set. Con-
cretely, as being dropped-out in the for-
Rectifier Linear Unit is the activation func-
ward propagation, some neurons are not al-
tion that fires neuron, according to the fol-
lowed to participate in the backpropagation
lowing:
processing. That makes the network less
(x) = max(0, x) (1)
sensitive to the certain weights of neurons.
By using ReLU over tanh or sigmoid ac-
tivations, it can prevent from happening the
1.5 Softmax
vanishing gradient problem.
In this network, softmax is used to determine
X E(W ) ak X (al ) a final decision by outputting each probabil-
= k wkl (2) ities across 0-9 vectors for an arbitrary ex-
ak al al
kc[l] kc[l]
ample, according to the following:
As shown in Equation no.2, the backprop-
agation processing requires a derivative of exp(WiT x)
P (yi |Wi , x) = P T
(5)
the activation with respect to the previous j exp(Wj x)
activation. That implies a derivative of the
activation can influence the gradient update The class has the highest probability is
directly. chosen as a predicted label, then step in the
evaluation.
(al )
= (al )(1 (al )) (3)
al
( 2 Adjustment
(al ) 1, iff al > 0
= (4)
al 0, otherwise 2.1 The first layer exception
Having any parameter, a derivative of
sigmoid activation (Eq no.3) permanently
emerges unstable lower values, and gradu-
ally leads to the vanishing gradient prob-
lem, whereas a derivative of the ReLU (Eq
no.4) generates a value from the straight-
forward options, so that the learning proce-
dure is able to isolate the vanishing gradient
problem. Consequently, the deep architec-
ture manages to learn its weight parameters
comparably faster.
Figure 1: The first ConvLayer exclusion
1.4 Dropout
In the figure 1, abandoning the first convo-
Dropout is the regularization technique for lutional layer, a raw image vector connected
neural networks by excluding randomly se- to the second layer directly, resulted the 64
lected neurons during the training step so as features after 5 5 filters convolving them.

2
Then, the pooling step reduces the spatial model 1 model 2 model 3
size of the feature space to 14 14. After- Accuracy 0.9585 0.9627 0.9561
wards, the network starts to learn its param-
eters as a complete model. Table 1: The test accuracies

2.2 The second layer exception ReLU, and dropout exclusions can estimably
induce the slow convergence, and overfitting.

3 Result
As these models were implemented (with a
learning rate of 104 ) at the learning stages,
the accuracies at every 100th epoch of each
model had been observed (Figure 4).

Figure 2: The second ConvLayer exclusion

Depriving the second convolutional layer,


the exact same configuration made in the
previous subsection that changes the shape
of connection between layers, is performed
(see Figure 2). After the pooling step, an
image vector that has been shaped through
the ConvLayer and the pooling, becomes a
flattened-row vector of R141432 . There-
fore, the fully-connected layer 1 makes that
input vector adapt to a vector of R1024 .

Figure 3: The comparison of the models


2.3 The third layer exception
The third layer is the fully-connected layer For the model 1 and the model 2, their ac-
that contains ReLU, and dropout steps. By curacies had exponentially increased in their
removing this layer, the softmax layer comes first 100 epochs. This comparison positively
into the connection of dendrytes from the implies that the model 3 that having not
second ConvLayer. ReLU and dropout, took a while to converge.
Explicitly, the softmax layer that has 10 In general, the model 1 has slightly better
neurons, connects to its previous layer con- accuracies than ones the model 2 has, com-
taining 7 7 64 neurons. The absence of paring their amplitudes. That is, because

3
the features of different sizes had been ex-
tracted.
After being tested, the test accuracies of
the models (see Table 1) were observed hav-
ing similar values. Even though these mod-
els outclassed capabilities of a sole fully-
connected architecture, compared to the
Deep Convolutional Architecture, they are
obviously incomplete.
As a conclusion, the assumption men-
tioned in the first section have evidently
presented. Eventually, at the very least in
this experiment, the test result has revealed
that these models are insufficient to make
predictions in this problem and needed to
be developed, which tells that using multi-
convolutional layers can provide those defi-
ciencies and those layers can learn more ad-
vanced features from an example.

Você também pode gostar