Escolar Documentos
Profissional Documentos
Cultura Documentos
by
MD.SAMIN YEASER FAHIM
October, 2018
2
ACKNOWLEDGEMENT
All praise to the Almighty Allah for blessing me with the means, courage and
supervisor and revered mentor, for his constant direction and inspiration in pursuing
the whole investigation of this research project and for correcting the mistakes
whenever necessary. I thank him from the bottom of my heart for his sincere
cooperation, precious advice and consistent encouragement and motivation. Had he
not been there, the research could not have been completed.
for their help, co-operation and valuable advice during this research project that has
wisdom and for generously examining my research works despite his busy schedule.
Finally, I would like to thank my family and friends for their unflagging love and
ii
3
ABSTRACT
the component exceed its strength. Most of the building components are designed as
crack section, except for some special structures such as water retaining structures,
ETP etc., so crack is inevitable in RCC structure. However cracks in structure may
pose threat and reduce durability and sustainability of the structure if it happens due
detecting civil infrastructure defects to partially replace human conducted on- site
inspections. These IPTs are primarily used to manipulate images to extract defect
features, such as cracks in concrete and steel surfaces. However, the extensively
varying real-world situations (e.g., lighting and shadow changes) can lead to
challenges to the wide adoption of IPTs. To overcome these challenges, use of deep
In this study two slightly different CNN image classifiers are built for the purpose
of detecting and classifying cracks from concrete beams. Training and validation
was conducted using two different datasets are created for the different classifiers,
containing 40051 labelled images of 32x32 pixels and 123 labelled images of 64x64
pixels.
for training and validation processes. The results show that the proposed methods
show quite a good performances and can indeed classify concrete cracks in realistic
situations.
iii
4
Table of Contents
Page No.
ACKNOWLEDGEMENT ii
ABSTRACT iii
Table of Contents iv
List of Tables vii
Chapter 1
1
INTRODUCTION
1
1.1 Background
1
1.2 Objectives
2
1.3 Scope of study
2
Chapter 2
3
LITERATURE REVIEW
3
2.1 Concrete
3
2.2 Concrete cracks
3
2.3 Reasons for occurring cracks
3
2.3.1 Plastic shrinkage
4
2.3.2 Expansion and contraction
4
2.3.3 Heaving and settling
4
2.3.4 Overload
4
2.3.5 Improper drying
4
2.3.6 Natural disaster
5
2.4 Types of cracks in concrete
5
2.4.1 Shrinkage crack
5
2.4.2 Hairline crack
6
2.4.3 Settlement crack
6
2.4.4 Temperature and shrinkage crack
7
2.4.5 Shear crack
8
2.4.6 Flexural crack
8
2.5 Neural network
9
2.6 Types of neural network
9
iv
5
v
6
vi
7
List of Tables
vii
8
List of Figures
viii
9
ix
10
List of Abbreviations
NN Neural Network
AL Active Learning
DB Data Base
x
1
Chapter 1
INTRODUCTION
1.1 Background
Structural cracks are of major concern for structural safety. Many old structures in
the world are now in the process of decaying and is losing structural strength.
Cracks in beams and columns, piers and equivalents are like forecast of a structural
failure. That is why structural cracks are dealt with great importance and detecting
In developing and least developed countries many cases are found which shows that
residential, commercial and institutional buildings are not designed and constructed
as per code. As a result the constructed building does not gain targeted strength and
action to retrofitting it thus saving further damage to the structure and loss of lives.
People living in low income countries are bound to live in risk prone building and
there are lack of building inspectors. So giving them a tool or technology which is
low cost and easy to use will help to determine which cracks to ignore and which
cracks to take seriously and then contact appropriate authorities to take action and
Several SHM systems have been developed to detect cracks from a concrete
structure using X-ray, vibrations, sensors etc. Due to low price and availably of
1.2 Objectives
The main objective of this study is to develop a model using Convolutional Neural
i. Constructing two image classifier using CNN. One will be called CNN_1
ii. Creating two dataset of images for two slightly different image classifiers.
One will be called dataset_1 and the other will be called dataset_2.
iii. Developing an algorithm for the dataset_1 for CNN_1 classifier, which
utilizes the input of crack location and outputs crack type.
iv. Comparing results and test accuracy between this two classifiers.
In this study a detailed literature review was conducted concerning concrete cracks
and various types of CNN. Then two suitable CNN models was architected and
datasets was created to be training and validated by the classifiers. After that
comparisons was made between the classifiers with regard to training time,
performance, accuracy and real world compatibility. And finally, future suggestions
was made for enriching the datasets and increasing the performances of the
classifiers.
3
Chapter 2
LITERATURE REVIEW
2.1 Concrete
mass. In order to form the cementing medium, cement would mix with water.
Coarse aggregates and fine aggregates are the part of inert mass. In properly mixed
concrete, these materials are completely surrounded and coated by cement paste
filling all the void space between the particles [1]. With time, the setting process of
Some concrete professionals believe that reinforced concrete structures should not
crack. With that belief, when cracking does occur, they often claim that the concrete
contractor caused the cracks and should pay for repair. Cracks in reinforced
concrete, however, are not a defect but are specifically included as part of the
Whether it’s roads, driveways, sidewalks, or parking slabs, eventually, almost every
kind of concrete cracks. It’s an inevitable element of the wear and tear suffered by a
property – the same as paint peeling, garden plants dying, or walls getting dirty.
Some main reasons for occurring concrete cracks are given below [3].
4
Concrete in its non-solid state is basically just water and cement. As it hardens and
the water evaporates, it shrinks. Too much water means too much shrinkage – and
too much shrinkage leads to cracks in the finished project. This is usually a
Like every solid, concrete tends to change subtly in shape and size based on
weather conditions. In hot weather concrete expands, pushing against anything in its
path as it does so. If it meets an object that doesn’t flex or give during that
expansion, it will likely crack.
Grounds movement is one of the most frequent causes of cracking, especially with
sidewalks, driveways, and roads. The growth of a tree’s roots or an excessive freeze
and thaw cycle can cause the ground to push upwards on the concrete, causing it to
crack and break – a process known as heaving. Consequently, shifting in soil due to
water or the rotting away of a tree’s roots cause the foundation of concrete to
collapse – this is known as settling. As with expansion and contraction, heaving and
settling is likely to occur over time.
2.3.4 Overload
If too much weight is placed on concrete, it can cause cracks or in extreme cases
Crazing and crusting occur when laid concrete is improperly dried. The former
results from concrete not having enough moisture or losing moisture too quickly,
5
while the latter results from a pattern being pressed into concrete before it is
properly dried.
Natural disaster like earthquake or heavy wind load might cause cracks to occur if
not properly designed following building codes. It may lead to reduced load bearing
capacity of the concrete component and in extreme cases, structural failure.
Tremendous forces can build up inside the wall due to any causes of cracks. When
the forces exceed the strength of the concrete, cracks will develop. Each of these
causes leaves a "signature" typically in the type of crack it creates. Some of the
major and most common cracks are discussed here [4, 5].
are usually uniform in width. Sometimes these cracks have a V-shape (less
frequent), with the top of the crack looking larger and the crack getting smaller as it
travels towards to floor and diminishing or stopping before reaching the bottom of
the foundation wall. If the crack reaches the bottom, the crack might damage the
building's footings, and the crack might have a significant impact on the foundation
structure. Figure 2.1 shows a sample of shrinkage crack on concrete surface.
Shrinkage cracks should not be ongoing nor of structural significance, though they
may invite water entry through the wall. In poured concrete foundations, shrinkage
cracks are usually due to conditions at original construction: poor concrete mix,
rapid curing or possibly other states. In any case, concrete shrinkage causes the
concrete to develop internal stresses. Unless control joints were included in the wall
or floor slab design, these stresses would cause the wall or floor is likely to crack in
Hairline cracks may develop in concrete foundations as the concrete cures. Hairline
cracks do not cause problems with the stability of the foundation but do cause
leakage problems. If the cracks appear shortly after pouring the concrete
foundation, concrete may have been mixed poorly or poured too quickly. In poured
concrete foundations, hairline crack frequently appear in the center of the walls
because the wall corners have greater stability.
Settlement cracks may appear when the underlying ground has not been compacted
or appropriately prepared or if the subsoil was not of the proper consistency. A
settlement crack may also appear as a random crack above areas where the soil of
the subgrade was uneven after the concrete was poured. Settlement cracks are
7
usually more extensive at the top of the crack than the bottom as the foundation
"bends" over a single point, allowing differential settlement. This type of crack is
Horizontal cracks found in the center of the wall are most likely caused by an
applied load such as backfill around foundation compacted improperly or too soon,
water table and poor drainage against the foundation wall, or heavy equipment
operated too soon or too close to the foundation wall. Horizontal cracks found high
Shear cracks form near the supports of members and are inclined at between about
30o and 45o to the axis of the beam, from the tension face of the member back
towards the support. As with flexural cracks, they will be widest at the tension face,
reducing in width with distance from the face. When the member has been correctly
designed, shear cracks will be controlled by the provision of shear reinforcement in
Flexural cracks on the sides of a beam start at the tension face and will extend, at
most, up to the neutral axis. Crack widths will be greatest at the tension face and
will reduce with distance from that face. In general, the cracks will be uniformly
spaced along the most heavily loaded portion of the beam, i.e. near the mid-span in
sagging or over the supports in hogging.Flexural cracks on the soffit of a slab will
run at right-angles to the span, again roughly uniformly spaced in the region of
maximum moment. In beams and slabs in buildings that have been correctly
designed, average crack widths should not exceed 0.3 mm. In bridges, the cracks
Artificial intelligence technique that mimics the operation of the human brain
Artificial neural networks are computational models which work similar to the
functioning of a human nervous system. There are several kinds of artificial neural
operations and a set of parameters required to determine the output. Some of the
types that are most used in today’s machine learning (ML) applications are given
below [7].
This neural network is one of the simplest form of ANN, where the data or the input
travels in one direction. The data passes through the input nodes and exit on the
output nodes. This neural network may or may not have the hidden layers. In simple
classifying activation function usually. Figure 2.7 shows a Single layer feed forward
10
network. Here, the sum of the products of inputs and weights are calculated and fed
threshold(usually 0) and the neuron fires with an activated output (usually 1) and if
it does not fire, the deactivated value is emitted (usually -1). Application of Feed
forward neural networks are found in computer vision and speech recognition where
classifying the target classes are complicated. These kind of Neural Networks are
Radial basic functions consider the distance of a point with respect to the center.
RBF functions have two layers, first where the features are combined with the
Radial Basis Function in the inner layer and then the output of these features are
taken into consideration while computing the same output in the next time-step
which is basically a memory. Figure 2.8 is a diagram which represents the distance
calculating from the center to a point in the plane similar to a radius of the circle.
Here, the distance measure used in euclidean, other distance measures can also be
used. The model depends on the maximum reach or the radius of the circle in
classifying the points into different categories. If the point is in or around the
radius, the likelihood of the new point begin classified into that class is high. There
can be a transition while changing from one region to another and this can be
11
controlled by the beta function. This neural network has been applied in Power
Restoration Systems. Power systems have increased in size and complexity. Both
factors increase the risk of major power outages. After a blackout, power needs to
When training the map the location of the neuron remains constant but the weights
differ depending on the value. This self organization process has different parts, in
the first phase every neuron value is initialized with a small weight and the input
vector. In the second phase, the neuron closest to the point is the ‘winning neuron’
and the neurons connected to the winning neuron will also move towards the point
like in the graphic below (Fig 2.9). The distance between the point and the neurons
is calculated by the euclidean distance, the neuron with the least distance wins.
Through the iterations, all the points are clustered and each neuron represents each
kind of cluster. This is the gist behind the organization of Kohonen Neural Network.
12
Kohonen Neural Network is used to recognize patterns in the data. Its application
can be found in medical analysis to cluster data into different categories. Kohonen
map was able to classify patients having glomerular or tubular with a high accuracy.
The Recurrent Neural Network works on the principle of saving the output of a
layer and feeding this back to the input to help in predicting the outcome of the
layer. Here, the first layer (Figure 2.10) is formed similar to the feed forward neural
network with the product of the sum of the weights and the features. The recurrent
neural network process starts once this is computed, this means that from one time
step to the next each neuron will remember some information it had in the previous
time-step. This makes each neuron act like a memory cell in performing
computations. In this process, it is needed to let the neural network to work on the
front propagation and remember what information it needs for later use. Here, if the
prediction is wrong the learning rate or error correction is used to make small
changes so that it will gradually work towards making the right prediction during
conversion models.
Convolutional neural networks are similar to feed forward neural networks , where
the neurons have learn-able weights and biases. Its application have been in signal
and image processing which takes over OpenCV in field of computer vision. Below
is a representation of a ConvNet, in this neural network, the input features are taken
in batch wise like a filter. This will help the network to remember the images in
parts and can compute the operations. These computations involve conversion of
the image from RGB or HSI scale to Gray-scale. Once we have this, the changes in
the pixel value will help detecting the edges and images can be classified into
different categories. ConvNet are applied in techniques like signal processing and
image classification techniques. Computer vision techniques are dominated by
The technique of image analysis and recognition, where the agriculture and weather
features are extracted from the open source satellites like LSAT to predict the future
growth and yield of a particular land are being implemented.
independently and contributing towards the output. Each neural network has a set of
inputs which are unique compared to other networks constructing and performing
sub-tasks. These networks do not interact or signal each other in accomplishing the
interaction of these network with each other, which in turn will increase the
computation speed. However, the processing time will depend on the number of
neurons and their involvement in computing the results. Modular Neural Networks
(MNNs) is a rapidly growing field in artificial Neural Networks research. It is
problems.
15
A CNN model has several layers which has a specific job. Most commonly used
layers are, convolution layer, pooling layer, auxiliary layers, softmax layer.
(i.e., dot product) between a subarray of an input array and a receptive field. The
receptive field is also often called the filter, or kernel. The initial weight values of a
receptive field are typically randomly generated. Those of bias can be set in many
ways in accordance with networks’ configurations [8]. Both values are tuned in
train- ing using a stochastic gradient descent (SGD) algorithm. The size of a
subarray is always equal to a receptive field, but a receptive field is always smaller
than the input array. Second, the multiplied values are summed, and bias is added to
the summed values. Figure 2.13 shows the convolutions of the subarrays (solid and
dashed windows) with an input array and a receptive field. One of the advantages of
the convolution is that it reduces input data size, which reduces computational cost.
An additional hyper parameter of the layer is the stride. The stride defines how
16
many of the receptive field’s columns and rows (pixels) slide at a time across the
input array’s width and height. A larger stride size leads to fewer receptive field
applications and a smaller output size, which also reduces computational cost,
though it may also lose features of the input data. The output size of a convolution
Another key aspect of the CNNs is a pooling layer, which reduces the spatial size of
an input array. This process is often defined as downsampling. There are two
different pooling options. Max pooling takes the max values from an input array’s
subarrays, whereas mean pooling takes the mean values. Figure 2.24 shows the
pooling method with a stride of two, where the pooling layer output size is
calculated by the equation in the figure. Owing to the stride size being larger than
the convolution example in Figure 2.14, the output size is further reduced to 3 × 3.
Max pooling performance in image data sets is better than that of mean pooling [9].
This article verified that the architecture with max pooling layers outperforms those
with mean pooling layers. Thus, all the pooling layers for this study are max
pooling layers.
17
The most typical way to give nonlinearity in the standard ANN is using sigmoidal
[10]. Recently, the ReLU was introduced [10] as a nonlinear activation function.
Figure 2.15 depicts the several examples of nonlinear functions. Briefly, while other
nonlinear functions are bounded to output values (e.g., positive and negative ones
and zeros), the ReLU has no bounded outputs except for its negative input values.
Intuitively, the gradients of the ReLU are always zeros and ones. These features
facilitate much faster computations than those using sigmoidal functions and
achieve better accuracies.
Overfitting has been a long-standing issue in the field of machine learning. This is a
phenomenon where a network classifies a training data set effectively but fails to
provide satisfactory validation and testing results. To address this issue, dropout
layers are used [11]. Training a network with a large amount of neurons often
results in overfitting due to complex coadaptations. The main idea of dropout is to
certain dropout rate. Accordingly, a network can generalize training examples much
network training time [12]. However, the distribution of layer’s input shifts by
passing through layers, which is defined as internal covariate shift, and this has
been pointed out as being the major culprit of slow training speed. A BN was
proposed to adapt the similar effect of whitening on layers [13]. As a result, this
convergence.
To classify input data, it is necessary to have a layer for predicting classes, which is
usually located at the last layer of the CNN architecture. The most prominent
method to date is using the softmax function given by the equation given below,
which is expressed as the probabilistic expression p(y( i ) = n | x( i ) ;W) for the ith
training example out of m number of training examples, the jth class out of n
number of classes, and weights W, where Wn T x(i) are inputs of the softmax layer.
The sum of the right-hand side for the ith input always returns as 1, as the function
As the initial values of W are randomly assigned during training, the predicted and
actual classes do not usually coincide. To calculate the amount of deviations
between the predicted and actual classes, the softmax loss function is defined by a
equation.
∑nj=11{.}. The term 1{y(i)=j} is the logical expression that always returns either
zero or ones. In other words, if a predicted class of the ith input is true for j class,
the term returns ones, returning zeros otherwise. The last hyper-parameter λ in the
necessary for obtaining the expected results (i.e., predicting true classes). This
process is considered for CNN training. There are several known methods, but SGD
minimize the deviations [12]. The standard gradient descent algorithm performs
updating W on an entire training data set, but the SGD algorithm performs it on
single or several training samples. To accelerate the training speed, the momentum
algorithm [14] is also often used in SGD. The overall updating process is as
are updated. A network can be tuned by repeating the explained process several
times until Wj←Wj+υ converges. The superscript (i) indicates the ith training
sample, where the range of i is dependent on a mini-batch size, which defines how
many training samples out of the whole data set are used. For example, if 100
images are given as the training data set and 10 images are assigned as the mini-
batch size, this network updates weights 10 times; each complete update out of the
Many types of artificial neural networks (ANNs), including the probabilistic neural
network (NN) [16], have been developed and adapted to research and industrial
fields, but convolutional neural networks (CNNs) have been highlighted in image
21
recognition, which are inspired by the visual cortex of animals [17]. CNNs can
effectively capture the grid-like topology of images, unlike the standard NNs, and
they require fewer computations due to the sparsely connected neurons and the
classes [18]. These aspects make CNNs an efficient image recognition method [19,
20]. The previous issue of CNNs was the need for a vast amount of labeled data,
which came with a high-computational cost, but this issue was overcome through
data set, no date; MNIST Database, no date) and parallel computations using
graphic processing units [21]. Owing to this excellent performance, a study for
detecting railway defects using a CNN was later proposed [22]. Rail surfaces are
homogenous, and the images are collected under controlled conditions. This cannot
abundant data set, taken under extensively varying conditions, is essential for
In 2017 Cha et al. [23] proposed a vision based method using a deep architecture of
Convolutional Neural Networks (CNNs) for detecting concrete cracks without
calculating the defect features. Which allowed the architecture to detect cracks in
concrete surfaces in the extensively varying real-world situations (e.g., lighting and
shadow changes). The CNN has three convolutional layers, 3 pooling layers, uses
Rectified Linear Unit (ReLU) at the end of the hidden layers and uses max pooling.
To classify input data into binary output the CNN architecture uses softmax
function. The overall architecture ends up detecting cracks with 98% accuracy.
22
In 2017 Feng et al. [24] proposed a deep Active Learning (AL) framework.
Addressing the efficient training and deployment of an automatic defect detection
system. A deep residual network (ResNet) is designed as the classifier for detection
and classification of defects in an input image patch. In their paper, they prepared a
suitable cost function for optimization and tuned hyper parameters for training to
achieve a best performance. The model is able to detect cracks, deposit and water
be achieved by high potential Gabor filter. The Gabor filter is a highly potential
technique for multidirectional crack detection. The image analysis of the Gabor
filter function was directly related to the manual visual perception. Once filtering
was completed, the cracks aligned to different directions are detected. They have a
In 2005 Iyer et al. [26] have designed a three-step method for the crack detection
from the high contrast images. The proposed method detects the crack like pattern
Linear filtering was performed after cross curvature evaluation to distinguish them
In 2014 Nguyen et al. [27] have proposed a method based on the edge detection of
concrete cracks from noisy 2D images of concrete surfaces. They have observed the
23
cracks as tree-like topology. Then based on the PSCEF non-crack objects were
Then the centre line was fitted by cubic splines. They have linked the edge points to
form the desired continuous crack edge. From the crack edge, the surface of the
crack was attained.
In 2015 Yang et al. [28] have proposed an image analysis method to capture thin
cracks and minimize the requirement for pen marking in reinforced concrete
structural tests. They have used the studies like crack depth prediction, change in
width measurement. Stereo triangulation method was the adopted technique based
on cylinder formula approximation and image rectification. Once they have the
rectified output, the surface of the observed regions can be unfolded and presented
in a plane image for following displacement and deformation analysis. From which
In 2010 Wang et al. [29] have proposed a system for the image based crack
detection and to characterize the crack based upon their effectiveness. They have
categorized the present image based crack detection into four categories. They are
an integrated algorithm, morphological approach, percolation approach and
practical technique. A shading correction was done using integrated algorithm. The
unclear crack prediction was detected using percolation method. The crack
detection was done using morphological approach for the micro crack detection
with the practical method providing high- performance feature extraction with more
Chapter 3
COMPUTATIONAL PROCEDURE
3.1 Introduction
The main purpose of this study is to make CNN architectures and datasets which in
combine will be able to determine and classify beam cracks into shear cracks,
flexural cracks, shrinkage cracks and spalling of concrete with precision and high
accuracy. For this reason, firstly, a CNN model will be introduced and then a dataset
alternative dataset containing only cracks, non-cracks and spalling images will be
GPU: none
Software: Spyder
Two slightly different test methodologies are followed for two different type of
crack image classifiers. Dataset_1 is the dataset containing standardized crack and
non-crack images and non-standardized spalling images to be trained and validated
by CNN_1.
25
Output Result:
Convolutional
Shear cracks,
Input Image Preprocessing Neural
Flexural cracks
Network
or Spalling
non-standardized datas from internet was collected. And for creating the dataset
containing cracks and non crack images, standardized cracks and non-cracks images
was collected [30]. All images used in the study are in .jpg format.
For classifying beam cracks into binary crack or non-crack, a standardized dataset
was used. The dataset contains concrete images having cracks. The data is collected
from various METU (Middle East Technical University) Campus Buildings. The
dataset is divided into two as negative and positive crack images for image
classification. Each class has 20000 images with a total of 40000 images with 227 x
227 pixels with RGB channels (Fig 3.3). The dataset is generated from 458 high-
resolution images (4032x3024 pixel) with the method proposed by Zhang et al [31].
a) Crack images
A total of 123 images was collected randomly from the internet without following
any specific procedure (samples shown in Figure 3.4). Then they were cropped
around the specific type of cracks the image was carrying. From which 40 are of
shear cracks images on concrete beam surface, 32 are of flexural cracks on concrete
beam surfaces and 51 are of spalling of concrete on concrete beam surfaces. These
images are of RGB channels with varying pixel sizes. No Data augmentation in
Due to memory, time and computational capacity constraints, the images from the
dataset had to be resized (both standardized and non-standardized datas). And due
to small number of images in both dataset (CNN requires large number of images)
Images from standardized dataset are of 227x227 pixels. Images from non-
cropping. This varieties of pixels can not be used for testing purpose, rather a single
size is needed for all the images. Here in this study, all the input images are resized
into 32x32 pixels in CNN_1 and 64x64 pixels in CNN_2 considering time and
resource constraints.
In this study image datas were augmented to increase image numbers in order to
increase accuracy. Every images from the datasets(except in dataset_2, rotation was
avoided to prevent confusion between shear and flexural cracks) were randomly
Keras library is being used for training and validation purpose.Keras is an open
special way by which Keras library is able to recognize which type of cracks it is
For the standardized dataset, the images are put in two folders naming ‘training_set’
and ‘val_set’ for testing and validation purpose. In each folder, there are two
folders, ‘Crack’ folders contains crack images and ‘noncrack’ folder contains non-
crack images. All the images in the folder are labeled with the type and a number to
give them a serial which will help keras library to read the images. A third subfolder
was added in both ‘training_set’ and ‘val_set’ folders named ‘Spalling’ for detecting
the images were labelled and numbered as previously stated (Figure 3.5).
For the non-standardized dataset, similar procedure was done to divide the images
into ‘training_set’ and ‘val_set’. ‘Flexural’, ‘Shear’ and ‘Spalling’ subfolders were
created to accommodate flexural, shear and spalling cracks and they were labelled
The proposed CNN contains two convolutional layers, two pooling layers, one
flattening layer and finally one full connection layer. For constructing the CNN
model Keras library was used and the packages that were used are: Sequential,
Conv2D, MaxPooling2D, Flatten and Dense. Building the CNN involved using
Figure 3.7.
In this study, three python libraries were used named: Theano library, Keras library
and Tensorflow library. So they had to be installed on the computer. Keras library
runs on Theano and Tensorflow library.
31
library were used for this study.The CNN is a sequence of layers, and so sequential
Conv2D is used to build convolutional layer. Since the study deals with 2D images,
2D convolutional layers has been used. In this study max pooling has been used.
That is why MaxPooling2D package was used. For flattening purpose, Flatten was
used. Flattening is the layer in which all the max pooled feature maps are converted
into a large feature vector which then works as an input for the fully connected
layer. Dense package was used to create a fully connected layer and a classic ANN.
max pooling layer. The CNN is named ‘classifier’ as the purpose of the NN to
For the first convolution layer, 32 feature detectors of 3x3 matrix was used as it is
common practice to start the first convolution layer with 32 feature detector. And as
mentioned earlier, input image was of the size of 32x32 pixels(for CNN_1 and for
CNN_2 64x64) and the activation function used was ‘relu’ (Figure 3.8) to discard
negative pixel values in the feature map. Discarding negative pixels will introduce
Pooling layer helps to reduce the size of the feature maps. In the CNN model a 2x2
max pooling layer was used with a stride of 2. No padding was used. The reason
m−f
Height of padded feature map, nH = +1
s
n− f
Width of the padded feature map, nw = +1
s
Here,
s = number of stride
After each layer, size of the input image become, nH x nW x number of feature
detectors.
For the second layer, the same convolution and pooling layer was used except that
Flattening layers takes in all the pooled feature maps and put them in one single
column vector. A sample procedure is shown in Figure 3.9. And this column vector
will be the input layer for the full connection layer. Dense function was used to add
practice to chose a number which is a power of 2. Larger nodes was not selected
For both standardized and non-standardized dataset three final output is expected.
That is why in the final output layer, three output node is selected for both cases
and softmax function was used for this purpose. Here softmax function was used as
more than two outcome is expected.
The whole CNN model is compiled using Stochastic gradient descent algorithm, a
For adding Stochastic gradient descent algorithm, optimizer parameter was used
and the algorithm used was ‘adam’. As the loss function, ‘categorical cross entropy’
was chosen as the CNN deals with categorical problems. For performance matrices,
the most common ‘accuracy’ was chosen. A generalized diagram is shown in Figure
3.10 to understand the connection between layers after the flattening layer.
For prediction purpose, two different types of approaches are selected for dataset_1(
Standardized crack and non-crack images and non standardized spalling images)
For dataset_2, direct output is shown which results from the final output layer of the
CNN. Output categories shown here are ‘Flexural’, ‘Shear’ and ‘ Spalling’. A
For dataset_1, a more elaborated approach is selected. At first the CNN gives result
whether an input test image contains cracks, noncracks or spalling. If the output
shows ‘cracks’, then a ‘length of beam’, ‘depth of crack’ and ‘edge of crack from
35
‘Shrinkage’ crack.
If the location of crack and length of beam is less than 30%, the input image will be
shown as ‘Shear Crack’. If the crack resides within 40% to 50% of the beam length
and the crack is in full depth, the output will show ‘Both Flexural and Shrinkage
Crack’ and if the crack is not in full depth of the beam, ‘Flexural Crack’ will be
shown. For ‘Shrinkage Crack’ output, the crack have be within 30% to 40% of the
beam length and full depth [34]. But for detecting spalling of concrete, only the test
image is sufficient.
Finally the accuracy of the two CNN was measured using some parameters.
Accuracy measures overall how often the model is correct. It should be as high as
possible.
TP + T N
Accuracy =
Total nu m ber of test im ag es
Here,
TP = True positive, when the predicted positive value equals the actual positive
value.
TN = True negative, when the predicted negative value equals the actual negative
value.
36
Chapter 4
RESULTS AND DISCUSSIONS
4.1 Introduction
Performance of CNNs are compared in this chapter with regard to training time,
accuracy and loss. Test images are used which are not in the training or validation
datasets to test and compare the compatibility of the CNNs in real world situations.
Layer diagrams for CNN_1 is shown in figure 4.1. Here it can be seen that the input
size 3x3, with 32 filters and stride of 1, the height and width becomes,
n− f 32 − 3
n= + 1= + 1= 30
s 1
Output volume in each layers can be calculated in this procedure. And finally it
Similarly, output volume of each layers for CNN_2 can be calculated. Layer
diagrams for CNN_2 is shown in Figure 4.2.
In this study 10 epoch was performed for each dataset. In Deep Learning, an epoch
an entire dataset is passed both forward and backward through the neural network
only once [35]. The code needed for training and validation purpose is slightly for
both dataset. Batch size, which was chosen to be 32 in case for dataset_1(because
capacity constraints) and 3 in the case for dataset_2(because dataset_2 has a small
number of dataset). So time needed to perform an epoch gives us an idea of the
From the graphs(Fig 4.3, Fig 4.4) it is visible that both the graphs are almost linear.
So the relationship between training time and epoch is a liner one. Here it can be
seen that training time needed for dataset_1 is much higher than dataset_2, it is
because dataset 1 contains much higher number of images. Also, training time
might have been effected due to overclocking of the CPU and other side tasks.
38
In case of dataset 1, it can be seen that training accuracy gradually increases (Fig
4.5) with epoch and then reach a pick point and then becomes a straight line. Test
accuracy reach a pick point and then fluctuates between a range as number of epoch
increases.
In case of dataset_2, test accuracy curve (Fig 4.6) has the same behavior as the
Loss function used for this study is ‘categorical cross entropy’. The equation for
epochs, training and validation loss for both dataset are shown below.
In case of dataset_1, it can be seen that training loss gradually decreases with epoch
and eventually becomes a straight line (Fig 4.7). Test loss reach a lowest point and
In case of dataset_2, training loss curve has the same behavior as the training loss
curve of dataset_1. But in case of test loss, it is still trending, It will probably take
more epochs to reach to a saturation point and fluctuate between two points (Fig
4.8).
41
For both the CNN and dataset, a pick accuracy is reached and then the accuracy
fluctuates between a range. At the 10th epoch for both the dataset, the accuracy was
found to be 99.7% for dataset_1 and 75.0% for dataset_2.
To examine the performance of the CNNs, some random images collected from the
internet and on-site inspection are be put through the CNN models and prediction
algorithm will be used to predict classes.
Here 10 test images are used (Figure 4.8), which covers various lighting, distance,
position, location and camera angle conditions. So, testing performance of the
constructed CNNs using these images will give a good idea about their
performance.
all the four category of cracks in a concrete beam. But CNN_2 can not distinguish
shrinkage cracks from flexural cracks. So, for CNN_2, flexural and shrinkage
Informations regarding beam and crack did to come along the images as they most
of them were collected randomly from the internet. So, length of beams, location of
cracks from nearest support and depth of beam affected by cracks are assumed
arbitrarily but logically from what can be seen from respected pictures.
Final predictions after running all the test results through CNN_1 and prediction
Table
Table4.1:
4.1: Test results for CNN_1.
Location
of Crack Depth of
Length
Test Image Crack No Crack Preliminary from the beam
Sl. of Beam Final Result
Crack Probability Probability Prediction Nearest covering
(m)
Support crack
(m)
Spalling
4 of 0 0 Spalling - - - Spalling
concrete
Spalling
5 of 1 0 Crack - - - Crack
concrete
Spalling
6 of 1 0 Crack - - - Crack
concrete
Flexural
Shrinkage Full and
7 1 0 Crack 10 4
Crack Depth Shrinkage
Crack
Final predictions after running all the test results through CNN_2 and prediction
Table
Table4.2:
4.2: Test results for CNN_2.
Flexural
Serial Shear Crack Spalling Final
Test Image Crack Crack
No. Probability Probability Prediction
Probability
Spalling of
4 0 0 1 Spalling
concrete
Spalling of
5 0 0 1 Spalling
concrete
Spalling of
6 1 0 0 Shear Crack
concrete
Flexural
7 Shrinkage Crack 0 1 0
Crack
Flexural
8 Shrinkage Crack 0 1 0
Crack
4.7 Summary
Table
Table4.3:
4.3: Summary of the test images.
Shear
1 Shear Crack OK Shear Crack OK
Crack
Shear
2 Shear Crack OK Shear Crack OK
Crack
Shear
3 Shear Crack OK Spalling NOT OK
Crack
Spalling of
4 Spalling OK Spalling OK
concrete
Spalling of
5 Crack NOT OK Spalling OK
concrete
Spalling of
6 Crack NOT OK Shear Crack NOT OK
concrete
Flexural and
Shrinkage Flexural
7 Shrinkage OK OK
Crack Crack
Crack
Shrinkage Shrinkage Flexural
8 OK OK
Crack Crack Crack
Flexural Flexural
9 OK Spalling NOT OK
Crack Crack
Flexural Flexural
10 OK Spalling NOT OK
Crack Crack
47
It can be seen form Table 4.3 that both the CNNs has poor performance identifying
spalling of concrete. Otherwise both the CNNs has shown reliability for using in
real world scenario.
TP + T N
Accuracy for CNN_1 =
Total nu m ber of test im ag es
8
= = 0.80=80%
10
So, the accuracy of CNN_1 is 80%, which is far below than validation accuracy,
which is 99.7%.
TP + T N
Accuracy for CNN_2 =
Total nu m ber of test im ag es
6
= = 0.60=60%
10
Accuracy of CNN_@ is 60%, which is far below than validation accuracy, which is
75%.
So, for real world scenarios CNN_1 is more compatible and reliable than CNN_2.
48
Chapter 5
CONCLUSIONS AND SUGGESTIONS
5.1 Conclusions
The two CNN models developed for this study is able to detect and classify
i. Dataset_1 performs better than dataset_2 in real world scenario with much
better accuracy and reliability.
ii. Increasing or decreasing convolution and pooling layers have little effect
on validation accuracy.
iii. Increasing and decreasing batch number has little effect on validation
accuracy
v. For classifying cracks, one have to carefully observe crack location and
depth of cracks. Although a slight miscalculation does not affect the
results.
vi. Training on GPU with a higher volume of RAM would have reduced the
training and validation time. Although it would had no effect on training
and validation accuracy.
viii.After a certain epoch number, increasing the epoch number does not
increase accuracy.
49
5.2 Suggestions
i. Studying the relationship between input image size and training time and
accuracy.
ii. Studying the relationship between number of output nodes in hidden layer
and training time and accuracy.
iii. A study to compare test results using different Stochastic gradient descent
algorithms.
iv. A study to compare training time and accuracy using a larger number of
image augmentation.
REFERENCES
[5] http://www.concrete.org.uk/fingertips-nuggets.asp?cmd=display&id=186
[Last access on 13 Oct. 2018].
[7] https://www.analyticsindiamag.com/6-types-of-artificial-neural-networks-
currently-being-used-in-todays-technology [Last access on 13 Oct. 2018].
[10] Nair, V. & Hinton, G. E. (2010), Rectified linear units improve restricted
Boltzmann machines, in Proceedings of the 27th International Conference
on Machine Learning (ICML-10), Haifa, Israel, June 21–24, 807–14.
[11] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R.
(2014), Dropout: a simple way to pre- vent neural networks from overfitting,
Journal of Machine Learning Research, 15(1), 1929–58.
51
[12] LeCun, Y. A., Bottou, L., Orr, G. B. & Mu ̈ ller, K.-R. (2012),
Efficient backprop, in G. Montavon, G. B. Orr, and K.-R. Müller (eds.), Neural
Networks: Tricks of the Trade, 2nd edn., Springer, Berlin Heidelberg, pp. 9–48.
[13] Ioffe, S. & Szegedy, C. (2015), Batch normalization: accelerating deep network
training by reducing internal covariate shift, arXiv preprint arXiv:1502.03167.
[14] Bengio, Y., Goodfellow, I. J. & Courville, A. (2016), Deep Learning, An MIT
Press book. Online version is available at: http://www.deeplearningbook.org,
accessed July 2016.
[16] Ahmadlou, M. & Adeli, H. (2010), Enhanced probabilistic neural network with
local decision circles: a robust classifier, Integrated Computer-Aided
Engineering, 17(3), 197– 210.
[17] Ciresan, D. C., Meier, U., Masci, J., Maria Gambardella, L. & Schmidhuber, J.
(2011), Flexible, high performance convolutional neural networks for image
classification, in Proceedings of International Joint Conference on Artificial
Intelligence, Barcelona, Spain, pp. 1234–42.
[19] Simard, P. Y., Steinkraus, D. & Platt, J. C. (2003), Best practices for
convolutional neural networks applied to visual document analysis, in
Proceedings of the 7th International Conference on Document Analysis and
Recognition, Edinburgh, UK, August 3–6, 958–62.
[20] LeCun, Y., Bengio, Y. & Hinton, G. (2015), Deep learning, Nature, 521(7553),
436–44.
[21] Steinkrau, D., Simard, P. Y. & Buck, I. (2005), Using GPUs for machine
learning algorithms, in Proceedings of 8th International Conference on
Document Analysis and Recognition, Seoul, Korea, August 29–September 1,
1115–19.
[22] Soukup, D. & Huber-Mö rk, R. (2014), Convolutional neural networks for steel
surface defect detection from photometric stereo images, in Proceedings of 10th
International Symposium on Visual Computing, Las Vegas, NV, 668–77.
52
[23] Cha, Young-Jin and Choi, Wooram and Buyukozturk, Oral. (2017), Deep
Learning-Based Crack Damage Detection Using Convolutional Neural
Networks, Computer-Aided Civil and Infrastructure Engineering. 32. 361-378.
10.1111/mice.12263.
[24] Feng, Chen and Liu, Ming-Yu and Kao, Chieh-Chi and Lee, Teng-Yok. (2017),
Deep Active Learning for Civil Infrastructure Defect Detection and
Classification. 298-306. 10.1061/9780784480823.036.
[26] Shivprakash Iyer, Sunil K. Sinha, A robust approach for automatic detection and
segmentation of cracks in underground pipeline images, Image Vis. Comput. 23
(10) (2005) 931–933.
[27] Hoang-Nam Nguyen, Tai-Yan Kam, Pi-Ying Cheng, An automatic approach for
accurate edge detection of concrete crack utilizing 2D geometric features of
crack, J. Signal Process. Syst. 77 (3) (2014) 221–240.
[28] Yuan-Sen Yang, Chung-Ming Yang, Chang-Wei Huang, Thin crack observation
in a reinforced concrete bridge pier test using image processing and analysis,
Adv. Eng. Softw. 83 (2015) 99– 108.
[31] Lei Zhang, Fan Yang, Yimin Daniel Zhang, and Ying Julie Zhu, Road crack
detection using deep convolutional neural network, in: Proceedings of 2016
IEEE International Conference on Image Processing(ICIP2016), DOI: 10.1109/
ICIP.2016.7533052.
[34] Debora B.(Editor) 2016, “Design of Concrete Structures”, Fourteen Edition, The
McGraw-Hill Companies, Inc.
38.
39. # Step 4 - Full connection
40. classifier.add(Dense(units = 128, activation = 'relu')
)
41. classifier.add(Dense(units = 3, activation = 'softmax'
))
42.
43. # Compiling the CNN
44. classifier.compile(optimizer = 'adam', loss = 'categor
ical_crossentropy', metrics = ['accuracy'])
45.
46. # Part 2 - Fitting the CNN to the images
47. from keras.preprocessing.image import ImageDataGenerat
or
48.
49. train_datagen = ImageDataGenerator(rescale = 1./255,
50. shear_range = 0.2,
66. )
67.
68.
69.
70.
71. training_set = train_datagen.flow_from_directory('/
Users/saminyeaser/Desktop/
Concrete Crack Images for Classification/
training_set',
72. targe
t_size = (32, 32),
55
73. batch
_size = 32,
74. class
_mode = 'categorical')
75.
76. val_set = val_datagen.flow_from_directory('/Users/
saminyeaser/Desktop/
Concrete Crack Images for Classification/val_set',
77. target_siz
e = (32, 32),
78. batch_size
= 32,
79. class_mode
= 'categorical')
80.
81. history = classifier.fit_generator(training_set,
82. steps_per_epoch = 32041,
83. epochs = 10,
84. validation_data = val_set,
85. validation_steps = 8010)
86.
87.
88.
89. # list all data in history
90. print(history.history.keys())
91.
92. # summarize history for accuracy
93. plt.plot(history.history['acc'])
94. plt.plot(history.history['val_acc'])
95. plt.title('model accuracy')
96. plt.ylabel('accuracy')
97. plt.xlabel('epoch')
98. plt.legend(['train', 'test'], loc='upper left')
99. plt.show()
100.
101.# summarize history for loss
102.plt.plot(history.history['loss'])
103.plt.plot(history.history['val_loss'])
104.plt.title('model loss')
105.plt.ylabel('loss')
106.plt.xlabel('epoch')
107.plt.legend(['train', 'test'], loc='upper left')
108.plt.show()
56
41.
42.
43.
44.
45.
46. # Step 3 - Flattening
47. classifier.add(Flatten())
48.
49. # Step 4 - Full connection
50. classifier.add(Dense(units = 128, activation = 'relu')
)
51. classifier.add(Dense(units = 3, activation = 'softmax'
))
52.
53. # Compiling the CNN
54. classifier.compile(optimizer = 'adam', loss = 'categor
ical_crossentropy', metrics = ['accuracy'])
55.
56. # Part 2 - Fitting the CNN to the images
57. from keras.preprocessing.image import ImageDataGenerat
or
58.
59. train_datagen = ImageDataGenerator( rescale=1./255,
60. shear_range=0.2,
61. zoom_range=0.2,
62. horizontal_flip=True,
63. vertical_flip = True
64.
65. )
66.
67.
68.
69. test_datagen = ImageDataGenerator(rescale = 1./255,
70. shear_range=0.2,
71. zoom_range=0.2,
72. horizontal_flip=True
,
73. vertical_flip = True
74.
75. )
76.
77.
78.
79. training_set = train_datagen.flow_from_directory('/
Users/saminyeaser/Desktop/dataset/training_set',
80. targe
t_size = (64, 64),
81. batch
_size = 3,
58
82. class
_mode = 'categorical')
83.
84. val_set = test_datagen.flow_from_directory('/Users/
saminyeaser/Desktop/dataset/val_set',
85. target_siz
e = (64, 64),
86. batch_size
= 3,
87. class_mode
= 'categorical')
88.
89. history = classifier.fit_generator(training_set,
90. steps_per_epoch = 99,
91. epochs = 10,
92. validation_data = val_set,
93. validation_steps = 24)
94.
95. # list all data in history
96. print(history.history.keys())
97.
98. ['acc', 'loss', 'val_acc', 'val_loss']
99.
100.# summarize history for accuracy
101.plt.plot(history.history['acc'])
102.plt.plot(history.history['val_acc'])
103.plt.title('model accuracy')
104.plt.ylabel('accuracy')
105.plt.xlabel('epoch')
106.plt.legend(['train', 'test'], loc='upper left')
107.plt.show()
108.
109.# summarize history for loss
110.plt.plot(history.history['loss'])
111.plt.plot(history.history['val_loss'])
112.plt.title('model loss')
113.plt.ylabel('loss')
114.plt.xlabel('epoch')
115.plt.legend(['train', 'test'], loc='upper left')
116.plt.show()
117.
118.
119.classifier.save('dataset_02.h5')
120.
121.
122.from keras.models import load_model
123.new_model= load_model('dataset_02.h5')
124.new_model.summary()
59
For Dataset_1
For Dataset_2
1. #Prediction
2. import numpy as np
3. from keras.preprocessing import image
4. test_image = image.load_img('/Users/saminyeaser/
Desktop/dataset/single_prediction/yes.
1.jpg',target_size = (32, 32))
5. test_image = image.img_to_array(test_image)
6. test_image = np.expand_dims(test_image, axis=0)
7. classifier.predict(test_image)
8. result = classifier.predict(test_image)
9. training_set.class_indices
10. if result [0][0]>=.4:
11. prediction='Flexural'
12. elif result [0][1]>=.4:
13. prediction='Shear'
14. elif result [0][2]>=.4:
15. prediction='Spalling'
61
CNN_1
1. ______________________________________________________
___________
2. Layer (type) Output Shape
Param #
3. ======================================================
===========
4. conv2d_3 (Conv2D) (None, 30, 30, 32)
896
5. ______________________________________________________
___________
6. max_pooling2d_3 (MaxPooling2 (None, 15, 15, 32)
0
7. ______________________________________________________
___________
8. conv2d_4 (Conv2D) (None, 13, 13, 64)
18496
9. ______________________________________________________
___________
10. max_pooling2d_4 (MaxPooling2 (None, 6, 6, 64)
0
11. ______________________________________________________
___________
12. flatten_2 (Flatten) (None, 2304)
0
13. ______________________________________________________
___________
14. dense_3 (Dense) (None, 128)
295040
15. ______________________________________________________
___________
16. dense_4 (Dense) (None, 3)
387
17. ======================================================
===========
18. Total params: 314,819
19. Trainable params: 314,819
20. Non-trainable params: 0
21. ______________________________________________________
___
62
CNN_2
1. ______________________________________________________
___________
2. Layer (type) Output Shape
Param #
3. ======================================================
===========
4. conv2d_5 (Conv2D) (None, 62, 62, 32)
896
5. ______________________________________________________
___________
6. max_pooling2d_5 (MaxPooling2 (None, 31, 31, 32)
0
7. ______________________________________________________
___________
8. conv2d_6 (Conv2D) (None, 29, 29, 64)
18496
9. ______________________________________________________
___________
10. max_pooling2d_6 (MaxPooling2 (None, 14, 14, 64)
0
11. ______________________________________________________
___________
12. flatten_3 (Flatten) (None, 12544)
0
13. ______________________________________________________
___________
14. dense_5 (Dense) (None, 128)
1605760
15. ______________________________________________________
___________
16. dense_6 (Dense) (None, 3)
387
17. ======================================================
===========
18. Total params: 1,625,539
19. Trainable params: 1,625,539
20. Non-trainable params: 0
21. ______________________________________________________
___
63
For Dataset 1
For Dataset 2
For Dataset 1
For Dataset 2