Você está na página 1de 72

Neural Networks for Beginners

An Easy-to-Use Manual for


Understanding Artificial Neural Network
Programming

By Bob Story
Table of Contents
INTRODUCTION
WHAT IS A NEURAL NETWORK?
THE BENEFITS
INSIDE THE HUMAN BRAIN
THE DIFFERENCE BETWEEN THE BIOLOGICAL AND ARTIFICIAL
THE APPLICATION OF ANN
LEARNING IN ARTIFICIAL NEURAL NETWORKS
THE P ARADIGM S
THE ANN ARCHITECTURE
THE BASIC MATH BEHIND THE ARTIFICIAL NEURON
SIMPLE NETWORKS FOR PATTERN CLASSIFICATION
LINEAR SEPARABILITY
THE HEBB RULE
THE P ERCEPTRON
ADALINE
BUILDING A SIMPLE NEURAL NETWORK CODE
BACKPROPAGATION ALGORITHM AND HOW TO PROGRAM IT
CONCLUSION
© Copyright 2017 by Logan Styles - All rights reserved.

The following eBook is reproduced below with the goal of providing


information that is as accurate and reliable as possible. Regardless, purchasing
this eBook can be seen as consent to the fact that both the publisher and the
author of this book are in no way experts on the topics discussed within and
that any recommendations or suggestions that are made herein are for
entertainment purposes only. Professionals should be consulted as needed
prior to undertaking any of the action endorsed herein.
This declaration is deemed fair and valid by both the American Bar
Association and the Committee of Publishers Association and is legally
binding throughout the United States.

Furthermore, the transmission, duplication or reproduction of any of the


following work including specific information will be considered an illegal
act irrespective of if it is done electronically or in print. This extends to
creating a secondary or tertiary copy of the work or a recorded copy and is
only allowed with express written consent from the Publisher. All additional
rights reserved.

The information in the following pages is broadly considered to be a truthful


and accurate account of facts and as such any inattention, use or misuse of the
information in question by the reader will render any resulting actions solely
under their purview. There are no scenarios in which the publisher or the
original author of this work can be in any fashion deemed liable for any
hardship or damages that may befall them after undertaking information
described herein.
Additionally, the information in the following pages is intended only for
informational purposes and should thus be thought of as universal. As befitting
its nature, it is presented without assurance regarding its prolonged validity or
interim quality. Trademarks that are mentioned are done without written
consent and can in no way be considered an endorsement from the trademark
holder.
Introduction
What seemed like a lame and unbelievable sci-fi movie a few decades ago is
now a reality. Machines can finally think. Maybe not quite as complex as the
human brain, but more than enough to make everyone’s life a lot easier.

Artificial neural networks, based on the neurons found in the human brain give
machines a ‘brain.' Patterned just like biological neurons, this software or
hardware is a variety of the deep learning technology. With the help of
artificial neural networks, you can make your computer learn by feeding it
data, which will then be generated as the output you desire. You can thank the
artificial neural networks for the nanoseconds in which computers operate.

It may be science, but it is not rocket science. Everyone can learn how to take
advantage of the progressive technology of today, get inside the ‘brain’ of
computers, and train them to perform the desired operations. They have been
used in many different industries, and you can rest assured that you will find
the perfect purpose for your own neural network.

The best part about this book is that it does not require a college degree. Your
high school math skills are quite enough for you to get a good grasp of the
basics and learn how to build an artificial neural network. From non-
mathematical explanations to teaching you the basic math behind the ANNs and
training you how to actually program one, this book is the most helpful guide
you will ever find. Carefully designed for you, the beginner, this guide will
help you become a proud owner of a neural network in no time.

Does this sound like a dream come true? Join me on this artificial ride and
learn the basics of ANNs.
What is a Neural Network?
By its definition, a neural network represents the computing system that is
made of interconnected processing elements such as nodes and units, that
process information to external outputs and whose functionality is based on
that of a human neuron. So basically, the simplest definition would be that it is
a computer system that is made to function like the human brain. Moreover,
while it all may sound Frankenstein-like, it is indeed possible for a machine to
function similarly. Will it ever surpass the brain and become smarter? We will
leave that for the next generations to witness.

Neural networks resemble the human brain in two ways:

1. The network also gains knowledge through its environment by a learning


process.
2. That knowledge is also stored by using connection strengths called
synaptic weights.

The idea of creating an artificial neural network (ANN) was mostly inspired
by the way in which the human brain used to (and still does!) differ from the
traditional machines. By using wires and silicones, neural networks can imitate
neurons and dendrites of the human brain. This makes the machines more
intelligent.

A conventional computer system is designed upon a binary system. A binary


system makes the computer work in lines of numbers in order to perform a
single operation. A neural network also works in lines, but the difference is
that it could also change the course of that operation. If you set a car to drive
straight by installing a system of codes and numbers, it would. However, that
binary system will not help the car realize that there is an obstacle; for
example, a giant rock lying in the middle of the road. With the help of a neural
network, the car gains the intelligence to detect the problems that may arise
during the operation, and in this case, swerve around the rock to avoid a crash.
Neural networks have the ability to see when there is an issue with a certain
code so they can either repair the problem on their own or move to a different
line without interrupting the entire operation.

Loosely modeled on the structure of the human brain, artificial neural networks
take information from their environment, process it, and then finally respond to
it. The basic concept behind the whole artificial intelligence idea was that if
you feed the machine as much data as possible, it would become able to think.
Of course, machines are not exactly able to think, just like Frankenstein was
not exactly a human. However, the point is that all of the information that you
exhaustively store into a computer makes it more powerful and increases its
functionality. Using this example, IBM programmers programmed every single
chess move and strategy into a computer. They created a machine that could
calculate every possible outcome and even predict the opponent’s move in
order to outplay them. This is what is called a memory bank.

However, a memory bank is very different than what neural networks do. You
see, the previous example with the IBM programmers is a classic
encyclopedic storage, and it is in fact what conventional computer systems
could do. They could ‘think’ as some like to call it, but they were unable to do
the one thing that the human brain could – learn.

The technology of analyzing the input information based on the human brain
may seem new, but it has been around since the 1940s. However, despite their
best efforts, every success that scientists had made at that time was quickly
overshadowed by the progress in technology. Also, neural networks were, in a
way, exaggerated at that time which cast disbelief 0ver the entire field.

Over the past decade or so, scientists renewed the old concept thanks to
today’s super powerful processors, as well as the huge increase of data, such
as voice searches, images, videos, and so much more. Today, neural networks
can finally live up to their potential.

The ANNs contain thousands of artificial neurons that are stacked in rows and
layered up to form millions of different connections. This highly functional
structure allows neural networks to learn and generalize, which means that they
can solve some very complex problems that seem intractable. However, an
individual neural network cannot offer a solution individually, but they can
work in groups. These networks work by decomposing the complexity of the
problems first, and then each of these networks takes the simpler task to work
on, in order to solve the problem together. However, it is important to mention
that scientists have yet a long way to go before they can upgrade these
networks to mimic the human brain completely.
The Benefits
Why and when are neural networks so important? In the world we live today,
I’d say neural networks are important all of the time and pretty much
everywhere. Neural networks offer many different properties and capabilities
that are highly useful.

Non-Linearity. Artificial neurons can be either linear or nonlinear. The


nonlinearity is carried throughout the network, and it is a very special property.
This property is especially important in those cases where the physical
mechanism for generating the output is also nonlinear.

Input-Output Mapping. Neural networks also learn through input and output
mapping, meaning that they are fed data that they later respond to. Unlike most
of the statistical methods, this makes neural networks non-parametric models
without the need for prior assumptions or higher statistical background.

Adaptivity. Neural networks are highly adaptive. Adaptivity is one of their


greatest built-in properties that allows them to adapt their synaptic weights as
their environment imposes. That means these networks are trained to operate
under certain conditions, and that way they can easily be retrained to generate
the desired outcome.

Evidential Response. Another great advantage of neural networks is that


during pattern classification they will not only inform which pattern is best to
be selected, but they can also provide information about the confidence of the
decision. This information is vital for improving the performance of the
network classification because it can automatically reject any vague patterns,
should the need for that arise.
Fault Tolerance. When implemented in a hardware form, neural networks
have the potential to be fault tolerant, in the sense that when the operating
conditions become adverse, their performance degrades gradually. This is
important because that way the damage is significantly reduced. Instead of
experiencing a catastrophic failure when neurons or information get damaged,
the network starts gracefully degrading its performance.

VLSI Implentability. The parallel nature of neural networks makes the


process of particular computation tasks very fast and easy. That is what makes
these networks suitable for implementation with Very Large Scale Integrated
(VLSI) technology.
Inside the Human Brain
The human brain is a complex machine. In fact, it is so complex that people
still haven’t cracked the puzzle of completely figuring it out. It is still unknown
how the human brain manages to train itself to learn and process information,
however, what we do know is the fact that it uses neurons to do it.

There are over 100 billion nerve cells called neurons found in the human
brain. The brain uses its interconnected neuron network to process information
and model the outer world. Each neuron sums and fires signals from and to
other neurons. However, there are two other main factors besides the neurons
that are in charge of information processing. The three main pieces of the brain
puzzle are receptors, neurons, and effectors.

For you to gain a good understanding of how artificial neural networks work,
you must first understand how the biology works, and what is going on inside
the human brain.

Receptors. Receptors are the nerves found all over your skin. Receptors can
be found at the end of your toes, your fingertips, etc. These nerves are the ones
that accept information. For instance, when you touch something with your
fingers, the receptors can ‘feel’ it and start their communication with the
neurons by transferring the received information to the neural network.

Neurons. This is the part that interests us the most. The neurons are the center
and the main part of the information processing task because they are in charge
of the responses of the body. There are three components that are particularly
significant to understand the artificial neurons. These components are
dendrites, soma, and axons.
- Dendrites are the receiving ends for the signal. Neurons use
dendrites to send signals to other neurons, as well as to collect the output
that another neuron sends. The signals that the neurons send are electric
impulses that some chemical processes transmit across a synaptic gap.
- The axon is ‘the body’ of a neuron. It is a long and thin stand that
splits into many (thousands) different branches. At the end of each
branch, there is a synapse, a structure in charge of converting the activity
from the axon into electrical effects that next inhibit activity to those
neurons that are connected.
- The soma is what sums the incoming signals. The soma is where the
information is stored. There, the nucleus (a cranial nerve) pulls the best
option from the many processes that it can do in order to transmit the
signal to other neurons through its axon. However, the input or received
information must be greater than a threshold in order for the soma to fire
the information. If the information is sufficient enough, the soma then
transmits the signal through the axon, to the dendrites of another,
connected neuron.

Effectors. The effectors are much simpler. Their main job is to act out
whatever decision the neurons decide to make. If a neuron detects an issue
with stimuli, it may ask the effectors to fire. Their only job is to perform the
action and complete the required task, whether neurons has asked them to lift a
leg or scratch the hand.

Although it is still hard to fully understand the human brain, it has been a lot
easier since 1911, when Ramon y Cajal, a Spanish pathologist and
neuroscientist, introduced the idea of brain structure and neurons.
The Difference Between Biological and Artificial
As you already know, artificial neural networks are inspired and based on the
complex anatomy of the human brain, or its neurons, to be precise. That means
that the artificial and biological neurons are similar right? Well, that depends
on what you mean by similar. I recently read a very interesting comparison.
Comparing an artificial neural network with the neural network found in the
human brain is almost like comparing a car and a horse. Both can get you from
point A to point B and they both need fuel to function properly, so they are
similar right? Well, not quite. The most important difference is obviously in
their speed and efficiency. A car can transport you from one place to another in
a very timely manner while traveling on a horse can take hours, if not days. A
horse will also get tired faster, which means that it is not as efficient as a car.
Just like a car and a horse, artificial and human neurons can also be similar,
but it depends on the actual parts.

- The processing elements in both artificial and biological neuron


receive signals.
- The signals are modified by a synaptic weight, both in the ANN and
biological neural networks.
- When there is sufficient input, both neurons can transmit a single
output.
- The output from one neuron travels to another
- The strength of the synapses can be modeled with experience

No matter how many similarities these two may have, we cannot argue the fact
that their dissimilarities surely outweigh the resemblance. I will finish the
thought with a single example – we still haven’t figured out how the brain fully
functions which means that we do not know how far behind we actually are.
Just like cars and horses, the human brain and artificial networks may have
similar functions, but the dissimilarities between them are so significant that
we should question if we should be making these comparisons to begin with.

Did you know that neurons are 5-6 orders of magnitude slower than the silicon
logic gates? That means that the events inside a silicon chip happen in a
nanosecond, while the events in the neurons happen in a millisecond. Yet,
despite this slow operation rate, the human brain is still faster. That is because
100 billion neurons make over 60 trillion connections. This results in
enormous efficiency that technology is still unable to recreate with artificial
intelligence. However, even though the artificial neurons we make are
primitive compared to those found in the human brain, they have been a huge
step forward for the world of technology, and hopefully, one day we will be
able to create something much more similar to the human brain. Who knows?
We have already made significant progess in the last two decades alone.
The Application of ANN
ANN or Artificial Neural Networks are broad when it comes to their
applicability. Today, in this technology era we live in, neural networks are
used in a spectrum of industries.

Pattern Recognition
Pattern recognition is probably the category where most of the interesting
issues fall. Neural networks have proven to be successful in cracking pattern
recognition problems. One of the most popular issues is the automatic
handwriting recognition. It is pretty hard to recognize someone’s writing using
a traditional technique because there are a lot of different sizes, styles, and
positions. The backpropagation algorithm (which we will talk about later in
this book) has been known to be of great use for pattern recognition. The best
part is that even if an application is based on one training algorithm, its
architecture can easily be changed to boost performance.

Control
Imagine a new driver backing up a trailer truck. If you have ever tried and
failed, or even tried and succeeded in this maneuver, then you already know
that it is quite tricky and difficult. On the other hand, an experienced driver
who has gone through this process one too many times can now do it with ease.

Now, let’s use training a neural network to provide directions to a trailer truck
so that it can back up a loading dock, as an example of the control application
of ANN. We have the information that describes the position of the truck’s cab,
the position of the loading docks, the rear’s position, as well as the angles that
both the trailer and truck make with the loading dock. Now, this neural network
is able to learn the best way in which the truck can be steered in order to reach
the dock. The neural network can solve the problem in two ways:
1. The first way the emulator learns how to compute a new position for the
truck, is by using its current positions and steering angle. At each step,
the truck moves a particular distance. With this method, the module can
‘feel’ how the truck reacts to different steering signals. This emulator
contains hidden layers, and it is trained with the help of the
backpropagation algorithm.
2. The second module or the controller begins its work only after the
emulator is trained. The controller is used to give the right steering
signals to get the trailer truck to the dock with its back parallel to it.
After each steering signal the controller gives, the emulator determines a
new position. This is done for as long as it takes for the truck to get to
the dock. Then, errors can be determined and the weights can be easily
adjusted.

Speech Production
Neural networks are also used for producing spoken text. And while it may
look like an easy task, learning to speak English aloud is not so simple. There
are quite a few phonetic pronunciation rules that need to be followed, and
given how each letter mainly depends on the context in which it appears, it can
get tricky.

Teaching a machine to speak is not that different from teaching a child. At first,
all you can hear are a few vowels and consonants and a rather funny, babbling
sound. All a baby can say is dada, mama, and some other simple syllables. As
the teaching process continues, the baby develops a richer vocabulary and can
say up to 50 words, and so on, until the baby becomes ready to participate in a
real conversation.
The same process is used with machines. Let’s take NETtalk for example.
NETtalk is a neural network created in 1986 that pronounces written English
by being shown words as input. The only other requirement that NETtalk
needed in order to talk was knowing the correct phonetic transcriptions to use
for comparison. The NETtalk was trained with the most common 1000 English
words, and it could read new words with few errors. However, NETtalk also
learned in stages. First, it had to learn to distinguish consonants and vowels,
then it had to learn to recognize the boundary between the words and so on,
just like a child.

Speech Recognition

Speech recognition is a part of our daily lives. Think Apple’s Siri or Amazon’s
Alexa. All you have to do is press a button, and voila, you can send a text
message without actually typing the text into your phone. However, have you
ever given a thought to how this is possible?

In order for the machine to recognize speech, obviously, it has to be trained to


do so, which means that it has to have models to use for comparison.
Obviously, the easiest way to do so is to feed the machines audio speech
recordings, but unfortunately, we are not quite there yet (hopefully this will
change in the near future). The point is that the speech varies in speed. What
does that mean? That means that if you say ‘sugar’ and someone else says
‘suuuuuugaaaaaaar,' you will produce different sound files. The second file
requires more data, meaning that aligning different-length audio recordings
automatically is hard.
In order for the machine to be able to recognize the speech, it goes through
many training processes. It must be fed sound waves in the form of pictures and
numbers; it must go through sampling, etc. Neural networks have been a real
lifesaver for the progress of speech recognition, and today, this is one of their
most popular applications.

Business
Neural networks are also applied in a number of different business settings, for
instance, mortgage assessment. Moreover, while you may think that mortgage
assessment is pretty straightforward and simple, the truth is, it is kind of hard
to completely specify the process that experts use to make marginal case
related decisions. The whole idea of using a neural network for a mortgage
risk assessment is so it can provide much more reliable and consistent
evaluation by using past examples.

Trained by professional mortgage evaluators, neural networks can easily


screen applications and determine which applicants should be given a loan.
The input here is information about the applicant’s employment, dependents,
monthly income, etc, and the outcome is a simple ‘reject’ or ‘accept’ response.
You can say that this is a simple threshold-type of decision. If the applicants
meet the requirements, then they will get a positive response. If not, they will
be rejected. It is as simple as that.

Medicine
This application of neural networks is extremely helpful and important. The
best example of this application comes from the mid-1980s when it was
created. This application was called “Instant Physician,” and the idea behind it
was to be able to train a network to store medical records so that it can offer
the right treatment. By being taught different conditions, symptoms, and
diagnosis, the network was able to recognize diseases and give a diagnosis.

We know that this opportunity is nothing unfamiliar to us now, but it is thanks to


the effect of neural networks that we can simplify our lives in so many different
ways. This simplification is something that people in the early 1980s could
only dream of.
Learning in Artificial Neural Networks
The most important part, and the part that makes machines intelligent is the
ability of the neural networks to learn. By being fed information from their
environment, the artificial neural networks can learn from it in order to
improve their performance. Through the iteration of the learning process, the
networks learn more about their environment, and the processes of adjustments
that are applied to its bias levels and synaptic levels are what makes these
networks more knowledgeable.

The learning process implies these events:

1. The ANN is stimulated by the environment.


2. This stimulation results in changes in the network’s free parameters.
3. Because of the change, the ANN becomes able to respond to the
environment in a new way.

There are many different ways in which a machine can learn. Below you will
find each of them explained and simplified for you to understand.

Error-Correction Learning. This learning technique, as the name itself


suggests, uses errors to direct the process of training. This means that when
comparing the output of the system to the output that is desired if errors occur,
they are used to learn. This technique is similar to a ‘learn from your mistakes’
type of learning. By using some algorithms (usually, the backpropagation
algorithm), the weights are directly adjusted by the error values.

This learning-from-errors type of learning process is great at preventing errors


from happening in each of the training iterations.
Memory-Based Learning. Memory-based learning is a process of storing and
retrieving information. All or most of the past experiences are stored in a large
memory that consists of classified input – output examples.
The memory-based learning uses algorithms that involve:
A) A criterion that is used to define the local neighborhood of the test
vector
B) A learning rule that is applied to the local neighborhood’s training
examples.

Hebbian Learning. Hebbian learning, or the Hebb’s rule, is the oldest and
most popular learning rule. You can learn more about The Hebb Rule in
Chapter 7.

Competitive Learning. In this learning process, the output neurons of the


network are in a constant competition over which one will end up being fired.
While the Hebb rule allows a couple of neurons to be fired at the same time,
here, only one neuron can be active at a given time. This learning rule has three
basic elements:

1. A group of neurons that are the same, except for the distributed weights
that can respond differently to the given input patterns.
2. A limit that is imposed on the strength or weight of each neuron.
3. A competing mechanism that allows the neuron to compete for the right
to be fired and respond to the input.

The simplest form of this learning process is the one where the network has a
single layer of output neurons that are connected to the input nodes. The
network often includes feedback connections that perform lateral inhibition,
meaning that each of the neurons tend to inhibit the one that it is connected to.

Boltzmann Learning
This learning rule, named after Ludwig Boltzmann, is a stochastic algorithm
that has its roots in statistical mechanics. Those networks that are based upon
this learning rule are called Boltzmann machines.

The neurons in the Boltzmann machines have a recurrent structure which means
that they are either ‘on’ and can be denoted by +1, or are in an ‘off’ state that is
denoted by -1.
The Paradigms
Although there are many different learning rules, there are only three major
learning paradigms.

Supervised Learning

Supervised learning is learning with a teacher. In conceptual terms, you may


think about how the teacher is the one that has the knowledge of the
environment, and the environment is unknown to the network. The teacher,
through a process of learning, may use different input methods to teach the
ANN the desired output, and to eventually receive it. It is just like teaching a
child. You use your built-in knowledge, and through different learning methods,
you expect that the child will respond with the desired output and show you
that they have actually learned something.

The error-correction learning process is a great example of supervised


learning or learning with a teacher. By many input-output examples, the teacher
is able to calculate the error, make adjustments, and even change the teaching
process in order to get the desired outcome.

Unsupervised Learning

Contrary to supervised learning, unsupervised learning means learning on your


own, or without a teacher. Here, the network is on its own to find the desired
output. There is no one to oversee the learning process, meaning that there are
only inputs, and it is the network’s job to find the pattern within these inputs
and generate the right outcome.
This type of learning is used in data mining and also used by recommendation
algorithms because of their ability to foresee the preferences of the user based
on those preferences of similar users that have been grouped together.

A great way to perform unsupervised learning is by using the competitive


learning rule. In fact, the best way to do it is to use an ANN that has two
layers. One of the layers is the input layer that is in charge of receiving
available data, while the other layer is made of neurons that compete for the
chance to ‘fire’ first and respond to the features of the input data.

Reinforcement Learning. Reinforcement learning, although it is mostly


considered to be an unsupervised type of learning, is actually somewhere in
between learning with and without a teacher. Here, some feedback is given, but
output is still not provided. Reinforcement learning means learning with a
reward. Based on how well the system responded, a reward is given. The main
goal here is to increase the reward through the process of trial-and-error.
Reinforcement learning is a great way of learning because it is how nature
works. Why do you think they give puppies a treat every time they find the
right spot to eliminate waste or respond to a ‘sit’ command? The reason for
this is because it is much easier to remember those actions for which you were
rewarded.
The ANN Architecture
Usually, artificial neural networks are visualized in layers, mostly because it is
much more convenient to analyze how they work that way. So, for the sake of
convenience, let’s imagine that neural networks are arranged in layers. The
neurons that are found in the same layer also behave in the same manner. The
behavior of the neurons is mostly determined through their activation function,
as well as the weighted connections through which neurons receive and send
signals.

In fact, the neurons from a single layer can either be fully connected or not
connected at all. Also, if a neuron is connected to another one from a different
layer, then all of the hidden units are connected to each output neuron.

This arrangement in layers and the patterns of connections between the layers
is called the network architecture.

Single-Layer Net

The single-layer nets have only one layer of connection weights. They can be
characterized as input units that receive the signals from the environment, and
output units that respond to the environment by generating input. The typical
single-layer nets have input units that are connected to each other but not
connected with the output units, and the other way around, meaning the output
units are connected only to the other output units. That means that single-layer
networks are simply feedforward types.

Multiple-Layer Net
Unlike single layers which are basic and simple, multiple-layer nets are also
more complicated type of networks that are constructed not of one, but multiple
layers. Unlike the single layers which have only input or output units, the
multiple-layer networks have a layer or two in between. That means that
between the input units and the output units there are also hidden units.
Moreover, while this may sound like something too scientific, actually there
isn’t as much to the hidden units as you may think. They are just that, hidden
units that are neither input nor output units. Obviously, these multi-layer
networks are used to solve more complicated problems than those that can be
resolved by the single-layer nets. However, that makes these multi-layer nets
harder to train. Just keep in mind that despite that fact, sometimes training
multilayer networks is more successful just because they can solve a problem
that the single layers cannot even be trained to perform the right way.

Recurrent Networks
The recurrent network can be easily distinguished from the feedforward nets
because it always has at least one feedback loop. For instance, a recurrent
network may have a single layer of neurons where each of the neurons feeds
the output signal back to the other neurons’ inputs. Note that the recurrent
network does not have hidden units.
The Basic Math Behind Artificial Neurons
The artificial neural networks are made of interconnected units that are meant
to serve as model neurons. That is what neurons are, information-processing
units. The neurons found in the human brain (the ones I explained earlier) serve
as models upon which the artificial neurons are created. In an attempt to make
the machines intelligent and give them a ‘brain,' scientists program computers
to simulate their functions.

There are three basic elements of the model of a neuron:

1. A set of connection links or synapses – All of them are characterized by


strength or weight. For instance, a signal xi from a synapse i is
connected to a neuron k and multiplied by the weight of the synapse wkj.
Make sure to remember the matter in which the synaptic weight is
written. The first subscript k refers to the neuron and the second
subscript i always refers to the synapse’s input end. Also, note that
unlike the human synapse, the weight of an artificial neuron may have
both positive and negative values.

2. An adder – This is for summing the signals that are weighted by the
neural synapses. This operation constitutes a linear combiner.

3. An activation function – This is for limiting the amplitude of the


neuron’s output.

Mathematically, a neuron k can be described with the following equations:


and

Here, X1, X2,…., Xm, are the input signals, Wk1, Wk2,…..,Wk3, are the
weights of the synapses, Uk is the linear combiner, bk is the bias that has effect
of either lowering or increasing the activation function’s input, is the
activation function, and Yk is the neuron’s output signal.

The bias bk can apply an affine transformation to the output Uk:

Vk = uk + bk

Depending on the fact whether the bk is negative or not, the relationship


between the activation potential or as called induced local field vk and the
linear combiner uk, can be modified as:
Induced Bias bk > 0
local field vk

Bias bk =0

0 Bias bk <0

Linear combiner’s
Output uk

The Activation Function


The activation function which is denoted by represents the neuron’s output
in terms of v – the induced local field. All of the inputs are weighted
individually, added together, and finally passed in this function:
We have three types of activation functions.

The Threshold Function. The threshold function is used to check whether the
signal is greater than a certain threshold. If the signal meets the sufficient
requirements, the output will be 1. If, however, the threshold is not met, the
output will be 0.

can output 1 only when v ≥ 0


can output 0 when v < 0

For instance, let’s imagine that the threshold is 1. Let’s say that the first input
X1 is 0.5 and the second one X2 is 0.7. Let’s also assume that the first weight
W1 is 0.8 and the second weight W2 is 0.3. Let’s calculate and see whether the
threshold is met and the neuron can fire the signal:

X1W1 + X2W2 = (0.5 x 0.7) + (0.8 x 0.3) = 0.35 + 0.24 = 0.69

Since 0.69 < 1, then the threshold is not met and the neuron will not fire the
signal.
Pice-Wise Linear Function. Here, the amplification that is found inside the
operation is considered to be a unity. This activation function is usually
considered to be an approximation to an amplifier that is nonlinear. These are
the two forms of this function:
- If the linear operational region is maintained without the need to run
into saturation, a linear combiner arises
- If the amplification factor is infinitely large, the pice-wise linear
function automatically decreases to a threshold function.

= 1, v = +

= v, + > v > -
= 0, v ≤ 0

Sigmoid Function. This s-shaped graph function is without a doubt the most
commonly used type of function in the creation of artificial neural networks.
This function is considered to be a great balance between the nonlinear and
linear behavior. The best example of this function is the logistic function which
is defined by this equation:

Here, a is the slope parameter of this function, and by varying it, we can obtain
different functions of different slopes. As a approaches infinity, the sigmoid
function then becomes a threshold function. However, it is important to know
that the threshold function assumes the value of 0 or 1, meaning that the
function sees both 0.01 and 0.99 as the same – they simply does not meet the
threshold. The sigmoid function, on the other hand, assumes a continuous range
of values from 0 to 1. Also, this type of function is differentiable, while the
threshold function is not.
Simple Networks for Pattern Classification
The pattern classification is probably the simplest way to use a neural
network. In these types of issues, the vector input or the pattern either belongs
or doesn’t belong to a certain category or class. Pattern classification
problems may arise in many different areas. For instance, one student of
Widrow named Donald Specht, used neural networks in 1963 in order to detect
heart abnormalities with EKG data types that served as inputs. With the help of
46 different input measurements, he was able to detect whether the output was
‘normal’ or ‘abnormal.'

Before you can start training and using your multi-layer neural networks, you
must first have a great knowledge of how you can actually train much simpler,
single-layer networks. This chapter will help you gain that knowledge.
Linear Separability
It is our intention to train each of the networks in this chapter so they can
respond with the right classification when being presented with the input
pattern. Before we delve deeper into explaining these particular ways of
training the single-layer networks, it is important for us to discuss some issues
that are common for all of these simple neural networks.

For a certain pattern unit, if the pattern belongs to its class, the desired
response would be ‘yes,' and if it is not, the response would be ‘no.' Since we
rely only on these two responses, and there is nothing in between, the
activation function here would obviously be the threshold function.

Imagine that there are two input patterns X, and X2 that have to be classified
into two classes.
X1
0 0 0 0
0 0 0
x x 0 0
x
x x x x 0 X2
x x
x x L

Each point either symbolled as x or as 0, defines a pattern that has a certain set
of values. Each of the patterns is classified into one class. A single line, L,
separates the classes. They are called linear separable patterns. Linear
separability means that those classes of patterns that have an n-dimensional
vector x (X1,X2,…Xn) can easily get separated with a single decision surface.
The units of the single-layer networks can categorize a set of patterns into 2
classes, since it is the threshold function that defines the linear separability.
That means that in order for the network to function properly, the two classes
must be linearly separable.
The Hebb Rule
The earliest learning rule is the Hebbian or Hebb rule. Hebb was the one who
first suggested that the process of learning happens as a result of the
modification in synapse weights, in the sense that when two internally
connected neurons are ‘on’ at the same time, the synapse weight between them
becomes increased. Although this only applies when there are two connected
neurons that are ‘firing’ signals at the same time, we now know that an even
stronger learning process can occur if the weights get increased when both
interconnected neurons are ‘off’ at the same time.

The desired weight update can be easily represented as:

Wi (new) = Wi (old) + XiY


THE ALGORITHM:

1. Initializing the weights: i = 0 (i= 1 to n)

2. Set activations for the units of input: Xi = Si (i = 1 to n)

3. Set activations for the units of output: y = t

4. Adjusting the weights: Wi (new) = Wi (old) + XiY (i= 1 to n)

5. Adjusting the bias: b (new) = b (old) + y

The weight update is also expressed in this vector form: w (new) = w (old) +
xy
The Perceptron
The perceptron learning rule is much more powerful than the Hebb rule. Under
the right assumptions, the perceptron’s iterative learning procedure can be
proved to converge with those weights that cause the network to produce the
right output value for the input patterns.

Some perceptions can be self-organizing, but they are mostly trained. Usually,
they have three neuron layers: sensory units, associator units, and response
units. For example, a simple perceptron can use binary activations for the
sensory and associator units, and a +1, 0, -1 activation, for its response unit.
The sensory units and associator units are connected with weights that are
fixed and have random values of -1, 0, and +1.

The activation function of the associator units is the binary threshold function,
which means that the signal that these units send to the output units is a binary
signal, or to be more precise 1 or 0. That output is y = (y_in) and the
activation function is:

(y_in) = 1 if y_n > 0

(y_in) = 0 if -0 ≤ y_n ≤ 0

(y_in) = -1 if y_n < -0

The weights from the associator units to the output are adjusted by the learning
rule of perception. That means that for every training input pattern, the network
calculates the response. Then, by checking the output with the targeted value,
the network determines whether an error has occurred or not.

If errors occurs and adjustment is needed, the weights can be adjusted by using
this formula: wi (new) = wi (old) + atxi,. Here a is the learning rate, while the
target t is either +1 or -1.

The training continues until the error stops occurring and no changes in the
weights are necessary.

THE ALGORITHM:

1. Initialize the weights and the bias (you can set them to zero to simplify
things).

2. Set the learning rate. 0 ≤ a ≤ 1 (you can set it to 1 to simplify things).

3. Set the activations of the input units xi = yi.

4. Compute the response of the output unit:

y_n = b +

y = 1 if y_n > 0
y = 0 if -0 ≤ y_n ≤ 0
y = -1 if y_n < -0

5. Update the weights and the bias if error occurred:


Wi (new) = W1 (old) + atxi
b (new) = b (old) +at

else

Wi (new) = Wi (old)
b (new) = b (old)

6. Test the stopping condition. If no weights changed, stop. Else, continue.


Adaline
The Adaptive Linear Neuron or Adaline usually uses bipolar 1 or -1
activations for the input and target signals. The weights here are adjustable,
and the bias, which is an adjustable connection weight from a unit that always
has activation 1.

The Adaline can be trained with the use of the delta rule or known as LMS
(the last mean square). Adaline is a very special case that only has a single
output. During training, the activation is the network input, meaning that the
activation function is the identity function. This learning rule is in charge of
minimizing the error of the mean square, which means that the network is
allowed to continue its learning on every training pattern, regardless of the fact
that the right value output has maybe already been generated (meaning that a
threshold has been met).

Once the training is finished, the network can be used for pattern classification
where the desired value output is +1 or -1. There is a threshold function
applied, meaning that if the output is greater than 0 or 0, the activation is set to
1, if it is not, it gets automatically set to -1.
THE ALGORITHM

1. Initialize the weights. Usually, there are small values used.

2. Set the learning rate a.

3. Set the activations of the input units, i = 1, 2, ..., n

Xi = Si
4. Compute the network input to the output unit.

y_n = b +

5. Update the weights and the bias i = 1, 2, …., n

B (new) = b (old) + a (t – y_n)


Wi (new) = Wi (old) + a(t – y_n)Xi

6. Test for stopping condition. If the largest of the weight changes is smaller
than the tolerance specified, stop. Else, continue.
Building a Simple Neural Network Code
Now that you know the most basic math behind artificial neural networks, as
well as all of the most important things that a beginner should know about
ANNs, it is time to test what you have learned so far and see how even a
beginner can easily create a code for a simple neural network.

First of all, because you have a long way to go before actually being able to
use your complex neural networks for something more than simply testing your
skills, we will stick to the basics. That means that in this chapter we are not
going to create a mind-blowing and life-altering code, but we will keep it
simple and use a single neuron that has three inputs and one output.

For that purpose, I decided to stick with the Python programming language,
which I believe works best.

However, before you start typing the code, let’s first learn how to build it.

In order to create a neural network, we need to train our network to solve a


problem. Let’s try this one:

Input: Output:
1st Example: 0 0 1 0
2nd Example: 1 1 1 1
3rd Example: 1 0 1 0
4th Example: 0 1 1 0

Our Problem: 10 0 ?
As you can see from the previous examples, the output is always the value
from the column on the left, which means that our desired output would be 1.
Now, we know the input and the output, but how do we actually teach the
neuron to generate the desired output? We do this by giving each of the input
values a weight which can be either a negative or a positive number. If the
input has a large positive or negative weight, it will have a strong impact on
the output of the neuron. Each weight is set to a random number.

The training process should look like this:

1. Take the inputs from the set of training examples, adjust the weights, and
calculate the output by using the correct formula.
2. Calculate the error. The error is the difference between the output that is
generated by the neuron and the desired output that we want to receive.
3. Depending on the errors, adjust the weights.
4. Repeat this about a bajillion times.

Let’s remind ourselves of the formula for calculating the output:

XiWi = X1 x W1 + X2 x W2 + X3 x W3

To simplify things and keep the result between 0 and 1, we will use the very
convenient Sigmoid function:

1
1 + e-x
Now, if we distribute our first formula into the second one, the neuron’s output
should be:
1
1 + e-XiWi

Now that we can calculate the output, we need to know how to adjust the
weights after receiving a wrong answer. This is the formula we will use:
Adjust weights = error x input x output (which is the SIgmoidCurveGradient)

So, what does this tell us? We have calculated the output with the Sigmoid
curve; we know that if the weight is either too large or too small it impacts the
neuron greatly. For this reason, we must keep in mind that when the numbers
are large the Sigmoid curve’s gradient is shallow (it is an s-shaped graph); we
need to multiply by the Sigmoid curve to force the neuron to adjust.

The sigmoid curve gradient can be found by output x (1- output) which leads
us to our final formula:

Adjust weight by = error x input x output x (1 – output)

Now that we know how to train our simple neural network, let’s try the code:

from numpy import exp, array, random, dot

class NeuralNetwork():
def __init__(self):
# Always seed the random generator in order for it to be
# able to generate the same number every time
random.seed(1)

# Our model has 3 input connections and only 1 output connection


# We give the matrix 3 x 1 random numbers from -1 to 1
# and mean 0.
self.synaptic_weights = 2 * random.random((3, 1)) - 1

# The Sigmoid function or the s-shaped curve.


# The weighted sum is passed through this function in order to
# keep them between 0 and 1.
def __sigmoid(self, x):
return 1 / (1 + exp(-x))

# The Sigmoid function’s derivate.


# The Sigmoid curve’s gradient.
# It indicates our confidence regarding the existing weight.
def __sigmoid_derivative(self, x):
return x * (1 - x)

# Our neural network is trained through the process of trial and error
# Each time, we adjust the synaptic weights.
def train(self, training_set_inputs, training_set_outputs,
number_of_training_iterations):
for iteration in xrange(number_of_training_iterations):
# The set of training is passes through our neural network (a single
neuron).
output = self.think(training_set_inputs)
# Calculating the error (The difference between the output we desire
# and the output predicted).
error = training_set_outputs - output

# Multiply error by the input and then by the Sigmoid curve’s gradient.
# This tells us that those weights that are less confident are adjusted
more.
# Those inputs that are zero do not change the weights.
adjustment = dot(training_set_inputs.T, error *
self.__sigmoid_derivative(output))

# Adjusting the weights.


self.synaptic_weights += adjustment

# The neural network thinks.


def think(self, inputs):
# Pass the inputs through the neural network (our single neuron).
return self.__sigmoid(dot(inputs, self.synaptic_weights))

if __name__ == "__main__":

#Intialising a single neuron neural network.


neural_network = NeuralNetwork()

print "Random starting synaptic weights: "


print neural_network.synaptic_weights
# Our training set of 4 examples. Every one of them has 3 input values
# and 1 output value.
training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
training_set_outputs = array([[0, 1, 1, 0]]).T

# Training the neural network with the help of the training set.
# Do it for maybe 10,000 times and make sure to make small adjustments
each time.
neural_network.train(training_set_inputs, training_set_outputs, 10000)

print "New synaptic weights after training: "


print neural_network.synaptic_weights

# Test your neural network with a brand new situation.


print "Considering new situation [1, 0, 0] -> ?: "
print neural_network.think(array([1, 0, 0]))
Backpropagation Algorithm and how to Program It
The backpropagation algorithm is probably the best method for training your
multi-layer artificial networks. This supervised algorithm for feedforward
neural networks works in two phases – propagation and weight update. Once
the network is introduced to an input value, the input is then propagated through
the neural network in a forward direction, slowly, layer by layer, until it can
finally reach the output layer. You know the drill afterward. Once the output is
generated, it is compared to see if it matches the desired output with the help of
a loss function, after which an error value must be calculated. Then, the
calculated error values are propagated in the opposite direction, backward,
from the output, and then through every layer until they finally reach the input
layer. This algorithm uses the errors in order to calculate the loss function’s
gradient.

In the second phase of the backpropagation algorithm the gradient is given to


the optimization method (selecting best elements from an available set of
alternatives), which updates the weights and decreases the loss function.

The backpropagation algorithm is in charge of training weights found in the


multilayer feedforward networks. That being said, the algorithm needs a well-
defined structure of one or more layers. Each layer has to be connected to the
next one. The standard form of structure is:

Input Layer ----- hidden layer ----- output layer

This algorithm can be used for both regression and classification problems, but
we will focus on the classification for the time being.
Initializing the Network

Let’s create a network that is ready for training. To keep it simple, we will
treat the network as a lаyer array, and we will initialize the weights to small
numbers. Remember to make them random. For this purpose, we will use
numbers from 0 to 1.

The function called initialize_network() will create a new network that is


simple and ready for training. We create the hidden layers with n_hidden.
Every neuron within these layers has weights - n_hidden + 1.

This will create the whole network:

from random import seed


from random import random

# Initializing the network


def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i
in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i
in range(n_outputs)]
network.append(output_layer)
return network

seed(1)
network = initialize_network(2, 1, 2)
for layer in network:
print(layer)

Forward Propagate

By propagating the input value through each layer, we can find the output value.
In order to do so we must:

- Calculate the activation. The activation of the neuron can be


calculated with this formula: activation = sum (Xi x Wi) + bias

- After the activation of the neuron, we have to see what the output is.
To do so, we will use the Sigmoid function output = 1 / ( 1 + e ^ ( -
activation ) )

- Finally, we need to implement the forward propagation. You will see


below that there is a function called forward_propagate() through which
we can achieve that.

This is the full code for forward propagation. At this point, these numbers are
still pretty much useless since we have still a lot of work to do to make the
weights more useful.

from math import exp

# Calculate the activation of a neuron for a given input


def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer the activation of the neuron


def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate the input to the network output


def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

# test the forward propagation


network = [[{'weights': [0.13436424411240122, 0.8474337369372327,
0.763774618976614]}],
[{'weights': [0.2550690257394217, 0.49543508709194095]},
{'weights': [0.4494910647887381, 0.651592972722763]}]]
row = [1, 0, None]
output = forward_propagate(network, row)
print(output)

Back Propagate Error

At this point, we must calculate the error by comparing the generated output
with the desired output. We can do that with the help of the Sigmoid function,
or the derivative, to be exact. error = (expected - output) *
transfer_derivative(output).

You can calculate the error for the hidden-layer neurons as the weighted error
for every neuron in the output. error = (weight_k * error_j) *
transfer_derivative(output)

Here is the code:

# Calculate the derivative of the output


def transfer_derivative(output):
return output * (1.0 - output)

# Backpropagate the error and store in the neurons


def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# test the backpropagation of error


network = [[{'output': 0.7105668883115941, 'weights':
[0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
[{'output': 0.6213859615555266, 'weights': [0.2550690257394217,
0.49543508709194095]}, {'output': 0.6573693455986976, 'weights':
[0.4494910647887381, 0.651592972722763]}]]
expected = [0, 1]
backward_propagate_error(network, expected)
for layer in network:
print(layer)

Train the Network

The network is trained by exposing dataset of training to the neural network.


Each of the processes of forward propagating the input is followed by
backpropagating the errors, as well as updating its weights.

The weights can be updated by using this formula: weight = weight +


learning_rate x error x input.

Below is the process. Because it is a binary problem of classification, we use


2 neurons for the output layer. We will train our network for 20 times. We will
be doing it with a higher learning rate of 0.5, because we will be training for a
few iterations.

from math import exp


from random import seed
from random import random

# Initializing the network


def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i
in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i
in range(n_outputs)]
network.append(output_layer)
return network

# Calculating the activation of the neuron for an input


def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation
# Transferring the activation of the neuron
def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output


def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

# Calculating the derivative of an output


def transfer_derivative(output):
return output * (1.0 - output)

# Backpropagate the error and store in the neurons


def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Update the network weights with error


def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']

# Train the network for a fixed epoch numbers


def train_network(network, train, l_rate, n_epoch, n_outputs):
for epoch in range(n_epoch):
sum_error = 0
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
sum_error += sum([(expected[i]-outputs[i])**2 for i in
range(len(expected))])
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)
print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate,
sum_error))

# Test training the backpropagate algorithm


seed(1)
dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
n_inputs = len(dataset[0]) - 1
n_outputs = len(set([row[-1] for row in dataset]))
network = initialize_network(n_inputs, 2, n_outputs)
train_network(network, dataset, 0.5, 20, n_outputs)
for layer in network:
print(layer)
Predict
Now it is time to make predictions. A function that is named predict() will help
us do that.

from math import exp

# Calculate the activation of the neutron for an input


def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer the activation of the neutron


def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate the input to an output


def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

# Make a prediction with the network


def predict(network, row):
outputs = forward_propagate(network, row)
return outputs.index(max(outputs))

# Test making predictions with your network


dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
network = [[{'weights': [-1.482313569067226, 1.8308790073202204,
1.078381922048799]}, {'weights': [0.23244990332399884,
0.3621998343835864, 0.40289821191094327]}],
[{'weights': [2.5001872433501404, 0.7887233511355132,
-1.1026649757805829]}, {'weights': [-2.429350576245497,
0.8357651039198697, 1.0699217181280656]}]]
for row in dataset:
prediction = predict(network, row)
print('Expected=%d, Got=%d' % (row[-1], prediction))
Wheat Seeds Dataset

This applies the algorithm to the wheat seeds dataset. The first thing to do is,
after loading the dataset, convert the numbers so they can be used in the
network. Load_csv() can help us load the dataset, str_column_to_float() will
convert the numbers to floats, and str_column_to_int() will convert the
columns of the class to integer values.

Finally, our new function back_propagation() will manage the algorithm by


initializing a network, training it, and then making predictions.
# Backpropagate on the Seeds Dataset
from random import seed
from random import randrange
from random import random
from csv import reader
from math import exp

# Load the CSV file


def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
# Convert the string column to float
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())

# Convert string column to integer


def str_column_to_int(dataset, column):
class_values = [row[column] for row in dataset]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
for row in dataset:
row[column] = lookup[row[column]]
return lookup

# Find the minimum and maximum values for each of the columns
def dataset_minmax(dataset):
minmax = list()
stats = [[min(column), max(column)] for column in zip(*dataset)]
return stats

# Rescale the dataset columns from 0 to 1


def normalize_dataset(dataset, minmax):
for row in dataset:
for i in range(len(row)-1):
row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])
# Split the dataset into k folds
def cross_validation_split(dataset, n_folds):
dataset_split = list()
dataset_copy = list(dataset)
fold_size = int(len(dataset) / n_folds)
for i in range(n_folds):
fold = list()
while len(fold) < fold_size:
index = randrange(len(dataset_copy))
fold.append(dataset_copy.pop(index))
dataset_split.append(fold)
return dataset_split

# Calculating the accuracy percentage


def accuracy_metric(actual, predicted):
correct = 0
for i in range(len(actual)):
if actual[i] == predicted[i]:
correct += 1
return correct / float(len(actual)) * 100.0

# Evaluating an algorithm with the help of a cross validation split


def evaluate_algorithm(dataset, algorithm, n_folds, *args):
folds = cross_validation_split(dataset, n_folds)
scores = list()
for fold in folds:
train_set = list(folds)
train_set.remove(fold)
train_set = sum(train_set, [])
test_set = list()
for row in fold:
row_copy = list(row)
test_set.append(row_copy)
row_copy[-1] = None
predicted = algorithm(train_set, test_set, *args)
actual = [row[-1] for row in fold]
accuracy = accuracy_metric(actual, predicted)
scores.append(accuracy)
return scores

# Calculating the activation of the neuron for an input


def activate(weights, inputs):
activation = weights[-1]
for i in range(len(weights)-1):
activation += weights[i] * inputs[i]
return activation

# Transfer the activation of the neuron


def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate the input to an output


def forward_propagate(network, row):
inputs = row
for layer in network:
new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs)
neuron['output'] = transfer(activation)
new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

# Calculating the derivative of an output


def transfer_derivative(output):
return output * (1.0 - output)

# Backpropagate the error and store in the neurons


def backward_propagate_error(network, expected):
for i in reversed(range(len(network))):
layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)):
error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta'])
errors.append(error)
else:
for j in range(len(layer)):
neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)):
neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Updating the weights with error


def update_weights(network, row, l_rate):
for i in range(len(network)):
inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']

# Training a network for a fixed epoch number


def train_network(network, train, l_rate, n_epoch, n_outputs):
for epoch in range(n_epoch):
for row in train:
outputs = forward_propagate(network, row)
expected = [0 for i in range(n_outputs)]
expected[row[-1]] = 1
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)

# Initializin the network


def initialize_network(n_inputs, n_hidden, n_outputs):
network = list()
hidden_layer = [{'weights':[random() for i in range(n_inputs + 1)]} for i
in range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random() for i in range(n_hidden + 1)]} for i
in range(n_outputs)]
network.append(output_layer)
return network

# Making a prediction with the network


def predict(network, row):
outputs = forward_propagate(network, row)
return outputs.index(max(outputs))

# Backpropagation Algorithm with Stochastic Gradient Descent


def back_propagation(train, test, l_rate, n_epoch, n_hidden):
n_inputs = len(train[0]) - 1
n_outputs = len(set([row[-1] for row in train]))
network = initialize_network(n_inputs, n_hidden, n_outputs)
train_network(network, train, l_rate, n_epoch, n_outputs)
predictions = list()
for row in test:
prediction = predict(network, row)
predictions.append(prediction)
return(predictions)

# Test the Backpropagation on Seeds dataset


seed(1)
# load and prepare data
filename = 'seeds_dataset.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
str_column_to_float(dataset, i)
# converting the class column to integers
str_column_to_int(dataset, len(dataset[0])-1)
# normalizing the input variables
minmax = dataset_minmax(dataset)
normalize_dataset(dataset, minmax)
# evaluate algorithm
n_folds = 5
l_rate = 0.3
n_epoch = 500
n_hidden = 5
scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate,
n_epoch, n_hidden)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

Viola! You have just created a more complex network, one that has 3 neurons
in the output layer and 5 in the hidden layer.
Conclusion
Congratulations! You can now build your very own neural network. With the
knowledge learned from this book, you can build, train, and adjust your simple
and multi-layer networks which will form the perfect base if you are
considering about taking on some more complex tasks.

Who said that beginners cannot program ANNs? Simple equations and basic
programming skills are all that programming neural networks requires.

Você também pode gostar