Você está na página 1de 21

Index

 Machine Learning
 Approaches towards problem-solving
 Supervised learning
 Semi-Supervised Learning
 Unsupervised Learning
 Reinforcement Learning
 Self-learning
 Feature Learning
 Anomaly Detection
 Models
 Artificial neural networks
 Decision trees
 Support vector machines
 Bayesian networks
 Genetic algorithms
 Training models
 Federated learning
 Defining An Intelligent System
 Types of Problems
 Big Problems
 Open-Ended Problems
 Time Changing
 Intrinsically Hard Problems
 Training and Testing the Model
 Application of Machine Learning
 References
Machine Learning
Machine Learning can generally be termed as the scientific study of
algorithms and statistical data that a machine is using without any
explicit instructions provided. Machine Learning is a subset of
Artificial Intelligence, but they both can be termed as studies of two
different fields of applications.
Machine Learning predicts and decides the operation to be performed
based on the sample input and output data provided by the user. The
machine prepares a mathematical model based on the data provided to
it and the output expected, this mathematical model can be known as
the "Training Set" for a defined amount of data. All the operations
performed by the "intelligent machine" are based on the training set.
Machine learning algorithms are used in a wide variety of
applications, such as email filtering and computer vision, where it is
difficult or infeasible to develop a conventional algorithm for
effectively performing the task.
Machine learning is closely related to computational statistics, which
focuses on making predictions using computers. The study of
mathematical optimization delivers methods, theory and application
domains to the field of machine learning. In its application across
business problems, machine learning is also referred to as predictive
analytics.

On a theoretical approach, we can explain Machine Learning as when


"A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P if its
performance at tasks in T, as measured by P, improves with
experience E." as coined by Arthur Samuel.
A core objective of a learner here is to generalize from its
experience.[1] Generalization in this context is the ability of a
learning machine to perform accurately on new, unseen examples or
tasks after having experienced a learning data set. The training
examples come from some generally unknown probability distribution
(considered representative of the space of occurrences) and the learner
has to build a general model about this space that enables it to
produce sufficiently accurate predictions in new cases.

The computational analysis of machine learning algorithms and their


performance is a branch of theoretical computer science known as
computational learning theory. Because training sets are finite and the
future is uncertain, learning theory usually does not yield guarantees
of the performance of algorithms. Instead, probabilistic bounds on the
performance are quite common. The bias-variance decomposition is
one way to quantify generalization error.

For the best performance in the context of generalization, the


complexity of the hypothesis should match the complexity of the
function underlying the data. If the hypothesis is less complex than the
function, then the model has underfit the data. If the complexity of the
model is increased in response, then the training error decreases. But
if the hypothesis is too complex, then the model is subject to
overfitting and generalization will be poorer.[2]
In addition to performance bounds, learning theorists study the time
complexity and feasibility of learning. In computational learning
theory, computation is considered feasible if it can be done in
polynomial time. There are two kinds of time complexity results.
Positive results show that a certain class of functions can be learned in
polynomial time. Negative results show that certain classes cannot be
learned in polynomial time.
Approaches towards problem-solving
Given a machine and a data set to be computed the data set may be
completely arbitrary, therefore first the machine has to go through a
learning process. The types of machine learning algorithms differ in
their approach, the type of data they input and output, and the type of
task or problem that they are intended to solve.
Supervised learning
Supervised learning algorithms build a mathematical model of a set
of data that contains both the inputs and the desired outputs.[3]
The data is known as training data and consists of a set of
training examples. Each training example has one or more inputs and
the desired output, also known as a supervisory signal. In the
mathematical model, each training example is represented by an
array or vector, sometimes called a feature vector, and the training
data is represented by a matrix. Through iterative optimization of an
objective function, supervised learning algorithms learn a function
that can be used to predict the output associated with new inputs.[4]
An optimal function will allow the algorithm to correctly
determine the output for inputs that were not a part of the training
data. An algorithm that improves the accuracy of its outputs or
predictions over time is said to have learned to perform that task.[5]
Supervised learning algorithms include classification and
regression[6]. Classification algorithms are used when the outputs are
restricted to a limited set of values, and regression algorithms are used
when the outputs may have any numerical value within a range.
Similarity learning is an area of supervised machine learning closely
related to regression and classification, but the goal is to learn from
examples using a similarity function that measures how similar or
related two objects are. It has applications in the ranking,
recommendation systems, visual identity tracking, face verification,
and speaker verification.
Semi-Supervised Learning
Sometimes the user may not be able to provide the machine complete
set of inputs so here in the case of semi-supervised learning
algorithms, some of the training examples are missing training labels,
but they can nevertheless be used to improve the quality of a model.
In weakly supervised learning, the training labels are noisy, limited,
or imprecise; however, these labels are often cheaper to obtain,
resulting in larger effective training sets.[7]
Unsupervised Learning
Unsupervised learning algorithms take a set of data that contains only
inputs, and fine structure in the data, like grouping or clustering of
data points. The algorithms, therefore, learn from test data that has not
been labeled, classified or categorized. Instead of responding to
feedback, unsupervised learning algorithms identify commonalities in
the data and react based on the presence or absence of such
commonalities in each new piece of data. A central application of
unsupervised learning is in the field of density estimation in
statistics,[8] though unsupervised learning encompasses other
domains involving summarizing and explaining data features.

Cluster analysis is the assignment of a set of observations into subsets


so that observations within the same cluster are similar according to
one or more pre-designated criteria, while observations drawn from
different clusters are dissimilar. Different clustering techniques make
different assumptions on the structure of the data, often defined by
some similarity metric and evaluated, for example, by internal
compactness, or the similarity between members of the same cluster,
and separation, the difference between clusters. Other methods are
based on estimated density and graph connectivity.
Reinforcement Learning
Reinforcement learning is an area of machine learning concerned
with how software agents ought to take actions in an environment to
maximize some notion of cumulative reward. Due to its generality,
the field is studied in many other disciplines, such as game theory,
control theory, operations research, information theory, simulation-
based optimization, multi-agent systems, swarm intelligence,
statistics, and genetic algorithms. In machine learning, the
environment is typically represented as a Markov Decision Process
(MDP). Many reinforcement learning algorithms use dynamic
programming techniques.[8] Reinforcement learning algorithms do
not assume knowledge of an exact mathematical model of the MDP
and are used when exact models are infeasible. Reinforcement
learning algorithms are used in autonomous vehicles or in learning to
play a game against a human opponent.
Self-learning
Previously the machine was learning based on a reward-punishment
system in other words the machine was specified of the consequences
it was going to face in case of a successful trial as well as an
unsuccessful one. Self-learning as a machine learning paradigm was
introduced in 1982 along with a neural network capable of self-
learning named Crossbar Adaptive Array (CAA).[9]
It is learning with no external rewards and no external teacher
advice. The CAA self-learning algorithm computes, in a crossbar
fashion, both decisions about actions and emotions about consequence
situations. The system is driven by the interaction between cognition
and emotion.[10]
The self-learning algorithm updates a memory matrix W =||w(a,s)||
such that in each iteration executes the following machine learning
routine:
In situation s perform action a;
Receive consequence situation s';
Compute emotion of being in consequence situation v(s');
Update crossbar memory w'(a,s) = w(a,s) + v(s').
It is a system with only one input, situation s, and only one output,
action a. There is neither a separate reinforcement input nor an advice
input from the environment. The backpropagated value is the emotion
toward the consequence situation. The CAA exists in two
environments, one is a behavioral environment where it behaves, and
the other is a genetic environment, wherefrom it initially and only
once receives initial emotions about situations to be encountered in
the behavioral environment. After receiving the genome vector from
the genetic environment, the CAA learns a goal-seeking behavior, in
an environment that contains both desirable and undesirable
situations.[11]
Feature Learning
Now instead of passing the machine with arbitrary initial inputs, the
machine pre-processes the provided data, before classification or
prediction of the final action to be performed. This technique allows
reconstruction of the inputs coming from the unknown data-
generating distribution, while not being necessarily faithful to
configurations that are implausible under that distribution. This
replaces manual feature engineering and allows a machine to both
learn the features and use them to perform a specific task. Feature
learning can be either supervised or unsupervised. In supervised
feature learning, features are learned using labeled input data.
Examples include artificial neural networks and supervised dictionary
learning.
Feature learning is motivated by the fact that machine learning tasks
such as classification often require input that is mathematically and
computationally convenient to process. However, real-world data
such as images, video, and sensory data has not yielded to attempts to
algorithmically define specific features. An alternative is to discover
such features or representations thorough examination, without
relying on explicit algorithms.
Anomaly Detection
Typically, the anomalous items represent an issue such as bank fraud,
a structural defect, medical problems or errors in a text. Anomalies
are referred to as outliers, novelties, noise, deviations, and exceptions.
In particular, in the context of abuse and network intrusion detection,
the interesting objects are often not rare, but unexpected bursts in
activity. This pattern does not adhere to the common statistical
definition of an outlier as a rare object, and many outlier detection
methods (in particular, unsupervised algorithms) will fail on such
data unless it has been aggregated appropriately. Instead, a cluster
analysis algorithm may be able to detect the micro-clusters formed by
these patterns.[12]
Three broad categories of anomaly detection techniques exist.[13]
Unsupervised anomaly detection techniques detect anomalies in an
unlabelled test data set under the assumption that the majority of the
instances in the data set are normal, by looking for instances that seem
to fit least to the remainder of the data set. Supervised anomaly
detection techniques require a data set that has been labeled as
"normal" and "abnormal" and involves training a classifier (the key
difference to many other statistical classification problems is the
inherently unbalanced nature of outlier detection). Semi-supervised
anomaly detection techniques construct a model representing normal
behavior from a given normal training data set and then test the
likelihood of a test instance to be generated by the model.
Association rules
Association rule learning is a rule-based machine learning method
for discovering relationships between variables in large databases. It
is intended to identify strong rules discovered in databases using
some measure of "interestingness”.[14]
Rule-based machine learning is a general term for any machine
learning method that identifies, learns, or evolves "rules" to store,
manipulate or apply knowledge. The defining characteristic of a rule-
based machine learning algorithm is the identification and utilization
of a set of relational rules that collectively represent the knowledge
captured by the system. This is in contrast to other machine learning
algorithms that commonly identify a singular model that can be
universally applied to any instance to make a prediction.[15] Rule-
based machine learning approaches include learning classifier
systems, association rule learning, and artificial immune systems.
Information can be used as the basis for decisions about marketing
activities such as promotional pricing or product placements. In
addition to market basket analysis, association rules are employed
today in application areas including Web usage mining, intrusion
detection, continuous production, and bioinformatics. In contrast with
sequence mining, association rule learning typically does not consider
the order of items either within a transaction or across transactions.
Models
Performing machine learning involves creating a model, which is
trained on some training data and then can process additional data to
make predictions. Various types of models have been used and
researched for machine learning systems.
Logistical Regression
Mapping the data mined to values between 0 and 1. The mapped
data can be structured using the Sigmauld Function. Logistic Model,
Gradient Descent,
Artificial neural networks
An artificial neural network is an interconnected group of nodes, akin
to the vast network of neurons in a brain. Here, each circular node
represents an artificial neuron and an arrow represents a connection
from the output of one artificial neuron to the input of another.
Artificial neural networks (ANNs), or connectionist systems, are
computing systems vaguely inspired by the biological neural
networks that constitute animal brains. Such systems "learn" to
perform tasks by considering examples, generally without being
programmed with any task-specific rules.
An ANN is a model based on a collection of connected units or nodes
called "artificial neurons", which loosely model the neurons in a
biological brain. Each connection, like the synapses in a biological
brain, can transmit information, a "signal", from one artificial neuron
to another. An artificial neuron that receives a signal can process it
and then signal additional artificial neurons connected to it. In
common ANN implementations, the signal at a connection between
artificial neurons is a real number, and the output of each artificial
neuron is computed by some non-linear function of the sum of its
inputs. The connections between artificial neurons are called "edges".
Artificial neurons and edges typically have a weight that adjusts as
learning proceeds. The weight increases or decreases the strength of
the signal at a connection. Artificial neurons may have a threshold
such that the signal is only sent if the aggregate signal crosses that
threshold. Typically, artificial neurons are aggregated into layers.
Different layers may perform different kinds of transformations on
their inputs. Signals travel from the first layer (the input layer) to the
last layer (the output layer), possibly after traversing the layers
multiple times.
The original goal of the ANN approach was to solve problems in the
same way that a human brain would. However, over time, attention
moved to performing specific tasks, leading to deviations from
biology. Artificial neural networks have been used on a variety of
tasks, including computer vision, speech recognition, machine
translation, social network filtering, playing board, and video games
and medical diagnosis.
Deep learning consists of multiple hidden layers in an artificial
neural network. This approach tries to model the way the human
brain processes light and sound into vision and hearing. Some
successful applications of deep learning are computer vision and
speech recognition.[16]
Decision trees
Decision tree learning uses a decision tree as a predictive model to go
from observations about an item (represented in the branches) to
conclusions about the item's target value (represented in the leaves). It
is one of the predictive modeling approaches used in statistics, data
mining and machine learning. Tree models where the target variable
can take a discrete set of values are called classification trees; in these
tree structures, leaves represent class labels and branches represent
conjunctions of features that lead to those class labels. Decision trees
where the target variable can take continuous values (typically real
numbers) are called regression trees. In Decision analysis, a decision
tree can be used to visually and explicitly represent decisions and
decision making. In data mining, a decision tree describes data, but
the resulting classification tree can be an input for decision making.

Support vector machines


Support vector machines (SVMs), also known as support vector
networks, are a set of related supervised learning methods used for
classification and regression. Given a set of training examples, each
marked as belonging to one of two categories, an SVM training
algorithm builds a model that predicts whether a new example falls
into one category or the other.[17] An SVM training algorithm is a
non-probabilistic, binary, linear classifier, although methods such as
Platt scaling exist to use SVM in a probabilistic classification setting.
In addition to performing linear classification, SVMs can efficiently
perform a non-linear classification using what is called the kernel
trick, implicitly mapping their inputs into high-dimensional feature
spaces.
Bayesian networks
Rain influences whether the sprinkler is activated, and both rain
and the sprinkler influence whether the grass is wet.
A Bayesian network, belief network or directed acyclic graphical
model is a probabilistic graphical model that represents a set of
random variables and their conditional independence with a directed
acyclic graph (DAG). For example, a Bayesian network could
represent the probabilistic relationships between diseases and
symptoms. Given symptoms, the network can be used to compute the
probabilities of the presence of various diseases. Efficient algorithms
exist that perform inference and learning. Bayesian networks that
model sequences of variables, like speech signals or protein
sequences, are called dynamic Bayesian networks. Generalizations of
Bayesian networks that can represent and solve decision problems
under uncertainty are called influence diagrams.
Genetic algorithms
A genetic algorithm (GA) is a search algorithm and heuristic
technique that mimics the process of natural selection, using methods
such as mutation and crossover to generate new genotypes in the hope
of finding good solutions to a given problem. In machine learning,
genetic algorithms were used in the 1980s and 1990s.[18]
Conversely, machine learning techniques have been used to improve
the performance of genetic and evolutionary algorithms.
Training models
Usually, machine learning models require a lot of data for them to
perform well. Usually, when training a machine learning model, one
needs to collect a large, representative sample of data from a training
set. Data from the training set can be as varied as a corpus of text, a
collection of images, and data collected from individual users of a
service. Overfitting is something to watch out for when training a
machine learning model.
Federated learning
Federated learning is a new approach to training machine learning
models that decentralizes the training process, allowing for users'
privacy to be maintained by not needing to send their data to a
centralized server. This also increases efficiency by decentralizing the
training process to many devices. For example, Gboard uses
federated machine learning to train search query prediction models
on users' mobile phones without having to send individual searches
back to Google.[19]
Defining An Intelligent System
By now we already know how machines learn or how machines
think. Let us now define what an intelligent system is and what it is
supposed to do.
Defining an intelligent system requires us to define further rules and
aspects. To explain what an intelligent system is let us take a general
example of Coffee Heater.
A Coffee Heater in general use is used to pre-heat coffee usually
during breakfasts. Let us say our coffee heater has a temperature
adjustment knob that is used to set a defined temperature to heat the
coffee. Initially, we come up with some default factory settings for
optimized preheated conditions that were tested by us. We really
cannot say that our Coffee Heater is the perfect heater we could
provide to the customer as he may like his coffee slightly warmer than
the default setting hence he will try to reheat the coffee now for a
longer interval of time or at a different temperature setting. Now let us
connect the Coffee heater to the internet and continuously collect data
from all the coffee machines we supplied, we now have data that will
help us observe trends and based on that we can configure the settings
of the coffee heater. We have now configured the settings based on a
certain trend that a particular amount of people need their coffee to be
warmer than the basic settings provided, but what if the day was
warmer today now the user will not be satisfied with the extremely
warm coffee. Adding a positioning sensor will help, a volumetric
sensor to help us know the amount of coffee in the jar with all the data
we can finally set a defined rule and update the settings daily to
satisfy the customer needs. But what if now the customer adds a
different kind of coffee that goes bitter if it's excessively heated or a
type of coffee grain that doesn't taste good if not heated till a certain
temperature, human constraints can be very arbitrary and all this data
may require a lot of workforces to monitor and execute. So what if
now we create an Intelligent System to figure out the location of the
user, the amount of coffee he is going to heat, the type of coffee grain
he added the heater and based on that we let it figure the optimum
settings. So here machines if able to decide can help us in a lot of
ways as they are quicker, accurate and more durable than a normal
human.
Building an Intelligent System has its rules the major one can be
listed as
The Objective
The Experience
The Implementation
The Intelligence
The Orchestration
All of these terms can be explained further. But now after ensuring
the major rules are in place we need to talk about the kind of problem
we need to tackle and the need for an intelligent system.
Types of Problems

Creating an intelligent system for something as simple as a bank


transaction can seem frivolous as there is not much need for a
machine to learn from its success or failure while performing a bank
transaction as if we ask a machine to transfer X amount of money
from an account A to another account B there will only two
possibilities i.e. success or failure, therefore creating an intelligent
system for a task similar to this may not be necessary as it is
furthermore time and resource consuming than the original operation.
Knowing the type of problem is as important as setting an objective
or managing the implementation, intelligence or orchestration. We
can classify problems as following,
Big Problems
Open-Ended Problems
Time Changing
Intrinsically Hard Problems
Big Problems
Problems which generally involve a lot of variables and functions to
be performed can be classified as a big problem. Assessing the
livelihood of every citizen in a country can be a very demanding job if
a physical effort is put up instead we can use an intelligent system to
help us with our task.
Open-Ended Problems
Problems being big can be a problem but what if a problem is never-
ending what if your data set has a continuous flow of information,
variables and operations define themselves to just one single set of
data but here there are a lot of other data sets. Solving this may not be
possible even if it is possible it may take a lot of workforce and time.
So here we can use an intelligent system to figure out the type of data
being manipulated and applying the related operations to the data set.

Time Changing Problems


Let us say we design an intelligent system to detect faces and
suddenly face tattoos was a fashion our system will no longer be able
to detect human faces and may start tagging people as something
abnormal. Problems may change as time passes by sometimes one
solution can be the exact opposite of the solution required for a given
problem over time. We need to make sure our system recognizes
change quickly and adapts to that.
Intrinsically Hard Problems
A general problem that a human being may not be able to solve can
be categorized as an intrinsic problem. Like trying to understand a
different language or play a game with someone highly experienced
in it. We can construct machines to learn to perform tasks like these.
Training and Testing the Model
Now that we have defined an intelligent system and discovered the
types of problems that lie in our final effort to find an intelligent
solution to reach our goal, we have also defined what our model will
consist of and all the algorithms to categorize data for further
prediction of results.
Now we train and test the machine or the intelligent interface.
Training the data involves providing the algorithm with non-biased
data to familiarise it with all the input-output cases. Training the data
provides us with the exact efficiency of which the algorithm is
working and finally we test the data using sample data for results and
verify the status of the algorithm.
Give below is a small project to demonstrate the building of an
intelligent system using python.
# Python Project Template

# 1. Prepare
Problem # a) Load
libraries # b) Load
dataset

# 2. Summarize Data

# a) Descriptive
statistics # b) Data
visualizations

# 3. Prepare
Data # a) Data
Cleaning

# b) Feature
Selection # c) Data
Transforms

# 4. Evaluate Algorithms

# a) Split-out validation dataset

# b) Test options and evaluation


metric # c) Spot Check Algorithms

# d) Compare Algorithms
The project template displays the process of building an efficient
application
# 5. using python and related libraries.
Improve
Accuracy # a)
Algorithm Tuning #
Application of Machine Learning
b) Ensembles

When we talk about an automated machine that performs like an


# 6. Finalize Model

intelligent human
# a) Predictions being dataset
on validation we generally have lots of applications as well
as
# b)aCreate
lot ofstandalone
implications.
model on entire training
dataset # c) Save model for later use

Using Machine learning in day to day applications improves lives by


a huge factor. We could just add a routine and our coffee would get
ready by the time we get ready the car will adjust its temperature by
the time we reach it, the home security system would start monitoring
as we leave the house.
Countless applications but when there is such huge potential there are
risks too, machine learning needs a large amount of data and
collections of data demand a lot of work, effort, and investment.
Questions about ethical data collection come into the picture.
Few applications are listed below
1. Virtual Personal Assistants
2. Predictions while Commuting
3. Videos Surveillance
4. Social Media Services
5. Email Spam and Malware Filtering
6. Online Customer Support
7. Search Engine Result Refining
8. Product Recommendations
9. Online Fraud Detection

References

[1] Bishop, C. M. (2006), Pattern Recognition and Machine Learning, Springer


[2] Alpaydin, Ethem (2010). Introduction to Machine Learning. London: The MIT Press
[3] Russell, Stuart J.; Norvig, Peter (2010). Artificial Intelligence: A Modern Approach (Third ed.).
Prentice Hall.

[4] Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine
Learning. The MIT Press

[5] Mitchell, T. (1997). Machine Learning. McGraw Hill


[6] Alpaydin, Ethem (2010). Introduction to Machine Learning. MIT Press
.

[7] Jordan, Michael I.; Bishop, Christopher M. (2004). "Neural Networks". In Allen B. Tucker
(ed.). Computer Science Handbook, Second Edition (Section VII: Intelligent Systems). Boca Raton,
Florida: Chapman & Hall/CRC Press LLC
[8] van Otterlo, M.; Wiering, M. (2012). Reinforcement learning and markov decision
processes. Reinforcement Learning. Adaptation, Learning, and Optimization

[9] Bozinovski, S. (1982). "A self learning system using secondary reinforcement" . In Trappl, Robert
(ed.). Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on
Cybernetics and Systems Research. North Holland

[10] Bozinovski, Stevo (2014) "Modeling mechanisms of cognition-emotion interaction in artificial


neural networks, since 1981." Procedia Computer Science

[11] Bozinovski, S. (2001) "Self-learning agents: A connectionist theory of emotion based on


crossbar value judgment." Cybernetics and Systems

[12] Dokas, Paul; Ertoz, Levent; Kumar, Vipin; Lazarevic, Aleksandar; Srivastava, Jaideep; Tan,
Pang-Ning (2002). "Data mining for network intrusion detection" (PDF). Proceedings NSF Workshop
on Next Generation Data Mining..

[13] Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey".
[14] Piatetsky-Shapiro, Gregory (1991), Discovery, analysis, and presentation of strong rules, in
Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., Knowledge Discovery in Databases,
AAAI/MIT Press, Cambridge, MA.

[15] Bassel, George W.; Glaab, Enrico; Marquez, Julietta; Holdsworth, Michael J.; Bacardit, Jaume
[16] Honglak Lee, Roger Grosse, Rajesh Ranganath, Andrew Y. Ng. "Convolutional Deep Belief
Networks for Scalable Unsupervised Learning of Hierarchical Representations" Proceedings of the
26th Annual International Conference on Machine Learning, 2009

[17] Cortes, Corinna; Vapnik, Vladimir N. (1995). "Support-vector networks". Machine Learning.
[18] Goldberg, David E.; Holland, John H. (1988). "Genetic algorithms and machine
learning" (PDF). Machine Learning.

[19] "Federated Learning: Collaborative Machine Learning without Centralized Training Data".
Intelligent Systems referred from “Building Intelligent Systems” by
Geoff Hulten.
Project Template From “Machine Learning Mastery with Python” by
Jason Brownlee.

Você também pode gostar