Você está na página 1de 8

________________________________________________________________

_____________________________________________

Machine Learning, Part I: Types of Learning Problems

•Machine learning is about just that: designing algorithms that allow a computer
to learn.

•Learning, of course, doesn't necessarily imply consciousness. Rather, learning


is a matter of finding statistical regularities or other patterns in the data.

•Classification of Learning Algorithms


One classification is the type of result expected from the algorithm.

1. Classification problems

For instance, some problems are classification problems. You may be


familiar with these problems from popular literature; a common example of a
classification problem is for the computer to learn how to recognize handwritten
digits. In fact, handwriting recognition is extremely good, with some specially
tuned solutions achieving over 99% accuracy (pretty good, considering some of
the messy handwriting that is out there). Much of the work on digit recognition
has been done in the neural network community, but more recently support
vector machines have proven to be even better classifiers.
Supervised learning can also be used in medical diagnoses--for instance,
given a set of attributes about potential cancer patients, and whether those
patients actually had cancer, the computer could learn how to distinguish
between likely cancer patients and possible false alarms. This sort of learning
could take place with neural networks or support vector machines, but another
approach is to use decision trees.
Decision trees are a relatively simple classification technique that relies on the
idea of following a tree of questions and answers: if the answer to a question is
yes, then the algorithm proceeds down one branch of the tree; if the answer to a
question is no, the algorithm takes the other branch. Eventually, the algorithm
reaches a leaf node with the final classification.
Learning decision trees is fairly straight-forward and requires significantly
less work tuning parameters than do neural nets, and clever algorithms such as
ada-boost can be used to quickly improve the performance. We'll look at decision
trees (and neural networks) in later essays. For now, be aware that even
apparently simple learning algorithms can accomplish a great deal. You could
use a decision tree learning algorithm in almost any setting where you can collect
a reasonable number of attributes and a classification system and expect
reasonable (though probably not outstanding) results.
A final example of classification learning to whet your appetite is speech
recognition--usually, a computer will be given a set of training instances
consisting of sounds and what word the sound corresponds to. This type of
learning could probably be carried out with neural networks, though it is hard to
imagine that the problem is simple enough for decision trees. There is another
machine learning approach, called the hidden markov model, that is designed
specifically for working with time series data like this, and it has shown good
results in speech processing.

2.Not to create classifications of inputs.

The other common type of learning is designed not to create


classifications of inputs, but to make decisions; these are called decision
problems. Typically, decision problems require making some sorts of
assumptions about the state of the world to be at all tractable. Decision problems
may be one-shot, in which case only a single decision is to be made, or
repeated, in which case the computer may need to make multiple decisions.
When there may be multiple decisions in the future, the decision problem
becomes decidedly more tricky because it requires taking into account both the
tangible consequences of action and the possibility of gaining information by
taking a certain course of action.

A common framework for understanding either type of decision problem is


to use the concept of a utility function, borrowed from economics. The idea is that
the computer (or the "agent") can get a certain amount of value from performing
some actions. It isn't necessarily the case that the utility function is known in
advance--it may be necessary for the agent to learn what actions result in utility
payoffs and what actions result in punishment or negative utility.

Utility functions are usually combined with probabilities about things like
the state of the world and the likelihood that the agent's actions will actually work
in the expected way. For instance, if you were programming a robot exploring a
jungle, you might not always know exactly where the robot was located, and
sometimes when the robot took a step forward, it might end up running into a tree
and turning left. In the first case, the robot doesn't know exactly what the state of
the world is; in the second case, the robot isn't sure that its actions will work as
expected.

The common framework combining these results is to say that there is a


probability distribution describing the state of the world, another describing the
chance of different results from each action an agent can take, and a utility
function that determines whether taking an action in a state is good or bad. Once
the agent learns each aspect of the model that isn't given. For instance, it might
know its utility, but not what the world looks like, which might be the case for an
explorer robot, or it might know the state of the world, but not the value of its
actions, as might be the case if the agent were learning to play a game like
backgammon.
Once these different functions (such as the probability distribution) are
learned, the correct action to take is simply a matter of deciding which action
maximizes the "expected utility" of the agent. Expected utility is calculated by
multiplying the probability of each payoff by the payoff and taking the total sum.
In some sense, this works out to what the "average" value of taking an action
should be. In order to plan ahead for multiple moves, an algorithm known as a
markov decision process is commonly used when there are only a reasonably
small group of possible world states. (This doesn't work in games like chess or
go, where there are many many possible states, but if you're moving around a
3x3 room, then it's not so bad.)

In some cases, the actual calculation of the utilities can be eschewed in


favor of using learning algorithms that actually incorporate the information
learned to make the correct move without having to plan ahead. Reinforcement
learning is a common technique for this scenario as well as the more traditional
scenario of actually learning the utility function.

Note that some decision problems can be reformulated as classification


problems in the following way: each decision is actually classifying a state of the
world by the proper action to take in that state! The trick is to come up with the
right framework for your problem so that you know which techniques are most
likely going to be appropriate. It might be very silly to use decision trees for
learning how to actually explore a jungle but very reasonable to use them for
picking food at a restaurant.

My Summary
Machine learning is allabout desing algorithm which allows computer to
learn. There may be many classifications to these algorithms. Common
classifications is
i. Classification the input - Algorithm is expected to classify
something like hadwriting or spech recognithin.
ii. Decision making - Alogrithms are expected to make a decision as
a example robot walking in a jungle.
________________________________________________________________
_____________________________________________

Machine Learning, Part II: Supervised and Unsupervised Learning

1.Supervised Learning

Supervised learning is fairly common in classification problems because the goal


is often to get the computer to learn a classification system that we have created.
Digit recognition, once again, is a common example of classification learning.
More generally, classification learning is appropriate for any problem where
deducing a classification is useful and the classification is easy to determine. In
some cases, it might not even be necessary to give pre-determined
classifications to every instance of a problem if the agent can work out the
classifications for itself. This would be an example of unsupervised learning in a
classification context.

Supervised learning is the most common technique for training neural networks
and decision trees. Both of these techniques are highly dependent on the
information given by the pre-determined classifications. In the case of neural
networks, the classification is used to determine the error of the network and then
adjust the network to minimize it, and in decision trees, the classifications are
used to determine what attributes provide the most information that can be used
to solve the classification puzzle. We'll look at both of these in more detail, but for
now, it should be sufficient to know that both of these examples thrive on having
some "supervision" in the form of pre-determined classifications.

Speech recognition using hidden Markov models and Bayesian networks relies
on some elements of supervision as well in order to adjust parameters to, as
usual, minimize the error on the given inputs.

Notice something important here: in the classification problem, the goal of the
learning algorithm is to minimize the error with respect to the given inputs. These
inputs, often called the "training set", are the examples from which the agent tries
to learn. But learning the training set well is not necessarily the best thing to do.
For instance, if I tried to teach you exclusive-or, but only showed you
combinations consisting of one true and one false, but never both false or both
true, you might learn the rule that the answer is always true. Similarly, with
machine learning algorithms, a common problem is over-fitting the data and
essentially memorizing the training set rather than learning a more general
classification technique.

As you might imagine, not all training sets have the inputs classified correctly.
This can lead to problems if the algorithm used is powerful enough to memorize
even the apparently "special cases" that don't fit the more general principles.
This, too, can lead to overfitting, and it is a challenge to find algorithms that are
both powerful enough to learn complex functions and robust enough to produce
generalizable results.

2. Unsupervised learning

Unsupervised learning seems much harder: the goal is to have the computer
learn how to do something that we don't tell it how to do! There are actually two
approaches to unsupervised learning. The first approach is to teach the
agent not by giving explicit categorizations, but by using some sort of
reward system to indicate success. Note that this type of training will generally
fit into the decision problem framework because the goal is not to produce a
classification but to make decisions that maximize rewards. This approach nicely
generalizes to the real world, where agents might be rewarded for doing certain
actions and punished for doing others.

Often, a form of reinforcement learning can be used for unsupervised learning,


where the agent bases its actions on the previous rewards and punishments
without necessarily even learning any information about the exact ways that its
actions affect the world. In a way, all of this information is unnecessary because
by learning a reward function, the agent simply knows what to do without any
processing because it knows the exact reward it expects to achieve for each
action it could take. This can be extremely beneficial in cases where calculating
every possibility is very time consuming (even if all of the transition probabilities
between world states were known). On the other hand, it can be very time
consuming to learn by, essentially, trial and error.

But this kind of learning can be powerful because it assumes no pre-discovered


classification of examples. In some cases, for example, our classifications may
not be the best possible. One striking exmaple is that the conventional wisdom
about the game of backgammon was turned on its head when a series of
computer programs (neuro-gammon and TD-gammon) that learned through
unsupervised learning became stronger than the best human chess players
merely by playing themselves over and over. These programs discovered some
principles that surprised the backgammon experts and performed better than
backgammon programs trained on pre-classified examples.

A second type of unsupervised learning is called clustering. In this type of


learning, the goal is not to maximize a utility function, but simply to find
similarities in the training data. The assumption is often that the clusters
discovered will match reasonably well with an intuitive classification. For
instance, clustering individuals based on demographics might result in a
clustering of the wealthy in one group and the poor in another.

Although the algorithm won't have names to assign to these clusters, it can
produce them and then use those clusters to assign new examples into one or
the other of the clusters. This is a data-driven approach that can work well when
there is sufficient data; for instance, social information filtering algorithms, such
as those that Amazon.com use to recommend books, are based on the principle
of finding similar groups of people and then assigning new users to groups. In
some cases, such as with social information filtering, the information about other
members of a cluster (such as what books they read) can be sufficient for the
algorithm to produce meaningful results. In other cases, it may be the case that
the clusters are merely a useful tool for a human analyst. Unfortunately, even
unsupervised learning suffers from the problem of overfitting the training data.
There's no silver bullet to avoiding the problem because any algorithm that can
learn from its inputs needs to be quite powerful.

•Summary
Unsupervised learning has produced many successes, such as world-champion
calibre backgammon programs and even machines capable of driving cars! It can
be a powerful technique when there is an easy way to assign values to actions.
Clustering can be useful when there is enough data to form clusters (though this
turns out to be difficult at times) and especially when additional data about
members of a cluster can be used to produce further results due to
dependencies in the data.

Classification learning is powerful when the classifications are known to be


correct (for instance, when dealing with diseases, it's generally straight-forward to
determine the design after the fact by an autopsy), or when the classifications are
simply arbitrary things that we would like the computer to be able to recognize for
us. Classification learning is often necessary when the decisions made by the
algorithm will be required as input somewhere else. Otherwise, it wouldn't be
easy for whoever requires that input to figure out what it means.

Both techniques can be valuable and which one you choose should depend on
the circumstances--what kind of problem is being solved, how much time is
allotted to solving it (supervised learning or clustering is often faster than
reinforcement learning techniques), and whether supervised learning is even
possible.

My Summary
Supervised Learning - Set of examples are given to study and learning the
thing. More suitable for classification problems. The example set has to
be provide with extream care.
Un supervided Learning - We not give any exaple cases or any thing.
Algorithm it self must learn things. There are tow approaches
to this.
i. Reword system.
ii. Clustering.

________________________________________________________________
____________________________________________

Machine Learning, Part III: Testing Algorithms, and The "No Free Lunch
Theorem"

1.Testing Machine Learning Algorithms

Now that you have a sense of the classifications of machine learning algorithms,
before diving into the specifics of individual algorithms, the only other background
required is a sense of how to test machine learning algorithms.

In most scenarios, there will be three types of data: training set data and testing
data will be used to train and evaluate the algorithm, but the ultimate test will be
how it performs on real data. We'll focus primarily on results from training and
testing because we'll generally assume that the test set is a resonable
approximation of the real world. (As an aside, some machine learning techniques
use a fourth set of data, called a validation set, which is used during training to
avoid overfitting. We'll discuss validation sets when we look at decision trees
because they are a common optimization for decision tree learning.)

2. Dealing with insufficient data

We've already talked a bit about the fact that algorithms may over-fit the training
set. A corollary of this principle is that a learning algorithm should never be
evaluated for its results in the training set because this shows no evidence of an
ability to generalize to unseen instances. It's important that the training and test
sets be kept distinct (or at least that the two be independently generated, even if
they do share some inputs to the algorithm).

Unfortunately, this can lead to issues when the amount of training and test data
is relatively limited. If, for instance, you only have 20 samples, there's not much
data to use for a training set and still leave a significant test set. A solution to this
problem is to run your algorithm twenty times (once for each input), using 19 of
the samples as training data and the last sample as a test set so that you end up
using each sample to test your results once. This gives you a much larger
training set for each trial, meaning that your algorithm will have enough data to
learn from, but it also gives a fairly large number of tests (20 instead of 5 or 10).
The drawback to this approach is that the results of each individual test aren't
going to be independent of the results of the other tests. Nevertheless, if you're in
a bind for data, this can yield passable results with lower variance than simply
using one test set and one training set.

3.The No Free Lunch theorem and the importance of bias

So far, a major theme in these machine learning articles has been having
algorithms generalize from the training data rather than simply memorizing it. But
there is a subtle issue that plagues all machine learning algorithms, summarized
as the "no free lunch theorem". The gist of this theorem is that you can't get
learning "for free" just by looking at training instances. Why not? Well, the fact is,
the only things you know about the data are what you have seen.

For example, if I give you the training inputs (0, 0) and (1, 1) and the
classifications of the input as both being "false", there are two obvious
hypotheses that fit the data: first, every possible input could result in a
classification of false. On the other hand, every input except the two
traininginputs might be true--these training inputs could be the only examples of
inputs that are classified as false. In short, given a training set, there are always
at least two equally plausible, but totally opposite, generalizations that could be
made. This means that any learning algorithm requires some kind of "bias" to
distinguish between these plausible outcomes.

Some algorithms may have such strong biases that they can only learn certain
kinds of functions. For instance, a certain kind of basic neural network, the
perceptron, is biased to learning only linear functions (functions with inputs that
can be separated into classifications by drawing a line). This can be a weakness
in cases where the input isn't actually linearly separable, but if the input is linearly
separable, it can force learning when more flexible algorithms might have more
trouble.

For instance, if you have the inputs ((0, 0), false), ((1, 0), true), and ((0, 1), true),
then the only linearly separable function possible would be the boolean OR
function, which the perceptron would learn. Now, if the true function were
boolean OR, then the perceptron would have correctly generalized from three
training instances to the full set of instances. On the other hand, a more flexible
algorithm might not have made that same generalization with the bias toward a
certain type of functions.

We'll often see algorithms that are specifically designed to introduce constraints
about the problem domain in order to introduce subtle biases to create specific
instances of general algorithms to enable better generalizations and
consequently better learning. One example of this when working with digit
recognition would be to create a specialized neural network that was designed to
mimic the way the eye perceives data. This might actually give (in fact, it has
given) better results than using a more powerful but more general network.
Continue to Decision Trees, Part I: Understanding Decision Trees, which covers
how decision trees work and how they exploit a particular type of bias to increase
learning.

My Summary

Testing - There are basically three types of data used with a machine
learning.
i. Traning data set
ii. Testing data
iii. Real world data

________________________________________________________________
_____________________________________________

Você também pode gostar