Escolar Documentos
Profissional Documentos
Cultura Documentos
Lecture 11 & 12
INTRODUCTION
What is Machine Learning? The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience (T. Mitchell) Principles, methods, and algorithms for learning and prediction on the basis of past experience
In the broadest sense, any method that incorporates information from training samples in the design of a classifier employs learning
2
INTRODUCTION
What is Machine Learning? Our tendency is to view learning only in the manner in which humans learn, i.e. incrementally over time. This may not be the case when ML algorithms are concerned.
INTRODUCTION
INTRODUCTION
INTRODUCTION
INTRODUCTION
INTRODUCTION
What is Machine Learning? An overly complex decision model This may lead to worse classification than a simple model
INTRODUCTION
What is Machine Learning? May be this model is an optimal trade off between model complexity and and performance on the training set
INTRODUCTION
What is Machine Learning? A classification problem: the grades for students taking this course
Key Steps: 1. Data (what past experience can we rely on?) 2. Assumptions (what can we assume about the students or the course?) 3. Representation (how do we summarize a student?) 4. Estimation (how do we construct a map from students to grades?) 5. Evaluation (how well are we predicting?) 6. Model Selection (perhaps we can do even better?)
10
INTRODUCTION
INTRODUCTION
What is Machine Learning? 2. Assumptions: There are many assumptions we can make to facilitate predictions 1. The course has remained same roughly over the years 2. Each student performs independently from others
12
INTRODUCTION
3. Representation:
Academic records are rather diverse so we might limit the summaries to a select few courses For example, we can summarize the ith student (say Pete) with a vector Xi = [A C B] where the grades may correspond to numerical values
13
INTRODUCTION
3. Representation:
The available data in this representation is:
14
INTRODUCTION
4. Estimation
Given the training data
ML grade B A
we need to find a mapping from input vectors x to labels y encoding the grades for the ML course.
15
INTRODUCTION
16
INTRODUCTION
5. Evaluation
How can we tell how good our predictions are? - we can wait till the end of this course... - we can try to assess the accuracy based on the data we already have (training data)
Possible solution: - divide the training set further into training and test sets - evaluate the classifier constructed on the basis of the training set on the test set
17
INTRODUCTION
6. Model Selection
We can refine - The estimation algorithm (e.g., using a classifier other than the nearest neighbor classifier) - The representation (e.g., base the summaries on a different set of courses) - The assumptions (e.g., perhaps students work in groups) etc. We have to rely on the method of evaluating the accuracy of our predictions to select among the possible refinements
18
INTRODUCTION
Types of Machine Learning Data can be - Symbolic or Categorical (e.g. High Temperature) - Numerical (e.g. 45 0C) We will be primarily dealing with Symbolic data Numerical data is primarily dealt by Artificial Neural Networks, which have evolved into a separate field
19
INTRODUCTION
Types of Machine Learning From the available data we can - Model the system which has generated the data - Find interesting patterns in the data We will be primarily concerned with rule based modelling of the system from which the data was generated
The search for interesting patterns is considered to be the domain of Data Mining
20
INTRODUCTION
Types of Machine Learning Complete Pattern Recognition (or classification) system consists of several steps We will be primarily concerned with the development of classifier systems
21
INTRODUCTION
Types of Machine Learning Supervised learning, where we get a set of training inputs and outputs. The correct output for the training samples is available Unsupervised learning, where we are interested in capturing inherent organization in the data. No specific output values are supplied with the learning patterns Reinforcement learning, where there are no exact outputs supplied, but there is a reward (reinforcement) for desirable behaviour
22
INTRODUCTION
First, there are problems for which there exist no human experts.
Example: in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules.
23
INTRODUCTION
Second, there are problems where human experts exist, but where they are unable to explain their expertise.
This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs.
24
INTRODUCTION
25
INTRODUCTION
Why Use Machine Learning? Third, there are problems where phenomena are changing rapidly. Example: people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. The rules and parameters governing these behaviors change frequently, so that the computer program for prediction would need to be rewritten frequently.
26
INTRODUCTION
Why Use Machine Learning? Fourth, there are applications that need to be customized for each computer user separately. Example: a program to filter unwanted electronic mail messages. Different users will need different filters.
27
VERSION SPACE
28
VERSION SPACE
Example: Concept of ball * red, round, small * green, round, small * red, round, medium
Complicated concepts: situations in which I should study more to pass the exam
29
VERSION SPACE
Each concept can be thought of as a Boolean-valued function whose value is true for some inputs and false for all the rest (e.g. a function defined over all the animals, whose value is true for birds and false for all the other animals)
Problem of automatically inferring the general definition of some concept, given examples labeled as members or nonmembers of the concept. This task is called concept learning, or approximating (inferring) a Boolean valued function from examples
30
VERSION SPACE
Concept Learning by Induction Target Concept to be learnt: Days on which Aldo enjoys his favorite water sport Training Examples present are:
31
VERSION SPACE
32
VERSION SPACE
Concept Learning by Induction: Hypothesis Representation The possible concepts are called Hypotheses and we need an appropriate representation for the hypotheses Let the hypothesis be a conjunction of constraints on the attribute-values
33
VERSION SPACE
Concept Learning by Induction: Hypothesis Representation If sky = sunny temp = warm humidity = ? wind = strong water = ? forecast = same then Enjoy Sport = Yes else Enjoy sport = No Alternatively, this can be written as: {sunny, warm, ?, strong, ?, same}
34
VERSION SPACE
Concept Learning by Induction: Hypothesis Representation For each attribute, the hypothesis will have either ? Any value is acceptable Value Any single value is acceptable No value is acceptable
35
VERSION SPACE
Concept Learning by Induction: Hypothesis Representation If some instance (example/observation) satisfies all the constraints of a hypothesis, then it is classified as positive (belonging to the concept) The most general hypothesis is {?, ?, ?, ?, ?, ?} It would classify every example as a positive example The most specific hypothesis is {, , , , , } It would classify every example as negative
36
VERSION SPACE
Concept Learning by Induction: Hypothesis Representation Alternate hypothesis representation could have been Disjunction of several conjunction of constraints on the attribute-values Example: {sunny, warm, normal, strong, warm, same} {sunny, warm, high, strong, warm, same} {sunny, warm, high, strong, cool, change}
37
VERSION SPACE
Concept Learning by Induction: Hypothesis Representation Another alternate hypothesis representation could have been Conjunction of constraints on the attribute-values where each constraint may be a disjunction of values Example: {sunny, warm, normal high, strong, warm cool, same change}
38
VERSION SPACE
Concept Learning by Induction: Hypothesis Representation Yet another alternate hypothesis representation could have incorporated negations
39
VERSION SPACE
Concept Learning by Induction: Hypothesis Representation By selecting a hypothesis representation, the space of all hypotheses (that the program can ever represent and therefore can ever learn) is implicitly defined In our example, the instance space X can contain 3.2.2.2.2.2 = 96 distinct instances
There are 5.4.4.4.4.4 = 5120 syntactically distinct hypotheses. Since every hypothesis containing even one classifies every instance as negative, hence semantically distinct hypotheses are: 4.3.3.3.3.3 + 1 = 973
40
VERSION SPACE
Concept Learning by Induction: Hypothesis Representation Most practical learning tasks involve much larger, sometimes infinite, hypothesis spaces
41
VERSION SPACE
Concept Learning by Induction: Search in Hypotheses Space Concept learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation The goal of this search is to find the hypothesis that best fits the training examples
42
VERSION SPACE
Concept Learning by Induction: Basic Assumption Once a hypothesis that best fits the training examples is found, we can use it to predict the class label of new examples The basic assumption while using this hypothesis is: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples
43
VERSION SPACE
Concept Learning by Induction: General to Specific Ordering If we view learning as a search problem, then it is natural that our study of learning algorithms will examine different strategies for searching the hypothesis space Many algorithms for concept learning organize the search through the hypothesis space by relying on a general to specific ordering of hypotheses
44
VERSION SPACE
Concept Learning by Induction: General to Specific Ordering Example: Consider h1 = {sunny, ?, ?, strong, ?, ?} h2 = {sunny, ?, ?, ?, ?, ?} any instance classified positive by h1 will also be classified positive by h2 (because it imposes fewer constraints on the instance) Hence h2 is more general than h1 and h1 is more specific than h2
45
VERSION SPACE
Concept Learning by Induction: General to Specific Ordering Consider the three hypotheses h1, h2 and h3
46
VERSION SPACE
Note that the more-general-than relationship is independent of the target concept. It depends only on which instances satisfy the two hypotheses and not on the classification of those instances according to the target concept
47
VERSION SPACE
Find-S Algorithm How to find a hypothesis consistent with the observed training examples? - A hypothesis is consistent with the training examples if it correctly classifies these examples One way is to begin with the most specific possible hypothesis, then generalize it each time it fails to cover a positive training example (i.e. classifies it as negative) The algorithm based on this method is called Find-S
48
VERSION SPACE
Find-S Algorithm We say that a hypothesis covers a positive training example if it correctly classifies the example as positive
49
VERSION SPACE
Find-S Algorithm
50
VERSION SPACE
Find-S Algorithm
51
VERSION SPACE
Find-S Algorithm The nodes shown in the diagram are the possible hypotheses allowed by our hypothesis representation scheme
Note that our search is guided by the positive examples and we consider only those hypotheses which are consistent with the positive training examples The search moves from hypothesis to hypothesis, searching from the most specific to progressively more general hypotheses
52
VERSION SPACE
Find-S Algorithm At each step, the hypothesis is generalized only as far as necessary to cover the new positive example
Therefore, at each stage the hypothesis is the most specific hypothesis consistent with the training examples observed up to this point Hence, it is called Find-S
53