Escolar Documentos
Profissional Documentos
Cultura Documentos
Machine learning
Model and cost function
1
3/21/2018
Today’s Lecture
• Review: Supervised Learning: Regression,
Classification
• Linear Regression with One Variable
• Hypothesis Function
• Cost Function
• Gradient Descent
• Linear Algebra Review
• PC Lab: Latex
Review:
supervised learning - Regression
Price ($)
In 1000’s
Size in m2
Supervised Learning Regression: Predict continuous valued
output (price)
“right answers” given
“right answers” given
2
3/21/2018
Review:
Supervised Learning - classification
Review:
Supervised learning
3
3/21/2018
Review:
Unsupervised Learning
Data Set
4
3/21/2018
5
3/21/2018
Price
(in 1000s
of dollars)
Size (m2)
Supervised Learning Regression Problem
• Given the “right answer” Predict real-valued output
for each example in the
data
Notation:
m = Number of training examples
x = “input” variable / features
y = “output” variable / “target” variable
6
3/21/2018
Model representation
How do we represent h?
Training Set
Learning Algorithm
Size of Estimated
house h price
ෝ = 𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
𝒚
7
3/21/2018
Example
• Suppose we have the following set of training data:
Input X Output y
0 4
1 7
2 7
3 8
Hypothesis: 𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
𝜃𝑖′ 𝑠 ∶ Parameters
8
3/21/2018
9
3/21/2018
Cost function
Simplified: 𝜽𝟎 = 𝟎
Hypothesis:
𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙 𝒉𝜽 𝒙 = 𝜽𝟏 𝒙
Parameters: 𝜽𝟏
𝜃0, 𝜃1
Cost Function: 𝑚
1 𝑚 1
𝐽 𝜃0 , 𝜃1 = 2𝑚 𝑖=1(ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2 𝐽 𝜃1 = (ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2
2𝑚
𝑖=1
Goal:
minimize 𝐽 𝜃0 , 𝜃1
𝜃0 , 𝜃1 minimize 𝐽 𝜃1
𝜃1
ℎ𝜃 𝑥 𝐽 𝜃1
ℎ𝜃 𝑥 = 𝑥
𝜃1 = 1
10
3/21/2018
ℎ𝜃 𝑥 𝐽 𝜃1
ℎ𝜃 𝑥 = 𝑥
𝜃1 = 1
𝐽 𝜃1 = 0, if 𝜃1 = 1
ℎ𝜃 𝑥 𝐽 𝜃1
ℎ𝜃 𝑥 = 0.5𝑥
𝜃1 = 0.5
11
3/21/2018
ℎ𝜃 𝑥 𝐽 𝜃1
ℎ𝜃 𝑥 = 0.5𝑥
𝜃1 = 0.5
𝐽 𝜃1 = 0.58, if 𝜃1 = 0.5
ℎ𝜃 𝑥 𝐽 𝜃1
ℎ𝜃 𝑥 = 0
𝜃1 = 0
12
3/21/2018
ℎ𝜃 𝑥 𝐽 𝜃1
ℎ𝜃 𝑥 = 0
𝜃1 = 0
𝐽 𝜃1 = 2.3, if 𝜃1 = 0
Hypothesis: 𝒉𝜽 𝒙 = 𝜽𝟎 + 𝜽𝟏 𝒙
Parameters: 𝜃0 , 𝜃1
1 𝑚
Cost Function: 𝐽 𝜃0 , 𝜃1 = 2𝑚 𝑖=1(ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2
Goal: minimize 𝐽 𝜃0 , 𝜃1
𝜃0 , 𝜃1
13
3/21/2018
ℎ𝜃 𝑥 𝐽 𝜃0 , 𝜃1
(for fixed 𝜃0 , 𝜃1, this is a function of x) (function of the parameter 𝜃0 , 𝜃1 )
Price ($)
In 1000’s
Size in m2 (x)
𝒉𝜽 𝒙 = 𝟓𝟎 + 𝟎. 𝟎𝟔𝒙
14
3/21/2018
ℎ𝜃 𝑥 𝐽 𝜃0 , 𝜃1
(for fixed 𝜃0 , 𝜃1, this is a function of x) (function of the parameter 𝜃0 , 𝜃1 )
Price ($)
In 1000’s
Size in m2 (x)
ℎ𝜃 𝑥 𝐽 𝜃0 , 𝜃1
(for fixed 𝜃0 , 𝜃1, this is a function of x) (function of the parameter 𝜃0 , 𝜃1 )
Price ($)
In 1000’s
Size in m2 (x)
15
3/21/2018
ℎ𝜃 𝑥 𝐽 𝜃0 , 𝜃1
(for fixed 𝜃0 , 𝜃1, this is a function of x) (function of the parameter 𝜃0 , 𝜃1 )
Price ($)
In 1000’s
Size in m2 (x)
Gradient descent
Algorithm
Linear regression with one variable
16
3/21/2018
• Outline:
• Start with some 𝜃0 , 𝜃1
• Keep changing 𝜃0 , 𝜃1 to reduce 𝐽 𝜃0 , 𝜃1 until we
hopefully end up at a minimum
17
3/21/2018
18
3/21/2018
𝜕
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝐽 𝜃0 , 𝜃1 for (𝑗 = 0 and 𝑗 = 1)
𝜕𝜃𝑗
𝜕
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝐽 𝜃0 , 𝜃1 (Simultaneously update 𝑗 = 0 and 𝑗 = 1)
𝜕𝜃𝑗
𝛼 ∶ 𝐿𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒
𝜕
𝜕𝜃𝑗
𝐽 𝜃0, 𝜃1 : Partial Derivative
19
3/21/2018
𝐽 𝜃1
𝜃1
𝐽 𝜃1
𝜃1
𝐽 𝜃1
𝜕
𝜃1 ≔ 𝜃1 − 𝛼 𝐽 𝜃1
𝜕𝜃1
If 𝛼 is too small, gradient descent
can be slow.
𝜃1
𝐽 𝜃1
𝜃1
20
3/21/2018
𝜕 𝐽 𝜃1
𝜃1 ≔ 𝜃1 − 𝛼 𝐽 𝜃1
𝜕𝜃1
21
3/21/2018
𝜕 𝒉𝜽 𝒙 = 𝜽 𝟎 + 𝜽 𝟏 𝒙
𝜃𝑗 ≔ 𝜃𝑗 − 𝛼 𝐽 𝜃0 , 𝜃1
𝜕𝜃𝑗
𝑚
(𝑗 = 0 and 𝑗 = 1) 1
𝐽 𝜃0 , 𝜃1 = (ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2
2𝑚
} 𝑖=1
𝜕
𝐽 𝜃0 , 𝜃1 =
𝜕𝜃𝑗
𝑚
𝜕 1
𝑗=0∶ 𝐽 𝜃0 , 𝜃1 = (ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )
𝜕𝜃0 𝑚
𝑖=1
𝑚
𝜕 1
𝑗=1∶ 𝐽 𝜃0 , 𝜃1 = (ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )𝑥 𝑖
𝜕𝜃1 𝑚
𝑖=1
22
3/21/2018
23
3/21/2018
24
3/21/2018
25
3/21/2018
26
3/21/2018
27
3/21/2018
Quiz:
Which of the following are true statements? Select all that apply.
28