Você está na página 1de 26

Andi Sudiarso

Mechanical & Industrial Engineering

Faculty of Engineering
Gadjah Mada University

Type of relationship
Number of predicted variables
Type of relationship
Measurement scale of
the dependent variable

: dependence
: one
: single
: metric

Purpose:
To predict the changes in the dependent variable
as a result of changes in the independent
variables.

Previous road resurfacing project


y : cost (in thousand USD, 1 dollar = Rp 10.000,00)
x : road length (in mile, 1 mile = 1,6093 km)
x

1
3
4
5
7

6
8
10
14
20

Now, there is a new available project to do resurfacing of 6


miles road and there was no 6 miles project done before.
The question is: how much the cost will be?

What is regression?
The problem of fitting a line to the data, i.e. pairs of
numbers (x,y).
The problem of predicting one variable (y) from
values of another variable (x).
To use data on a quantitative independent variable
to predict or explain variation in a quantitative
dependent variable (Ott, 2001). Prediction refers to
future values, explanation refers to current or past
values; both requires unit of association.

Type of regression analysis


Linear regression
y = 0 + 1 x +
Polynomial regression
y = 0 + 1x + 2x2 + (quadratic)
y = 0 + 1x + 2x2 + 3x3 + (cubic)
etc.
Non-linear regression, for example:
y = 0 + 1sin(2x) +
y = 0 + 1e2x +
Multiple regression
y = 0 + 1x1 + 2x2 + 3x3 + +

Simple linear regression


y = 0 + 1x

where
0 : the intercept, the value of y when x = 0
1 : the slope, the change in y when there is one-unit
change in x
y = 0 + 1x

Linear regression (complete form)


y = 0 + 1x +

where
: random error, deviation of actual y values from their
predicted values (unpredictable and ignored factors)
y = 0 + 1x

To estimate the value of parameters


n

(xi x). yi
1 =

i=1
n

(xi x)2
i=1

0 = y - 1x
The mean squared error (MSE)
MSE = [(n-1)sy2 - 12(n-1)sx2]/(n-2)

The variance is the measure of dispersion


about the mean
n

sx2 = MSx = SSx / v = (xi-x)2/(n-1)


i=1

sy2 = MSy = SSy / v = (yi-y)2/(n-1)


i=1

sxy = MSxy = SSxy / v = { (xi-x)(yi-y)}/(n-1)


i=1

(sample covariance between x and y)

where
MS : the mean square
SS
: sum of the squares
v
: number of degree of freedom = n-1(sampling)

The Coleman Report (USA, 1977)


y : the mean verbal test score for 6th graders
x : a composite measure of socio-economics status

School
1
2
3
4
5
6
7
8
9
10

37.01
26.51
36.51
40.70
37.10
33.90
41.80
33.40
41.01
37.20

7.20
-11.71
12.32
14.28
6.31
6.16
12.70
-0.17
9.85
-0.05

School
11
12
13
14
15
16
17
18
19
20

23.30
35.20
34.90
33.10
22.70
39.70
31.80
31.70
43.10
41.01

-12.86
0.92
4.77
-0.96
-16.04
10.62
2.66
-10.99
15.03
12.77

The Coleman Report (USA, 1977)


45
40
35
y
30
25
20
-20

-15

-10

-5

10

Plot of y versus x (scatter plot)

15

20

The Coleman Report (USA, 1977)


Estimate the parameters
n=20, x = 3.14, sx2 = 92.65
y = 35.08, sy2 = 33.84
xiyi = 3189.88
i=1-n

1 = 3189.88-20(3.14)(35.08)/[(20-1)(92.65)] = 0.56
0 = 35.08-0.56(3.14) = 33.32
MSE = [(20-1)(33.84)-(0.56)2(20-1)(92.65)]/(20-2)
= 5.01

The Coleman Report (USA, 1977)


45
40
35
y
30
25
20
-20

-15

-10

-5

Linear regression

10

15

20

Original equation:

T = c.Vb
Calculate the constants c and b!

Taking natural logarithmic on both sides:


ln T = ln c + b ln V
Lets define:
y = ln T
0 = ln c
y = 0 + 1x
1 = b
x = ln V
To calculate c and b, we can calculate 0 and 1 first.

What is correlation?
A measure of the linear relationship between two
variables
Measurement of the strength of linear relation
between x and y.
The sample correlation coefficient (r), -1 r 1,
related to the estimated slope
r = sxy/sxsy = sxy/(sx2sy2)= 1sx/sy
The correlation coefficient r is a positive number if
y tends to increase as x increases; r is negative if y
tends to decrease as x increase; r is zero if there is
either no relation between changes in x and
changes in y or there is a nonlinear relation.

Consider the following data


No.
1
2
3
4
5
6
7
8
9

25
41
47
59
54
56
49
43
30

10
20
20
30
30
30
40
40
50

Calculate the correlation coefficient r!

x = 30.00
y = 44.89

sx2 = (10-30.00)2 + = 1,200/8


sy2 = (25-44.89)2 + = 1,062.89/8
sxy = (10-30.00)(25-44.89) + = 140/8
r = 140 / [(1,200)0.5(1,062.89)0.5] = 0.1240
The correlation coefficient r is a small positive number.

What is multiple regression?


The problem of fitting more than one independent
variable to a dependent variable.
The problem of predicting the dependent variable (y)
from values of the independent variables (x1, x2, ).
The surfaces obtained are not used only to make
predictions, but also often used for purposes of
optimization, i.e. to determine the values of
independent variables when the dependent variable
is maximum or minimum.
An example is the effect of factory production,
consumption level, and stocks in the storage on the
price of a product.

Type of regression analysis


Linear regression
y = 0 + 1x +
Polynomial regression
y = 0 + 1x + 2x2 + (quadratic)
y = 0 + 1x + 2x2 + 3x3 + (cubic)
etc.
Non-linear regression, for example:
y = 0 + 1sin(2x) +
y = 0 + 1e2x +
Multiple (linear) regression
y = 0 + 1x1 + 2x2 + 3x3 + +

For any given set of values x1, x2, x3, , and xr and the
corresponding values of y, a linear relationship between
variables is given by
y = 0 + 1x1 + 2x2 + 3x3 + + rxr

For two independent variables, this is a problem of


fitting a plane to a set of n points with coordinates
(x1i, x2i, yi), for i=1 to n. The equation is
y = 0 + 1x1 + 2x2

Applying the least squares method to obtain estimates


of the coefficients 0, 1, and 2 by minimizing the sum
of the squares of the distances from the points to the
plane, we minimize
n

[yi (0 + 1x1i + 2x2i)]2


i=1

Solving the minimization, the results are normal


equations as follows (prove it!)

y = n 0 + 1x1 + 2 x2
x1y = 0 x1 + 1 x12 + 2 x1x2
x2y = 0 x2 + 1 x1x2 + 2 x22

Twisting a forged alloy bar


y : the number of twists required to break the alloy
x1: the percentage of element A
x2: the percentage of element B

y
38
40
85
59
40
60
68
53

x1

x2

1
2
3
4
1
2
3
4

5
5
5
5
10
10
10
10

y
31
35
42
59
18
34
29
42

x1

x2

1
2
3
4
1
2
3
4

15
15
15
15
20
20
20
20

Twisting a forged alloy bar


Calculating the following (n=16)

x1
x2
x12
x22

=
=
=
=

40
200
120
3000

x1x2
x1y
x2y
y

=
=
=
=

500
1989
8285
733

Substituting to the normal equations gives


733 = 16 0 + 40 1 + 200 2
1989 = 40 0 + 120 1 + 500 2
8285 = 200 0 + 500 1 + 3000 2

Twisting a forged alloy bar


Solving the simultaneous linear equations gives

0 = 48.2
1 = 7.83
2 = -1.76
Hence, the multiple regression equation is
y = 48.2 + 7.83x1 - 1.76x2

Using the equation, we can predict the number of


twists required to break the forged alloy bar for any
given pair of values of x1 and x2.

Equation models that include interaction may also be


analyzed by multiple linear regression method. An
interaction between two variables can be represented
by a cross product term such as
y = 0 + 1x1 + 2x2 + 12x1x2
If we let x3 = x1x2 and 3 = 12 then we get
y = 0 + 1x1 + 2x2 + 3x3

Você também pode gostar