Escolar Documentos
Profissional Documentos
Cultura Documentos
=
+ = =
+ = =
+
+
1
ln
1
1 1
ln ln
log log log
Time permitting, we will look at some of these
possibilities later in the course. These may present
interesting opportunities for student term projects.
Applied Regression -- Prof. Juran 19
Regression Estimators
i 1 2 . . . i . . . n
Y y
1
y
2
. . . y
i
. . . y
n
X x
1
x
2
. . . x
i
. . . x
n
We are given the data set:
We seek g ood estimators
0
| of
0
| and
1
| of
1
| that minimize the sums of the
squared residuals (errors). The i
th
residual is
n i x y e
i i i
,..., 2 , 1 ),
(
1 0
= + = | |
Applied Regression -- Prof. Juran 20
Computer Repair Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
A B C
Minutes Units
1 23 1
2 29 2
3 49 3
4 64 4
5 74 4
6 87 5
7 96 6
8 97 6
9 109 7
10 119 8
11 149 9
12 145 9
13 154 10
14 166 10
Applied Regression -- Prof. Juran 21
Statistical Basics
Basic statistical computations and graphical displays are very helpful in
doing and interpreting a regression. We should always compute:
n
x
x and
n
y
y
n
i
i
n
i
i
= =
= =
1 1
1
) (
1
) (
1
2
1
2
=
= =
n
x x
s and
n
y y
s
n
i
i
x
n
i
i
y
=
2 2
,
) ( ) (
) )( (
x x y y
x x y y
r
i i
i i
Y X
Applied Regression -- Prof. Juran 22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
A B C D E F G H I J K L
Minutes Units Error (min) Error (units)
1 23 1 -74.21 -5
2 29 2 -68.21 -4
3 49 3 -48.21 -3
4 64 4 -33.21 -2
5 74 4 -23.21 -2
6 87 5 -10.21 -1
7 96 6 -1.21 0
8 97 6 -0.21 0
9 109 7 11.79 1
10 119 8 21.79 2
11 149 9 51.79 3
12 145 9 47.79 3
13 154 10 56.79 4
14 166 10 68.79 4
mean 97.21 6
stdev 46.22 2.96
count 14 14
correl 0.9937
covar 126.29
correl 0.9937 Book method
covar 136 B6014 method
covar 136 Book method
=AVERAGE(C$2:C$15)
=STDEV(C$2:C$15)
=COUNT(C$2:C$15)
=CORREL(B$2:B$15,C$2:C$15)
=COVAR(B$2:B$15,C$2:C$15)
=SUMPRODUCT(E2:E15,F2:F15)/SQRT(SUMPRODUCT(E2:E15,E2:E15)*SUMPRODUCT(F2:F15,F2:F15))
=B20*(B18*C18)
=SUMPRODUCT(E2:E15,F2:F15)/(B19-1)
Applied Regression -- Prof. Juran 23
Graphical Analysis
We should always plot
histograms of the y and x values,
a time order plot of x and y (if appropriate)
and
a scatter plot of y on x.
Applied Regression -- Prof. Juran 24
Minutes
0
1
2
3
4
0 25 50 75 100 125 150 175 200
Minutes
F
r
e
q
u
e
n
c
y
Applied Regression -- Prof. Juran 25
Units
0
1
2
3
0 1 2 3 4 5 6 7 8 9 10 11
Units
F
r
e
q
u
e
n
c
y
Applied Regression -- Prof. Juran 26
Minutes vs. Units
0
20
40
60
80
100
120
140
160
180
0 2 4 6 8 10 12
Units
M
i
n
u
t
e
s
Applied Regression -- Prof. Juran 27
Estimating Parameters
Using Excel
Using Solver
Using analytical formulas
Applied Regression -- Prof. Juran 28
Using Excel (Scatter Diagram)
Applied Regression -- Prof. Juran 29
y = 15.509x + 4.1617
R = 0.9874
0
20
40
60
80
100
120
140
160
180
0 2 4 6 8 10 12
M
i
n
u
t
e
s
Units
Minutes vs. Units
Applied Regression -- Prof. Juran 30
Using Excel (Data Analysis)
Data Tab Data Analysis
Applied Regression -- Prof. Juran 31
Using Excel (Data Analysis)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A B C D E F G H I
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9937
R Square 0.9874
Adjusted R Square 0.9864
Standard Error 5.3917
Observations 14
ANOVA
df SS MS F Significance F
Regression 1 27419.5088 27419.5088 943.2009 0.0000
Residual 12 348.8484 29.0707
Total 13 27768.3571
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 4.1617 3.3551 1.2404 0.2385 -3.1485 11.4718 -3.1485 11.4718
Units 15.5088 0.5050 30.7116 0.0000 14.4085 16.6090 14.4085 16.6090
Applied Regression -- Prof. Juran 32
Using Solver
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
A B C D E F G H I
Minutes Units Predictions Errors Errors^2
1 23 1 19.6704 3.3296 11.0861
2 29 2 35.1792 -6.1792 38.1824
3 49 3 50.6880 -1.6880 2.8492
4 64 4 66.1967 -2.1967 4.8256
5 74 4 66.1967 7.8033 60.8909
6 87 5 81.7055 5.2945 28.0317
7 96 6 97.2143 -1.2143 1.4745
8 97 6 97.2143 -0.2143 0.0459
9 109 7 112.7230 -3.7230 13.8611
10 119 8 128.2318 -9.2318 85.2265
11 149 9 143.7406 5.2594 27.6614
12 145 9 143.7406 1.2594 1.5861
13 154 10 159.2494 -5.2494 27.5558
14 166 10 159.2494 6.7506 45.5712
348.8484
Intercept 4.1617
Slope 15.5088
=$B$17+$B$18*C3
=B5-E5
=F7^2
=SUM(G2:G15)
Applied Regression -- Prof. Juran 33
Applied Regression -- Prof. Juran 34
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
A B C D E F G H I
Minutes Units Predictions Errors Errors^2
1 23 1 19.6704 3.3296 11.0861
2 29 2 35.1792 -6.1792 38.1824
3 49 3 50.6880 -1.6880 2.8492
4 64 4 66.1967 -2.1967 4.8256
5 74 4 66.1967 7.8033 60.8909
6 87 5 81.7055 5.2945 28.0317
7 96 6 97.2143 -1.2143 1.4745
8 97 6 97.2143 -0.2143 0.0459
9 109 7 112.7230 -3.7230 13.8611
10 119 8 128.2318 -9.2318 85.2265
11 149 9 143.7406 5.2594 27.6614
12 145 9 143.7406 1.2594 1.5861
13 154 10 159.2494 -5.2494 27.5558
14 166 10 159.2494 6.7506 45.5712
348.8484
Intercept 4.1617
Slope 15.5088
Applied Regression -- Prof. Juran 35
Using Formulas
( )( )
( )
=
2
1
x x
x x y y
i
i i
|
x y
1 0
| | =
RABE 2.13
RABE 2.13
Applied Regression -- Prof. Juran 36
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
A B C D E F G H I J K L M
Minutes Units Error (min) Error (units)
1 23 1 -74.2143 -5
2 29 2 -68.2143 -4
3 49 3 -48.2143 -3
4 64 4 -33.2143 -2
5 74 4 -23.2143 -2
6 87 5 -10.2143 -1
7 96 6 -1.2143 0
8 97 6 -0.2143 0
9 109 7 11.7857 1
10 119 8 21.7857 2
11 149 9 51.7857 3
12 145 9 47.7857 3
13 154 10 56.7857 4
14 166 10 68.7857 4
mean 97.21429 6 Slope 15.50877 Eq. 2.13
Intercept 4.161654 Eq. 2.14
=SUMPRODUCT(E2:E15,F2:F15)/(SUMPRODUCT(F2:F15,F2:F15))
=B18-F18*C18
Applied Regression -- Prof. Juran 37
Correlation and Regression
There is a close relationship between regression
and correlation. The correlation coefficient, ,
measures the degree to which random variables
X and Y move together or not.
= +1 implies a perfect positive linear
relationship while = -1 implies a perfect
negative linear relationship. = 0 essentially
implies independence.
Applied Regression -- Prof. Juran 38
Statistical Basics: Covariance
The covariance can be calculated using:
( )
( )( ) | | Y Y X X E Cov
XY
=
or equivalently
( )
( ) ( )( ) Y X XY E Cov
XY
=
Usually, we find it more useful to consider the coefficient of
correlation. That is,
( )
( )
Y X
XY
XY
Cov
Corr
o o
=
Sometimes the inverse relation is useful:
( ) ( ) XY Y X XY
Corr Cov o o =
Applied Regression -- Prof. Juran 39
Correlation and Regression
The sample (Pearson) correlation coefficient is
Regressions automatically produce an estimate of the squared
correlation called R
2
or R-square. Values of R-square close to 1
indicate a strong relationship while values close to 0 indicate a
weak or non-existent relationship
1
) , (
1 s = s
Y X
Y X Cov
o o
=
2 2
,
) ( ) (
) )( (
x x y y
x x y y
r
i i
i i
Y X
Applied Regression -- Prof. Juran 40
Some Validity Issues
We need to evaluate the strength of the relationship,
whether we have the proper functional form, and the
validity of the several statistical assumptions from a
practical and theoretical viewpoint using a multiplicity
of tools.
Fitted regression functions are interpolations of the data
in hand, and extrapolation is always dangerous.
Moreover, the functional form that fits the data in our
range of experience may not fit beyond it.
Applied Regression -- Prof. Juran 41
Regressions are based on past data. Why should the
same functional form and parameters hold in the
future?
In some uses of regression the future value of x may not
be known this adds greatly to our uncertainty.
In collecting data to do a regression choose x values
wisely when you have a choice. They should:
Be in the range where you intend to work
Be spread out along the range with some observations near
practical extremes
Have replicated values at the same x or at very nearby x values
for good estimation of o
Whenever possible test the stability of your model with
a holdout sample, not used in the original model
fitting.
Applied Regression -- Prof. Juran 42
Summary
Course Objectives & Description
Review of Basic Statistical Ideas
Intercept, Slope, Correlation, Causality
Simple Linear Regression
Statistical Model and Concepts
Regression in Excel
Computer Repair Example