Escolar Documentos
Profissional Documentos
Cultura Documentos
Diagnostics
7.08.2014
Office hours
I Lecturers
Office auckland.ac.nz day time
Steffen Klaere 303.219 s.klaere Thu, 10:0012:00
Alan Lee 303S.265 aj.lee Tue, 10:3012:00
Thu, 10:3012:00
x2 x1
Aims of the next four lectures
I pairs
I reg3d
I coplot
Residual analysis
Choose Model
Fit Model
USE MODEL
What constitutes a bad fit?
Outliers: ...
Independence: ...
> data(cherry.df)
> cherry.lm <- lm(volume~diameter+height,data=cherry.df)
> plot(cherry.lm,which=1)
Residuals vs Fitted
10
31
2
5
Residuals
18
10 20 30 40 50 60 70
Fitted values
lm(volume ~ diameter + height)
Additive models
> library(mgcv)
> cherry.gam <- gam(volume~s(diameter)+s(height),
data=cherry.df)
> plot(cherry.gam,residuals=T,pages=1)
Example: Cherry trees
40
40
30
30
s(diameter,2.69)
s(height,1)
20
20
10
10
0
0
20
20
8 10 12 14 16 18 20 65 70 75 80 85
diameter height
Fitting polynomials
y~poly(x,2)
y~poly(x,3)
etc.
Orthogonal polynomials
Y = 0 + 1 p1 (x) + 2 p2 (x)
where
Call:
lm(formula = volume ~ poly(diameter, 2) + height,
data = cherry.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.56553 6.72218 0.233 0.817603
poly(diameter, 2)1 80.25223 3.07346 26.111 < 2e-16 ***
poly(diameter, 2)2 15.39923 2.63157 5.852 3.13e-06 ***
height 0.37639 0.08823 4.266 0.000218 ***
---
Residual standard error: 2.625 on 27 degrees of freedom
Multiple R-squared: 0.9771, Adjusted R-squared: 0.9745
F-statistic: 383.2 on 3 and 27 DF, p-value: < 2.2e-16
Quadratic equation
Call:
lm(formula = volume ~ diameter + I(diameter^2) + height,
data = cherry.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.92041 10.07911 -0.984 0.333729
diameter -2.88508 1.30985 -2.203 0.036343 *
I(diameter^2) 0.26862 0.04590 5.852 3.13e-06 ***
height 0.37639 0.08823 4.266 0.000218 ***
---
Residual standard error: 2.625 on 27 degrees of freedom
Multiple R-squared: 0.9771, Adjusted R-squared: 0.9745
F-statistic: 383.2 on 3 and 27 DF, p-value: < 2.2e-16
Quadratic equation
volume
er
he et
igh am
t di
Splines
1.0
0.8
0.6
y
0.4
0.2
0.0
x
Cherry splines
Call:
lm(formula = volume ~ bs(diameter, knots = knot.points) + height,
data = cherry.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -16.3679 7.4856 -2.187 0.03921 *
bs(diameter, knots = knot.points)1 0.1941 7.9374 0.024 0.98070
bs(diameter, knots = knot.points)2 5.5744 3.1704 1.758 0.09201 .
bs(diameter, knots = knot.points)3 10.7976 3.9798 2.713 0.01240 *
bs(diameter, knots = knot.points)4 31.4053 5.5545 5.654 9.35e-06 ***
bs(diameter, knots = knot.points)5 42.2665 6.1297 6.895 4.97e-07 ***
bs(diameter, knots = knot.points)6 58.6454 4.2781 13.708 1.49e-12 ***
height 0.3970 0.1050 3.780 0.00097 ***
---
Residual standard error: 2.8 on 23 degrees of freedom
Multiple R-squared: 0.9778, Adjusted R-squared: 0.971
F-statistic: 144.4 on 7 and 23 DF, p-value: < 2.2e-16
Cherry splines
0.4
0.2
0.0
x
Cherry splines
0.4
0.2
0.0
x
Cherry splines
80
polynomial
splines
70
60
50
Volume
40
30
20
10
8 10 12 14 16 18 20
Diameter
Example: Tyre abrasion data
I Variables are
Call:
lm(formula = abloss ~ hardness + tensile, data = rubber.df)
---
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 885.1611 61.7516 14.334 3.84e-14 ***
hardness -6.5708 0.5832 -11.267 1.03e-11 ***
tensile -1.3743 0.1943 -7.073 1.32e-07 ***
---
Residual standard error: 36.49 on 27 degrees of freedom
Multiple R-squared: 0.8402, Adjusted R-squared: 0.8284
F-statistic: 71 on 2 and 27 DF, p-value: 1.767e-11
Tyre abrasion data
I Pairs plot
I Spinning plot
I Coplot
I Residual vs. fitted value plot
I Fitting GAMs
Pairs plot Not very informative
120 140 160 180 200 220 240
90
hardness
80
70
60
50
120 140 160 180 200 220 240
tensile
0.30
abloss
300
0.74 0.30
200
50 100
50 60 70 80 90 50 100 200 300
Spinning Hint of a kink
Coplot Suggestion of non-linearity
Given : hardness
50 60 70 80
120 140 160 180 200 220 240 120 140 160 180 200 220 240
350
250
150
50
abloss
350
250
150
50
tensile
Residuals vs. fitted values weak suggestion of
non-planarity
Residuals vs Fitted
29
50
Residuals
50
22
10
Fitted values
GAMs Quite strong indication of non-planarity
50 100
s(tensile,5.42)
s(hardness,1)
0
0
100
100
120 140 160 180 200 220 240 50 60 70 80 90
tensile hardness
Fitting a fourth degree polynomials
Outliers: ...
Independence: ...