Escolar Documentos
Profissional Documentos
Cultura Documentos
28.08.2014
Page 1/23
Todays agenda
I General principles
I Lack of data
I Model selection?
Page 2/23
General Principles
I If we are modeling to gain insight into the
relationship of (some of) the explanatory
variables to the response, we must include the
important variables, and any likely confounders.
I We should allow the important categorical
variables to interact with each other and the
continuous variables, if we have enough data.
I It is vital to model the important variables well.
I If we have enough data, we could start with the
full model.
Page 3/23
The full model
Page 4/23
Illustration
Page 5/23
Illustration (cont)
Page 6/23
Illustration (cont)
I There are a b X -coefficients, we can also
arrange them in a a b table, with a rows and
b columns
I Can split the table up into main effects and
interactions as we did with means in 2 way
ANOVA.
I Listed in output as X, A:X, B:X and A:B:X
I If all the A:X, B:X, A:B:X interactions are zero,
coefficient of X is the same for all the a b
regressions
Page 7/23
Illustration (cont)
Page 8/23
Caution!!
Page 9/23
Model selection: general advice
Page 10/23
Example: the low birthweight data
I These data were collected at Baystate Medical
Center, Springfield, Mass. during 1986, as part
of a study to identify risk factors for
low-birthweight babies.
I There are 189 births in the study.
I The response variable was birthweight, and data
was collected on a variety of continuous and
categorical explanatory variables relating to the
mothers health.
I The data set is in the R330 package
(births.df). To keep things simple we use
only 4 of the factors in the data set.
Page 11/23
The Variables
age: age : mothers age in years, continuous
lwt: mothers weight in pounds, continuous
race: mothers race
a factor, with 1 = white, 2 = black,
3 = other.
smoke: smoking during pregnancy
a factor with 1 =smoked, 0=didnt smoke
ht: history of hypertension
a factor with 0=No, 1=Yes
ui: presence of uterine irritability
a factor with 0=No, 1=Yes
bwt: birth weight in grams, continuous, the
response
Page 12/23
Preliminary plots
5000
5000
4000
4000
baby's weight, gms
3000
2000
2000
1000
1000
Page 13/23
baby's weight, gms baby's weight, gms
1000 2000 3000 4000 5000 1000 2000 3000 4000 5000
White
No
Black
Mother's race
Mother's hypertension
Yes
Other
No
No
Mother's smoking
Page 14/23
Preliminary conclusions
Page 15/23
Model fitting
I There are 24 treatment combinations of race,
smoke, iu, ht, and 189 observations.
I The data are not evenly spread over the 24
combinations. Eight combinations have no data,
and only 13/24 have 3 or more points. Hence
we cant hope to estimate a complicated model.
Even the coefficients we can estimate will be
very inaccurate, and there is no hope of finding
many significant effects.
I We will consider fitting four simpler models, as
well as the all interactions model.
Page 16/23
The models
Page 17/23
Summary of results
R2 Adj R 2 Number of AIC
parameters
Model 1 0.428 0.263 43 2471.116
Model 2 0.314 0.246 18 2455.462
Model 3 0.276 0.239 10 2449.704
Model 4 0.241 0.212 8 2454.459
Model 5 0.269 0.233 10 2451.339
The terms smoke, race2, race3, ui, ht affect the intercept of the age-lwt
regressions, smoke and ht also affect the age slope. The variables smoke
and ht raise the intercept but lower the slope. The other variables lower
the birthweight as we move away from the baselines.
Page 19/23
Model 3
Lets fix all variables except age, smoke and ht and
look at the effect of smoke and ht on the
relationship between age and birthweight.
3000
2000
centered birthweight
1000
smoke=0,ht=0
0
smoke=1,ht=0
smoke=0,ht=1
smoke=1,ht=1
15 20 25 30 35 40 45
age
Page 20/23
Model 4
This model forces a common slope for age.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2935.606 310.712 9.448 < 2e-16 ***
age -4.758 9.335 -0.510 0.610875
lwt 4.396 1.707 2.576 0.010786 *
race2 -491.675 149.159 -3.296 0.001179 **
race3 -358.616 113.834 -3.150 0.001908 **
ui -527.505 135.061 -3.906 0.000133 ***
smoke -359.367 104.007 -3.455 0.000685 ***
ht -590.037 200.251 -2.946 0.003637 **
Page 22/23
Interpreting the joint effect of smoking
and race
Lets hold all other variables fixed. What do the
various race/smoking combinations add to the
birthweight?
Race=White Race=Black Race=Other
Smoking = No 0 -538 -495
Smoking=Yes -516 -978 -518
Page 23/23
Overall conclusions
Page 24/23