Você está na página 1de 7

1

SAS
Q1. Give a reasonably detailed explanation of what Maximum Likelihood is (3 points).
Maximum likelihood is a technique used to estimate statistical models like logistic
regression (and any other general linear model (GLM)). The idea behind maximum
likelihood is to choose parameter estimates such that they maximize the probability of
observing the data that was in fact observed (maximizing the likelihood). This is
achieved by common methods to maximizing any mathematical function: 1) writing down
a probability expression for the likelihood, 2) taking the derivative of the likelihood
function (or log likelihood as its easier to work with), setting equal to 0, and solving to
ascertain a maximum like in Calculus. Usually this maximum is estimated using iterative
methods however. Maximum likelihood estimates have good properties: they are
asymptotically unbiased, have the smallest variance and are Normally distributed.
Because of these desirable properties, ML has become a popular technique for fitting
these regression models.
Q2. Fit a simple logistic regression model to these data using treatment group as a
predictor for myocardial infarction at 5 years (MI).
a. What is the form of the estimated logistic regression equation (2 points)?
We can use SAS Enterprise to fit a simple logistic regression model for MI with treatment
group by going to Analyze->Regression->Logistic.
Doing so gives the following relevant output:
Model Information
Data Set WORK.SORTTEMPTABLESORTED
Response Variable mi
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Probability modeled is mi=1.
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept 1 -0.4382 0.2868 2.3348 0.1265
itrt 1 -0.5268 0.4106 1.6464 0.1995
Thus, the form of the estimated logistic regression model is:
2
, )
, )
Pr 1
ln 0.4382 0.5268*
1 Pr 1
MI
itrt
MI
| =
=
|
|
=
' .
b. What is the estimated odds ratio and 95% CI for the OR for the treatment group
effect (3 points)? Interpret the odds ratio (2 points).
From the same output this OR is given by:
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
Itrt 0.590 0.264 1.320
Thus, the OR for treatment vs. control = 0.590, with a 95% Confidence Interval = (0.264,
1.320). Therefore, the odds of a MI were 41% lower for patients in the treatment group
compared to the control group.
c. Add gender to the logistic regression model. Is there possible evidence of
gender confounding on the treatment effects for MI (3 points)? How did adding
gender affect the predictive ability of the model (3 points)?
If were using ORs to quantify effects in a logistic regression model, then we need to
compare the OR of itrt in models with and without ifemale in order to determine if there is
confounding due to gender.
After adding ifemale to the model, the odds ratios are given by:
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
Itrt 0.664 0.290 1.523
Ifemale 1.825 0.741 4.495
Thus, the OR for itrt went from 0.590 to 0.664, which is a 12.5% change. Thus, we
conclude there is confounding on treatment effects due to gender (and thus we would
include gender in the model).
The Area under the ROC curve (c statistic) without ifemale in the model was 0.565 and
with ifemale was 0.620. So although the predictive ability of the model improved
somewhat by adding gender this was not a dramatic improvement.
Q3. Fit a multiple logistic regression model for response with treatment arm, systolic
blood pressure (SBP) at baseline and gender. Use the centered version of SBP (which is
centered at its mean value and include this centered variable in your model.
a. What is the form of the estimated logistic regression equation (4 points)?
Using SAS Enterprise to fit this model gives the following relevant output:
3
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept 1 -0.5959 0.3481 2.9308 0.0869
itrt 1 -0.5354 0.4388 1.4889 0.2224
ifemale 1 0.5220 0.4650 1.2600 0.2617
Csbp 1 -0.0249 0.0203 1.4997 0.2207
Thus, the form of the estimated logistic regression model is:
, )
, )
, )
Pr 1
ln 0.5959 0.5354* 0.5220* 0.0249* 138.0734
1 Pr 1
MI
itrt ifemale SBP
MI
| =
= +
|
|
=
' .
b. What is the estimated odds ratio and 95% CI for the OR for the gender effect (3
points)? Interpret the odds ratio (2 points).
The ORs are given in the following output:
Odds Ratio Estimates
Effect Point Estimate 95% Wald
Confidence Limits
Itrt 0.585 0.248 1.383
Ifemale 1.685 0.677 4.193
Csbp 0.975 0.937 1.015
The adjusted OR for ifemale is 1.685, with a 95% CI = (0.677, 4.193). Thus, the odds of
a MI are 68.5% higher for females compared to males, adjusting for treatment group and
systolic blood pressure.
c. Give and interpret the odds ratio for baseline SBP.
The odds ratio for SBP centered at its mean is 0.975. Thus, for every unit increase in
SBP (1 mmHg) the odds of MI decrease by 2.5%, controlling for treatment group and
gender.
However the 95% confidence interval for this adjusted OR includes 1 (and thus stretched
below and above 1). So although the point estimate suggests an inverse association,
this association is not statistically significant (more about this later).
d. Conduct a Likelihood ratio test and a Wald test to see if there are significant
treatment arm effects.
The Wald test for treatment arm effects is given directly off the SAS output:
Analysis of Maximum Likelihood Estimates
4
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept 1 -0.5959 0.3481 2.9308 0.0869
itrt 1 -0.5354 0.4388 1.4889 0.2224
ifemale 1 0.5220 0.4650 1.2600 0.2617
Csbp 1 -0.0249 0.0203 1.4997 0.2207
Here, the Wald Chi-square = 1.4889 with df = 1 and p-value = 0.2224. Thus using the
Wald test we would fail to reject the null hypothesis of no treatment group effect at the
5% significance level. Here, we conclude there is no treatment effect on the odds of
having an MI.
To conduct a Likelihood ratio test we must compare the -2lnL value from the current (full)
model with treatment group:
Model Fit Statistics
Criterion Intercept
Only
Intercept
and
Covariates
AIC 140.293 141.390
SC 142.985 152.155
-2 Log L 138.293 133.390
To a reduced model without treatment group (but with ifemale and Csbp):
Model Fit Statistics
Criterion Intercept
Only
Intercept
and
Covariates
AIC 140.293 140.889
SC 142.985 148.963
-2 Log L 138.293 134.889
Thus the Likelihood ratio test statistic = 134.889 133.390 = 0.125. Since this does not
exceed the critical value of 3.84, we fail to reject the null hypothesis that full model has
similar fit as the reduced model. Thus there is not evidence of a treatment group effect
on the odds of having a MI.
e. How did adding SBP affect the predictive ability of the model (3 points)? Give
a plot of the corresponding ROC curve which helps summarize this graphically
(1 point).
5
The Area under the ROC curve without SBP was 0.620 and with SBP was 0.637. So
again the models predictive ability was just slightly improved.
SAS Enterprise gives the following ROC curve
plot:
f. Does the treatment effect depend on gender (in a model with itrt, ifemale,
Csbp)? Conduct an investigation using the data in order to address this
question and justify your answer (6 points).
In order to answer this question, we need to fit a model with itrt, ifemale, Csbp and an
interaction effect between itrt and ifemale (i.e., include INTitrtifemale, which is
INTitrtifemale = itrt * ifemale). If the treatment effect depends upon gender this means
there is an interaction between itrt and ifemale.
Doing this gives the following relevant SAS Enterprise output:
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept 1 -0.3333 0.3647 0.8353 0.3607
itrt 1 -1.0270 0.5148 3.9799 0.0460
ifemale 1 -0.1819 0.5996 0.0920 0.7616
6
Analysis of Maximum Likelihood Estimates
Parameter DF Estimate Standard
Error
Wald
Chi-Square
Pr > ChiSq
Csbp 1 -0.0203 0.0206 0.9677 0.3252
INTitrtifemale 1 1.7870 0.9572 3.4852 0.0619
Here, the Wald test for interaction gives a p-value = 0.0619. Thus, there is not evidence
of interaction at the 5% significance level, although the p-value was close. Thus, we
conclude there is not significant evidence that the treatment effect depends upon gender.
Q4. Answer the following questions by hand (not using SAS). The fitted multiple logistic
regression model for baseline SBP and gender for a subset of the study participants looks
like:
, )
, )
, )
Pr
ln 0.87 0.68* 0.03 137.2
1 Pr
mi
ifemale sbp
mi
|
= +
|
|

' .
and
, )
0.46
ifemale
SE

=
.
a. Estimate the odds ratio and 95% CI for gender (4 points). Is there evidence of a
gender effect? (2 points).
The odds ratio can be estimated by exp(0.68)=1.97 . Since SE=0.46 then a 95% CI is given
by:
, ) , ) , ) , ) , )
, ) , ) , )
, )
95% CI = exp 0.68- 1.96*0.46 , exp 0.68+ 1.96*0.46
= exp -0.2216 , exp 1.5816
= 0.80, 4.86
The 95% CI is pretty wide, indicating there was lower precision in estimating gender effects.
Since the 95% CI for the OR includes 1, we can deduce that there is not a significant gender
effect at the 5% level.
However we could formally carry out a Wald test by calculating Z = 0.68/0.46 = 1.478. Since this
does not exceed the critical value of 1.96 at the 5% significance level, we fail to reject the null
hypothesis of no gender effect.
b. Estimate the probability that a Female with baseline SBP=150 has a MI by 5 years
(4 points).
We can estimate this probability by simply plugging in the covariate information into the
probability form of the estimated logistic regression model:
7
, )
, ) , )
, ) , )
, )
, )
exp 0.87 0.68*1 0.03 150 137.2
Pr 1
1 exp 0.87 0.68*1 0.03 150 137.2
exp 0.574
1 exp 0.574
0.3603
MI
+
= =
+ +

=
+
=
Thus a female with SBP=150 at baseline has about a 36% chance of having a MI.
c. Estimate the probability that a Male at the average SBP of the sample (137.2) will
have a MI by 5 years (4 points).
Using the same formula we can calculate this probability:
, )
, ) , )
, ) , )
, )
, )
exp 0.87 0.68*0 0.03 137.2 137.2
Pr 1
1 exp 0.87 0.68*0 0.03 137.2 137.2
exp 0.87
1 exp 0.87
0.2953
MI
+
= =
+ +

=
+
=
Thus a male with average SBP has about a 29.5% chance of having a MI.
B. Explain what quasi-complete separation is and how you would remediate it if you had it
in a multiple logistic regression model with 3 categorical predictor variables. Include the
detailed steps you would take (Bonus up to 2 points).
Quasi-complete separation occurs when a predictor variable is so good at predicting the
dichotomous outcome that for some linear combination of the predictors the outcome is perfectly
predicted. When this happens statistical estimation breaks down and SEs and CIs become
extremely inflated. SAS prints out a warning in the SAS Log file when this occurs and also can
be diagnosed after observing extreme, nonsensical SEs and very wide CIs. To remediate for a
categorical predictor one could collapse the problematic categories by reducing the number of
groups. If convergence problems remain other options include dropping the variable or excluding
the cases where perfect outcome prediction is observed.

Você também pode gostar