Você está na página 1de 13

Q1

(a)

Empirical Survival Function


1
0.9
0.8
0.7
0.6
Probability 0.5
0.4
0.3
0.2
0.1
0
0

10

20

30

40

50

60

70

80

90 100 110 120 130 140 150 160 170

Time (days)

Figure 1: Empirical Survival Function

Empirical Hazard Function


1
0.9
0.8
0.7
0.6
Hazard Rate (per
0.5
day)
0.4
0.3
0.2
0.1
0
0

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170


Time (days)

Figure 2: Empirical Hazard Function


As can be seen in figure 2, the hazard function increases until around day 60, where the
function begins to decrease and continues decreasing until around day 120. The hazard
function is approximately monotonically increasing after day 120. It increases sharply from
day 150 onwards.

(b)
The survival model fitted to the data is S(t) = g^(c^t -1). The parameter estimates are
g=0.929072 and c=1.120873. Plots of the empirical and fitted models are given below.

Empirical Survival Function Vs. Fitted Gompertz


Survival Function
1
0.9
0.8
0.7
0.6
Probability 0.5
0.4
0.3
0.2
0.1
0

Empirical Survival Function


Fitted Gompertz Survival
Function
0

20

40

60

80 100 120 140 160 180 200


Time (days)

Figure 3: Empirical Survival Vs. Fitted Gompertz Survival Function

Empirical Hazard Function Vs. Fitted Gompertz


Hazard Function
1400000
1200000
1000000
Hazard Rate (per 800000
day)
600000

Empirical Hazard Rate

400000

Fitted Gompertz Hazard


Function

200000
0
0

20 40 60 80 100 120 140 160 180 200


Age (days)

Figure 4: Empirical Hazard Vs. Fitted Gompertz Hazard Function


The fitted survival function seems to do a reasonable job of fitting the data, but there are
noticeable differences around t=10 and t=40. The fitted hazard function does not provide a
good fit. The fitted hazard function increases without bound for t>120.

(c)
Other distributions that could be fitted to the data include: weibull, exponential and
polynomial. Any distribution that looks similar in shape to the empirical function.
I fitted a weibull distribution to the data. The survival model fitted to the data is S(t) = e^( (t/theta)^(k)). The parameter estimates are theta=22.994 and k=2.562. Plots of the empirical
(in black) and fitted (in red) models are given below. This model gives a much better fit
because the weibull hazard function does not increase anywhere near as fast as the
gompertz harzard function.

Figure 5: Empirical Survival Vs. Fitted Weibull Survival Function

Figure 6: Empirical Hazard Vs. Weibull Hazard Function

Q2.
(a)

Empirical Hazard Rate


0.016
0.014
0.012
0.01
Hazard Rate (per
0.008
week)
0.006
0.004
0.002
0
0

10

20

30

40

50

60

Age (weeks)

Figure 7: Empirical Hazard Function

Kaplan Meier Survival Function Estimate


1
0.9
0.8
0.7
0.6
Probability 0.5
0.4
0.3
0.2
0.1
0
0

10

20

30

40

50

60

Time (weeks)

Figure 8: Kaplan Meier Survival Function Estimate

(b)
The results from both the Log Rank and Wilcoxon tests are given in the appendix. Both tests
give p values that are significantly greater than 5%, so we do not have any evidence against
the null hypothesis that the survival functions for both races are the same. That is, we
conclude that there is no significant difference in arrest times between races.
Note: It is shown in (c) that the proportional hazards assumption is approximately correct.
Therefore, the Log Rank test is most appropriate to pick up differences in survival functions.

(c)
The full model and final model are shown in the appendix. The final model was reached by
removing predictors from the full model. From the output of the full model fit, it seemed as
though AGE was the only significant predictor. The least significant predictor (one with the
highest p value), PRIO, was deleted and the modelled was refitted. It still looked as though
AGE was the only significant predictor. The predictors RACE, MAR and EDUC were
successively removed as the p values for the coefficients were all significantly higher than
5% after each model refit. Based on this analysis, the only factor that explains arrest time is
age.
A log-likelihood test is performed in the appendix. This tests whether it was justified to
remove all of the predictors except for age. The p value for the test is greater than 5%, so
we do not have any evidence to reject the null hypothesis that the predictors PRIO, RACE,
MAR and EDUC are jointly zero (contribute no useful information).
A plot of the cumulative hazard function against the Cox-Snell residuals is given below:

Figure 9: Testing Proportionality Assumption


The plot appears to follow the 45 degree line fairly closely. This tells us that the Cox-Snell
residuals have an exponential(1) distribution (we should pick exponential(1) as the error
distribution). This also verifies the model assumption of proportional hazards and shows
that this model fits the data well.

Appendix
Q1
(a)
Day
0
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

Empirical Hazard
Function
0.000000
0.001440
0.004006
0.005077
0.006382
0.007535
0.009771
0.012324
0.016416
0.021837
0.029819
0.037855
0.045210
0.058853
0.063439
0.072233
0.075692
0.079254
0.082636
0.084988
0.092282
0.096805
0.100236
0.105855
0.110221
0.115806
0.129891
0.133597
0.136103
0.128019
0.121290
0.121414
0.116820
0.124090
0.125001
0.126642
0.122354

Empirical Survival
Function
1.000000
0.998560
0.994560
0.989510
0.983195
0.975787
0.966253
0.954345
0.938678
0.918180
0.890801
0.857079
0.818330
0.770169
0.721310
0.669208
0.618554
0.569531
0.522468
0.478064
0.433947
0.391939
0.352653
0.315323
0.280568
0.248076
0.215853
0.187016
0.161562
0.140879
0.123792
0.108762
0.096056
0.084137
0.073620
0.064296
0.056429

38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

0.133847
0.115366
0.124878
0.121245
0.127930
0.130136
0.136561
0.143931
0.135957
0.130634
0.136145
0.145241
0.133767
0.151456
0.135971
0.145604
0.150133
0.132578
0.160273
0.160488
0.158120
0.167174
0.160098
0.131592
0.130919
0.125000
0.134066
0.141286
0.168473
0.122038
0.136302
0.104688
0.125654
0.079840
0.119306
0.120690
0.109244
0.103774
0.091228
0.127413
0.066372
0.080569
0.067010
0.066298

0.048876
0.043238
0.037838
0.033251
0.028997
0.025223
0.021779
0.018644
0.016109
0.014005
0.012098
0.010341
0.008958
0.007601
0.006568
0.005611
0.004769
0.004137
0.003474
0.002916
0.002455
0.002045
0.001717
0.001491
0.001296
0.001134
0.000982
0.000843
0.000701
0.000616
0.000532
0.000476
0.000416
0.000383
0.000337
0.000297
0.000264
0.000237
0.000215
0.000188
0.000175
0.000161
0.000150
0.000140

82
83
84
85
86
87
88
89
91
92
93
94
95
96
97
98
99
103
104
105
106
107
109
110
111
112
113
114
115
116
120
121
122
123
125
126
127
128
129
130
133
137
142
143

0.076923
0.089744
0.084507
0.061538
0.057377
0.052174
0.100917
0.071429
0.054945
0.011628
0.070588
0.075949
0.027397
0.056338
0.014925
0.015152
0.046154
0.064516
0.017241
0.035088
0.018182
0.018519
0.018868
0.019231
0.039216
0.040816
0.042553
0.044444
0.023256
0.047619
0.025000
0.051282
0.027027
0.083333
0.060606
0.096774
0.107143
0.040000
0.041667
0.086957
0.047619
0.050000
0.105263
0.058824

0.000130
0.000118
0.000108
0.000101
0.000096
0.000091
0.000081
0.000076
0.000071
0.000071
0.000066
0.000061
0.000059
0.000056
0.000055
0.000054
0.000052
0.000048
0.000047
0.000046
0.000045
0.000044
0.000043
0.000042
0.000041
0.000039
0.000037
0.000036
0.000035
0.000033
0.000032
0.000031
0.000030
0.000027
0.000026
0.000023
0.000021
0.000020
0.000019
0.000017
0.000017
0.000016
0.000014
0.000013

144
146
147
148
149
151
154
155
156
159
165
172

0.125000
0.142857
0.083333
0.090909
0.100000
0.111111
0.125000
0.142857
0.166667
0.200000
0.500000
1.000000

0.000012
0.000010
0.000009
0.000008
0.000007
0.000007
0.000006
0.000005
0.000004
0.000003
0.000002
0.000000

(b)
Formula: survival ~ g^(c^t - 1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
c 1.120873 0.003795 295.4 <2e-16 ***
g 0.929072 0.005720 162.4 <2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.02248 on 135 degrees of freedom
Number of iterations to convergence: 10
Achieved convergence tolerance: 3.088e-06

(c)
Formula: Survival ~ exp(-(t/theta)^k)
Parameters:
Estimate Std. Error t value Pr(>|t|)
theta 22.99430 0.06138 374.6 <2e-16 ***
k 2.56199 0.02430 105.4 <2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.009922 on 135 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 3.372e-06

Q2.
(a)
Week of
Arrest
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
30
31
32
33
34
35
36
37

Sum of arrest
(dj)
0
1
1
1
1
1
1
1
5
2
1
2
2
1
3
2
2
3
3
2
5
2
1
1
4
3
3
2
2
2
1
2
2
2
4
3
4

Number of lives at
risk (nj)
432
432
431
430
429
428
427
426
425
420
418
417
415
413
412
409
407
405
402
399
397
392
390
389
388
384
381
378
376
374
372
371
369
367
365
361
358

Empirical
Hazard Rate
0.000000
0.002315
0.002320
0.002326
0.002331
0.002336
0.002342
0.002347
0.011765
0.004762
0.002392
0.004796
0.004819
0.002421
0.007282
0.004890
0.004914
0.007407
0.007463
0.005013
0.012594
0.005102
0.002564
0.002571
0.010309
0.007813
0.007874
0.005291
0.005319
0.005348
0.002688
0.005391
0.005420
0.005450
0.010959
0.008310
0.011173

Empirical Survival
Function
1.000000
0.997685
0.995370
0.993056
0.990741
0.988426
0.986111
0.983796
0.972222
0.967593
0.965278
0.960648
0.956019
0.953704
0.946759
0.942130
0.937500
0.930556
0.923611
0.918981
0.907407
0.902778
0.900463
0.898148
0.888889
0.881944
0.875000
0.870370
0.865741
0.861111
0.858796
0.854167
0.849537
0.844907
0.835648
0.828704
0.819444

38
39
40
42
43
44
45
46
47
48
49
50
52

1
2
4
2
4
2
2
4
1
2
5
3
4

354
353
351
347
345
341
339
337
333
332
330
325
322

(b)
Log Rank Test:
Call:
survdiff(formula = Surv(week, arrest) ~ race, rho = 0)
N Observed Expected (O-E)^2/E (O-E)^2/V
race=0 53
12 14.7 0.4990 0.576
race=1 379 102 99.3 0.0739 0.576
Chisq= 0.6 on 1 degrees of freedom, p= 0.448
Wilcoxon Statistic Test:
Call:
survdiff(formula = Surv(week, arrest) ~ race, rho = 1)
N Observed Expected (O-E)^2/E (O-E)^2/V
race=0 53 10.1 12.8 0.568 0.748
race=1 379 89.2 86.5 0.084 0.748
Chisq= 0.7 on 1 degrees of freedom, p= 0.38

0.002825
0.005666
0.011396
0.005764
0.011594
0.005865
0.005900
0.011869
0.003003
0.006024
0.015152
0.009231
0.012422

0.817130
0.812500
0.803241
0.798611
0.789352
0.784722
0.780093
0.770833
0.768519
0.763889
0.752315
0.745370
0.736111

(c)
Full Model:
Call:
coxph(formula = xx ~ age + as.factor(educ) + mar + prio + race,
method = "breslow")
n= 432, number of events= 114
coef exp(coef) se(coef) z Pr(>|z|)
age
-0.061591 0.940268 0.021208 -2.904 0.00368 **
as.factor(educ)1 0.108916 1.115069 0.873223 0.125 0.90074
as.factor(educ)2 -0.682694 0.505254 0.683156 -0.999 0.31764
as.factor(educ)3 -0.457698 0.632739 0.525318 -0.871 0.38360
as.factor(educ)4 -0.880341 0.414641 0.544926 -1.616 0.10620
as.factor(educ)5 -1.277873 0.278629 0.676321 -1.889 0.05883 .
as.factor(educ)6 -0.563035 0.569478 0.772968 -0.728 0.46637
mar
-0.466274 0.627335 0.375422 -1.242 0.21424
prio
-0.001888 0.998114 0.049705 -0.038 0.96970
race
0.336789 1.400443 0.310854 1.083 0.27862
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
exp(coef) exp(-coef) lower .95 upper .95
age
0.9403 1.0635 0.90199 0.9802
as.factor(educ)1 1.1151 0.8968 0.20138 6.1743
as.factor(educ)2 0.5053 1.9792 0.13244 1.9276
as.factor(educ)3 0.6327 1.5804 0.22598 1.7716
as.factor(educ)4 0.4146 2.4117 0.14251 1.2065
as.factor(educ)5 0.2786 3.5890 0.07402 1.0488
as.factor(educ)6 0.5695 1.7560 0.12518 2.5908
mar
0.6273 1.5940 0.30057 1.3094
prio
0.9981 1.0019 0.90546 1.1002
race
1.4004 0.7141 0.76149 2.5755
Concordance= 0.636 (se = 0.027 )
Rsquare= 0.058 (max possible= 0.956 )
Likelihood ratio test= 25.84 on 10 df, p=0.003955
Wald test
= 22.23 on 10 df, p=0.01399
Score (logrank) test = 23.19 on 10 df, p=0.01007

Final Model:
Call:
coxph(formula = xx ~ age, method = "breslow")

coef exp(coef) se(coef) z


p
age -0.0726 0.93 0.0208 -3.5 0.00047
Likelihood ratio test=15.2 on 1 df, p=9.71e-05 n= 432, number of events= 114
Log-likelihood Test on Final Model:
> pvalue<-1-pchisq(25.84-15.2,9)
> pvalue
0.3004644

Você também pode gostar