ACTL3001 Assignment

Q1
(a)
Empirical Survival Function

1
0.9
0.8
0.7
0.6
Probability 0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
50
60
70
80
90 100 110 120 130 140 150 160 170
Time (days)
Figure 1: Empirical Survival Function
Empirical Hazard Function

1
0.9
0.8
0.7
0.6
Hazard Rate (per
0.5
day)
0.4
0.3
0.2
0.1
0
0
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170

Time (days)
Figure 2: Empirical Hazard Function

As can be seen in figure 2, the hazard function increases until around day 60, where the
function begins to decrease and continues decreasing until around day 120. The hazard
function is approximately monotonically increasing after day 120. It increases sharply from
day 150 onwards.
(b)
The survival model fitted to the data is S(t) = g^(c^t -1). The parameter estimates are
g=0.929072 and c=1.120873. Plots of the empirical and fitted models are given below.
Empirical Survival Function Vs. Fitted Gompertz

Survival Function
1
0.9
0.8
0.7
0.6
Probability 0.5
0.4
0.3
0.2
0.1
0
Empirical Survival Function

Fitted Gompertz Survival
Function
0
20
40
60
80 100 120 140 160 180 200

Time (days)
Figure 3: Empirical Survival Vs. Fitted Gompertz Survival Function
Empirical Hazard Function Vs. Fitted Gompertz

Hazard Function
1400000
1200000
1000000
Hazard Rate (per 800000
day)
600000
Empirical Hazard Rate
400000
Fitted Gompertz Hazard

Function
200000
0
0
20 40 60 80 100 120 140 160 180 200

Age (days)
Figure 4: Empirical Hazard Vs. Fitted Gompertz Hazard Function

The fitted survival function seems to do a reasonable job of fitting the data, but there are
noticeable differences around t=10 and t=40. The fitted hazard function does not provide a
good fit. The fitted hazard function increases without bound for t>120.
(c)
Other distributions that could be fitted to the data include: weibull, exponential and
polynomial. Any distribution that looks similar in shape to the empirical function.
I fitted a weibull distribution to the data. The survival model fitted to the data is S(t) = e^( (t/theta)^(k)). The parameter estimates are theta=22.994 and k=2.562. Plots of the empirical
(in black) and fitted (in red) models are given below. This model gives a much better fit
because the weibull hazard function does not increase anywhere near as fast as the
gompertz harzard function.
Figure 5: Empirical Survival Vs. Fitted Weibull Survival Function
Figure 6: Empirical Hazard Vs. Weibull Hazard Function
Q2.
(a)
Empirical Hazard Rate

0.016
0.014
0.012
0.01
Hazard Rate (per
0.008
week)
0.006
0.004
0.002
0
0
10
20
30
40
50
60
Age (weeks)
Figure 7: Empirical Hazard Function
Kaplan Meier Survival Function Estimate

1
0.9
0.8
0.7
0.6
Probability 0.5
0.4
0.3
0.2
0.1
0
0
10
20
30
40
50
60
Time (weeks)
Figure 8: Kaplan Meier Survival Function Estimate
(b)
The results from both the Log Rank and Wilcoxon tests are given in the appendix. Both tests
give p values that are significantly greater than 5%, so we do not have any evidence against
the null hypothesis that the survival functions for both races are the same. That is, we
conclude that there is no significant difference in arrest times between races.
Note: It is shown in (c) that the proportional hazards assumption is approximately correct.
Therefore, the Log Rank test is most appropriate to pick up differences in survival functions.
(c)
The full model and final model are shown in the appendix. The final model was reached by
removing predictors from the full model. From the output of the full model fit, it seemed as
though AGE was the only significant predictor. The least significant predictor (one with the
highest p value), PRIO, was deleted and the modelled was refitted. It still looked as though
AGE was the only significant predictor. The predictors RACE, MAR and EDUC were
successively removed as the p values for the coefficients were all significantly higher than
5% after each model refit. Based on this analysis, the only factor that explains arrest time is
age.
A log-likelihood test is performed in the appendix. This tests whether it was justified to
remove all of the predictors except for age. The p value for the test is greater than 5%, so
we do not have any evidence to reject the null hypothesis that the predictors PRIO, RACE,
MAR and EDUC are jointly zero (contribute no useful information).
A plot of the cumulative hazard function against the Cox-Snell residuals is given below:
Figure 9: Testing Proportionality Assumption

The plot appears to follow the 45 degree line fairly closely. This tells us that the Cox-Snell
residuals have an exponential(1) distribution (we should pick exponential(1) as the error
distribution). This also verifies the model assumption of proportional hazards and shows
that this model fits the data well.
Appendix
Q1
(a)
Day
0
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Empirical Hazard
Function
0.000000
0.001440
0.004006
0.005077
0.006382
0.007535
0.009771
0.012324
0.016416
0.021837
0.029819
0.037855
0.045210
0.058853
0.063439
0.072233
0.075692
0.079254
0.082636
0.084988
0.092282
0.096805
0.100236
0.105855
0.110221
0.115806
0.129891
0.133597
0.136103
0.128019
0.121290
0.121414
0.116820
0.124090
0.125001
0.126642
0.122354
Empirical Survival
Function
1.000000
0.998560
0.994560
0.989510
0.983195
0.975787
0.966253
0.954345
0.938678
0.918180
0.890801
0.857079
0.818330
0.770169
0.721310
0.669208
0.618554
0.569531
0.522468
0.478064
0.433947
0.391939
0.352653
0.315323
0.280568
0.248076
0.215853
0.187016
0.161562
0.140879
0.123792
0.108762
0.096056
0.084137
0.073620
0.064296
0.056429
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
0.133847
0.115366
0.124878
0.121245
0.127930
0.130136
0.136561
0.143931
0.135957
0.130634
0.136145
0.145241
0.133767
0.151456
0.135971
0.145604
0.150133
0.132578
0.160273
0.160488
0.158120
0.167174
0.160098
0.131592
0.130919
0.125000
0.134066
0.141286
0.168473
0.122038
0.136302
0.104688
0.125654
0.079840
0.119306
0.120690
0.109244
0.103774
0.091228
0.127413
0.066372
0.080569
0.067010
0.066298
0.048876
0.043238
0.037838
0.033251
0.028997
0.025223
0.021779
0.018644
0.016109
0.014005
0.012098
0.010341
0.008958
0.007601
0.006568
0.005611
0.004769
0.004137
0.003474
0.002916
0.002455
0.002045
0.001717
0.001491
0.001296
0.001134
0.000982
0.000843
0.000701
0.000616
0.000532
0.000476
0.000416
0.000383
0.000337
0.000297
0.000264
0.000237
0.000215
0.000188
0.000175
0.000161
0.000150
0.000140
82
83
84
85
86
87
88
89
91
92
93
94
95
96
97
98
99
103
104
105
106
107
109
110
111
112
113
114
115
116
120
121
122
123
125
126
127
128
129
130
133
137
142
143
0.076923
0.089744
0.084507
0.061538
0.057377
0.052174
0.100917
0.071429
0.054945
0.011628
0.070588
0.075949
0.027397
0.056338
0.014925
0.015152
0.046154
0.064516
0.017241
0.035088
0.018182
0.018519
0.018868
0.019231
0.039216
0.040816
0.042553
0.044444
0.023256
0.047619
0.025000
0.051282
0.027027
0.083333
0.060606
0.096774
0.107143
0.040000
0.041667
0.086957
0.047619
0.050000
0.105263
0.058824
0.000130
0.000118
0.000108
0.000101
0.000096
0.000091
0.000081
0.000076
0.000071
0.000071
0.000066
0.000061
0.000059
0.000056
0.000055
0.000054
0.000052
0.000048
0.000047
0.000046
0.000045
0.000044
0.000043
0.000042
0.000041
0.000039
0.000037
0.000036
0.000035
0.000033
0.000032
0.000031
0.000030
0.000027
0.000026
0.000023
0.000021
0.000020
0.000019
0.000017
0.000017
0.000016
0.000014
0.000013
144
146
147
148
149
151
154
155
156
159
165
172
0.125000
0.142857
0.083333
0.090909
0.100000
0.111111
0.125000
0.142857
0.166667
0.200000
0.500000
1.000000
0.000012
0.000010
0.000009
0.000008
0.000007
0.000007
0.000006
0.000005
0.000004
0.000003
0.000002
0.000000
(b)
Formula: survival ~ g^(c^t - 1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
c 1.120873 0.003795 295.4 <2e-16 ***
g 0.929072 0.005720 162.4 <2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.02248 on 135 degrees of freedom
Number of iterations to convergence: 10
Achieved convergence tolerance: 3.088e-06
(c)
Formula: Survival ~ exp(-(t/theta)^k)
Parameters:
Estimate Std. Error t value Pr(>|t|)
theta 22.99430 0.06138 374.6 <2e-16 ***
k 2.56199 0.02430 105.4 <2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.009922 on 135 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 3.372e-06
Q2.
(a)
Week of
Arrest
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
30
31
32
33
34
35
36
37
Sum of arrest
(dj)
0
1
1
1
1
1
1
1
5
2
1
2
2
1
3
2
2
3
3
2
5
2
1
1
4
3
3
2
2
2
1
2
2
2
4
3
4
Number of lives at
risk (nj)
432
432
431
430
429
428
427
426
425
420
418
417
415
413
412
409
407
405
402
399
397
392
390
389
388
384
381
378
376
374
372
371
369
367
365
361
358
Empirical
Hazard Rate
0.000000
0.002315
0.002320
0.002326
0.002331
0.002336
0.002342
0.002347
0.011765
0.004762
0.002392
0.004796
0.004819
0.002421
0.007282
0.004890
0.004914
0.007407
0.007463
0.005013
0.012594
0.005102
0.002564
0.002571
0.010309
0.007813
0.007874
0.005291
0.005319
0.005348
0.002688
0.005391
0.005420
0.005450
0.010959
0.008310
0.011173
Empirical Survival
Function
1.000000
0.997685
0.995370
0.993056
0.990741
0.988426
0.986111
0.983796
0.972222
0.967593
0.965278
0.960648
0.956019
0.953704
0.946759
0.942130
0.937500
0.930556
0.923611
0.918981
0.907407
0.902778
0.900463
0.898148
0.888889
0.881944
0.875000
0.870370
0.865741
0.861111
0.858796
0.854167
0.849537
0.844907
0.835648
0.828704
0.819444
38
39
40
42
43
44
45
46
47
48
49
50
52
1
2
4
2
4
2
2
4
1
2
5
3
4
354
353
351
347
345
341
339
337
333
332
330
325
322
(b)
Log Rank Test:
Call:
survdiff(formula = Surv(week, arrest) ~ race, rho = 0)
N Observed Expected (O-E)^2/E (O-E)^2/V
race=0 53
12 14.7 0.4990 0.576
race=1 379 102 99.3 0.0739 0.576
Chisq= 0.6 on 1 degrees of freedom, p= 0.448
Wilcoxon Statistic Test:
Call:
survdiff(formula = Surv(week, arrest) ~ race, rho = 1)
N Observed Expected (O-E)^2/E (O-E)^2/V
race=0 53 10.1 12.8 0.568 0.748
race=1 379 89.2 86.5 0.084 0.748
Chisq= 0.7 on 1 degrees of freedom, p= 0.38
0.002825
0.005666
0.011396
0.005764
0.011594
0.005865
0.005900
0.011869
0.003003
0.006024
0.015152
0.009231
0.012422
0.817130
0.812500
0.803241
0.798611
0.789352
0.784722
0.780093
0.770833
0.768519
0.763889
0.752315
0.745370
0.736111
(c)
Full Model:
Call:
coxph(formula = xx ~ age + as.factor(educ) + mar + prio + race,
method = "breslow")
n= 432, number of events= 114
coef exp(coef) se(coef) z Pr(>|z|)
age
-0.061591 0.940268 0.021208 -2.904 0.00368 **
as.factor(educ)1 0.108916 1.115069 0.873223 0.125 0.90074
as.factor(educ)2 -0.682694 0.505254 0.683156 -0.999 0.31764
as.factor(educ)3 -0.457698 0.632739 0.525318 -0.871 0.38360
as.factor(educ)4 -0.880341 0.414641 0.544926 -1.616 0.10620
as.factor(educ)5 -1.277873 0.278629 0.676321 -1.889 0.05883 .
as.factor(educ)6 -0.563035 0.569478 0.772968 -0.728 0.46637
mar
-0.466274 0.627335 0.375422 -1.242 0.21424
prio
-0.001888 0.998114 0.049705 -0.038 0.96970
race
0.336789 1.400443 0.310854 1.083 0.27862
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
exp(coef) exp(-coef) lower .95 upper .95
age
0.9403 1.0635 0.90199 0.9802
as.factor(educ)1 1.1151 0.8968 0.20138 6.1743
as.factor(educ)2 0.5053 1.9792 0.13244 1.9276
as.factor(educ)3 0.6327 1.5804 0.22598 1.7716
as.factor(educ)4 0.4146 2.4117 0.14251 1.2065
as.factor(educ)5 0.2786 3.5890 0.07402 1.0488
as.factor(educ)6 0.5695 1.7560 0.12518 2.5908
mar
0.6273 1.5940 0.30057 1.3094
prio
0.9981 1.0019 0.90546 1.1002
race
1.4004 0.7141 0.76149 2.5755
Concordance= 0.636 (se = 0.027 )
Rsquare= 0.058 (max possible= 0.956 )
Likelihood ratio test= 25.84 on 10 df, p=0.003955
Wald test
= 22.23 on 10 df, p=0.01399
Score (logrank) test = 23.19 on 10 df, p=0.01007
Final Model:
Call:
coxph(formula = xx ~ age, method = "breslow")
coef exp(coef) se(coef) z

p
age -0.0726 0.93 0.0208 -3.5 0.00047
Likelihood ratio test=15.2 on 1 df, p=9.71e-05 n= 432, number of events= 114
Log-likelihood Test on Final Model:
> pvalue<-1-pchisq(25.84-15.2,9)
> pvalue
0.3004644

ACTL3001 Assignment

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

ACTL3001 Assignment

Enviado por

Direitos autorais:

Formatos disponíveis

Q1

Empirical Survival Function

90 100 110 120 130 140 150 160 170

Figure 1: Empirical Survival Function

Empirical Hazard Function

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170

Figure 2: Empirical Hazard Function

Empirical Survival Function Vs. Fitted Gompertz

Empirical Survival Function

80 100 120 140 160 180 200

Figure 3: Empirical Survival Vs. Fitted Gompertz Survival Function

Empirical Hazard Function Vs. Fitted Gompertz

Empirical Hazard Rate

Fitted Gompertz Hazard

20 40 60 80 100 120 140 160 180 200

Figure 4: Empirical Hazard Vs. Fitted Gompertz Hazard Function

Figure 5: Empirical Survival Vs. Fitted Weibull Survival Function

Figure 6: Empirical Hazard Vs. Weibull Hazard Function

Empirical Hazard Rate

Figure 7: Empirical Hazard Function

Kaplan Meier Survival Function Estimate

Figure 8: Kaplan Meier Survival Function Estimate

Figure 9: Testing Proportionality Assumption

coef exp(coef) se(coef) z

Você também pode gostar