Escolar Documentos
Profissional Documentos
Cultura Documentos
(a)
10
20
30
40
50
60
70
80
Time (days)
(b)
The survival model fitted to the data is S(t) = g^(c^t -1). The parameter estimates are
g=0.929072 and c=1.120873. Plots of the empirical and fitted models are given below.
20
40
60
400000
200000
0
0
(c)
Other distributions that could be fitted to the data include: weibull, exponential and
polynomial. Any distribution that looks similar in shape to the empirical function.
I fitted a weibull distribution to the data. The survival model fitted to the data is S(t) = e^( (t/theta)^(k)). The parameter estimates are theta=22.994 and k=2.562. Plots of the empirical
(in black) and fitted (in red) models are given below. This model gives a much better fit
because the weibull hazard function does not increase anywhere near as fast as the
gompertz harzard function.
Q2.
(a)
10
20
30
40
50
60
Age (weeks)
10
20
30
40
50
60
Time (weeks)
(b)
The results from both the Log Rank and Wilcoxon tests are given in the appendix. Both tests
give p values that are significantly greater than 5%, so we do not have any evidence against
the null hypothesis that the survival functions for both races are the same. That is, we
conclude that there is no significant difference in arrest times between races.
Note: It is shown in (c) that the proportional hazards assumption is approximately correct.
Therefore, the Log Rank test is most appropriate to pick up differences in survival functions.
(c)
The full model and final model are shown in the appendix. The final model was reached by
removing predictors from the full model. From the output of the full model fit, it seemed as
though AGE was the only significant predictor. The least significant predictor (one with the
highest p value), PRIO, was deleted and the modelled was refitted. It still looked as though
AGE was the only significant predictor. The predictors RACE, MAR and EDUC were
successively removed as the p values for the coefficients were all significantly higher than
5% after each model refit. Based on this analysis, the only factor that explains arrest time is
age.
A log-likelihood test is performed in the appendix. This tests whether it was justified to
remove all of the predictors except for age. The p value for the test is greater than 5%, so
we do not have any evidence to reject the null hypothesis that the predictors PRIO, RACE,
MAR and EDUC are jointly zero (contribute no useful information).
A plot of the cumulative hazard function against the Cox-Snell residuals is given below:
Appendix
Q1
(a)
Day
0
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Empirical Hazard
Function
0.000000
0.001440
0.004006
0.005077
0.006382
0.007535
0.009771
0.012324
0.016416
0.021837
0.029819
0.037855
0.045210
0.058853
0.063439
0.072233
0.075692
0.079254
0.082636
0.084988
0.092282
0.096805
0.100236
0.105855
0.110221
0.115806
0.129891
0.133597
0.136103
0.128019
0.121290
0.121414
0.116820
0.124090
0.125001
0.126642
0.122354
Empirical Survival
Function
1.000000
0.998560
0.994560
0.989510
0.983195
0.975787
0.966253
0.954345
0.938678
0.918180
0.890801
0.857079
0.818330
0.770169
0.721310
0.669208
0.618554
0.569531
0.522468
0.478064
0.433947
0.391939
0.352653
0.315323
0.280568
0.248076
0.215853
0.187016
0.161562
0.140879
0.123792
0.108762
0.096056
0.084137
0.073620
0.064296
0.056429
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
0.133847
0.115366
0.124878
0.121245
0.127930
0.130136
0.136561
0.143931
0.135957
0.130634
0.136145
0.145241
0.133767
0.151456
0.135971
0.145604
0.150133
0.132578
0.160273
0.160488
0.158120
0.167174
0.160098
0.131592
0.130919
0.125000
0.134066
0.141286
0.168473
0.122038
0.136302
0.104688
0.125654
0.079840
0.119306
0.120690
0.109244
0.103774
0.091228
0.127413
0.066372
0.080569
0.067010
0.066298
0.048876
0.043238
0.037838
0.033251
0.028997
0.025223
0.021779
0.018644
0.016109
0.014005
0.012098
0.010341
0.008958
0.007601
0.006568
0.005611
0.004769
0.004137
0.003474
0.002916
0.002455
0.002045
0.001717
0.001491
0.001296
0.001134
0.000982
0.000843
0.000701
0.000616
0.000532
0.000476
0.000416
0.000383
0.000337
0.000297
0.000264
0.000237
0.000215
0.000188
0.000175
0.000161
0.000150
0.000140
82
83
84
85
86
87
88
89
91
92
93
94
95
96
97
98
99
103
104
105
106
107
109
110
111
112
113
114
115
116
120
121
122
123
125
126
127
128
129
130
133
137
142
143
0.076923
0.089744
0.084507
0.061538
0.057377
0.052174
0.100917
0.071429
0.054945
0.011628
0.070588
0.075949
0.027397
0.056338
0.014925
0.015152
0.046154
0.064516
0.017241
0.035088
0.018182
0.018519
0.018868
0.019231
0.039216
0.040816
0.042553
0.044444
0.023256
0.047619
0.025000
0.051282
0.027027
0.083333
0.060606
0.096774
0.107143
0.040000
0.041667
0.086957
0.047619
0.050000
0.105263
0.058824
0.000130
0.000118
0.000108
0.000101
0.000096
0.000091
0.000081
0.000076
0.000071
0.000071
0.000066
0.000061
0.000059
0.000056
0.000055
0.000054
0.000052
0.000048
0.000047
0.000046
0.000045
0.000044
0.000043
0.000042
0.000041
0.000039
0.000037
0.000036
0.000035
0.000033
0.000032
0.000031
0.000030
0.000027
0.000026
0.000023
0.000021
0.000020
0.000019
0.000017
0.000017
0.000016
0.000014
0.000013
144
146
147
148
149
151
154
155
156
159
165
172
0.125000
0.142857
0.083333
0.090909
0.100000
0.111111
0.125000
0.142857
0.166667
0.200000
0.500000
1.000000
0.000012
0.000010
0.000009
0.000008
0.000007
0.000007
0.000006
0.000005
0.000004
0.000003
0.000002
0.000000
(b)
Formula: survival ~ g^(c^t - 1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
c 1.120873 0.003795 295.4 <2e-16 ***
g 0.929072 0.005720 162.4 <2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.02248 on 135 degrees of freedom
Number of iterations to convergence: 10
Achieved convergence tolerance: 3.088e-06
(c)
Formula: Survival ~ exp(-(t/theta)^k)
Parameters:
Estimate Std. Error t value Pr(>|t|)
theta 22.99430 0.06138 374.6 <2e-16 ***
k 2.56199 0.02430 105.4 <2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 0.009922 on 135 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 3.372e-06
Q2.
(a)
Week of
Arrest
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
30
31
32
33
34
35
36
37
Sum of arrest
(dj)
0
1
1
1
1
1
1
1
5
2
1
2
2
1
3
2
2
3
3
2
5
2
1
1
4
3
3
2
2
2
1
2
2
2
4
3
4
Number of lives at
risk (nj)
432
432
431
430
429
428
427
426
425
420
418
417
415
413
412
409
407
405
402
399
397
392
390
389
388
384
381
378
376
374
372
371
369
367
365
361
358
Empirical
Hazard Rate
0.000000
0.002315
0.002320
0.002326
0.002331
0.002336
0.002342
0.002347
0.011765
0.004762
0.002392
0.004796
0.004819
0.002421
0.007282
0.004890
0.004914
0.007407
0.007463
0.005013
0.012594
0.005102
0.002564
0.002571
0.010309
0.007813
0.007874
0.005291
0.005319
0.005348
0.002688
0.005391
0.005420
0.005450
0.010959
0.008310
0.011173
Empirical Survival
Function
1.000000
0.997685
0.995370
0.993056
0.990741
0.988426
0.986111
0.983796
0.972222
0.967593
0.965278
0.960648
0.956019
0.953704
0.946759
0.942130
0.937500
0.930556
0.923611
0.918981
0.907407
0.902778
0.900463
0.898148
0.888889
0.881944
0.875000
0.870370
0.865741
0.861111
0.858796
0.854167
0.849537
0.844907
0.835648
0.828704
0.819444
38
39
40
42
43
44
45
46
47
48
49
50
52
1
2
4
2
4
2
2
4
1
2
5
3
4
354
353
351
347
345
341
339
337
333
332
330
325
322
(b)
Log Rank Test:
Call:
survdiff(formula = Surv(week, arrest) ~ race, rho = 0)
N Observed Expected (O-E)^2/E (O-E)^2/V
race=0 53
12 14.7 0.4990 0.576
race=1 379 102 99.3 0.0739 0.576
Chisq= 0.6 on 1 degrees of freedom, p= 0.448
Wilcoxon Statistic Test:
Call:
survdiff(formula = Surv(week, arrest) ~ race, rho = 1)
N Observed Expected (O-E)^2/E (O-E)^2/V
race=0 53 10.1 12.8 0.568 0.748
race=1 379 89.2 86.5 0.084 0.748
Chisq= 0.7 on 1 degrees of freedom, p= 0.38
0.002825
0.005666
0.011396
0.005764
0.011594
0.005865
0.005900
0.011869
0.003003
0.006024
0.015152
0.009231
0.012422
0.817130
0.812500
0.803241
0.798611
0.789352
0.784722
0.780093
0.770833
0.768519
0.763889
0.752315
0.745370
0.736111
(c)
Full Model:
Call:
coxph(formula = xx ~ age + as.factor(educ) + mar + prio + race,
method = "breslow")
n= 432, number of events= 114
coef exp(coef) se(coef) z Pr(>|z|)
age
-0.061591 0.940268 0.021208 -2.904 0.00368 **
as.factor(educ)1 0.108916 1.115069 0.873223 0.125 0.90074
as.factor(educ)2 -0.682694 0.505254 0.683156 -0.999 0.31764
as.factor(educ)3 -0.457698 0.632739 0.525318 -0.871 0.38360
as.factor(educ)4 -0.880341 0.414641 0.544926 -1.616 0.10620
as.factor(educ)5 -1.277873 0.278629 0.676321 -1.889 0.05883 .
as.factor(educ)6 -0.563035 0.569478 0.772968 -0.728 0.46637
mar
-0.466274 0.627335 0.375422 -1.242 0.21424
prio
-0.001888 0.998114 0.049705 -0.038 0.96970
race
0.336789 1.400443 0.310854 1.083 0.27862
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
exp(coef) exp(-coef) lower .95 upper .95
age
0.9403 1.0635 0.90199 0.9802
as.factor(educ)1 1.1151 0.8968 0.20138 6.1743
as.factor(educ)2 0.5053 1.9792 0.13244 1.9276
as.factor(educ)3 0.6327 1.5804 0.22598 1.7716
as.factor(educ)4 0.4146 2.4117 0.14251 1.2065
as.factor(educ)5 0.2786 3.5890 0.07402 1.0488
as.factor(educ)6 0.5695 1.7560 0.12518 2.5908
mar
0.6273 1.5940 0.30057 1.3094
prio
0.9981 1.0019 0.90546 1.1002
race
1.4004 0.7141 0.76149 2.5755
Concordance= 0.636 (se = 0.027 )
Rsquare= 0.058 (max possible= 0.956 )
Likelihood ratio test= 25.84 on 10 df, p=0.003955
Wald test
= 22.23 on 10 df, p=0.01399
Score (logrank) test = 23.19 on 10 df, p=0.01007
Final Model:
Call:
coxph(formula = xx ~ age, method = "breslow")