Escolar Documentos
Profissional Documentos
Cultura Documentos
1 Introduction
3 Notation
5 General assumptions
8 Nelson-Aalen estimator
11 Examples
10
Introduction
Introduction
Given observations (data), the aim is to estimate the distribution of T
(remember all F (t) , S (t) , f (t) or t provide the same information about the distribution, so any will do)
A simple method to estimate S(t) would be to observe a (very) large number of newborns and take the survival function
as the proportion alive at each age.
However, this presents a number of problems
The experiment would take an extremely long time to complete
Lives under observation may be lost to the investigation, for one reason or another, and to exclude these from the
analysis might bias the result (censoring)
This would be useful only if all cohorts have the same mortality (which is not the case)
Parametric vs Non-parametric
Non-parametric approach: no prior assumptions about the shape or form of the distribution are made Parametric approach:
assume that the distribution belongs to a certain family (eg normal or exponential) and use the data to estimate the appropriate
parameters.
In this module we focus on non-parametric models.
Censoring
Types of censoring:
Type I Right Censoring
Event (e.g. failure such as death) is observed only if it occurs prior to some prescribed time CR (right censoring time).
The lifetime T is only known if T CR ; the observation will be CR if T > CR .
Examples of right censoring:
1
investigation ends before all the lives being observed have died
life insurance policyholders surrender their policies
Type II Right Censoring
Observations continues until a predetermined number (say r) of events (failures) have occurred.
Data then consists of r smallest lifetimes in a sample of n (order statistics).
Left Censoring
The event of interest (such as death) has already occurred before the observation starts. So we only know that the
lifetime T is less than a left censoring time CL (left censoring time).
Interval Censoring
The lifetime T is only known to occur within an interval (e.g. actuarial investigations where we only know the calendar
year of death)
Truncation
occurs when only those individual whose event times lies within a certain observation period (YL , YR ) are observed.
Otherwise no information is available at all
often confused with censoring: in presence of censoring at least partial information is available (we know the event has
happened, but have only partial information about it)
examples:
Any insurance claim that is not communicated because the deductible was not reached is left truncated
Right truncation arises in estimating the distribution of stars from the earth in that stars too far away are not visible
and are right truncated.
Random bounds
Random censoring/truncation: censoring/truncation point is also subject to randomness
Examples:
other competing risk can remove the individual from the study (e.g. lapsing of policy in a mortality study of insured
lives)
lifetime is censored by another random event (not failure)
Notation
Notation
Population of N lives
Observe m deaths; N m lives are right censored
Ordered times of death t1 < t2 < . . . < tk , k m
dj number of deaths occur at time tj (1 j k), (more than one death can occur at any time)
d1 + d2 + . . . + dk = m
cj : the number of lives that are right censored at a time belonging to [tj , tj+1 ) (0 j k), t0 = 0 and tk+1 =
c0 + c1 + . . . ck = N m
The times at which observations are censored within the time interval [tj , tj+1 ) are tj1 , tj2 , . . . tjcj (need not be distinct)
Define nj as the number of lives alive and at risk at time t
j (just before time tj ). Then nj = dj + cj + nj+1 .
The largest observed study time tmax = max{tk , tkck }.
Example 2.1
10 rats injected with a new drug are observed over 20 days:
Day Event
3
Rat 4 dies from effects of drug
4
Rat 3 dies from effects of drug
6
Rat 7 gnaws through bars of cage and escapes
11
Rat 6 and 9 die from effects of drug
17
Rat 1 killed by other rats
20
Investigation closes. Remaining rats hold street party.
No. of lives under investigation N = 10 No. of drug-related death m = 4
Times when drug-related deaths are observed t1 = 3, t2 = 4, t3 = 11,
Total no. of drug-related deaths d1 = 1, d2 = 1, d3 = 2
No. of lives censored c0 = 0, c1 = 0, c2 = 1, c3 = 5,
Lives censored at times t2,1 = 6, t3,1 = 17, t3,2 = = t3,5 = 20
No. of lives at risk n1 = 10, n2 = 9, n3 = 7
3
(1 j k) .
Then
Y
S (t) = 1 F (t) =
(1 j )
j:tj t
Proof
Note that since t0 = 0 we have
Pr[T > t0 ] = Pr[T > 0] = 1
Then
Pr[T > t1 ] =
Pr[T > t1 ]
Pr[T = t1 ]
=1
= 1 1
Pr[T > t0 ]
Pr[T > t0 ]
More generally
1 j = 1 Pr [T = tj |T tj ] =
Pr[T > tj ]
Pr[T > tj ]
=
Pr[T tj ]
Pr[T > tj1 ]
Y
j:tj t
Y
Pr[T > tj ]
=
(1 j )
Pr[T > tj1 ]
j:tj t
General assumptions
Assumptions
The estimators introduced in this module (Kaplan-Meier, Nelson-Aalen) are based on the following assumptions
Non-informative censoring: Time to censoring is independent of Time to death
Lives are independent: Time to censoring or time to death determined independently for each life.
If one maximises likelihood, what will the shape of the resulting distribution be?
Shape of the distribution
For each interval (tj1 , tj ], the likelihood of the data is
c
j
dj Y
F (tj ) F t
[1 F (tjl )]
j
l=1
This is because
we have dj deaths at time tj for j = 1, 2, , k and their likelihood is
F (tj ) F t
j
we have censored lives surviving to tjl for j = 0, 1, k and l = 1, , cj with probability
1 F (tjl )
Note we can take the product thanks to the assumption of non-informative censoring
To maximize the likelihood, note the following:
F (tj ) > F t
at each failure, otherwise the likelihood will be zero
j
4
Introduction
The likelihood of data is in the form
L=
k
Y
dj
[j ]
nj dj
[1 j ]
j=1
nj
(1 j k)
Kaplan-Meier estimator
Given what we have above, the KM estimator is obvious and given by
Y
bj
Fb (t) = 1
1
j:tj t
or alternatively
Sb (t) =
1
Q
1
j:tj t
if t < t1
dj
nj
t1 t tmax .
0.0
0.2
0.4
0.6
0.8
1.0
10
15
b =
S(t)
20
if t < t1
if tj t < tj+1 , j = 1, , k 1
if tk t < tmax .
tj
3
4
11
dj
1
1
2
nj
10
9
7
j = dj
nj
0.1
0.11111
0.28571
j
1
0.9
0.88889
0.71429
Qj
k )
1 k=1 (1
0.1
0.2
0.42857
0.1
F (t) =
0.2
0.42857
for
for
for
for
0t<3
3t<4
4 t < 11
11 t 20
dj
nj (nj dj )
Maximum likelihood estimators are asymptotically normally distributed, so we can easily construct confidence intervals, e.g.
r
S (t) Z1 V ar S (t) ,
2
where Z1 2 is the 1
Example
Consider the following recorded data for the lifetime of a group of 9 rats
2, 4+ , 5, 8, 12, 12+ , 17, 18+ , 18+ ,
where + denotes the censored.
Suppose that you have found the K-M estimate
S(5)
= 0.76190.
V ar(S(5))
2(
= S(5)
=
1
1
+
)
9(9 1) 7(7 1)
S(5)
Z10.025
6
V ar(S(5))
0.76190 1.96 0.02188
(0.47196, 1.05182)
which finally becomes
(0.47196, 1)
as S 1.
Nelson-Aalen estimator
Nelson-Aalen estimator
Another non-parametric approach for F (t) is the Nelson-Aalen estimator (from Nelson, 1971 and Aalen, 1978), which is based
on the cumulative hazard (or integrated hazard)
Z t
X
t =
s ds +
mj .
0
j:tj t
Nelson-Aalen estimator
Since in our setting we do not have continuous increases in F (t) (only jumps see previous sub-section) we focus on the second
half only and use the ML estimator for j to approximate the mj s such that
t =
X dj
nj
j:tj t
Finally,
= e t .
S(t)
Variance of Nelson-Aalen estimator
(t) denote the estimatOR. Its variance is approximated as (p. 33 of yellow book):
Let
X dj (nj dj )
(t)
V ar
n3j
j:tj t
Example
Consider the following recorded data for the lifetime of a group of rats
2, 4+ , 5, 8, 12, 12+ , 17, 18+ , 18+ , 18+ ,
where
denotes the censored. Derive 95% confidence intervals for the N-A estimators (5).
Y
j:tj t
dj
1
nj
X dj
= exp t = SN A (t)
exp
nj
j:tj t
for t1 t tmax .
10
Introductory Example
Are these two survival functions different?
Based on data representing weeks to death (or censoring) in 51 adults with recurrent gliomas:
(Example 2.2)
A=astrocytoma and G=glioblastoma, the following survival functions have been constructed
,
n1j
nj
j=1
where
w
e (tj ) represent a positive weight function
t1 < t2 < < tk are the distinct death times in the pooled sample
d1j is the number of deaths that occur in Group 1 at time tj
n1j is number at risk prior to time tj in Group 1
nj is the total number at risk prior to time tj
X
k
dj
Z1 =
=
w (tj ) d1j n1j
w (tj ) [d1j e1j ]
nj
j=1
j=1
In this particular case, one can obtain
var(Z1 )
k
X
(w (tj ))
j=1
n1j
nj
n1j
nj dj
1
dj
nj
nj 1
An level test
Under the null hypothsis, the statistic
2 =
Z12
var(Z1 )
Z12
var(Z1 )
is larger than the th upper percentage point of the Chi-squared distribution with 1 degree of freedom.
Special cases
Special cases are based on the choice of weight for the test statistic
Z1 =
k
X
dj
w (tj ) d1j n1j
n
j
j=1
We have
w (tj ) = 1 : Log-rank test
w (tj ) = SbKM (tj ) : Peto-Peto Prentice test
w (tj ) = nj
Discussion
Some other weight functions may be appropriate. The choice of weight function depends on the investigators desire to
give different weights to different types of error.
For instance, when comparing 1 vs nj (log-rank vs Wilcoxon), the latter gives more weight to early times (because
n1 > n2 > n3 > )
A practical note: log-rank statistic is more powerful for detecting differences in the hazard rates when the hazard rates are
proportional (h1 (t) = rh2 (t)) i.e.,
r
S1 (t) = [S2 (t)]
for some constant r.
The tests can be generalised to involve more than 2 groups.
11
Examples
Example 1
Consider an experiment for which we are interested in the effects of a particular drug (2 types)
Survival times for Group 1 (in months): 2, 4*, 5, 6, 9, 9, 12, 12*, 15*, 17.
Survival times for Group 2 (in months): 6*, 7, 9*, 10, 13, 15*, 17*, 18
where denotes the censored.
What can you say about the hazard rates for the 2 groups?
For k = 10
tj
n1j
2
10
5
8
6
7
7
6
9
6
10
4
12
4
13
2
17
1
18
0
sum
We have
d1j
1
1
1
0
2
0
1
0
1
0
7
n2j
8
8
8
7
6
5
4
4
2
1
d2j
0
0
0
1
0
1
0
1
0
1
4
nj
18
16
15
13
12
9
8
6
3
1
dj
1
1
1
1
2
1
1
1
1
1
11
(Z1LR )2
2 =
=P
k
var(Z1LR )
2
(d1j e1j )
= 2.42
nj dj
n
1 n1jj
dj
nj 1
P
k
j=1
n1j
j=1 nj
so that
p value = Pr 21 > 2.42 = 0.12 > 0.05
and hence we can not reject the null hypothesis that the hazard rates for the 2 groups are same.
KM survival functions with R
[use of survival package]
Call: survfit(formula = Surv(time, status) ~ treatment)
treatment=0
time n.risk n.event survival std.err lower 95% CI upper 95% CI
7
7
1
0.857
0.132
0.633
1
10
5
1
0.686
0.186
0.403
1
13
4
1
0.514
0.204
0.236
1
18
1
1
0.000
NaN
NA
NA
treatment=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
2
10
1
0.900 0.0949
0.732
1.000
5
8
1
0.787 0.1340
0.564
1.000
6
7
1
0.675 0.1551
0.430
1.000
9
6
2
0.450 0.1660
0.218
0.927
12
4
1
0.337 0.1581
0.135
0.845
17
1
1
0.000
NaN
NA
NA
NA survival functions with R
Call: survfit(formula = Surv(time, status) ~ treatment,
type = "fleming-harrington")
10
treatment=0
time n.risk n.event survival std.err lower 95% CI upper 95% CI
7
7
1
0.867
0.134
0.641
1
10
5
1
0.710
0.193
0.417
1
13
4
1
0.553
0.219
0.254
1
18
1
1
0.203
Inf
0.000
1
treatment=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
2
10
1
0.905 0.0954
0.736
1.000
5
8
1
0.799 0.1359
0.572
1.000
6
7
1
0.692 0.1590
0.441
1.000
9
6
2
0.496 0.1830
0.241
1.000
12
4
1
0.386 0.1810
0.154
0.967
17
1
1
0.142
Inf
0.000
1.000
0.0
0.2
0.4
0.6
0.8
1.0
KM survival functions
10
15
10
15
0.0
0.2
0.4
0.6
0.8
1.0
NA survival functions
11
Statistical tests
Peto-Peto:
Call: survdiff(formula = Surv(time, status) ~ treatment, rho = 1)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 8
2.23
4.11
0.86
2.58
treatment=1 10
5.33
3.45
1.02
2.58
Chisq= 2.6
Log-rank:
Call: survdiff(formula = Surv(time, status) ~ treatment, rho = 0)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 8
4
6.41
0.903
2.42
treatment=1 10
7
4.59
1.259
2.42
Chisq= 2.4
tj
2
4
6
7
10
12
13
d1j
1
1
2
0
0
1
0
n1j
7
5
4
2
2
2
0
d2j
0
0
0
1
1
0
1
n2j
7
7
7
6
4
3
3
dj
1
1
2
1
1
1
1
nj
14
12
11
8
6
5
3
e1j = n1j
dj
j
0.5
0.416666667
0.727272727
0.25
0.333333333
0.4
0
nj (d1j e1j )
7
7
14
-2
-2
3
0
12
10
13
4
3
1
1
0.625
0.417
0.213
0.222
0.320
0.147
1
1
treatment=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
2
7
1
0.857
0.132
0.6334
1
4
5
1
0.686
0.186
0.4026
1
6
4
2
0.343
0.195
0.1124
1
12
2
1
0.171
0.156
0.0289
1
NA survival functions with R
Call: survfit(formula = Surv(time, status) ~ treatment,
type = "fleming-harrington")
treatment=0
time n.risk n.event survival std.err lower 95% CI upper 95% CI
7
6
1
0.846
0.155
0.592
1
10
4
1
0.659
0.225
0.338
1
13
3
1
0.472
0.251
0.166
1
treatment=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
2
7
1
0.867
0.134
0.6406
1
4
5
1
0.710
0.193
0.4167
1
6
4
2
0.430
0.245
0.1411
1
12
2
1
0.261
0.237
0.0441
1
0.0
0.2
0.4
0.6
0.8
1.0
KM survival functions
10
15
NA survival functions
13
1.0
0.8
0.6
0.4
0.2
0.0
0
10
15
Statistical tests
Peto-Peto:
Call: survdiff(formula = Surv(time, status) ~ treatment, rho = 1)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 7
1.71
3.76
1.12
4.23
treatment=1 7
4.14
2.09
2.02
4.23
Chisq= 4.2
Log-rank:
Call: survdiff(formula = Surv(time, status) ~ treatment, rho = 0)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 7
3
5.37
1.05
3.61
treatment=1 7
5
2.63
2.14
3.61
Chisq= 3.6
14