As M2 A4

Contents
1 Introduction
2 Censoring and truncation
3 Notation
4 Discrete hazard function
5 General assumptions
6 Kaplan-Meier (product limit) estimator
7 Variance of the Kaplan-Meier estimator
8 Nelson-Aalen estimator
9 Relationship between the Kaplan-Meier and Nelson-Aalen estimators
10 Comparing survival functions - General approach
11 Examples
10
Introduction
Introduction
Given observations (data), the aim is to estimate the distribution of T
(remember all F (t) , S (t) , f (t) or t provide the same information about the distribution, so any will do)
A simple method to estimate S(t) would be to observe a (very) large number of newborns and take the survival function
as the proportion alive at each age.
However, this presents a number of problems
The experiment would take an extremely long time to complete
Lives under observation may be lost to the investigation, for one reason or another, and to exclude these from the
analysis might bias the result (censoring)
This would be useful only if all cohorts have the same mortality (which is not the case)
Parametric vs Non-parametric
Non-parametric approach: no prior assumptions about the shape or form of the distribution are made Parametric approach:
assume that the distribution belongs to a certain family (eg normal or exponential) and use the data to estimate the appropriate
parameters.
In this module we focus on non-parametric models.
Censoring and truncation
Censoring
Types of censoring:
Type I Right Censoring
Event (e.g. failure such as death) is observed only if it occurs prior to some prescribed time CR (right censoring time).
The lifetime T is only known if T CR ; the observation will be CR if T > CR .
Examples of right censoring:
1
investigation ends before all the lives being observed have died
life insurance policyholders surrender their policies
Type II Right Censoring
Observations continues until a predetermined number (say r) of events (failures) have occurred.
Data then consists of r smallest lifetimes in a sample of n (order statistics).
Left Censoring
The event of interest (such as death) has already occurred before the observation starts. So we only know that the
lifetime T is less than a left censoring time CL (left censoring time).
Interval Censoring
The lifetime T is only known to occur within an interval (e.g. actuarial investigations where we only know the calendar
year of death)
Truncation
occurs when only those individual whose event times lies within a certain observation period (YL , YR ) are observed.
Otherwise no information is available at all
often confused with censoring: in presence of censoring at least partial information is available (we know the event has
happened, but have only partial information about it)
examples:
Any insurance claim that is not communicated because the deductible was not reached is left truncated
Right truncation arises in estimating the distribution of stars from the earth in that stars too far away are not visible
and are right truncated.
Random bounds
Random censoring/truncation: censoring/truncation point is also subject to randomness
Examples:
other competing risk can remove the individual from the study (e.g. lapsing of policy in a mortality study of insured
lives)
lifetime is censored by another random event (not failure)
Ci , the time at which the ith observation is censored is a random variable

Informative and Non-Informative Censoring
If Ci is random:
Censoring is non-informative if it gives no information about the lifetimes.
For random censoring, independence of all T s and Cs is sufficient for it to be non-informative
Examples
Informative censoring: Withdrawal of life insurance policies: those in better health are more likely to withdraw.
Non-informative censoring: The end of the investigation period.
2
Likelihood for censored and truncated data

Assume that lifetimes and censoring times are independent
The likehood for the observation t,
if it is an exact lifetime, f (t)
if it is a right-censored observation, S (CR )
if it is a left-censored observation 1 S (CL )
if it is an interval-censored observation, [S (CL ) S(CR )]
in presence of left truncation,
[any of the above]

S(YL )
in presence of right truncation,
[any of the above]

1S(YR )
in presence of interval truncation,
[any of the above]

[S(YL )S(YR )]
Notation
Notation
Population of N lives
Observe m deaths; N m lives are right censored
Ordered times of death t1 < t2 < . . . < tk , k m
dj number of deaths occur at time tj (1 j k), (more than one death can occur at any time)
d1 + d2 + . . . + dk = m
cj : the number of lives that are right censored at a time belonging to [tj , tj+1 ) (0 j k), t0 = 0 and tk+1 =
c0 + c1 + . . . ck = N m
The times at which observations are censored within the time interval [tj , tj+1 ) are tj1 , tj2 , . . . tjcj (need not be distinct)
Define nj as the number of lives alive and at risk at time t
j (just before time tj ). Then nj = dj + cj + nj+1 .
The largest observed study time tmax = max{tk , tkck }.
Example 2.1
10 rats injected with a new drug are observed over 20 days:
Day Event
3
Rat 4 dies from effects of drug
4
Rat 3 dies from effects of drug
6
Rat 7 gnaws through bars of cage and escapes
11
Rat 6 and 9 die from effects of drug
17
Rat 1 killed by other rats
20
Investigation closes. Remaining rats hold street party.
No. of lives under investigation N = 10 No. of drug-related death m = 4
Times when drug-related deaths are observed t1 = 3, t2 = 4, t3 = 11,
Total no. of drug-related deaths d1 = 1, d2 = 1, d3 = 2
No. of lives censored c0 = 0, c1 = 0, c2 = 1, c3 = 5,
Lives censored at times t2,1 = 6, t3,1 = 17, t3,2 = = t3,5 = 20
No. of lives at risk n1 = 10, n2 = 9, n3 = 7
3
Discrete hazard function
Discrete hazard function

Suppose F (t) corresponds has positive probability masses at and only at the points t1 < t2 < . . . < tk
Define the discrete hazard function as
j = Pr [T = tj |T tj ] ,
(1 j k) .
Then
Y
S (t) = 1 F (t) =
(1 j )
j:tj t
Proof
Note that since t0 = 0 we have
Pr[T > t0 ] = Pr[T > 0] = 1
Then
Pr[T > t1 ] =
Pr[T > t1 ]
Pr[T = t1 ]
=1
= 1 1
Pr[T > t0 ]
Pr[T > t0 ]
More generally
1 j = 1 Pr [T = tj |T tj ] =
Pr[T > tj ]
Pr[T > tj ]
=
Pr[T tj ]
Pr[T > tj1 ]
and therefore (by induction)

S(t) = Pr[T > t] =
Y
j:tj t
Y
Pr[T > tj ]
=
(1 j )
Pr[T > tj1 ]
j:tj t
General assumptions
Assumptions
The estimators introduced in this module (Kaplan-Meier, Nelson-Aalen) are based on the following assumptions
Non-informative censoring: Time to censoring is independent of Time to death
Lives are independent: Time to censoring or time to death determined independently for each life.
If one maximises likelihood, what will the shape of the resulting distribution be?
Shape of the distribution
For each interval (tj1 , tj ], the likelihood of the data is
c
j
dj Y

F (tj ) F t
[1 F (tjl )]
j
l=1
This is because
we have dj deaths at time tj for j = 1, 2, , k and their likelihood is

F (tj ) F t
j
we have censored lives surviving to tjl for j = 0, 1, k and l = 1, , cj with probability
1 F (tjl )
Note we can take the product thanks to the assumption of non-informative censoring
To maximize the likelihood, note the following:

F (tj ) > F t
at each failure, otherwise the likelihood will be zero
j
4
[1 F (tjl )] will be maximised

if F (tjl ) is minimised
but F (tjl ) is non-decreasing
hence we assume F (tjl ) stays as low as possible over the interval, that is F (tjl ) = F (tj1 ) for all l
Therefore, the maximum likelihood estimate of F (t) is a c`adl`ag step function with jumps at the times of the observed failures
(deaths).
Kaplan-Meier (product limit) estimator
Introduction
The likelihood of data is in the form
L=
k
Y
dj
[j ]
nj dj
[1 j ]
j=1
which is in binomial form and has maximum likelihood estimator

b j = dj
nj
(1 j k)
Kaplan-Meier estimator
Given what we have above, the KM estimator is obvious and given by

Y
bj
Fb (t) = 1
1
j:tj t
or alternatively
Sb (t) =
1
Q
1
j:tj t
if t < t1
dj
nj
t1 t tmax .
0.0
0.2
0.4
0.6
0.8
1.0
Note the KM estimator is also called Product-limit estimator.

An example of the plot of K-M estimate of a survival function
10
15
The Kaplan-Meier estimator is well defined for
b =
S(t)
20
time points less than tmax :

1
b
S(tj )
b k)
S(t
if t < t1
if tj t < tj+1 , j = 1, , k 1
if tk t < tmax .
For estimator of the survival function beyond tmax :

If tmax corresponds to a death time and there is no censoring at tmax , the estimated survive curve is ZERO beyond tmax .
If tmax = tkck , the value of S(t) for t > tmax is undetermined.
5
Two extreme views:

= 0 for t > tmax .
If assuming that the survivors at time tmax would have died immediately after tmax , S(t)
= S(t
max ) = S(t
k ) for t > tmax .
If assuming that the survivors at time tmax would die at , S(t)
Refer to Practical Notes 2 and 3 of K&M (p. 99-100)
Example (rats)
Calculate the Kaplan-Meier estimate of F (t).
j
1
2
3
tj
3
4
11
dj
1
1
2
nj
10
9
7
j = dj
nj
0.1
0.11111
0.28571
j
1
0.9
0.88889
0.71429
Qj
k )
1 k=1 (1
0.1
0.2
0.42857
From the final column, the Kaplan-Meier estimate of F (t) is
0.1
F (t) =
0.2
0.42857
for
for
for
for
0t<3
3t<4
4 t < 11
11 t 20
Variance of the Kaplan-Meier estimator
Variance of Kaplan-Meier estimator

Let F (t) denote the estimatOR. Greenwoods formula states (p. 33 of yellow book)
2 X

V ar F (t) = V ar S (t) 1 F (t)

j:tj t
dj
nj (nj dj )
Maximum likelihood estimators are asymptotically normally distributed, so we can easily construct confidence intervals, e.g.
r

S (t) Z1 V ar S (t) ,
2
where Z1 2 is the 1
percentile from the standard normal distribution.
Example
Consider the following recorded data for the lifetime of a group of 9 rats
2, 4+ , 5, 8, 12, 12+ , 17, 18+ , 18+ ,
where + denotes the censored.
Suppose that you have found the K-M estimate
S(5)
= 0.76190.
Derive a 95% confidence for the K-M estimate S(5).

Solution
We have
V ar(S(5))
2(
= S(5)
=
1
1
+
)
9(9 1) 7(7 1)
0.761902 0.037698 = 0.02188
so that the confidence interval is
S(5)
Z10.025
6
V ar(S(5))

0.76190 1.96 0.02188
(0.47196, 1.05182)
which finally becomes
(0.47196, 1)
as S 1.
Nelson-Aalen estimator
Another non-parametric approach for F (t) is the Nelson-Aalen estimator (from Nelson, 1971 and Aalen, 1978), which is based
on the cumulative hazard (or integrated hazard)
Z t
X
t =
s ds +
mj .
0
j:tj t
Since in our setting we do not have continuous increases in F (t) (only jumps see previous sub-section) we focus on the second
half only and use the ML estimator for j to approximate the mj s such that
t =
X dj
nj
j:tj t
Finally,
= e t .
S(t)
Variance of Nelson-Aalen estimator
(t) denote the estimatOR. Its variance is approximated as (p. 33 of yellow book):
Let

X dj (nj dj )
(t)
V ar
n3j
j:tj t
Example
Consider the following recorded data for the lifetime of a group of rats
2, 4+ , 5, 8, 12, 12+ , 17, 18+ , 18+ , 18+ ,
where
denotes the censored. Derive 95% confidence intervals for the N-A estimators (5).
Relationship between the Kaplan-Meier and Nelson-Aalen estimators
Relationship between the Kaplan-Meier and Nelson-Aalen estimators

Denote:
the Kaplan-Meier estimate of the survival function by SKM (t)
the Nelson-Aslen estimate of the survival function by SN A (t)
Then
SKM (t)
Y
j:tj t
dj
1
nj

X dj
= exp t = SN A (t)
exp
nj
j:tj t
(using ex 1 x for small |x|)

7
Kaplan-Meier is more pessimistic

Furthermore,
SKM (t) < SN A (t)
for t1 t tmax .
(remember ln(1 x) < x for 0 < x < 1)
10
Comparing survival functions - General approach
Introductory Example
Are these two survival functions different?
Based on data representing weeks to death (or censoring) in 51 adults with recurrent gliomas:
(Example 2.2)
A=astrocytoma and G=glioblastoma, the following survival functions have been constructed
Comparing survival functions

In many applications, one wants to compare two populations. For example:
smokers versus non-smokers
effect of different treatments for a disease
As there is a 1-1 relationship between survival function (S) and hazard rates (s, here denoted hs), we can test for
differences between hazard rates
We will test the hypothesis
H0 : h1 (t) = h2 (t) ; t
vs
H1 : At least one of the h1 (t) differ from h2 (t)
for some t .
To test for difference in hazard rates h1 (t) and h2 (t) of two different populations for all time t , the general form of the
statistic is

k
X
d1j
dj
Z1 =
w
e (tj )
,
n1j
nj
j=1
where
w
e (tj ) represent a positive weight function
t1 < t2 < < tk are the distinct death times in the pooled sample
d1j is the number of deaths that occur in Group 1 at time tj
n1j is number at risk prior to time tj in Group 1
nj is the total number at risk prior to time tj
Typically the weights are of the form

w
e (tj ) = w (tj ) n1j
such that the statistic becomes
k
X

X
k
dj
Z1 =
=
w (tj ) d1j n1j
w (tj ) [d1j e1j ]
nj
j=1
j=1
In this particular case, one can obtain
var(Z1 )
k
X
(w (tj ))
j=1
n1j
nj

n1j
nj dj
1
dj
nj
nj 1
An level test
Under the null hypothsis, the statistic
2 =
Z12
var(Z1 )
is a Chi-squared random variable (with 1 degree of freedom) for large samples

This means we will reject the null hypothesis
H0 : h1 (t) = h2 (t) ; t
if
Z12
var(Z1 )
is larger than the th upper percentage point of the Chi-squared distribution with 1 degree of freedom.
Special cases
Special cases are based on the choice of weight for the test statistic
Z1 =
k
X

dj
w (tj ) d1j n1j
n
j
j=1
We have
w (tj ) = 1 : Log-rank test
w (tj ) = SbKM (tj ) : Peto-Peto Prentice test
w (tj ) = nj
Wilcoxon (Breslow-Gehan) test
Discussion
Some other weight functions may be appropriate. The choice of weight function depends on the investigators desire to
give different weights to different types of error.
For instance, when comparing 1 vs nj (log-rank vs Wilcoxon), the latter gives more weight to early times (because
n1 > n2 > n3 > )
A practical note: log-rank statistic is more powerful for detecting differences in the hazard rates when the hazard rates are
proportional (h1 (t) = rh2 (t)) i.e.,
r
S1 (t) = [S2 (t)]
for some constant r.
The tests can be generalised to involve more than 2 groups.
11
Examples
Example 1
Consider an experiment for which we are interested in the effects of a particular drug (2 types)
Survival times for Group 1 (in months): 2, 4*, 5, 6, 9, 9, 12, 12*, 15*, 17.
Survival times for Group 2 (in months): 6*, 7, 9*, 10, 13, 15*, 17*, 18
where denotes the censored.
What can you say about the hazard rates for the 2 groups?
For k = 10
tj
n1j
2
10
5
8
6
7
7
6
9
6
10
4
12
4
13
2
17
1
18
0
sum
We have
d1j
1
1
1
0
2
0
1
0
1
0
7
n2j
8
8
8
7
6
5
4
4
2
1
d2j
0
0
0
1
0
1
0
1
0
1
4
nj
18
16
15
13
12
9
8
6
3
1
dj
1
1
1
1
2
1
1
1
1
1
11
(Z1LR )2
2 =
=P
k
var(Z1LR )
2
(d1j e1j )

= 2.42
nj dj
n
1 n1jj
dj
nj 1
P
k
j=1
n1j
j=1 nj
so that

p value = Pr 21 > 2.42 = 0.12 > 0.05
and hence we can not reject the null hypothesis that the hazard rates for the 2 groups are same.
KM survival functions with R
[use of survival package]
Call: survfit(formula = Surv(time, status) ~ treatment)
treatment=0
time n.risk n.event survival std.err lower 95% CI upper 95% CI
7
7
1
0.857
0.132
0.633
1
10
5
1
0.686
0.186
0.403
1
13
4
1
0.514
0.204
0.236
1
18
1
1
0.000
NaN
NA
NA
treatment=1
2
10
1
0.900 0.0949
0.732
1.000
5
8
1
0.787 0.1340
0.564
1.000
6
7
1
0.675 0.1551
0.430
1.000
9
6
2
0.450 0.1660
0.218
0.927
12
4
1
0.337 0.1581
0.135
0.845
17
1
1
0.000
NaN
NA
NA
NA survival functions with R
Call: survfit(formula = Surv(time, status) ~ treatment,
type = "fleming-harrington")
10
treatment=0
7
7
1
0.867
0.134
0.641
1
10
5
1
0.710
0.193
0.417
1
13
4
1
0.553
0.219
0.254
1
18
1
1
0.203
Inf
0.000
1
treatment=1
2
10
1
0.905 0.0954
0.736
1.000
5
8
1
0.799 0.1359
0.572
1.000
6
7
1
0.692 0.1590
0.441
1.000
9
6
2
0.496 0.1830
0.241
1.000
12
4
1
0.386 0.1810
0.154
0.967
17
1
1
0.142
Inf
0.000
1.000
0.0
0.2
0.4
0.6
0.8
1.0
KM survival functions
10
15
10
15
0.0
0.2
0.4
0.6
0.8
1.0
NA survival functions
11
Statistical tests
Peto-Peto:
Call: survdiff(formula = Surv(time, status) ~ treatment, rho = 1)
N Observed Expected (O-E)^2/E (O-E)^2/V
treatment=0 8
2.23
4.11
0.86
2.58
treatment=1 10
5.33
3.45
1.02
2.58
Chisq= 2.6
on 1 degrees of freedom, p= 0.108
Log-rank:
treatment=0 8
4
6.41
0.903
2.42
treatment=1 10
7
4.59
1.259
2.42
Chisq= 2.4
See R code or spreadsheet for Wilcoxon: p-value = 0.111

Example 2
A medical study was performed to investigate the difference (or otherwise) in the effectiveness of 2 alternative treatments (A
and B) to a disease. The survival time (in months) are
Survival times for Group 1 (in months):
2, 3 , 4, 6, 6, 12 , 12
Survival times for Group 2 (in months):
6 , 7, 9 , 10, 13, 15 , 17
where denotes a censored observation.
Suppose the variance of the Wilcoxon statistic was calculated as 160.40. Calculate the Wilcoxon statistic and perform the
Wilcoxon test.
Solution
j
1
2
3
4
5
6
7
tj
2
4
6
7
10
12
13
d1j
1
1
2
0
0
1
0
n1j
7
5
4
2
2
2
0
d2j
0
0
0
1
1
0
1
n2j
7
7
7
6
4
3
3
dj
1
1
2
1
1
1
1
nj
14
12
11
8
6
5
3
e1j = n1j
dj
j
0.5
0.416666667
0.727272727
0.25
0.333333333
0.4
0
nj (d1j e1j )
7
7
14
-2
-2
3
0
and hence the Wilcoxon statistic is 27.

The null hypothesis is that there is no difference in all points of the survival function vs the alternative of at least 1 point of
difference.
272
The Wilcoxon chi-squared statistic is 160.4
= 4.545 which is significant.
Hence on the basis of the Wilcoxin test result we reject the null and conclude that there is evidence of a different survival
function.
KM survival functions with R
[use of survival package]
Call: survfit(formula = Surv(time, status) ~ treatment)
treatment=0
7
6
1
0.833
0.152
0.583
1
12
10
13
4
3
1
1
0.625
0.417
0.213
0.222
0.320
0.147
1
1
treatment=1
2
7
1
0.857
0.132
0.6334
1
4
5
1
0.686
0.186
0.4026
1
6
4
2
0.343
0.195
0.1124
1
12
2
1
0.171
0.156
0.0289
1
NA survival functions with R
Call: survfit(formula = Surv(time, status) ~ treatment,
type = "fleming-harrington")
treatment=0
7
6
1
0.846
0.155
0.592
1
10
4
1
0.659
0.225
0.338
1
13
3
1
0.472
0.251
0.166
1
treatment=1
2
7
1
0.867
0.134
0.6406
1
4
5
1
0.710
0.193
0.4167
1
6
4
2
0.430
0.245
0.1411
1
12
2
1
0.261
0.237
0.0441
1
0.0
0.2
0.4
0.6
0.8
1.0
KM survival functions
10
15
NA survival functions
13
1.0
0.8
0.6
0.4
0.2
0.0
0
10
15
Statistical tests
Peto-Peto:
treatment=0 7
1.71
3.76
1.12
4.23
treatment=1 7
4.14
2.09
2.02
4.23
Chisq= 4.2
Log-rank:
treatment=0 7
3
5.37
1.05
3.61
treatment=1 7
5
2.63
2.14
3.61
Chisq= 3.6
See R code or spreadsheet for Wilcoxon: p-value = 0.033
14

As M2 A4

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

As M2 A4

Enviado por

Direitos autorais:

Formatos disponíveis

Contents

2 Censoring and truncation

4 Discrete hazard function

6 Kaplan-Meier (product limit) estimator

7 Variance of the Kaplan-Meier estimator

9 Relationship between the Kaplan-Meier and Nelson-Aalen estimators

10 Comparing survival functions - General approach

Censoring and truncation

Ci , the time at which the ith observation is censored is a random variable

Likelihood for censored and truncated data

[any of the above]

in presence of right truncation,

[any of the above]

in presence of interval truncation,

[any of the above]

Discrete hazard function

Discrete hazard function

and therefore (by induction)

[1 F (tjl )] will be maximised

Kaplan-Meier (product limit) estimator

which is in binomial form and has maximum likelihood estimator

Note the KM estimator is also called Product-limit estimator.

The Kaplan-Meier estimator is well defined for

time points less than tmax :

For estimator of the survival function beyond tmax :

Two extreme views:

From the final column, the Kaplan-Meier estimate of F (t) is

Variance of the Kaplan-Meier estimator

Variance of Kaplan-Meier estimator

V ar F (t) = V ar S (t) 1 F (t)

percentile from the standard normal distribution.

Derive a 95% confidence for the K-M estimate S(5).

0.761902 0.037698 = 0.02188

so that the confidence interval is

Relationship between the Kaplan-Meier and Nelson-Aalen estimators

Relationship between the Kaplan-Meier and Nelson-Aalen estimators

(using ex 1 x for small |x|)

Kaplan-Meier is more pessimistic

(remember ln(1 x) < x for 0 < x < 1)

Comparing survival functions - General approach

Comparing survival functions

Typically the weights are of the form

is a Chi-squared random variable (with 1 degree of freedom) for large samples

Wilcoxon (Breslow-Gehan) test

on 1 degrees of freedom, p= 0.108

on 1 degrees of freedom, p= 0.12

See R code or spreadsheet for Wilcoxon: p-value = 0.111

and hence the Wilcoxon statistic is 27.

on 1 degrees of freedom, p= 0.0398

on 1 degrees of freedom, p= 0.0574

See R code or spreadsheet for Wilcoxon: p-value = 0.033

Você também pode gostar