This article was downloaded by: [Indian Council of Medical Res], [ramesh athe]
On: 02 May 2013, At: 03:56
Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK The American Statistician Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/utas20 Comparing Treatments Using Quality-Adjusted Survival: The Q-TWiST Method Richard D. Gelber a , Bernard F. Cole b , Shari Gelber c & Aron Goldhirsch d a Harvard Medical School, Harvard School of Public Health, Dana-Farber Cancer Institute, Boston, MA, 02115, USA b Department of Community Health and Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA c Frontier Science and Technology Research Foundation, Brookline, MA, 02146, USA d University of Bern, Scientific Director of the International Breast Cancer Study Group, Ospedale Civico, Servizio Oncologico, 6900, Lugano, Switzerland Published online: 27 Feb 2012. To cite this article: Richard D. Gelber , Bernard F. Cole , Shari Gelber & Aron Goldhirsch (1995): Comparing Treatments Using Quality- Adjusted Survival: The Q-TWiST Method, The American Statistician, 49:2, 161-169 To link to this article: http://dx.doi.org/10.1080/00031305.1995.10476135 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Comparing Treatments Using Quality-Adjusted Survival: The Q-TWiST Method Richard D. GELBER, Bernard F. COLE, Shari GELBER, and Aron GOLDHIRSCH The quality of life of patients is an important component of evaluation of therapies. Wepresent an overview of a sta- tistical method called Q-TWiST (Quality-Adjusted Time Without Symptoms and Toxicity) which incorporates quality -of-life considerations into treatment comparisons. Multivariate censored survival data are used to partition the overall survival time into periods of time spent in a set of progressive clinical health states which may differ in qual- ity of life. Mean health state durations, restricted to the follow-up limits of the clinical trial, are derived from the data and combined with value weights to estimate quality- adjusted survival. The methodology emphasizes treatment comparisons based on threshold utility analyses that high- light trade-offs between different health state durations; it is not intended to provide a unique result combining qual- ity and quantity of life. Wealso describe three recent ex- tensions of the methodology: covariates can be included using proportional hazards and accelerated failure time regression models, restricted estimates can be projected beyond follow-up limits using parametric models, and meta-analyses can be performed incorporating quality-of- life dimensions. The basic methods are demonstrated in an analysis of data from a clinical trial comparing long versus short duration adjuvant chemotherapy regimens for the treatment of breast cancer. The clinical health states are defined by the following three outcomes: (1) end of treatment toxicity, (2) disease recurrence, and (3) death. The results allow one to evaluate the trade-off between the increased toxic effects and the increased recurrence-free interval associated with the long duration treatment. KEY WORDS: Clinical trials, Quality of life; Restricted means; Survival analysis; Utility. Richard D. Gelber is Professor of Pediatrics (Biostatistics) Har- vard Medical School, Harvard School of Public Health, and Dana- Farber Cancer Institute, Boston, MA 021 15, Bernard F. Cole is Assistant Professor, Department of Community Health and Division of Applied Mathematics, Brown University, Providence, RI 02912. Shari Gelber is Biostatistician, Frontier Science and Technology Research Founda- tion, Brookline, MA 02146. Aron Goldhirsch is Professor of Oncology, University of Bern, Scientific Director of the International Breast Can- cer Study Group, Ospedale Civico, Servizio Oncologico, 6900 Lugano, Switzerland. Support for the clinical trial was provided by theSwiss Cancer League, the Cancer League of Ticino, the Ludwig Institute for Cancer Research, the Swedish Cancer Society, the Frontier Science and Technology Research Foundation, and the Swiss Group for Clinical and Epidemiological Cancer Research. Support for the methodological de- velopment was provided by Grant PBR-53 fromthe American Cancer Society and Grant CA-06516 fromthe National Cancer Institute. The authors thank the patients, physicians, nurses, and data managers of the International Breast Cancer Study Group who contributed to the clini- cal trial described in this article. This paper was presented at the 1993 Spring Meetings of the Biometric Society (Eastern North America Re- gion), Institute of Mathematical Statistics, and the American Statistical Association, Philadelphia, PA, March 21-24, 1993. Portions of this ar- ticle from an earlier paper by the authors are reprinted with permission fromCancer Treatment Reviews (Gelber, Goldhirsch, and Cole 1993a). 1. INTRODUCTION The evaluation of treatments in terms of quality of life is becoming increasingly important in clinical research (Schumacher, Olschewski, and Schulgen 1991; Cox et al. 1992). In particular, there is a need to develop methods for comparing the palliative effects of treatment options within randomized clinical trials. Such methods are es- pecially useful in situations where a new treatment is not shown to significantly prolong life, but may have an advan- tage to improve or maintain the quality of life of the patient. For example, an experimental treatment may significantly increase time to disease progression or recurrence as com- pared with a standard treatment, but have only a modest effect on overall survival. Thus the experimental treatment represents an improvement in quality of life. On the other hand, the treatment may have gdverse side effects that di- minish quality of life. In this case there is a trade-off between improved response and treatment toxicity. For an individual patient the treatment selection depends not only on the magnitude of these trade-offs, but also on his or her preferences concerning the trade-offs. The purpose of this article is to present an overview of a statistical method called Q-TWiST that can be used to make treatment com- parisons in terms of both quality and quantity of life, while incorporating individual patient preferences. First attempts at assessing the impact of treatments on quality of life were made by identifying and grading the side effects of treatments. Subsequent efforts have been made to measure patients perceptions of the influence of such side effects and perceptions of symptoms of disease (Priestman and Baum 1976). This has led to the develop- ment of several instruments for assessing quality of life, which have been reviewed for their attributes and value for eliciting patient perceptions (see Maguire and Selby (1989); Donovan, Sanson-Fisher, and Redman (1 989); and Moinpour et al. (1989) for examples). Further ef- forts focused on the integration of both quality and quan- tity of life into a single end point that may be used to make treatment comparisons. This led to the development of the Q-TWiST method Q-TWiST stands for Quality- Adjusted Time Without Symptoms of disease and Toxicity of treatment, and was originally designed to incorporate aspects of quality of life into adjuvant chemotherapy and endocrine therapy comparisons for the treatment of breast cancer. The methodology is an extension of the TWiST method of Gelber and Goldhirsch (1986), which makes treatment comparisons in terms of survival time with- out symptoms of disease and toxicity of treatment (i.e., the survival time that remains after subtracting periods of time with symptoms or toxicity from the overall sur- vival time). The Q-TWiST method, which was first pro- posed by Goldhirsch, Gelber, Simes, Glasziou, and Coates (1989), allows for a portion of the time spent with symp- toms or toxicity to be included in the comparison. This @ 1995 American Statistical Association The American Statistician, May 1995, Vol. 49, No. 2 I61 D o w n l o a d e d
b y
[ I n d i a n
C o u n c i l
o f
M e d i c a l
R e s ] ,
[ r a m e s h
a t h e ]
a t
0 3 : 5 6
0 2
M a y
2 0 1 3
is accomplished by placing value-weights on these peri- ods according to their quality of life. The methodology has been successfully applied in a number of analyses of clinical trials. Gelber, Goldhirsch, and Cavalli (1991) and Gelber, Goldhirsch, Hurny, Bernhard, and Simes (1992a) present analyses of adjuvant therapies for operable breast cancer, and Gelber et al. (1992b) and Lenderking et al. (1994) present analyses of zidovudine therapy for HIV infection. In Section 2 we present a review of the Q-TWiST methodology. Section 3 describes recent extensions which (1) allow covariates to be included by proportional haz- ards and accelerated failure time regression models, (2) use parametric models to extrapolate beyond the follow- up limits of the available data, and (3) provide a means for performing meta-analyses. In Section 4 the basic proce- dures are illustrated in an analysis of a clinical trial com- paring long versus short duration chemotherapy for the treatment of node-positive breast cancer. In this exam- ple there is a trade-off between the toxic side effects of the treatment and delayed disease recurrence. Practical issues related to performing a Q-TWiST analysis are discussed in Section 5. 2. A REVIEW OF THE Q-TWiST METHOD The Q-TWiST method makes treatment comparisons in terms of quality and quantity of life by penalizing treatments which have negative quality-of-life effects and rewarding those which increase survival and have other positive quality-of-life effects. As in an ordinary survival analysis, the focus of the method is on time, but rather than look at a single end point such as overall survival or disease-free survival, multiple outcomes corresponding to changes in quality of life are considered. Periods of time with the negative side effects of treatment are weighted according to the severity of the side effects. A weight of zero indicates the period of time is as bad as death, and a weight of unity indicates perfect health. Weights between zero and unity indicate degrees between these extremes. These weights are called utility coeficients. A compos- ite measure of quality and quantity of life (i.e., quality- adjusted survival) is obtained by summing the weighted periods of time. This utility model makes two main as- sumptions: (1) the quality-adjusted time spent in a health state is directly proportional to the actual time spent in the health state, where the proportionality is given by the util- ity coefficient, and (2) the utility coefficient for a health state is independent of the time the health state is entered, as well as past and future quality of life. A general, technical description of the Q-TWiST method is given by Glasziou, Simes, and Gelber (1990). The three steps involved in applying the method are reviewed briefly below. Where appropriate, we have illust- rated the concepts by referencing the figures for the spe- cific example described in Section 4. 2.1 Step 1: Define Quality-of-Life Outcomes The first step is to define quality-of-life oriented sur- vival outcomes that are relevant for the disease setting under study. These should highlight specific treatment differences in terms of time and quality of life. These survival outcomes are then used to define a series of pro- gressive clinical health states which may differ in terms of quality of life. These states are progressive because a patient must proceed through them in order; however, any of the states may be skipped, for example, due to early death. In the case of adjuvant chemotherapy for re- sectable breast cancer, the survival outcomes are defined as follows: the time with toxicity (TOX), represented by the period in which the patient is exposed to subjective side effects of therapy; disease-free survival (DFS), the time until disease recurrence or death, whichever occurs first; and overall survival (OS), the time to death from any cause. The resulting progressive clinical health states are: time spent with treatment toxicity (TOX); time with- out either symptoms of the disease or toxicity of treat-. ment (TWiST =DFS - TOX); and time following the diagnosis of systemic spread of the disease or relapse (REL =0s - DFS). The definitions of TOX and REL reflect the fact that these periods of time have a negative impact on the overall quality of life of the patient. Further- more, their definitions are designed to emphasize the con- trasting properties of the different treatments under study. The defined survival outcomes (e.g., TOX, DFS, and 0s) indicate transitions between the progressive states of health (e.g., TOX, TWiST, and REL). The transition times may be subject to right censoring due to follow-up loss or patients surviving beyond the follow-up interval. As in standard survival analysis, this is acceptable if the censoring mechanism does not provide information about the failure mechanism (i.e., is noninformative). When a transition time is censored, all subsequent transition times in the progressive health state model are similarly censored. Each clinical health state is assigned a utility coeffi- cient, which may be unknown. In our example the utility coefficient for TWiST is assumed to be unity because it characterizes a period of relatively perfect health. On the other hand, the periods TOX and REL are associated with diminished quality of life, but the exact values for their utility coefficients are unknown. Therefore, we let UTOX and uREL denote the respective utility coefficients. These express the value of time in TOX and REL relative to TWiST. Figure 1 displays the different time periods in this example according to assumed utility coefficients of 1.0 for TWiST and .5 for both TOX and REL. This repre- sents a scenario in which one month spent TOX or REL is equivalent in value to one-half month spent with the better quality of life that characterizes TWiST. Utility :::l-rk, TWiST TOX REL DEATH 0.0 Time Figure 1. Components of Quality-Adjusted Time Without Symp- toms and Toxicity (Q-TWiST). Illustrates the division of overall sur- vival into TOX (subjective toxic effects), TWiS7; and REL (relapse), and the weighting of these time periods using utility coefficients UTOX and UREL. 162 The American Statistician, May 1995, Vol. 49, No. 2 D o w n l o a d e d
b y
[ I n d i a n
C o u n c i l
o f
M e d i c a l
R e s ] ,
[ r a m e s h
a t h e ]
a t
0 3 : 5 6
0 2
M a y
2 0 1 3
The Q-TWiST outcome is calculated as the weighted sum of the clinical health state durations and the utility coefficients. For the breast cancer example, Q-TWiST =uTOX x TOX +TWiST +uREL x REL. (1) 2.2 Step 2: Partition Overall Survival The second step is to consider each treatment separately and to partition the overall survival time into the defined clinical health states. This is done using the Kaplan-Meier product limit method (Kaplan and Meier 1958) to graph the transitional survival curves (e.g., the survival curves for TOX, DFS, and 0s). The areas between the curves are estimates of the mean health state durations. For ex- ample, the area beneath the TOX curve is an estimate of the mean duration of TOX, the area between the DFS and TOX survival curves is an estimate of the mean duration of TWiST, and the area between the 0s and DFS survival curves is an estimate of the mean duration of REL. These estimates have a multivariate normal limiting distribution (Glasziou et al. 1990; Breslow and Crowley 1974), and do not suffer from bias due to the induced dependency be- tween the censoring mechanism and health state duration distributions (Gelber, Gelman, and Goldhirsch 1989). In practice, censoring often precludes one from estimat- ing the entire survival curve. In this case the average health state durations (i.e., the areas between the survival curves) are calculated within the follow-up interval of the study co- hort. The resulting estimates are called restricted means (Kaplan and Meier 1958). Covariation among these re- stricted means can be estimated using a resampling proce- dure such as the bootstrap method (Glasziou et al. 1990). As a useful visual display, the transitional survival cur- ves corresponding to the multiple outcomes for one treat- ment can be plotted on the same graph. Separate graphs can be produced for each treatment group. These are called partitioned survival plots (see Figure 3, Section 4). 2.3 Step 3: Compare the Treatments The third step is to compare the treatment regimens in terms of quality-adjusted survival (Q-TWiST). This com- posite measure is obtained by the linear combination of the estimated restricted mean health state durations cal- culated in Step 2 and the utility coefficients. For exam- ple, estimates of TOX, TWiST, and REL are substituted into equation (1). This is done separately for each treat- ment group, and the treatment effects are estimated by computing the differences in Q-TWiST (e.g., treatment group minus control group Q-TWiST) for specific values of the utility coefficients. Standard error estimates can be obtained using the bootstrap method. Statistical infer- ences on the treatment effects can be conducted using the large sample theory for restricted means estimated from the Kaplan-Meier survival curves. The influence of patient preferences on treatment choice can be examined by a sensitivity analysis, called a thresh- old utility analysis, which displays the treatment compari- son for varying values of the utility coefficients (Glasziou et al. 1990). When two treatments are being compared and there are two utility coefficients, the sensitivity anal- ysis can be presented as a two-dimensional plot with a straight line, called a threshold line, indicating pairs of utility coefficients for which the two treatments have equal Q-TWiST (see Figure 5, Section 4). The threshold line is obtained by setting the treatment effect equal to zero and solving for the unknown utility coefficients, producing a linear equation. A confidence region for the threshold line can also be obtained by finding the pairs of utility coefficient values for which the confidence interval for the treatment effect captures zero. The plot shows which treatment is preferred in terms of Q-TWiST for each pair of coefficient values. It is also possible to investigate how the Q-TWiST treat- ment effect unfolds over the course of follow-up. This is accomplished by performing the analysis at an evenly spaced sequence of times (restriction times) leading up to the follow-up limit. For example, if there are ten years of follow-up, then the analysis could the restricted to yearly intervals beginning at zero and ending at ten. The results can be plotted on a time axis for particular values of the utility coefficients or as a region indicating the range of the treatment effect as the utility coefficients vary between zero and one. This is called the Q-TWiST gain function (see Figure 4, Section 4). 3. RECENT EXTENSIONS 3.1 Regression Models Covariates and prognostic factors can be easily incor- porated into a Q-TWiST analysis with standard regression methods for survival analysis, allowing the inclusion of continuous covariates as well as discrete stratifying vari- ables. In most cases the entire sample of patients can be used to estimate one model for each survival out- come, avoiding the problem of decreased sample sizes due to stratification. This has been done with proportional hazards models (Cole, Gelber, and Goldhirsch 1993) and accelerated failure time models (Cole, Gelber, and Anderson 1994). Proportional hazards regression can be used instead of the product limit method in Step 2 of the Q-TWiST methodology to estimate survival curves for the health state transitions according to various predetermined pa- tient profiles. Specifically, a proportional hazards model is fit to each of the progressive survival outcomes, and the resulting estimates are used to predict survival curves for various covariate values. Threshold utility analyses, based on the predicted survival curves, are preformed for each of the patient profiles, allowing one to evaluate treatment effectiveness under a variety of prognostic situations. If the proportional hazards assumption is not appropriate for aparticular covariate, then a stratified analysis can be used. Accelerated failure time regression can be used in a similar fashion, or it may be used in a more complicated approach that involves the conditional modeling of health state transitions given previous transitions and health state durations (Cole et al. 1994). This represents a more di- rect modeling of the health state transitions as a semi- Markov stochastic process. The intensity function for each transition is assumed to have a certain functional form (e.g., Weibull, log-normal, etc.), and is conditional The American Statistician, May 1995, Vol. 49, No. 2 163 D o w n l o a d e d
b y
[ I n d i a n
C o u n c i l
o f
M e d i c a l
R e s ] ,
[ r a m e s h
a t h e ]
a t
0 3 : 5 6
0 2
M a y
2 0 1 3
on the previous health state transitions. Covariates are in- cluded by assuming that the location parameter for each model is a linear combination of covariate values and un- known parameters. The parameters are estimated by max- imum likelihood. The expected health state durations and quality-adjusted survival are approximated by simulat- ing data from the estimated regression models, and infer- ence is conducted using the bootstrap method or the delta method. This procedure also allows for models that do not involve progressive health states; however, a sufficient number of observations making each type of transition is required. 3.2 Extrapolation of Survival Curves The procedures described thus far use restricted sur- vival means to produce a composite measure of quality and quantity of life. The estimated mean treatment effect in terms of quality-adjusted survival is restricted to the follow-up limits of the data, and therefore does not ad- dress the possible long-term treatment effect. In situations where there are a sufficient number of events, long-term effects may be extrapolated from the available data to pro- vide an indication of what may occur in the future. The extrapolation methodology is introduced by Gelber, Goldhirsch, and Cole (1993b), and consists of using para- metric models to model the tail of a survival curve and project the product limit estimate beyond the follow-up interval. Cut points are used to define where the tail be- gins, and where observations may contribute to the likeli- hood for estimating the model. Probability plots are used to determine appropriate parametric models and values for the cut points. The procedure is especially useful when it is difficult to fit a parametric model to the entire survival curve, but one is easily fit to the tail portion. For exam- ple, early failures in clinical trials may be influenced by the healthy entrant phenomenon that suggests that patients entering a clinical trial, being initially healthy enough to undergo the treatment, are at decreased risk for disease recurrence and death soon after enrollment. Such a phe- nomenon may be difficult to model and is not central to the extrapolation problem. In this case a composite estimator based on the product limit method and a parametric model is convenient, useful, and appropriate. To produce extrapolated estimates of quality-adjusted survival, the extrapolation methodology is applied to the survival curves corresponding to the progressive health state transitions. Mean health state durations may then be estimated using the projected survival curves, allowing estimates to be restricted to some limit greater than the follow-up interval. The bootstrap method may then be used for statistical inference. For the extrapolation methodology to be successful, it is necessary to have a sufficient follow-up period and a suffi- cient number of events for evaluating the fit of the model; otherwise, the projected estimates could be misleading. Although it is not possible to fully evaluate the accuracy of statistical inferences based on projected estimates with- out continued follow-up, a reasonable range of projections based on careful modeling of a large data set can provide estimates which supplement the more traditional measures such as relative risk reduction, and represent a more com- plete use of the clinical trial data. 3.3 Meta-Analysis Cole, Gelber, and Goldhirsch (1994) present an exten- sion of the Q-TWiST method to perform meta-analysis. The method was applied to evaluate results from eight clinical trials comparing chemotherapy versus control in the treatment of breast cancer in women under 50 years of age. The median follow-up intervals for these trials range from a minimum of three years to a maximum of ten years. The meta-analysis procedure uses multivariate multiple regression models (one for the treatment group and one for the control group) to combine individual trial analyses in a manner that accounts for varying follow- up intervals between the trials, and provides a summary in terms of quality-adjusted survival. The method consists of four steps and is a modification of the three-step standard Q-TWiST procedure outlined in Section 2. The first step in the analysis is exactly the same as Step 1 in Section 2, and the remaining steps are as follows: Step 2: Restricted mean health state transition times are estimated separately for the treatment group and the control group for each of the clinical trials under consid- eration. The restriction time is the follow-up limit, which may differ for each trial. Each trial also contributes an esti- mated covariance matrix corresponding to the mean health state transition times. Step 3: A multivariate multiple regression model is fit to the resulting estimates separately for the treatment group and the control group. The dependent variables are the restricted means for the progressive survival outcomes, and the independent variable is the follow-up limit. Pow- ers of the follow-up limit may also be included as inde- pendent variables. Each trial contributes one multivariate data point to the estimation for the two models. Regres- sion parameters are estimated by generalized least squares in order to accommodate the covariance estimates for the health state durations among the clinical trials. Step 4: The regression models estimated in Step 3 are used to predict the health state durations for a particular follow-up limit, which is generally some number smaller than the largest follow-up interval observed for the trials under consideration. This is done for each of the treatment groups. For example, the regression models could be used to predict the mean durations of TOX, TWiST, and REL restricted to ten years. These estimates are then used to predict mean quality-adjusted survival. The resulting esti- mates take into account the data from all trials under con- sideration. Statistical inference is carried out using the covariance matrix of the regression parameter estimates. This procedure assumes that the estimated mean health state durations for each trial are normally distributed, which is appropriate if the large sample properties of the Kaplan-Meier product limit estimator apply. 4. EXAMPLE To illustrate the evaluation of treatment effectiveness using Q-TWiST, we applied the standard methodology as described in Section 2 to a randomized clinical trial of ad- juvant chemotherapy for resectable breast cancer. Trial V 164 The American Statistician, May 1995, Vol. 49, No. 2 D o w n l o a d e d
b y
[ I n d i a n
C o u n c i l
o f
M e d i c a l
R e s ] ,
[ r a m e s h
a t h e ]
a t
0 3 : 5 6
0 2
M a y
2 0 1 3
2o 1 100- 80 - 2 6 0 - a 40- E 04 0 12 24 36 48 60 72 84 Months from Randomization (a) 0 s DFS 2o 1 04 0 12 24 36 48 60 72 84 Months from Randomization (b) Figure 2. Disease-Free Survival (a) and Overall Survival (b) Comparing Long Duration Chemotherapy (Solid Line) Versus Short Duration Chemotherapy (Dashed Line) for 1,229 Patients with Node- Positive Breast Cancer in International Breast Cancer Study Group (IBCSG) Trial Vat Seven Years of Median Follow-Up. of the International Breast Cancer Study Group (IBCSG) investigated, in patients with node-positive breast cancer, the effectiveness of short duration (one month) peri- operative systemic treatment compared with long dura- tion adjuvant therapy (six or seven months) [see Ludwig Breast Cancer Study Group (1 988); Gelber et al. (1992a)l. The short duration therapy consisted of perioperative chemotherapy given on days 1 and 8 after surgery. The long duration treatment regimen consisted of chemother- apy for six months either following the perioperative coursc or initiated three to five weeks after surgery (with- out the perioperative course). A total of 1,229 patients were randomized to the two treatmcnts. Four hundred thirteen patients were randomized to the short duration treatment, and 8 16 patients were randomized to the long duration treatment. The median follow-up for this analysis was seven years. Figure 2 shows the DFS and 0s comparisons ofthe long duration group versus the short duration group. Table 1 gives thc seven-year percentages for DFS and 0s accord- ing to treatment group. 4.1 Partitioning Overall Survival Figurc 3 shows the partitioned survival plots according to treatment group. The areas between the curves give the average amount of time spent in TOX, TWIST, and REL as indicated. The larger area of TOX and the smaller area of REL are characteristics of the long duration treatment in terms of time with reduced quality of life. Table 2 gives the average amounts of time i n TOX, TWIST, and REL up to seven years from randomization Table 1. Seven-Year Disease-Free Survival (DFS) and Overall Survival (0s) Percentages According to Treatment for 1,229 Patients With Node-Positive Breast Cancer in International Breast Cancer Study Group Trial V Chemotherapy treatment 7- year DFS % (S.E.) 7-year 0s % (S. E.) Long duration 51 (1.8) 63 (1.8) Short duration 33 (2.5) 50 (2.7) Log-rank test 2-sided P-value <.0001 .0002 derived from the partitioned survival plots. The two right-hand columns of the table refer to the treatment differences (long duration minus short duration) for the average amount of time patients spend in the various states. The Q-TWIST calculation was made as an exam- ple attributing the utility coefficients of .5 to both TOX and REL. These values were arbitrarily selected to il- lustrate the method, and do not represent specific val- ues actually derived from individual patient preferences. Within seven years, the amount of Q-TWIST gained by the long duration treatment compared with the short du- ration treatment was five months, an amount of time gained even after quality-of-life adjustments for toxic ef- fects and disease relapse. The 95% confidence interval , , , , , , 0 0 12 24 36 48 60 72 84 Months from Randomlzatlon ( 4 100 80 E 60 E a 40 20 0 0 12 24 36 48 60 72 84 Months from Randomization (b) Figure 3. Partitioned Survival Plots. Partitioned survival for the long duration treatment (a) and for the short duration treatment (b) for IBCSG Trial Vat seven years of median follow-up. In each graph the area under the overall survival curve (0s) is partitioned by the survival curves for disease-free survival (DFS) and time with treat- ment toxicity (TOX). The areas between the survival curves give the average months spent in TOX, TWiST, and REL as indicated. The American Statistician, May 1995, Vol. 49, No. 2 16.5 D o w n l o a d e d
b y
[ I n d i a n
C o u n c i l
o f
M e d i c a l
R e s ] ,
[ r a m e s h
a t h e ]
a t
0 3 : 5 6
0 2
M a y
2 0 1 3
Table 2. Average Months of Time According to Quality of Life End Point for 1,229 Patients in International Breast Cancer Study Group Trial V Chemotherapy treatmenf End point Long duration Short duration Difference 95% C.I. TOX 6 TWiST 54 REL 9 Q-TWIST 61 ~ O X =UREL =0.5 0s 69 DFS 59 1 47 16 56 64 48 5 4.9-5.1 6 3-1 0 -7 -9--5 5 3-8 5 2-8 11 8-1 5 for the gain in Q-TWiST, with utility coefficients equal to .5 for both TOX and REL, was between three and eight months, suggesting an advantage for the long duration regimen. 4.2 Q-TWiST Gain Function By restricting the Q-TWiST analysis to yearly inter- vals leading up to the seven year analysis, we see how Q-TWiST gains for the long duration treatment are accu- mulated over time. This is described by the Q-TWiST gain function shown in Figure 4. The solid line within the shaded region reflects the result for utility coefficients of .5 for both TOX and REL. Early in the course of the follow-up, the toxic effects of the long duration treatment result in a loss in Q-TWIST compared with the short dura- tion treatment. This is because the advantages of the long duration treatment (i.e., increased DFS and 0s) do not ap- pear until later on in time. As the benefits are realized with follow-up, the Q-TWiST gain function begins to increase, and will continue to increase provided the DFS curves for the two treatments remain separated. The shaded re- gion in Figure 4 illustrates the range of results for the Q-TWIST gain function as the coefficient values for TOX and REL range between 0 and 1. The lower edge of the shaded region corresponds to utility coefficient values of UTOX =0 and uREL =1, while the upper edge corresponds to UTOX =1 and UREL =0. 0 1 2 3 4 5 6 7 Years Figure 4. Q-TWiST Gain Function. The Soliddarkcurve gives the average months of Q-TWiST (for UTOX =UREL =.5) gained for the long duration treatment compared with the short duration treatment in IBCSG Tral Vas a function of years from randomization. The shaded region surrounding the solid curve shows the ranges for the Q-TWiST gain function as the utility coefficients vary between 0 and 1. 4.3 Threshold Utility Analysis Clearly, the results of a Q-TWIST analysis depend on the values of the utility coefficients. A threshold utility analysis illustrates the treatment comparison results for all combinations of utility coefficient values, allowing the interpretation of clinical trial results based on individual patient preference. Figure 5 shows the threshold utility analysis for the IBCSG Trial V data at seven years. The solid threshold line in the lower right corner of the graph indicates values of uTOX and uREL for which the treatments have equal Q-TWiST. The long duration treatment has greater Q-TWiST for pairs of utility coefficients that fall above the threshold line, while the short duration treat- ment has greater Q-TWiST for pairs of values that fall below the threshold line. The dashed line gives an upper 95% confidence band for the threshold line. The lower confidence band is outside the range of possible utility coefficients. The results show that at seven years, a sig- nificant Q-TWiST gain was achieved for a large range of choices for the utility coefficients. Figure 5 allows one to determine the treatment prefer- ence given apair of utility coefficient values. For example, 0.8 0.6 Longer Duration uTOX I , Sig. Better , , , , ,,,, j 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 U R E L Threshold Utility Analysis for IBCSG 7iial V: Both uT0x (vertical axis) and UREL (horizontal axis) range between 0 and 1, where the value 1 indicates that the time is worth the same as TWiST; while the value 0 indicates that the time is worth nothing. The solid line is the threshold (based on values of UTOX and UREL) for which the treatments have equal Q-TWiST: The dashed line shows the 95% confidence band for the threshold. The region denoted by "Longer Duration Sig. Better" indicates the values of utility coefficients for which average Q-TWiST at seven years after randomization was sta- tistically significantly greater for the long duration chemotherapy treat- ment compared with the short duration chemotherapy treatment. Figure 5. 166 The American Statistician, May 1995, Vol. 49, No. 2 D o w n l o a d e d
b y
[ I n d i a n
C o u n c i l
o f
M e d i c a l
R e s ] ,
[ r a m e s h
a t h e ]
a t
0 3 : 5 6
0 2
M a y
2 0 1 3
for apatient with utility coefficient values of UTOX =UREL = .5, the long duration treatment is significantly better, and thus is the preferred treatment in terms of Q-TWiST. On the other hand, for a patient with utility, coefficient values of UTOX =. l and UREL =.9, for whom the disutility of toxic effects is great while the disutility of relapse is minimal, the gain in Q-TWiST at seven years is not significant. In this case the treatment preference in terms of Q-TWiST is not conclusive. It is important to note that the thresh- old utility analysis does not indicate the distribution of the utility coefficients for the population. In other words, the threshold line does not tell us how many patients pre- fer one treatment over the other. This question must be addressed with additional research. 4.4 Incorporating Prognostic Factors To illustrate one of the recent extensions of the Q- TWiST methodology, weperformed a proportional haz- ards Q-TWiST analysis of the IBCSG Trial V data using the following prognostic factors: tumor size, age, tumor grade, and the number of lymph nodes involved. Treat- ment group was included as an added covariate. Models were fit to each of survival outcomes, DFS and OS, and all covariates were statistically significant ( p <.05). Other factors, such as estrogen receptor status, were not included because they were not statistically significant. The product 1 .o 1 - 1 Longer Duration 0.81 Sig. Better UTOX 0.0 0.2 0.4 0.6 0.8 1.0 0.6 0.81 Lon er Duration &g. Better uTOX 0.41 ,/ 0.04 0.0 0.2 0.4 0.6 0.8 1.0 UREL (b) Figure 6. Threshold Utility Analyses for Two Patient Profiles Basedon the Proportional Hazards Model for 1,229 Patients in IBCSG Rial V: Threshold diagrams are shown for 45 year old patients in a good prognostic situation (a) and in a poor prognostic situation (b). The regions denotedby Longer Duration Sig. Bettefindicate the val- ues of utility coefficients for which average Q-Wi ST at seven years after randomization was statistically significantly greater for the long duration chemotherapy treatment compared with the short duration chemotherapy treatment. limit method was used to estimate the survival curves for TOX according to treatment group. A proportional haz- ards model was not used for TOX because none of the prognostic factors was significant in the model, and the proportional hazards assumption did not appear appropri- ate for the treatment group covariate. For DFS and OS, goodness-of-fit tests did not suggest that the proportional hazards assumption was violated. Threshold utility analyses at seven years based on the model are presented in Figure 6 for two patient profiles. These profiles represent a good prognosis and a poor prognosis for a 45 year old patient. The range of util- ity coefficients favoring the more toxicity-intensive long duration chemotherapy is large for the poor prognosis sit- uation compared with the good prognosis situation. This is the case even though relative effectiveness is similar for both patient profiles; that is, the same percentage re- duction in the risk of an event is achieved for good and poor prognosis. Figure 6 illustrates how, from a patients point of view, a poor prognosis scenario has the potential to gain more Q-TWiST in the short term (within seven years), thus increasing the rationale to use long duration chemotherapy. 5. APPLYING THE Q-TWiST METHOD As illustrated by the IBCSG example, a Q-TWiST anal- ysis may be performed retrospectively after the completion of a clinical trial. In this case data must have been col- lected in the trial to enable the partition of overall survival into the clinically relevant health states. These are often broadly defined, for example, using the entire treatment period to represent TOX. Alternatively, a Q-TWiST analysis can be planned prospectively and specified as part of the protocol doc- ument. Each clinical health state should be defined with the assistance of a clinical colleague. This will ensure that the appropriate data are collected for evaluating the clinical health states. Patient-derived utilities could also be col- lected during the trial (Weeks 1994). Methods for deriv- ing utility scores such as standard gamble, time trade-off, and multiattribute techniques are discussed by Torrance (1986). In addition to a Q-TWiST analysis incorporating the patient-derived utilities, we recommend performing a threshold utility analysis to allow individual patients to determine the treatment choice for their particular prefer- ence scores. In practice, the most challenging component to define is toxicity. Typically, it is preferable to use criteria which focus on symptomatic rather than on laboratory events as the former most directly influence patients quality of life. It may be difficult to precisely accommodate intermittent toxicities because the clinical health states are progres- sive. It is possible, however, to define toxicity as the time period from initial treatment until all toxicity has disap- peared. If there are long periods of time captured in this definition that are actually free of toxicity, this will be re- flected by having a higher value for the average toxicity utility coefficient. By defining the clinical health states to reflect specific trade-offs of concern to health professionals and patients, The American Statistician, May 1995, Vol. 49, No. 2 167 D o w n l o a d e d
b y
[ I n d i a n
C o u n c i l
o f
M e d i c a l
R e s ] ,
[ r a m e s h
a t h e ]
a t
0 3 : 5 6
0 2
M a y
2 0 1 3
Q-TWiST provides a framework for treatment decision- making. For example, to evaluate the role of zidovudine therapy for asymptomatic patients with HIV infection, progressive health states of TWiST, adverse events (AE: symptomatic sequelae associated with treatment or dis- ease), and progression (Prog: clinical definition of HIV progression) were defined (Lenderking et al. 1994). In this case Q-TWiST =TWiST +UAE x AE +uprog x Prog, focusing attention on the trade-off between increased ad- verse events and delayed disease progression associated with zidovudine therapy. Q-TWiST can also be used to evaluate chemotherapy for small cell lung cancer which can prolong survival by a month or two and may also relieve symptoms of the disease, but at a cost of severe side effects of treatment. By defining appropriate health states for treatment toxicity and palliation of disease symptoms, the Q-lWiST method can highlight the benefits and costs of chemotherapy in this setting. Furthermore, different therapies for small cell lung cancer may be less successful in returning patients to states of relatively good health, and in this case TWiST may be assigned a utility value less than one. A Q-TWiST analysis of the efficacy of treatments de- signed to prolong event-free survival can highlight the in- fluence of late sequelae by defining clinical health state to capture the occurrence of late events. This approach is cur- rently being applied to evaluate treatments for childhood acute lymphoblastic leukemia and Hodgkins disease. 6. CONCLUSIONS The evaluation of treatment effectiveness in terms of quality of life will become increasingly important in clin- ical trials. For chronic illnesses with no cure, new treat- ments will need to be evaluated not only for a survival effect but also for possible palliative advantages. In this article we have presented areview of the Q-TWiST method that is directly applicable and well suited for this purpose because treatments are evaluated simultaneously in terms of quantity and quality of life. Other quality-of-life mea- sures, which do not account for time, only indirectly reflect benefits of delayed disease recurrence. Another advantage of Q-TWiST is that the method does not aggregate quality- of-life results for an entire population: instead, it allows individual patients and physicians to determine the rec- ommended treatment according to individual preferences. This advantage is obtained from a threshold utility anal- ysis which gives the preferred treatment according to all combinations of the utility coefficients. Wehave also described in this article various extensions of the Q-TWiST methodology. The extension to regres- sion models allows the evaluation of treatment effects, in terms of quality of life, to be made according to differ- ent prognostic situations. The extrapolation methodology provides a means for investigating long-term treatment ef- fects when there are sufficient data for modeling the tails of the survival curves. The final extension to meta-analysis allows clinical trials, having different length follow-up intervals, to be combined in such a way that aggregate Q-TWiST analyses are possible. The Q-TWiST method provides a quality-adjusted sur- vival analysis for clinical trial data that is most useful for treatment decision-making. The results can be used for treatment recommendations for individual patients, as well as for clinical trial evaluations of therapeutic regimens. [Received June 1993. Revised November 1994.1 REFERENCES Breslow, N. E., and Crowley, J. (1974), A Large Sample Study of the Life Tableand Product Limit Estimates under RandomCensorship, Annals of Statistics, 2,437-453. Cole, B. F., Gelber, R. D., and Anderson, K. M., for the Intema- tional Breast Cancer Study Group (1994), Parametric Approaches to Quality-Adjusted Survival Analysis, Biometrics, 50,621-63 1. Cole, B. F., Gelber, R. D., and Goldhirsch, A., for the International Breast Cancer Study Group (1993), Cox Regression Models for Quality-Adjusted Survival Analysis, Statistics in Medicine, 12,975- 987. (1993, A Quality Adjusted Survival Meta-Analysis of Adju- vant Chemotherapy for Premenopausal Breast Cancer, Statistics in Medicine, 14, 1771-1784. Cox, D. R., Fitzpatrick, R., Fletcher, A. E., Gore, S. M., Spiegelhalter, D. J., and Jones, D. R. (1992), Quality of Life Assessment: Can We Keep It Simple?: Journal of the Royal Statistical Society, Part A, 155,353-393. Donovan, K., Sanson-Fisher, R. W., and Redman, S. (1989), Measuring Quality of Life in Cancer Patients, Journal of Clinical Oncology, 7, Gelber, R. D., Gelman, R. S., and Goldhirsch, A. (1989), A Quality- of-Life-Oriented Endpoint for Comparing Therapies, Biometrics, 45, Gelber, R. D., and Goldhirsch, A. (1986), A New Endpoint for the Assessment of Adjuvant Therapy in Postmenopausal Women with Operable Breast Cancer, Journal of Clinical Oncology, 4, 1772- 1779. Gelber, R. D., Goldhirsch, A., andcavalli, F., for the International Breast Cancer Study Group (1991), Quality-of-Life-Adjusted Evaluation of a Randomized Trial Comparing Adjuvant Therapies for Operable Breast Cancer, Annals of Internal Medicine, 114,621-628. Gelber, R. D., Goldhirsch, A., and Cole, B. F., for the International Breast Cancer Study Group (1993a), Evaluation of Effectiveness: Q-TWiST, Cancer Treatment Reviews, 19,73-84. Gelber, R. D., Goldhirsch, A., and Cole, B. F. (1993b), Parametric Extrapolation of Survival Estimates to Quality-of-Life Evaluation of Treatments, Controlled Clinical Trials, 14,485-489. Gelber, R. D., Goldhirsch, A., Hurny, C., Bernhard, J., and Simes, R. J ., for the International Breast Cancer Study Group (1992a), Quality of Life in Clinical Trials of Adjuvant Therapies, Journal of the National Cancer Institute Monographs, 11, 127-135. Gelber, R. D., Lenderking, W. R., Cotton, D. J ., Cole, B. F., Fischl, M. A., Goldhirsch, A., and Testa, M. A,, for the AIDS Clinical Tri- als Group (1992b), Quality-of-Life Evaluation in a Clinical Trial of Zidovudine Therapy in Patients with Mildly Symptomatic HIV Infection, Annals of Internal Medicine, 116,961-966. Glasziou, P. P., Simes, R. J., and Gelber, R. D. (1990), Quality Adjusted Survival Analysis, Statistics in Medicine, 9, 1259-1276. Goldhirsch, A., Gelber, R. D., Simes, R. J., Glasziou, P., and Coates, A,, for the Ludwig Breast Cancer Study-Group (1989), Costs and Benefits of Adjuvant Therapy in Breast Cancer: A Quality Adjusted Survival Analysis, Journal of Clinical Oncology, 7,36-44. Kaplan, E. L., and Meier, P. (1958). Nonparametric Estimation from Incomplete Observations, Journal of the American Statistical Asso- ciation, 54,457-481. Lenderking, W. R., Gelber, R. D., Cotton, D. J., Cole, B. F., Goldhirsch, A., Volderding, P. A., and Testa, M. A. (1994), Evaluation of the Quality-of-Life Assessment in Asymptomatic Human Immunode- ficiency Virus Infection, New England Journal of Medicine, 330, Maguire, P., and Selby, P., on behalf of the Medical Research Councils Cancer Therapy Committee Working Party on Quality of Life (1989), Assessing Quality of Life in Cancer Patients, British Journal of Cancer, 60,437-440. 959-968. 781-795. 738-743. 168 The American Statistician, May 1995, Vol. 49, No. 2 D o w n l o a d e d
b y
[ I n d i a n
C o u n c i l
o f
M e d i c a l
R e s ] ,
[ r a m e s h
a t h e ]
a t
0 3 : 5 6
0 2
M a y
2 0 1 3
Moinpour, C. M., Feigl, P., Metch, B. Hayden, K. A., Meyskens, Jr., F. L., and Crowley, J. (1989), Quality of Life End Points in Can- cer Clinical Trials: Review and Recommendations, Journal of the National Cancer Institute, 81,485-495. Priestman, T. J., and Baum, M. (1976), Evaluation of Quality of Life in Patients Receiving Treatment for Advanced Breast Cancer, Lancet, Schumacher, M., Olschewski, M., and Schulgen, G. (1991), Assess- ment of Quality of Life in Clinical Trials, Statistics in Medicine, 10, 1,899-900. 1915-1930. TheLudwig Breast Cancer Study Group (1988), Combination Adju- vant Chemotherapy for Node-Positive Breast Cancer: Inadequacy of a Single Perioperative Cycle, New England Journal of Medicine, 319,677-683. Torrance, G. W. (1986), Measurement of Health State Utilities for Eco- nomic Appraisal: A Review, Journal ofHealth Economics, 5, 1-30. Weeks, J . C., OLeary, J., Fairclough, D., Paltiel, D., and Weinstein, M. (1994), The Q-tility Index: A New Tool for Assessing Health- Related Quality of Life and Utilities in Clinical Trials and Clinical Practice, in Proceedings of ASCO 1994, 13, p. 436. The American Statistician, May 1995, Vol. 49, No. 2 169 D o w n l o a d e d