Você está na página 1de 7

The Sickness Impact Profile

Development of an Outcome Measure of Health Care


The Sickness Impact Profile, a behaviorally based measure of sickness-related dysfunction, is being developed to provide an appropriate and sensitive measure of health status for use in assessing the effects of health care services.
The development of methods for evaluating health care services is one of the most urgent concerns in the field of health services research. Among the major stimuli for this concern are the public accountability that accompanies increased government participation in health care financing and the growing public interest in the quality of increasingly
Dr. Betty Gilson is Associate Professor and Associate Dean, Department of Health Services, School of Public Health and Community Medicine, University of Washington, Seattle, Washington 98195. Dr. John Gilson is Director of Medical Education, Group Health Cooperative of Puget Sound, Seattle, Washington. Dr. Bergner is Assistant Professor, Department of Health Services, School of Public Health and Community Medicine, University of Washington, Seattle, Washington. Dr. Bobbitt is Research Professor, Department of Health Services, School of Public Health and Community Medicine, University of Washington, Seattle, Washington. Ms. Kressel is Senior Administrative Analyst, Health Policy Program, San Francisco, California. Dr. Pollard is a postdoctoral fellow, Department of Psychology, Northwestern University, Evanston, Illinois. Dr. Vesselago's address is: 2012 Tenth Avenue East, Seattle, Washington. This investigation was supported by the HMO Service of the Health Services and Mental Health Administration, Contract HSM 110-72-420. This paper was presented, in abbreviated form, at the American Public Health Association Annual Meeting, San Francisco, 1974. It was accepted for publication July 21, 1975.
1304 AJPH DECEMBER, 1975, Vol. 65, No. 12

costly services. The proliferation of innovative organizational patterns for providing health services makes it necessary to obtain data demonstrating the relative benefits of available alternatives. Evaluators use three types of measures to assess health care services: measures of structure, measures of process, and measures of outcome. 1 2 Measures of structure or process assess factors that are presumably directly related to outcome. Measures of outcome are designed to assess the effects of the health care services on the population served. Often, structure or process measures are used because no adequate or efficient measure of outcome is available. While it has been assumed that these three types of evaluation measures are highly related and that structure and process measures can serve as proxies for outcome measures, the substitution will be legitimate only when the relationship between structure or process and outcome has been established. For example, one can assess the outcome of a program such as polio immunization by examining the number of immunizations administered (a process measure), since it has been demonstrated that such immunization leads to less polio (an outcome measure). On the other hand, since it is not known whether the number of clinician visits decreases illness, measuring numbers of visits does not provide knowledge of outcome.

TABLE 1-Categories and Selected Items of the Sickness Impact Profile


Items Describing Behaviors Involved in or Related to

Selected Items I make many demands, for example, insist that people do things for me, tell them how to do things I am going out less to visit people I am walking shorter distances I do not walk at all I lie down to rest more often during the day I sit around half asleep I am eating no food at all, nutrition is taken through tubes or intravenous fluids I am eating special or different food, for example, soft food, bland diet, low salt, low fat foods I often act irritable toward my work associates, for example, snap at them, give sharp answers, criticize easily I am not working at all I have given up taking care of personal or household business affairs, for example, paying bills, banking, working on budget I am doing less of the regular daily work around the house that I usually do I stay within one room I stop often when traveling because of health problems I am in a restricted position all the time I sit down, lie down, or get up only with someone's help communicate only by gestures, for

Scale Values

Social Interaction



Ambulation or Locomotion Activity Sleep and Rest Activity

5.2 3.3 9.2 4.6

8.1 12.3

Taking Nutrition


Usual Daily Work


Household Management

8.6 6.9

3.9 9.9 4.2

13.6 10.4 11.3

Mobility and Confinement

Movement of the Body

Communication Activity

example, moving head, pointing, sign language I often lose control of my voice when talk, for example, my voice gets louder, starts trembling, changes pitch
am doing more physically inactive pastimes instead of my other usual activities I am going out for entertainment less often I have difficulty reasoning and solving problems, for example, making plans, making decisions, learning new things sometimes behave as if I were confused or disoriented in place or time, for example, where I am, who is around, directions, what day it is I isolate myself as much as I can from the rest of the family I am not doing the things I usually do to take care of my children or family I act irritable and impatient with myself, for example, talk badly about myself, swear at myself, blame myself for things that happen I laugh and cry suddenly for no reason I dress myself, but do so very


Leisure Pastimes and Recreation



Intellectual Functioning



Interaction with Family Members


Emotions, Feelings, and Sensations


8.1 4.6

Personal Hygiene

I do not have control of my bowels

The importance of developing widely applicable outcome measures of health care is evident. The few well designed studies of health care that have used outcome measures to evaluate programs have shown for the most part no difference between control and experimental groups or have shown inconclusive or contradictory results.* These results would be acceptable if one were confident that the outcome measures used were appropriate to the program goal and were sufficiently sensitive to discern differences where differences occur over a relatively short-time span. New developments should be directed toward overcoming the inadequacies of outcome measures used in the past: inappropriateness and insensitivity. This is especially important if the measures are to be used to assess the effects of comprehensive health care programs having relatively global objectives with regard to patient welfare and serving populations with heterogeneous health problems.

TABLE 2-SIP Item Scaling: Correlation of Each Judge's Ratings with the Mean Ratings for 312 Items

Judge No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

0.83 0.73 0.80 0.74 0.81 0.82 0.71 0.76 0.84 0.77 0.58 0.61 0.65 0.76 0.73 0.77 0.66 0.79 0.63 0.83 0.81 0.76 0.71 0.74 0.75

The Sickness Impact Profile

The Sickness Impact Profile (SIP), a behaviorally based measure of sickness-related dysfunction, is being developed in an effort to provide an appropriate, valid, and sensitive measure of health status that will aid in assessing the outcome of health care services. A measure of behavioral dysfunction in usual daily activities could provide a valid and practical indicator of health status and be of potential use in health care outcome evaluation. Obviously, no single measure can provide a complete assessment of the quality of health services and the SIP would be applied in conjunction with other measures appropriate to a particular study situation. Before this measure of health status was developed, an assumption was made regarding the desired outcomes of comprehensive health care programs, thus defining what was to be measured. It was assumed that the ultimately sought product of health services is the reduction of sickness. Disease was differentiated from sickness. Disease was used to denote a professional or provider definition of illness based on clinical or clinically related observations of a patient or a population. Sickness was used to denote the nonprofessional definition based on lay observations. For example, an individual may define certain signs or symptoms as sickness-related. Observation of such signs and symptoms in himself may result in his perceiving himself as sick. Having done so, he may or may not seek medical care. If he does not and his sickness persists, he nonetheless experiences impacts of sickness. If he does seek medical care, he enters the medical care process and his sickness may then be defined as disease by a clinician. The impact the sickness has on the individual is influenced by the health care provider's definition of the illness, the medical care process, and his own sickness perception.
* Elinson,3 Wilner et al.,4 and Cumming and Cumming' provide examples of experimentally designed studies that were unable to show differences between experimental and control groups. More recently, Brook and Appel2 cast doubt on the appropriateness of outcome measures currently considered acceptable, as well as of process measures.

Whether or not an individual seeks medical care, the impacts of sickness, as perceived by him, form the basis for his response to the SIP. Although sickness can be assessed in terms of clinical indices or on the basis of subjective or "feeling state" descriptions,6 the behavioral or performance dimension of sickness as perceived by the individual provides a singularly appropriate basis for an outcome measure of health care for several reasons. First, the behavior of an individual is a manifestation, at a given time, of the overall impact of illness, reflecting the effects of both the clinical and subjective dimensions, as well as their interactive effects on daily life activities. Second, the effect of sickness on an individual's social, mental, and physical activities is perceived and appreciated by both providers and consumers of health services. If a single evaluation measure is to be useful, the inclusion of both perspectives is essential. Such a measure must be acceptable to those providing services and yet must be responsive to consumers who demand a larger share in the quality assessment processes applied to health services. Third, a measure of behavioral dysfunction, independent of clinical examination, is particularly appropriate in evaluation of health care systems that are responsible for the health maintenance of heterogeneous population groups over extended periods of time. It is more feasible and economical when applied to a whole population than are most clinical measures, more reliable than measures of feelings and emotions, and more sensitive to the comprehensive impact of a health care system than are morbidity

1306 AJPH DECEMBER, 1975, Vol. 65, No. 12

and mortality measures. An instrument measuring level of dysfunction can be used to classify individuals "at a time when knowledge about cause and pathogenesis is not advanced enough to permit measurement in the latter terms."7 In addition, such an instrument can be used as a common basis for comparing persons in diagnostically homogeneous groups or across selected diagnostic groups. It can be applied whether or not medical care has been received or clinical information is available. Some behaviorally based instruments are currently used to evaluate very specific and circumscribed patient groups but are not designed to be applicable to a segment of the general population such as might be served by a comprehensive health care program.7 8 Certain other behaviorally based instruments are widely applicable, but do not provide sufficient detail to be useful in evaluation and planning of services.9' 0 A recent refinement by researchers has been the focus on performance of usual daily activities in function assessment measures. 1-'3

Methodology of SIP Instrument Construction

The aim of instrument construction was to incorporate both professional and lay perspectives of the impacts of sickness in the content of the Sickness Impact Profile. A procedure was devised to obtain statements describing behavioral dysfunction from patients, health care professionals, individuals caring for patients, and the apparently healthy. These descriptions were obtained by using an open ended request form to elicit specific statements that describe sickness-related changes in behavior. Over 1000 completed request forms were collected. In addition, function assessment instruments that have been designed for the

evaluation of circumscribed patient groups were reviewed for statements of behavioral dysfunction. From these sources, 1250 specific statements of behavioral change were obtained. These statements were subjected to standard grouping techniques according to a set of criteria. This process yielded 312 unique statements or items, each item describing a behavior or activity and specifying a dysfunction. A standard sorting procedure yielded 14 groups or categories of items, each of which appears to describe dysfunction in an area of living or a type of activity. These categories and selected items in each are shown in Table 1. Before the items and categories were field tested, a standardized and structured interview instrument was developed. It included the 312 items, several questions about the personal characteristics of the subject, and a request that the subject list any additional changes in his behavior that were related to his health and were not covered by the items. Subjects were asked to respond positively only to those items that they were sure described them and were related to their health. The total pattern of positive responses, or dysfunctions, provided a detailed profile of sickness impacts or, as it became known, a protocol. In order to: (1) interrelate individual items and to compare them, (2) provide a base for scoring profile patterns within and across categories, (3) validate the construct of dysfunction, and (4) determine the extent to which SIP scores relate to a more global assessment of dysfunction, two separate approaches to scaling were applied to the SIP. They were item scaling and protocol scaling. The item scaling procedure was employed to interrelate individual items and compare them, and to provide a basis for scoring. The SIP items were rated by a group of 25

TABLE 3-SIP Protocol Scaling: Median Correlation and Range of Correlation of Each Judge's Ratings with the Mean Ratings for Each Group of Judges*
Judging Group
1 2

Category Median
A B C D E F G H 1 J K L M N

0.69-0.96 0.76-0.99 0.70-0.98 0.42-0.97 0.79-0.99 0.70-0.97 0.73-0.97 0.62-0.99 0.41-0.95 0.49-0.96 0.82-0.96 0.66-0.97 0.45-0.95 0.72-0.99 0.75-0.97


0.18-0.92 0.72-0.97 0.67-0.93 0.79-0.95 0.41-0.99 0.57-0.97 0.78-0.97 0.78-0.96 0.48-0.95 0.68-0.95 0.62-0.97 0.24-0.96 0.64-0.94 0.79-0.97


0.65-0.96 0.88-0.99 0.71-0.94 0.58-0.97 0.89-1.00 0.78-0.98 0.74-0.99 0.71-0.98 0.67-0.97
0.57-0.95 0.53-0.97 0.61-0.95 0.56-0.95 0.85-0.99


0.59-0.98 0.75-0.97 0.55-0.95 0.65-0.98 0.83-1.00 0.71-0.99

0.86 0.91 0.90 0.87 0.96 0.90 0.92 0.95 0.85 0.82 0.89 0.94 0.77 0.93

0.85 0.88 0.87 0.89 0.95 0.92 0.93 0.88 0.80 0.87 0.85 0.77 0.85 0.94

0.89 0.96 0.86 0.91 0.98 0.93 0.94 0.93 0.90

0.82 0.93 0.83 0.76 0.95 0.92

0.91 0.90 0.91 0.91 0.97 0.93 0.92 0.93 0.90 0.83 0.93 0.88 0.93 0.92

0.73-0.96 0.83-0.97
0.79-0.97 0.55-0.96 0.62-0.98 0.59-0.96 0.71-0.98 0.77-0.98 0.80-0.95



Each group of judges rated the protocols of 50 subjects.



judges. The judges were seven graduate nursing students, eight medical students, six health services administration students, and four physicians. The scaling consisted of two steps. In the first step, using the method of equal appearing intervals, judges rated each item within each category on an 11-point scale, ranging from "minimally dysfunctional" to "severely dysfunctional." Judges were asked to rate the severity of the dysfunction described by an item without regard for what might be causing it, i.e., without regard for any specific health condition, prc'gnosis, or personal characteristics, in the context of which the behavior might seem more or less dysfunctional. As is shown below, the mean scale values of the items were stable and there was high agreement among judges. Since the items in all categories had been rated in terms of the same concept of dysfunction, the 25 judges were asked in the second step of item scaling to place those items that had been judged to be the most dysfunctional and least dysfunctional within each category on a single 15-point scale. Again, there was high agreement among the judges. The average scale value for each of these items was calculated. This process provided a set of commonly scaled endpoints within which the 15-point scale value for each of the remaining items in each category could be mathematically derived. The protocol scaling procedure was employed to validate the construct of dysfunction and determine the extent to which SIP scores relate to a more global assessment of dysfunction. Four groups of 25 judges* each rated 50 protocols of subjects obtained in a field trial of the SIP. This provided 25 ratings on 200 protocols. In the first step of this procedure, the judges were asked to rate each subject's protocol of responses in each category on an 11-point scale. The points on the scale ranged from "minimally dysfunctional" to "severely dysfunctional." As in item scaling, judges were asked to make their ratings without regard to the cause of the dysfunction. The mean scale value assigned to subject protocols was stable and there was high agreement among the judges. In the second step, the judges were asked to rate each subject's complete protocol on a 15-point scale. An analysis of each of these scaling procedures indicated that there was a high level of agreement among judges on both the ratings of items and the ratings of protocols. With respect to item scaling, results were analyzed in two ways. First, the correlation of each judge's ratings of 312 items with the mean of the 25 judges' ratings of these items was generally high and indicated that this approach produced reliable scale values (Table 2). Second, the agreement among the judges on each item scaled was generally high. Items were scaled with a mean standard deviation of 2.0 scale points with a standard deviation of the standard deviations of 0.45. The largest 95 per cent confidence interval for the mean scale value of any item retained was approximately two scale points. Twentynine items were omitted from subsequent scoring analyses
* None of these judges was involved in the item scaling; however, all judges were chosen from the same population subgroups. There were fewer physicians and nurses for protocol scaling than for item scaling.
1308 AJPH DECEMBER, 1975, Vol. 65, No. 12

TABLE 4-SIP Protocol Scaling: Mean Standard Deviation of Judgments within Judging Groups for Category and Overall Judgments

Judging Group

Category Scaling
MeanSD SD of SD Overall Scaling Mean SD SDofSD
1.6 0.5

1.7 0.4

1.6 0.4

1.5 0.4

2.0 0.8

1.9 0.7

1.9 0.7

1.9 0.7

because the 95 per cent confidence interval for these items was greater than two scale points. Results of the scaling of the endpoint items of each category were comparable. Third, an analysis was made of the difference between judges considered to be differentially sophisticated in health matters (clinically experienced nursing students and physicians, as opposed to medical and health services administration students). t-tests showed no significant difference between the two types of judges; this suggests that the health care backgrounds of the judges may not be so influential a factor as to invalidate the obtained scale values for use in developing the instrument.t A validation of the scaling using consumer judges is planned. With respect to protocol scaling, the results for each of the four groups of judges were analyzed in two ways. First, the correlations of each judge's ratings with mean ratings of subject protocols were high for each of the groups of judges. The median correlation for each group of judges is shown in Table 3. Second, the agreement among judges on each protocol scaled was consistently high for each of the four groups of judges. The mean standard deviation for the four groups of judges ranged from 1.6 to 1.7 in judging subjects' protocols by category on the 11-point scale, and from 1.9 to 2.0 in judging overall protocols on the 15-point scale (Table 4). Four scoring methods were tested7, in order to select a method that would best reflect factors that may have been taken into account by protocol judges while being sufficiently simple to allow interpretation and disaggregation of scores. Each of the methods reflects differently the pattern of dysfunction and the item scale values represented in a protocol. Each permitted calculation of a score for each category and for the overall SIP. The methods tested were: * A mean of the scale values of items checked. A mean score represents an average of the dysfunction weights of the items checked in a protocol; * A mean of the squared scale values of items checked.
Bush, and Chen between nurses and students in their ratings of case descriptions (personal communication with James W. Bush,
t It should be noted that no difference was found by Patrick,

MD, January, 1974). A published account of their work incorrectly reported a significant difference between nurses' ratings and students' ratings. 12

This represents an average of the dysfunction weights of items checked, but increases the relative weight of items that have high scale values; * A percentage of total possible dysfunction, which is the sum of the scale values of items checked divided by the sum of the scale values for all items, multiplied by 100. This method of scoring provides a relative frequency that is weighted by the magnitude of the scale value as well as by the number of items checked; * A profile indicating the number of items checked within one of four scale-point groupings. The determination of scale-point groupings is based on the distribution of items scaled across 15 points. While the profile represents a frequency distribution, its size as a number or score may also relate in some systematic way to the method of judging that takes into account protocol patterning from maximum to minimum as well as number of items.
All scoring methods related sufficiently well to protocol judging to give evidence of the validity of the values derived in item scaling and of the construct of dysfunction. The profile score showed the best relationship with protocol judgments. Since further analysis indicated that judges took into account both number of items checked and the item checked with the highest scale value in making their ratings, both the profile scoring method and the per cent scoring method were retained (Table 5).
TABLE 5-Correlation of SIP Percentage and Profile Scores with Mean Protocol Ratings by Category and Overall

Pilot Study of the SIP The limited field trial conducted as part of the initial development of the SIP provided useful albeit preliminary data about the feasibility, reliability, and validity of the SIP. In the field trial, 246 group practice enrollees were
interviewed. This represented 71 per cent of a sample of 357 drawn from five categories of medical care: inpatients, home care patients, walk-in clinic patients, outpatients, and nonpatients. Analysis of the completed interviews showed that 98 per cent of the items were used at least once, that 225 subjects of the 246 interviewed found at least one item that described them, and that the mean number of items checked per subject was 30. In view of the sampling design, this provides evidence of the broad applicability of the instrument. In addition, no subject refused to complete the interview once begun, and of the 357 people contacted and the 246 interviewed, no complaints of any kind were registered with the group practice or the SIP project. Of the refusals, 36 per cent were not interviewed because of health reasons. Most of these were among the elderly, the inpatients, and the home care patients. These refusals came not only from subjects but from medical care personnel whose permission was requested before inpatients were interviewed. Interviewers were instructed not to schedule or reschedule interviews for a time when the subject's health would be improved. Since this was a pilot study of feasibility, and patients' tolerance of the interview process was unknown, interviewers did not urge participation. Reliability estimates based on two administrations of the SIP to 31 subjects showed that overall scores were highly reliable. Test-retest correlations using the various scoring methods ranged from 0.80 to 0.88. When subjects were grouped by the category of medical care they were receiving at the time of the field trial (as indicated by their patient classification as described above), SIP scores were related to these categories in the expected direction. These data, along with the positive relationship between a self-assessment of sickness obtained from each subject and his SIP score, provide preliminary evidence of validity. (Subsequently, the SIP was revised on the basis of data obtained in the pilot study. A long form of the SIP, consisting of 235 items, and a short form, consisting of 146 items, were developed. A second field trial was conducted that provided a more definitive evaluation of the reliability and validity of the SIP. The results of this field trial are reported elsewhere.1" 15 In general, the results were positive and warrant further revision and refinement of the SIP.)


% Score*
0.90 0.82 0.89 0.86 -0.29t 0.82 0.75 0.86 0.90 0.79 0.83 0.85 0.88 0.86 0.93

Profile Scoret


0.95 0.92 0.96 0.93 0.58 0.96 0.97 0.97 0.94 0.86 0.94 0.95 0.88 0.98


Summary and Discussion

The SIP is a scaled measure of health-related dysfunction. It is being designed for use, in conjunction with other kinds of assessment, in evaluation of health care services and particularly of comprehensive health care programs. It is a behavioral measure, independent of diagnostic criteria, which relies solely on an individual's perception of the

* Pearson product-moment correlation.

t Spearman rank-difference correlation. t A positive response to "I am not working at all" precluded response to any of the remaining items in Category E. The percentage score in this case does not accurately reflect the degree of dysfunction relative to other response patterns. This will be taken into account in revising and refining
the SIP.

impacts of sickness on his usual daily activities. It is intended to provide a quantitatively sensitive and qualitatively detailed measure without imposing the limitations and uncertainties of diagnostic classification. It should be noted that no considerations relating to prognosis have been incorporated into the SIP. The imperfect and constantly changing state of medical science makes prognostication so uncertain that however attractive it may seem conceptually to include future risk in a health indicator, this is methodologically feasible in very few situations. Further, since the SIP is designed to measure "sickness" in a population at a given point in time, change in SIP from one administration to another should in itself be a valid and useful indicator of change in the health of the population under consideration. The examination of health in terms of separate dimensions, such as function, diagnosis, and prognosis, facilitates the study of each and its relationship to the others while allowing for the meaningful combination of these into a unified health status index. Since one of the objectives in developing the SIP was to construct a measure capable of detailed and comprehensive description of dysfunction, a comment is appropriate on the completeness of the SIP dysfunction catalog. There is evidence that the method used for eliciting descriptions of behavior related to sickness, and the adaptation of numerous descriptions found in existing instruments, have produced an adequately extensive catalog of dysfunction descriptions: (1) the yield of new, useful items from the open ended questionnaires used to collect descriptions of health-related dysfunction decreased sharply toward the end of the data collection period; (2) continuing review of the literature revealed no new descriptions for inclusion in the SIP; (3) field trial subjects, who were asked at the close of the interview to list any changes in their behavior that were not covered by the items in the SIP, added no new dysfunction descriptions to the SIP compendium. Thus, it is evident that the preliminary SIP contained a relatively complete inventory of items from which to revise and refine the instrument. Although the primary concern with regard to SIP content has been the inclusion of dysfunction descriptions in a wide variety of activity areas spanning a broad range of severity, the important issue of the level of detail desirable in such an instrument has not yet been dealt with in a systematic way. For practical reasons, the shortest possible instrument is desirable, yet condensation of content must not reduce qualitative and quantitative discriminative capability or reliability of results. Since the discriminative capacity of the SIP will depend in part on the descriptive detail retained, this issue will be a major focus of the

refinement process. Statistical results must be interpreted in terms of the use of the instrument and in terms of the descriptive information desired. Since there are no tested and documented guidelines, the finalization of the SIP will require the expertise of health care practitioners, as well as of consumers and other evaluation researchers.

2. 3.


5. 6.




REFERENCES Donabedian, A. Evaluating the Quality of Medical Care. Milbank Mem. Fund Q. 44:166-206, 1966. Brook, R. H., and Appel, F. A. Quality of Care Assessment: Choosing a Method for Peer Review. N. Engl. J. Med. 288: 1323-1329, 1973. Elinson, J. Effectiveness of Social Action Programs in Health and Welfare. In Assessing the Effectiveness of Child Health Services, edited by Bergman, A. B., pp. 77-81. Ross Laboratories, Columbus, 1967. Wilner, D. M., et al. The Housing Environment and Family Life: A Longitudinal Study of the Effects of Housing on Morbidity and Mental Health. Johns Hopkins Press, Baltimore, 1962. Cumming, E., and Cumming, J. Closed Ranks. Harvard University Press, Cambridge, 1957. Baumann, B. Diversities in Conceptions of Health and Physical Fitness. J. Health Hum. Behav. 2:39-46, 1960. Katz, S., Downs, T. D., Cash, H. R., and Grotz, R. C. Progress in Development of the Index of ADL. Gerontologist 10:20-30, 1970. Kelman, H. R., and Willner, A. Problems in Measurement and Evaluation of Rehabilitation. Arch. Phys. Med. Rehabil. 43:172-181, 1962. Belloc, N., Breslow, L., and Hochstein, J. R. Measurement of Physical Health in a General Population Survey. Am. J. Epidemiol. 93:328-336, 1970. Sullivan, D. F. Conceptual Problems in Developing an Index of Health. Vital and Health Statistics: Data Evaluation and Methods Research. National Center for Health Statistics, Series 2, No. 17. U.S. Government Printing Office, Washington, DC, 1966. Fanshel, S., and Bush, J. W. A Health Status Index and Its Application to Health Services Outcomes. Operations Res.

18:1021-1066, 1970. 12. Patrick, D. L., Bush, J., and Chen, M. Toward an Operational Definition of Health. J. Health Soc. Behav. 14:6-23, 1973. 13. Spitzer, W. O., Sackett, D. L., Sibley, J. C., Roberts, R. S., Gent, M., Kergin, D. J., Hackett, B. C., and Olynich, A. The Burlington Randomized Trial of the Nurse Practitioner. N. Engl. J. Med. 290:251-256, 1973. 14. Pollard, W. E., Bobbitt, R. A., Bergner, M., Martin, D. P., and Gilson, B. S. The Sickness Impact Profile: Reliability of a Health Status Measure. Med. Care, in press, February, 1976. 15. Bergner, M., Bobbitt, R. A., Pollard, W. E., Martin, D. P., and Gilson, B. S. The Sickness Impact Profile: Validation of a Health Status Measure. Med. Care, in press, January, 1976.

1310 AJPH DECEMBER, 1975, Vol. 65, No. 12