Praktikum Epid Ari

Sumber: http://www.epidemiolog.
net/epid168/exams/
University of North Carolina at Chapel Hill

School of Public Health
Department of Epidemiology
Fundamentals of Epidemiology (EPID 168)
Midterm Examination, Fall 1998
1. a. Briefly summarize two criteria on which disease classifications
are based. Discuss a reason why these two criteria do not always
correspond with one another. (3 pts)
b. List two examples of each of the two types of criteria you
mentioned in 1A. (2 pts)
2. Cohort studies can form the framework for efficient substudies,

using nested case-control and case-cohort study designs. Which of
the following best compares and contrasts these nested case
control studies and case-cohort studies. (3 pts)
A. Both nested case control and case-cohort studies select
controls that are matched on time of case development but
only case-cohort studies allow for multiple comparisons with
different case groups.
B. Both nested case control and case-cohort studies select
controls from the entire baseline cohort, but in case-cohort
studies the selection is done at random.
C. In case-cohort studies a single group of controls can be used
for comparison with several case groups.
D. In nested case control studies, cases are selected entirely
from the non-exposed cohort group.
E. both C and D
1. Name the three component parts of any kind of incidence measure.
(3 pts)
2. Over a ten-year period the number of bicycle injury events in a

population increases even as the age adjusted bicycle injury rate
decreases in the population. Describe two conditions that could
cause this outcome (assume the definition of a bicycle injury and
the quality of the data remain constant over the 10 year period) (3
pts)
3. Which of the following best describes the condition(s) that are

required for the odds ratio (OR) to estimate the risk ratio (RR) in a
case-control study. (choose one best answer) (3pts)
A. Incident cases are identified for a defined population at risk.
B. The controls represent the base population that gave rise to
the cases.
C. The disease outcome is rare in the base population at risk.
D. All of the above.
6. The association between induced abortion and breast cancer has
been the subject of previous epidemiological studies. Cohort
studies have found no association, while at least one case-control
study has found a positive association. Possible explanations for the
different results in case-control and cohort studies of this topic
include (choose single best answer). (3pts)
A. Case-control studies are prone to selection bias, whereas
cohort studies are not vulnerable to selection bias.
B. Recall bias might explain the association observed in a casecontrol study, but this would not be a problem in prospective
cohort studies.
C. The method of disease classification is different in casecontrol and cohort studies.
D. All of the above
7. Swaen et al (1998) conducted a study of 6,803 males who worked
for at least six months before 1/1/80 at one of nine chemical plants
in the Netherlands. The workers were followed for mortality from
1/1/56 until 1/1/96. Before 1/1/80, 2,842 of the workers were
occupationally exposed to acrylonitrile and the other 3,961 workers
were not exposed to acrylonitrile. After 1/1/80, there was no
exposure to acrylonitrile. To measure the association between
occupational exposure to acrylonitrile and several outcomes, the
investigators calculated standardized mortality ratios (SMRs) for
both the exposed and the unexposed workers. Age-interval-specific
person-years were generated for specific exposure groups and

were multiplied by the mortality rates for the total male population
of the Netherlands to generate expected numbers of cause specific
deaths.
a. What study design did the investigators use? (2 pts)
b. What was the (crude) cumulative incidence ratio (CIR) for

mortality comparing the exposed to the unexposed men?
What are two reasons why this measure is problematic with
these data?
c. For brain cancer, the SMR for the exposed workers

(SMR=173.9) was more than twice the SMR for the
unexposed workers (SMR=85.7). Why are these two SMRs
not strictly comparable? (3 pts)
d. There were 290 deaths due to all causes among the exposed
group and 983 deaths due to all causes among the unexposed
group. What measure of effect could be calculated to strictly
compare all-cause mortality between the exposed and the
unexposed group. (2 pts)
8. The issue of classification of disease is fundamental to
epidemiological investigations. The degree that we correctly
separate cases of disease from non cases can be quantified in terms
of specificity and sensitivity. The issue of correct classification is
important in research involving cerebrovascular disease (stroke).
Generally speaking there are two kinds of strokes, ischemic (blood
flow is restricted to brain tissue because of blocked artery in or
leading to the brain) and he morrhagic (a vessel in the brain
ruptures causing bleeding in the brain). These two pathologic
processes are quite different.
Background information:
A panel of experts reviewed the medical records of 525 patients
discharged from the hospital with diagnosis codes indicative of a
stroke (ICD 430-438). The panel classified strokes as either
ischemic or not ischemic. Assume the diagnos is reached by the
panel is the most accurate classification possible. Of the 525 cases,
325 had a discharge diagnosis code for ischemic stroke (ICD code
434). Of these 325 patients, 85 were determined by the panel not
to be ischemic strokes. All but 20 o f the patients with discharge
diagnosis codes other than 434 were determined by the panel to

have non-ischemic strokes.
Given the background information, compute the sensitivity,
specificity, and positive predictive value of a hospital discharge
code for ischemic stroke (ICD code 434) in classifying a patient as
truly having an ischemic stroke.
a. sensitivity of a 434 code: (2 pts)
b. specificity of a 434 code: (2 pts)
c. positive predictive value of a 434 code: (2 pts)
d. Constructing a receiver/response operating characteristic

(ROC) curve may be useful in understanding the implications
of using different case definitions. Briefly explain what a ROC
curve is and what information it provides. (2 pts)
e. If you were to use a 434 discharge code to identify a group of

cases with ischemic stroke and the sensivity was 99% but the
specificity was 40%, which of the following would best
describe your resulting case group. (choose one best answer).
(2 pts)
A. The case group would be highly homogenous with
respect to pathophysiology of stroke.
B. The case group would be highly heterogeneous
with respect to pathophysiology of stroke.
C. The case group would have many false negative
ischemic strokes.
D. The case group would represent the source
population of cases.
f. What two factors influence the positive predictive value
of a screening test in most situations? (2 pts)
9. Suppose that a study was conducted to compare the rates of
automobile collisions in two cities. The researchers were
impressed with studies that suggest that the use of cell
phones and pagers contribute to auto collisions. They wanted

to adjust (standardize) the rates of auto collisions in the two
cities for cell phone and pager use. Data on cell phone use
and auto collisions in the two cities were collected and are
presented in the table below.
Cell phone
and pager use
Corona del Mar, California
#
person
s
#
accident
s
293
100
974
27
300
Never
1106
15
8293
145
Total
6559
335
8693
153
Heavy
Moderate
#
person
s
#
accident
s
4479
Rate*
Boulder, Colorado
Rat
e*
* per 1000 persons

a. Calculate the crude total and cell phone/pager use
specific rates for Corona del Mar and Boulder. How do
these two cities compare in crude prevalence of auto
accidents. (2 pts)
b. Using the combined number of persons in both areas as

a standard, calculate a standardized rate (standardized
for cell phone/pager use) for each of the states. Use the
direct standardization method. Briefly describe how
these standardized rates compare with each other and
with the crude rates. Briefly describe any meaningful
differences. (4 pts)
c. In general, which of the following best describes a

major weakness of both crude and adjusted rates? (2
pts)
A. Both measures hide or obscure the heterogeneity in
the population.
B. Both measures are only estimates of the true
population rate.
C. Neither measure can be used to determine the
magnitude of disease burden in the population.

D. None of the above.
10.
In a community intervention study, like the Minnesota
Heart Health Program, the effectiveness of an educational
intervention program was evaluated. Which of the following
best describes the unit of assignment, the unit of observation,
and the unit of analysis in these types of studies (in this
order)? (2 pts)
A. group, person, group
B. person, group, group
C. group, group, group
D. none of the above
11.
Indicate next to each statement below whether you
consider it to be TRUE, FALSE, or if you are NOT SURE. A
correct answer receives 2 points, an incorrect one zero.
a. An advantage of cohort designs compared to the pure
case control designs is that cohort studies can directly
estimate risks.
b. The temporal sequence of exposure and disease can be
directly addressed in a cohort design as well as in a
case control study.
c. A disadvantage of the cohort design compared to a case
control study design is that in a cohort study one
cannot address multiple outcomes.
d. As described in class, a randomized clinic trial is an
example of a prospective dynamic cohort study.
e. A disadvantage of the cohort design compared to a case
control study is that in a cohort study one needs to
follow a large number of participants if the disease is
rare.
f. Ecological studies cannot directly assess causal
inference because they measure disease and exposure
in a person at the same point in time.
g. Correlation studies can be quick, inexpensive, and
allow for multinational comparisons.
h. A case report is a type of descriptive study that is
commonly conducted, partially because an appropriate
control group is easily defined.
i. Cross-sectional studies are limited by their lack of

generalizability, but are powerful in that they directly
measure risk.
j. The study of person, place, and time helps to
understand the natural history of a disease.
k. A risk difference is determined by the absolute
difference in two incidence rates, whereas the relative
difference is considered an attributable risk.
l. A correlation coefficient measures the degree of linear
or monotonic relationship between two variables and is
therefore suitable for determining the epidemiologic
strength of association between them.
m. As an estimate of a relative risk, an odds ratio is a
measure of association that can be used to determine
the magnitude of an association between exposure and
an outcome.
n. An attributable risk proportion is a measure of the
impact assessing how much risk results from exposure
levels. Attributable risks that adjust for the prevalence
of the causal factor in a population is called a
population attributable risk.
o. Case control studies have several crucial advantages
that relate to their efficiency for studying rare
conditions and those with prolonged induction and their
efficiency in examining many exposure and outcomes.
p. Incidence density is a proportion where the units of
time are specified.
q. The decision to use an incidence density measure or a
cumulative incidence as a measure of the strength of
association may depend on the objectives of the study.
Cumulative incidence is preferred if estimating
individual risk is the main objective.
r. A standardized mortality ratio (SMR) can be
determined using indirect adjustment. Because rates
from a standard population are used, SMRs from two
study populations can be compared as long as the rates
in the standard population are stable.
s. Comparability between cases and controls is a
important step in constructing a case-control study. It
should be possible to detect exposure in controls to the
same extent as in cases. It is also critical that controls
have similar motivation and availability as cases. These
two conditions are best met when controls are selected

from the general population.
12.
Attributable measures are used by researchers to
assess the public health impact of a detrimental exposure,
assuming causality. Given data from a cohort study on the
incidence of stroke (see below), estimate the attributable risk
proportion among the exposed (physically inactive). Explain
your answer in one sentence. Assume that physical activity is
causally related to stroke risk.
Did develop
a stroke
Do not
develop a
stroke
Person
years (PY)
45
5,955
43,200
INACTIVE
135
13,865
100,800
Total
180
19,820
144,000
Physical
activity
level
ACTIVE
Incidence
per 1,000
PY
a. Attributable risk proportion (INACTIVITY) (3 pts)
Explain:
b. Additional data from the National Health and Nutrition

Examination Survey (NHANES) suggest the prevalence of a
physically active lifestyle (at least 30 minutes of moderate
activity 3 days per week) is 27%. Using this information and
your answer to part (A), estimate what we can hope to
accomplish with programs to get people to be physically
active in the total population. In one sentence explain your
answer. (3 pts)
Explain:
13.
Suppose that in 1998 researchers hypothesized that
communication ability and skill in young adulthood was
related to Alzheimers Disease. To test this they evaluated
hand written essays completed by a group of 350 nuns joining
a single religious sect in 1930. By careful review of these
writing samples, the researchers categorized all 350 as either
having a high error profile (N=150) or a low error profile

(N=200). Using surveillance of death certificates and other
methods the researchers verified vital status of each nun
through 1998. An accounting of all deaths produced the table
below.
Cause of Death and Year by Handwriting Profile Status
High error
profile
Low error
profile
# of
Death
s
Year of
Death
Alzheimers
Disease
1985
1985
Alzheimers
Disease
1990
1990
Alzheimers
Disease
1995
1995
Heart Disease
1980
Heart Disease
10
1980
Heart Disease
10
1990
Heart Disease
15
1995
Other
20
1960
Other
25
1960
Other
10
1970
Other
30
1970
# of
Deaths
Year of
Death
Alzheimers
Disease
1980
Alzheimers
Disease
Alzheimers
Disease
Alzheimers
Disease
Cause of Death
Cause of Death
a. Describe the type of study design used in this example. (2

pts)
b. Compute the incidence density rate of Alzheimers disease

death for those with a high error profile and for those with a
low error profile. (3 pts) Show your work.
c. Compute the incidence density ratio for the risk of

Alzheimers disease death associated with a high error
communication profile. Explain, in two sentences or less,
what this value means. (3 pts)
d. Using data from this study compute an odds ratio for the
association of a high error communication profile with death
from Alzheimers disease. Show a clearly labeled 2x2 table. (2
pts)
e. Compare the odds ratio with the incidence density ratio

computed in part c and explain why they are similar or
different.
Answer Guide
1. a. Manifestational criteria: disease definition and classification
based on observable characteristics, such as symptoms, signs,
history, labloratory findings, response to treatment, prognosis.
Causal criteria: disease definition and classification based on the
cause of the condition,
b. Manifestational criteria: Examples are cancers, arthritis,
cholescystitis, schizophrenia, depression, addiction, insomnia, . . .
Causal criteria : microbial diseases for which the pathogen has
been identified (syphilis, TB, malaria, yellow fever, influenza, etc.),
lead poisoning, birth trauma,
2. (C)- Other choices are incorrect because controls in case-cohort
studies are not matched to cases (A), contrrols are selected at
random with both designs (B), and cases must be selected without
regard to exposure (D).
3. New cases or events, population at risk or source population,
passage of time
4. The size of the population may have grown (number increases even
though rate does not); the age distribution of the population may
have changed (e.g., influx of families with small children,
outmigration of families with older children), so that agestandardized rate may not change but a greater proportion of the
population may be in the higher risk age range (assuming that
younger children have higher injury rates).
5. (D)- All of the above - use of prevalent cases requires that duration
is not related to exposure, controls should provide estimate of
exposure in study base, and rare disease assumption is required for
OR to estimate RR (though not for OR to estimate IDR).
6. (B)- In a prospective cohort study, information on exposure is

obtained before the outcome (breast cancer, in this case) has
occurred. Therefore recall bias - different recall by cases and
noncases - is not an issue. In a case-control study, cases and
noncases may recall and report exposure with different degrees of
accuracy.
7. a. A (retrospective) cohort study.
b. CIR = (290/2,842) / (983/3,961) = 0.411
A cumulative measure ignores possible differences in length of
follow-up between groups being compared. A crude measure
ignores possible differences in the age distributions between men
who have been exposed and men who have not.
c. SMRs are an indirect method of standardization, since they are
based on weighted averages for which the weights are taken from
the population whose SMR is being computed rather than from a
"standard" population. Unless the age (and in this case, agecalendar year interval) distributions for the populations whose
SMR's are being computed are the same, then the weighted
averages that make up the SMR's are based on different sets of
weights and are not strictly comparable. Since age-interval
distributions of exposed and unexposed workers may differ, their
SMR's are not strictly comparable.
d. Mortality rates computed with person-time denominators can be
compared between exposed and unexposed person-time. These will
take into account the varying amounts of follow-up for workers in
different categories. Unless the person-years at risk for exposed
and unexposed workers have the same age distribution, which we
do not know, then adjustment for age is needed. Since there are
ample numbers of deaths from any cause, mortality rates can be
directly-standardized using any reasonable set of weights. Since
directly-standardized rates are "strictly comparable", a ratio or
difference of directly standardized rates would be a suitable
measure of association.
8. All but 85 of the 325 code 434's were correct classifications, so
there were 240 (=325-85) ischemic stroke patients correctly
classified by discharge code. All but 20 of the patients without code
434 were judged to have had an ischemic stroke, meaning that 20
were judged to have an ischemic stroke. Thus, there were 260
(240+20) ischemic stroke patients, of whom 240 were identified by
discharge code (sensitivity=240/260). The remaining 265 (=525260) patients did not have an ischemic stroke, and 180 of them
were in fact not given a code 434 (specificity=180/265). Of the 325
code 434's, 240 had had an ischemic stroke (PPV=240/325). These
data are summarized in the following table:
Comparison of discharge code 434 and classification by expert

panel
Expert panel
Discharge code
Ischemic
Not ischemic
Total
Code 434
240
85
325
Other
20
180
200
Total
260
265
525
a. Sensitivity= (325-85) / [(325-85+20) = 240 / 260 = 92.3%
b. Specificity = (200-20) / (525-260) = 180 / 265 = 68%
c. Positive predictive value of a 434 code = (325-85) / 325 = 73.8%
d. An ROC curve plots the value of sensitivity and specificity for
each case definition or cutpoint. Examining the ROC curve shows
the trade-off between sensitivity and specificity that is available for
the diagnostic test or measurement method. [The area between the
identity diagonal (slope = 1.0) and the ROC curve serves as a
measure of accuracy that takes into account both sensitivity and
specificity, with the assumption that the costs of false negatives and
false positives are the same.]
e. (B) - Due to the low specificity (50%), half of hemmorhagic
strokes in the patient group will be classified as ischemic strokes.
f. Specificity and prevalence of the condition
9. a. Corona del Mar has a 2.9 times higher crude accident rate than
Boulder.
Corona del Mar = 51.1/1000 and Boulder = 17.6/1000. Ratio = 2.9
b. Adjusted rates Corona del Mar: (4579 x .0654) + (1274 x .0277) + (9399 x .
0136)/15,252 = 29.9/1000
Boulder: (4579 x .0200) + (1274 x .0200) + (9399 x .0178)/15,252
= 18.6/1000
The cell phone/pager adjusted auto accident rate for Corona del
Mar was 1.6 times that of Boulder. A portion of the difference seen
in the crude rates was due to differences in the distribution of use
of cell phones and pagers between the two cities.
The standard weights are the sum of the population sizes for the
two cities. The weighted rates are the rates for each city, weighted
(multiplied) by the standard weights. The total of the weighted

rates is the directly standardized rate. A problem in using the
directly standardized rates is that there are small numbers of
cellular phone and pager users in Boulder.
The higher crude rate in Corona del Mar reflects the much higher
use of cellular phones and pagers, which is associated with a much
higher accident rate. The difference is reduced for the standardized
rates, since these control for the different distributions of cellular
phones and pagers between the two cities. However, this is a
situation where it is essential to examine the specific rates, since
Boulder has lower accident rates among cellular phone and pager
users but a higher rate among never-users.
Since the rates in never users are quite similar, Corona del Mar is
likely to make its greatest impact on accident rates by getting
motorists to reduce cellular phone and pager use while driving or
finding some way to such use safer (promote the use of "designated
drivers"!?).
c.(A) Both measures obscure heterogeneity (variation) in rates
across subgroups.
10.
(A) Community intervention trials of this type assign groups
to treatments and collect measurements from individuals. The unit
of analysis must be the same as the unit of assignment (GROUP) or
both (i.e., using mixed models).
11.
a. T a cohort study enrolls people who are free of the
outcome and monitors them for the development of the outcome, so
the cohort design can be used to estimate risk of the event;
b. Not sure the temporal sequence of exposure and disease can
typically not be addressed in a case-control study, though in some
cases (e.g., a genetic characteristic or other "exposure" that can be
definitively assigned to a time prior to disease onset);
c. F a cohort design can readily be used to study multiple
outcomes; a case-control design can readily be used to study
multiple exposures;
d. T a randomized clinical trial often enrolls participants over a
period of time, with follow-up time measured from the time of
randomization;
e. T a cohort study begins with disease-free subjects and monitors
them for development of the outcome; if the outcome is rare, many
subjects must be followed to obtain an adequate number of cases;
f. F ecological studies use group-level variables (e.g., per capita

meat consumption) and relate them to disease rates; direct
assessment at the individual level is NOT made, which is the basis
for the ecological fallacy (where the group data are used to infer a
link at the individual level);
g. T correlational studies (another term for ecological studies) are
often used to compare disease rates across geopolitical entities
using available data;
h. F a case report does not involve a control group;
i. F cross-sectional studies measure prevalence, not risk (of a
future event); they are the most statistically generalizable type of
study when, as is often the case, the study population is obtained
through population-sampling;
j. F the natural history of a disease is the process by which it
develops over time; descriptive information relating to person,
place, and time can at best provide only indirect information;
k. F as used in class, the term "attributable risk" refers to the risk
difference;
l. F strength of association as used in epidemiology refers to the
degree of change in the one variable with respect to changes in the
other variable; two variables can be very strongly correlated (vary
linearly or motonically) yet a large change in one may be
associated with only a small change in the other (e.g., a straight
line with a modest slope has a high correlation but a small degree
of change in the ordinate variable for a given change in the
variable on the abscissa);
m. T for a rare outcome, the odds ratio (OR) closely approximates
the cumulative incidence ratio (CIR) and incidence density ratio
(IDR), so it indicates strength of association in the epidemiologic
sense; when the outcome is not rare, the OR does not approximate
but does vary with the CIR and IDR, so the OR still gives an
indication strength of association
n. T an attributable risk proportion estimates the proportion of
risk that is associated with an exposure in people who are exposed;
attributable risk (as used in this course) is the risk difference,
which indicates the amount of risk associated with an exposure in
people who are exposed; attributable risk must be adjusted for the
prevalence of the exposure in order to estimate the amount of risk
associated with exposure in the population as a whole;
o. F since case-control studies begin with people who are already

cases, they avoid having to study a large number of people for a
long time in order to accumulate enough cases; they can also
compare cases and controls in respect to many exposures;
HOWEVER, they cannot readily study many outcomes, since to do
so requires enrolling cases for each of the outcomes to be studied
(i.e., equivalent to conducting several case-control studies that
share the same control group);
p. F incidence density is a (relative) rate; cumulative incidence is
a proportion;
q. F incidence density and cumulative incidence are measures of
frequency of occurrence, not of strength of associatiion;
r. F comparability of standardized rates and ratios across study
populations requires that the standardized measures be
constructed using the same set of weights; indirect standardization
(e.g., via a SMR) employs the weights (the number of people in
each stratum) from the study population, so measures standardized
using this method are, strictly speaking, useful only for comparing
a study population with the standard population used in the
standardization;
s. F typically, general population controls will be less motivated
than cases and sources of medical information for them will not be
comparable to those for cases.
12.
a. ARP = (I1 - I0) / I1 = (RR-1) / RR = (1.34-1.04) / 1.34 =
0.30 / 1.34 = 22% (after rounding)
The "I can't remember formulas" method:
ARP = attributable cases / all exposed cases = attributable
cases / 135
Attributable cases = attributable risk * Exposed PY = (1.341.04)*100,800 = 30.24
ARP = 30/135 = 22% (after rounding)
Interpretation: Based on these data, 22% (about one in five) strokes
in people who are physically inactive can be attributed to their
physical inactivity; in other words, if physically inactive people
became active early enough in their lives, their stroke incidence
would decrease by 22%
b. A key point here is that 27% is the prevalence of physically
active people, whereas the exposure is physical inactivity, whose
prevalence is therefore 100% - 27% = 73%
PARP = p1(RR-1) / [1 + p1(RR-1)] = 0.73(1.286-1) / [1 +

0.73(1.286-1)]
= (0.73 x 0.286) / (1 + 0.73 x 0.286) = 0.209 / 1.209 = 17%
(The formula PARP = (I - I0) / I can also be used by first estimating
the crude population incidence, I, as a weighted average of the
incidences in exposed and unexposed, weighting by the prevalence
of exposure, e.g.: I = (0.73)(1.34) + (0.27)(1.04) = 1.26, so PARP =
(1.259 - 1.04) / 1.259 = 17%
The "I can't remember formulas" method:
PARP = Attributable cases / All cases
Attributable cases are (1.34-1.04) x number of exposed personyears. Since we do not know the population size, represent it by n.
Based on the NHANES data, 27% of people are physically active, so
there are 0.73n physically inactive people (in one year, 0.73 personyears). So: Attributable cases = (1.34-1.04)(0.73) = 0.219.
All cases are exposed cases + unexposed cases. Since we do not
know the population size, let it be represented by n. Based on the
prevalence of physically active people, there are 0.73n phyisically
inactive and 0.27n physically active people (or person-years, if we
assume a one-year period). So the total number of cases = exposed
cases + unexposed cases = 0.73(1.34) + 0.27(1.04) = 1.259
Therefore, PARP = 0.219/1.259 = 17%
Note that these measures can be computed more precisely by using
the original number of cases and person-years and not rounding
intermediate results, but two significant figures is adequate for the
actual result, and in this case the answer does not change.
Explanation: Seventeen percent of all strokes in the population are
attributable to physical inactivity; if everyone were physically
active, there would be 17% fewer strokes.
c. Attributable risk measures assume that the relationship is causal
(i.e., that physical inactivity does in fact cause an ncrease stroke
risk). Some of the above interpretations may also require that the
process be reversible, so that changing to a physically active
lifestyle brings risk down to the level of someone who was not
inactive. Another assumption is that the rates and rate ratio
observed in the cohort study hold ofr the entire population. Also,
we have ignored the effects of other factors, most notably age.
13.
a. This is a retrospective cohort study (researchers developed
the hypothesis in 1998).
b. High error profile: (2 + 5 + 6 + 5)/8021 = 2.24 per 1,000
women-years.
Low error profile: (1+3+4) / 12,287 = 0.651 per 1,000 wy
Women-years (WY) are computed as follows:
End
1980
1985
1990
1995
1980
1995
1960
1970
1998
Totals
Start
1930
1930
1930
1930
1930
1930
1930
1930
1930
Years
50
55
60
65
50
65
30
40
68
Women
2
5
6
5
10
15
25
30
52
150
WY
100
275
360
325
500
975
750
1,200
3,536
8,021
c. IDR= ID High / ID low = 2.24/0.651 = 3.4. Nuns with a high

error communications profile are 3.4 times more likely to die from
Alzheimer's Disease than nuns with a low error profile.
d.
Alzheimers Disease
Handwriting Profile
AD Yes
AD No
High error
18
132
Low error
192
odds ratio = (18) (192)]/[(8) (132)] = 3.27

e. The two are similar because the condition is fairly rare.
Instructions:
o Write the last 4 digits of your ID number in space provide on each page (top
right).
o
Write clearly and legibly; avoid writing on the back of these pages.
Show all your work and include units where appropriate.
Write all answers and computations on these pages.
1. Which of the following best describes the retrospective design where subjects are
sampled by disease status and is often used when the investigator is interested in rare
diseases. (4 pts)
A. intervention trial
B. case control study
C. retrospective cohort
D. ecologic study
E. none of the above
2. Which of the following best describes the study design that can be either retrospective
or prospective and is often used when the investigators are interested in rare
exposures. (4 pts)
A. intervention trials
B. cohort studies
C. prevalence studies
D. case control study
3. The strength of an association is one of the criteria for evaluating the cause and effect
relationship between an exposure and outcome. Which of the following is a measure
of the strength of association? (Choose one best answer). (4 pts)
A. incidence rate among the exposed
B. cumulative incidence among the exposed
C. the ratio of odds of exposure among cases to the odds of exposure among the
non-cases
D. odds of disease among exposed relative to the prevalence of exposure in the
source population
4. Incidence rates of a disease are often referred to as direct measures of risk. Can
incidence rates be calculated from case-control studies? Briefly explain in 1-2
sentences why they can or can not be calculated. (4 pts)
5. For each of the following epidemiological measures, indicate whether it is a rate, a

proportion or that it is neither a rate nor a proportion. Circle the best answer. (1 pt
each)
a. Population attributable risk
RATE
PROPORTION
NEITHE
R
b. Incidence density (ID)
RATE
PROPORTION
NEITHE
R
c. Prevalence
RATE
PROPORTION
NEITHE
R
d. Relative risk
RATE
PROPORTION
NEITHE
R
6. Indicate true or false next to each of the following. (2 pt each)
____ ____ a. A "J" or "U" shaped relationship of a continuous risk factor and continuous
measure of disease suggests a Pearson product-moment correlation coefficient
of near plus one or minus one.
____ ____
b. A risk ratio measure and a correlation coefficient are both measures of

association.
____ ____
c. A population attributable risk proportion depends on the prevalence of

exposure and is not directly related to the strength of an association.
____ ____
d. The study base for a case-control study consists of those people who if they
developed the disease could have been counted as cases.
e. The Bradford Hill criterion "coherence" means that the association has been
____ ____ observed repeatedly in different places, by different observers, and at different
times.
____ ____
f. If an exposure is a cause of a disease, then "temporality" is the Bradford Hill

criterion for causal inference that must hold true between exposure and disease.
7. The death rates from various conditions are often compared across geographic areas.
These comparisons are usually based on directly age-standardized mortality rates.
Which of the following best describes what is meant by an age-standardized rate
created by the direct method? (Choose one best answer). (4 pts)
A. The number of events in each age stratum of a standard population is used to
create a weighted average rate.
B. The event rates in each age stratum in the standard population are used to
create a weighted average rate.
C. The event rates in the geographic area of interest are applied to the agestratum sizes of a standard population to create a rate that is a weighted
average.
D. The event rates in the geographic area of interest are compared to the event
rates of a standard population to create a summary rate that is a weighted
average.
8. In order to estimate counts and rates of work-related fatalities, the National Traumatic
Occupational Fatality system has introduced a tick-box on the death certificate to
indicate "injury at work." Kraus et al. (Am J Epidemiol 1995; 141: 973-9) attempted
to validate this "injury at work" classification system against a gold standard
[International Classification of Diseases (ICD) death certificate codes designating
deaths that occurred during work-related activities]. After reviewing a sample of
100,000 death certificates, the authors reported the following: 1,195 true positives;
788 false positives; 97,672 true negatives; 345 false negatives. ("positive" indicates
that the tick-box was checked; "negative" indicates that it was not checked; "true"
indicates agreement between the tick-box and the ICD code).
a. Using the counts provided above, complete the 2x2 table below: (2 pts)
ICD Classification
Death
Certificate
Workrelated
Not workrelated
TOTAL
Work-related
Not workrelated
TOTAL
b. What are the sensitivity and specificity of the "injury at work" classification
system? (4 pts)
c. What is the positive predictive value? In your own words, how

would you interpret this value? (3 pts)
d. Based on these data is the death certificate "injury at work"

classification system likely to underestimate or overestimate
the true number of work-related fatal injuries? (2 pts)
e. The use of data from the "tick-box" on the death certificates

to track work-related mortality trends is an example of which
kind of surveillance system? (choose one best answer). (4 pts)
A. Active surveillance
B. Passive surveillance
C. Retrospective cohort surveillance
D. Cross-sectional survey surveillance
f. The sensitivity and specificity computed above are
quantitative measures of which of the following aspects of
death certificate classification of work-related fatalities?
(choose one best answer). (4 pts)
A. Reliability of death certificate classification

B. Repeatability of death certificate classification
C. Validity of death certificate classification
D. Attributable risk of work-related classification on death
certificates
E. None of the above
9. Age-related maculopathy is a leading cause of blindness among
people 65 and older in the United States, and is estimated to affect
between 16 and 26% of people in this age group. In a recent study
by Klein, residents aged 43 to 86 years in the town of Beaver Dam,
Wisconsin were asked to participate in a study to determine
whether cigarette smoking was related to age-related maculopathy.
At a baseline examination, participants were asked to report their
lifetime smoking habits. After 5 years, participants had an
examination to determine whether they had developed age-related
maculopathy. The following table presents the number of cases of
age-related maculopathy measured at the follow-up examination
among the 1232 male participants ages 43-86 who did not have age
related maculopathy (ARM) at the baseline examination:
Smoking status
Cases of ARM
Never smokers
368
26
Ever smokers
864
79
a. Which of the following best describes the research design

used by in this study? (choose one best answer) (3 pts)
A. Population based cross-sectional study
B. Case cohort study
C. Nested case control study
D. Prospective cohort study
b. Create a 2 x 2 table where one axis is smoking status and the
other is age-related maculopathy status. (4 pts)
c. Calculate the 5-year cumulative incidence of age-related

maculopathy in ever smokers, and in never smokers. Show
your work. (4 pts)
d. Calculate the cumulative incidence ratio comparing the

incidence of age-related maculopathy in ever smokers with
that in never smokers. Show your work. (4 pts)
e. Assuming causality, what is the proportion of cases of agerelated maculopathy that could have been prevented in the
population of males ages 43-86 in Beaver Dam if the smokers
had never smoked? Show your work. (4 pts)
10.
The following data come from a national survey of the
occurrence of back pain. A case of low back pain was defined as
having at least one episode of severe back pain occurring over a
period of 6 months. The number of cases was obtained from
surveys of different occupation groups as well as a national random
sample.
Cell phone
manufacturing
Textile
manufacturing
National random
sample
Age
Perso
ns
case
s
Rate
Perso
ns
Case
s
Rate
Perso
ns
Cas
es
rate
25-39
1000
.002
100
.02
10,00
0
30
.003
40-55
700
25
.037
500
30
.06
15,00
0
900
.06
55+
50
15
.300
1500
150
.100
15,00
0
120
0
.08
Total
1750
42
.024
2100
182
.087
40,00
0
213
0
.053
a. Compute a standardized event ratio (similar to a standardized

mortality ratio (SMR) except the episodes of back pain arent
mortal events) of back pain for the cell phone-manufacturing
employees. Briefly state in one sentence the interpretation of
this measure in this case. (3 pts)
b. Compute a standardized event ratio (similar to a standardized

mortality ratio (SMR) except the episodes of back pain arent
mortal events) of back pain for the textile-manufacturing

employees. Briefly state in one sentence the interpretation of
this measure in this case. (3 pts)
c. Can these two ratios in part (a) and (b) be compared? Briefly
explain why or why not. (3 pts)
11. The evidence supporting obesity as a risk factor for colon cancer
remains inconclusive, especially among women. A recent study (Am

J Epidemiol 1999;150:390-398) reported the association between
obesity (measured at baseline) and colon cancer morbidity as
determined from review of medical records and death certificates
in a nationally representative cohort of men and women age 25-74
years who participated in the First National Health and Nutrition
Examination Survey from 1971 to 1975 and were subsequently
followed up through 1992. The following table is from this study for
men and women combined.
Baseline body
Number of incident
cases of colon cancer
Person-years
of follow up
<22
28
53,475
22 - <24
41
38,919
24 - <26
36
36,610
26 - <28
40
32,635
28 - <30
35
21,122
30+
42
34,904
mass index*
Crude incidence
rate/100,000
PY
* kg body weight per height in meters squared

a. Which of the following best describes the research design
used in this study? (choose one best answer). (2 pts)
A. Cross-sectional survey
B. Ecological study
C. Population based case control study
D. Cohort study
b. Complete the table by calculating the crude body mass indexspecific incidence rates. (3 pts)
c. Calculate the relative risk (RR) of colon cancer associated

with a BMI of 28-<30. Use the lowest BMI category as
referent. In one sentence interpret your answer. (2 pts)
d. Calculate the attributable risk proportion of those in the 28<30 BMI category. In one sentence interpret your answer.
(the attributable risk formulas provided in class can be used
even though the data provide is for rates) (2 pts)
12.
Analyses of data from cohort studies often have to deal with
the reality that participants have unequal lengths of follow up.
Given the data below, calculate the (a) total person time (month) of
follow up, (b) the overall incidence density rate, (c) 13 month
cumulative incidence, and (d) the product limit estimate of failure.
Each horizontal line represents a cohort participant. Each vertical
line represents one month. Arrows indicate time of loss to follow
up. Black boxes indicate onset of disease (failure). (2 pts each)
a. ______________
b. ______________
c. ______________
d. ______________
Answer Guide
1. B. Case-control studies are said to use sampling by disease and are
suited for studying rare diseases.
2. B. Cohort studies can be either retrospective or prospective and
are often used to study rare exposures.
3. The ratio of odds of exposure among cases to odds of exposure
among noncases is the odds ratio, which is a measure of
association.
4. Incidence rates cannot be estimated from case-control studies

without additional information. In the case-control design selection
of subjects is based on disease status, so the number of cases is
under the control of the investigator. If the investigator has access
to all cases and knows the size of the population from which they
arise s/he can estimate incidence, but knowledge of the population
size is not available from the case-control design.
5.
a. Population attributable risk (PARP)
Both "proportion" and "neither" received credit, since this is
a subtle distinction. According to Regina Elandt-Johnson (Am
J Epidemiol 1975;102:267-271), a proportion is a type of ratio
in which the numerator is included in the denominator [p=a/
(a+b)]. Since PARP can be expressed as ("attributable"
cases / all cases), it is indeed a proportion. However, it can
also be expressed as a difference of two proportions (I-I0) or
the product of a proportion (prevalence) and the difference of
two proportions [p(I1-I0)], so it is easy to be misled about its
mathematical form (indeed, the "official" answer to this
question could not explain why it is a proportion!).
b. Incidence density (ID) is a RATE.
c. Prevalence is a PROPORTION.
d. Relative risk is NEITHER a rate nor a proportion.
6. Indicate true or false next to each of the following. (2 pt each)
a. FALSE A Pearson product-moment correlation coefficient
measures the extent to which a relationship is linear, so a
value of plus one or minus one corresponds to a straight line.
b. TRUE A risk ratio measure and a correlation coefficient are

both measures of association.
c. FALSE A population attributable risk proportion depends on

the prevalence of exposure and is ALSO directly related to
the strength of an association.
d. TRUE The study base for a case-control study consists of

those people who if they developed the disease could have
been counted as cases.
e. FALSE The Bradford Hill criterion "coherence" means that

all of the known facts about the relationship fit into place; the
criterion of "consistency" means that the association has
been observed repeatedly in different places, by different
observers, and at different times.
f. TRUE "Temporality" is the one Bradford Hill criterion for

causal inference that must hold true between exposure and
disease.
7. C. "The event rates in the geographic area of interest are applied to
the age-stratum sizes of a standard population to create a rate that
is a weighted average" describes a directly-standardized rate.
8.
a.
ICD Classification
Death
Certificate
Work-related
Not workrelated
TOTAL
Workrelated
Not workrelated
TOTAL
1195
788
1,983
345
97,672
98,017
1,540
98,460
100,000
b. Sensitivity = 1,195/1,540 = 78% Specificity = 97,672/98,460

= 99%
c. Positive predictive value = 1,195/1,983 = 60%
d. Based on these data the death certificate "injury at work"

classification system will overestimate the true number of
work-related fatal injuries, since more non-work-related
injuries will be classified as work-related than vice-versa.
e. B. Passive surveillance the reports are submitted by health

care workers in conformance with a general obligation rather
than in response to a specific request from the surveillance
organization.
f. C. Sensitivity and specificity are measures of validity, since

there is a standard for "truth".
9. D. Prospective cohort, since the investigators monitored people
without the condition over time to detect its development.
Cigarette smoking status
Ever smokers
Never
smokers
Total
Case
ARM
cases
79
26
105
Status
Non-cases
785
342
1127
Total
864
368
1232
a. CI in ever smokers = # new cases / population at risk =

79/864 = 0.091 in 5 years
CI in never smokers = # new cases / population at risk =
26/368 = 0.071 in 5 years
b. (was labeled "e") Cumulative incidence ratio (CIR) = CI in
ever smokers / CI in never smokers
= (79/864) / (26/368) = 1.29
c. (was labeled "f") PARP = (overall incidence incidence in

never smokers) / overall incidence of ARM
= (0.0852 0.0707) / 0.0852 = 17%
10.
a. Standardized event ratio (for cell phones) = SMR (cell phone)
= observed/expected
= 42/{(.003)(1000) + (.06)(700) + (.08)(50)} = 42/49 = 0.86
b. Standardized event ratio (textiles) = SMR (textile) =

observed/expected
= 182/{(.003)(100) + (.06)(500) + (.08)(1500)} = 182/150 =

1.2
c. These two ratios cannot be compared directly. An SMR is a

weighted average where the weights (e.g., age structure)
come from the population for which indirect standardization
is being carried out. So SMRs for two populations use
different weights. Unless the populations have identical age
structures, the stratum-specific rates are the same for all
strata, or the stratum-specific rates for one population are a
constant multiple of those for the second population, the
comparison is invalid. With indirect standardization, it is
actually the "standard population" rates that are being
"standardized" to the age distribution of the study population.
11.
Baseline body
mass index*
Number of incident Person-years

Incidence
cases of colon cancer of follow up rate/100,000 PY
<22
28
53,475
52.4
22 - <24
41
38,919
105.3
24 - <26
36
36,610
98.3
26 - <28
40
32,635
122.6
28 - <30
35
21,122
165.7
30+
42
34,904
* kg body weight per height in meters squared

a. D. Cohort study
b. RR of colon cancer for BMI 28-<30 kg/m2 vs. lowest =

165.7/52.4 = 3.16
c. ARP for BMI 28-<30 kg/m2 vs. lowest = (3.16 1) / 3.16 =

68%
The ARP of 68% means that 68% of the incidence in the 28<30 kg/m2 group is attributable to elevated BMI.
12.
a. 43 person-months
b. 3 cases/43 person-months = 7.0 cases per 100 person-months
c. 13-month CI = 3/7 = 0.43
d. Product-limit estimate of survival = 1-[(6-1)/6 x (5-1)/5 x (31)/3)] = 1-0.444 = 0.555
Final Examination, Fall 1999

The questions on this examination are largely based on Cantor KP, Lynch
CF, Hildesheim ME, Dosemeci M, Lubin J, Alavanja M, Craun G. Drinking
water source and chlorination byproducts in Iowa. III. Risk of brain
cancer. Am J Epidemiol 1999;150:552-60. You may refer to an
unannotated copy of this article during the examination.
1. Briefly discuss two reasons why a case-control study is (or is not)
well suited to examine risk factors for brain cancer. (3 pts)
2. The authors describe the study design they used as a "populationbased case-control study". Briefly explain how this is different than
a non-population based case-control study. Include in your answer
issues regarding the selection of cases, selection of controls, and
validity. (3 pts)
3. Cases were identified by the State Health Registry of Iowa. Which

of the following categories of study design best describes this
method of case finding? Choose one best answer. (3 pts)
A. Prospective follow-up
B. Passive surveillance
C. Cross-sectional survey
D. Community-based screening
E. Hospital-based surveillance
4. The authors state that cases had to be newly diagnosed with

histologically confirmed glioma without previous diagnosis of a
maligant neoplasm. Which of the following best describes an
advantage of using incident cases instead of prevalent cases?
Choose one best answer. (3 pts)
A. Using incident cases allows the investigators to directly
compute relative risks.
B. Using incident cases reduces the non-systematic error of
case-control studies.
C. Estimates of exposure from incident cases may be less
influenced by disease status.
D. Using incident cases allows for the investigation of effects on
risk versus those effecting duration.
E. Incident cases are less likely to be lost to follow up than
prevalent cases.
5. Even if the investigators are careful in the selection of cases and
controls, selection bias can make interpretation of results difficult.
Which of the following is NOT a situation that can produce
selection bias? Choose one best answer. (3 pts)
A. The exposure has some influence on the process by which
controls are selected.
B. The exposure has some influence on the process of case
ascertainment.
C. The disease status has some influence on the recall of
exposures.
D. The exposed cases are reported to registries more than
unexposed.
E. All of the above will produce selection bias.
6. In this study, exposre information for many of the brain cancer
cases was provided by proxy respondents. The authors did not have
information from independent sources that could be used to
directly verify information provided by these surrogates. However,
suppose a follow-up questionnaire was administered to cases, and
for 85 of the cases, the investigators were able to obtained
information about whether or not they used a private well directly
for the cases (self report). Assuming that self report is the best
available assessment of whether they used a private well or not,
complete the table below so that it reflects a sensitivity, specificity,
and positive predictive value of a proxy response of 77%, 75%, and
57%, respectively. Assume that 26 of cases reported that they used

private wells. Show your calculation. (6 pts)
Proxy report
Self Report =
YES
Self report =
NO
YES
NO
7. Cases in this study were histologically confirmed. This is an
example of which of the following disease classification criteria?
A. Causal criteria
B. Ecologic criteria
C. Manifestational criteria
D. Etiologic criteria
8. Consider the data presented in Table 1 of this article. Which of the
following best represents the proportion of the risk of brain cancer
in the population that is attributable to working on a farm (farm
occupation). Assume that a farm occupation is causally related to
brain cancer risk. Choose one best answer. (4 pts)
A. 33%
B. 57%
C. 10%
D. 29%
E. Cannot be calculated from case-control studies
9. A case-control study like the one described in this paper is most
useful when it helps us understand what is happening in the study
base (underlying population). Which of the following best describes
the study base in this article? Choose one best answer. (3 pts)
A. The study base is those who if they developed brain cancer
could have been selected as a case.
B. The study base is those who have an equal probability to be
selected as a case or control.
C. The study base is those who are identified as cases or
controls after excluding non-responders.
D. The study base is those who if exposed would have been

identified as exposed.
E. None of the above.
10.
In Table 3 the odds ratios for incident brain cancer by
duration of chlorinated surface water exposure are given. The odds
ratio (95% confidence interval) in men estimating the risk of brain
cancer with 1-19 years of exposure is 1.3 (0.8, 2.1) and 2.5 (1.2,
5.0) for 40 years or more of exposure. Which of the following best
describes the role of chance in observing these two estimates?
Choose one best answer. (3 pts).
A. The odds ratio for 40 years exposure is more likely due to
chance because it is based on fewer cases and controls.
B. The odds for 1-19 years of exposure is more likely due to
chance because the point estimate is closer to the null value
(1.0).
C. The odds ratio for 40 years exposure is more likely due to
chance because the confidence interval is so wide.
D. The odds ratio for 1-19 years of exposure is less likely due to
chance because the confidence interval is narrower.
E. The odds ratio for 40 years exposure is less likely due to
chance because the confidence interval does not include 1.0.
11.
Table 3 presents odds ratios for the association of incident
brain cancer with various levels of lifetime average THM exposure.
The odds ratio (95% confidence interval) for lifetime average THM
concentration of 0.8-2.2 g/liter for men was 0.9 (0.6, 1.6). The
odds ratio (95% confidence interval) for lifetime average THM
concentration of 32.6 g/liter for woman was 0.9 (0.4, 1.8).
Which of the following best describes the precision of these two
estimates of risk? Choose one best answer. (3 pts)
A. The estimate is equal because the point estimates are the
same.
B. The estimate is equal because neither confidence interval
excludes 1.0.
C. The estimate in men is slightly more precise because the
confidence interval is narrower.
D. The estimate in women is slightly more precise because the
exposure level is much higher.
E. The precision of the estimates cannot be compared because

they are from different exposure groups.
12.
Using the data in Table 4, which of the following best
describes the crude unadjusted odds ratios estimating the risk of
brain associated with 40 years exposure to chlorinated surface
water in men with above median tap water intake? Use the
category of 0 years exposure to chlorinated surface water as the
reference group. Choose one best answer. (4 pts)
A. 4.0
B. 1.5
C. 3.6
D. 2.6
E. Cannot be computed from data in Table 4.
13.
Table 1 shows the adjusted odds ratio estimating the risk of
brain cancer by population size. Using the 25,000 population size
as a reference calculate the crude (unadjusted) odds ratio
associated with the > 50,100 population. In 2 sentences or less
explain why the two estimate agree or disagree. (4 pts)
14.
The authors state that they "found a dose-response
relationship among men between brain cancer and duration of
consuming drinking water from chlorinated surface water". Using
3 Bradford Hill criteria, in 3-4 sentences, address causality (or the
lack of causality) of the relationship of drinking water to brain
cancer. (4 pts)
15.
An early study of drinking water and brain cancer was an
ecological study conducted by the lead author of the present
article. In this study, brain cancer mortality rates in 923 U.S.
counties were compared with average levels of THM measured in
the drinking water supplies of those counties. For counties in which
the sampled water supply served at least 85% of the residents of
that county, the correlation coefficient between county-specific
mortality rates from brain cancer and trihalomethane levels was
0.24 in White men and 0.19 in White women. After reviewing this
paper, your colleague concluded that THM in drinking water are
causally related to brain cancer. However, you are more cautious in
your interpretation, citing the "ecological fallacy." Please define the
ecologic fallacy (2 pts) and describe why it limits the causal
inferences that can be made from the ecological study described

above (2 pts).
16.
The authors used information provided by cases and controls
on place of residence, primary source of drinking water, and tap
water and total fluid consumption to create an index of cumulative
lifetime exposure. However, the natural history of cancer
(initiation, promotion, conversion, and progression) may encompass
many years. If drinking water is involved at the earliest stages of
brain cancer (initiation), then drinking water exposures in the
recent past may be more important than present exposures or
those in the distant past (e.g., in childhood). As defined in class,
which of the following periods would be important in defining the
minimal and maximal length of time expected between drinking
water exposure and diagnosis with histologically confirmed glioma?
A. Induction period
B. One year case fatality
C. Latent period
D. Both a and c
E. None of above
17.
The authors included all cases of histologically confirmed
malignant brain cancers, including glioblastoma, fibrillary and
gemistocytic astrocytoma, and mixed glioma. If authors suspected
that drinking water exposure was associated with only certain
subtypes of brain cancer (i.e., disease heterogeneity), which of the
following strategies could they employ at the analysis stage? (3 pts)
A. Adjustment for cancer type using mathematical modeling
(e.g., logistic regression)
B. Stratification of cases by brain cancer type
C. Direct standardization by brain cancer type
D. Indirect standardization by brain cancer type
E. Matching cases and control by brain cancer type
18.
The authors restricted their analysis to those cases and
controls with at least 70 percent of their lifetime years with a
known source of drinking water. This approach was used to reduce
which type of bias? Choose one best answer (3 pts)
A. Confounding bias
B. Selection bias
C. Information bias
D. Random error
19.
(question was not asked)
20.
a. Using the data in Table 3, label and complete a 2x2 table for
the association between brain cancer and >=40 years
residence with a chlorinated surface water source (versus 0
years), collapsing over sex (i.e., combine the data for men
and women). (4 pts)
b. Calculate the odds ratio for your 2x2 table in part a. Show
your work. (3 pts)
c. Suppose that the sex-adjusted OR for the relationship

between brain cancer and >=40 years residence with a
chlorinated surface water source is 1.1. Is sex a confounder
of this relationship? Justify your answer. (3 pts)
d. Is sex an effect modifier (assuming a multiplicative model for

joint effects) of the relationship between brain cancer and
>=40 years residence with a chlorinated surface water
source? Justify your answer. (3 pts)
e. According to Table 1, having a farming occupation (ever vs.

never) is a risk factor for brain cancer (OR=1.5). Assume that
among the controls, farming occupation is associated with
duration of residence with a chlorinated surface water
source. Could farming occupation be a confounder of the
associations reported under the Total column in Table 3?
Explain your answer. (3 pts)
21.
Characteristics of cases and controls included in this study
are shown in Table 1. Using this information answer the following
questions.
a. Calculate the appropriate crude (unadjusted) measure of
association between farm occupation and brain cancer. Consider
those ever working on a farm as sufficient to be classified as having
a farm occupation. In 2 sentences or less interpret what this odds
ratio means. (4 pts)
Farm Occupation
CASE
CONTROL
YES
NO
b. Assume that 10% of the cases that were labeled as never having
worked on a farm truly had worked in such an environment.
Furthermore assume that 15% of the controls that were labeled as
having ever worked on a farm, in fact never really did work on a
farm. What would the true association be between farm occupation
and brain cancer? Assume that the classification of disease status is
valid. (4 pts)
c. Which of the following best describes a comparison of the odds

ratios you computed in parts (a) and (b)? Choose one best answer.
(3 pts)
A. The odds ratios are different as a result of differential
misclassification of exposure.
B. The odds ratios are different as a result of nondifferential
C. The odds ratios are different as a result of differential
misclassification of disease status.
D. The odds ratios are different as a result of nondifferential
misclassification of disease status.
E. The odds ratios are different as a result of random variation
in the exposure assessment.
22.
Which of the following is a measure of the validity of methods
used to classify exposures such as having worked on a farm?
A. interclass correlation coefficient

B. kappa statistic
C. standard error
D. sensitivity
23.
a. Using data in Table 1, assess whether the crude OR of brain
cancer associated with farm occupation is confounded by age
and/or sex. Support your answer with relevant calculations.
Table 1 shows the adjusted odds ratios estimating the risk of
brain cancer due to having farm occupation. (2 pts)
b. What feature of the study design could have contributed to

the crude ORs in Table 1 being confounded by age and/or
sex? (2 pts)
Answer Guide
1. Case-control studies are well-suited for studying risk factors for
brain cancer because the disease is rare (hence difficult to study in
a cohort design). Also, the case-control design facilitates examining
many risk factors of current interest, a substantial advantage when
so few risk factors have been identified. A retrospective cohort
study can examine only exposures for which historical data are
available.
2. A "population-based case-control study" is a case-control study for
which the study base is a defined population. With a hospital-based
case-control study, it is difficult to specify the study base, since
which cases come to a given hospital is influenced by such factors
as seriousness and treatability of the disease, type of hospital, and
health care financing ability and arrangements. A representative
sample from this same defined population yields a control group
that permits valid estimation of odds ratios. In contrast, the validity
of measures of association estimated using a control group selected
from among hospitalized persons is always somewhat uncertain,
since it is generally impossible to know how well such controls
provide valid estimates of the study base.
3. B. The method of finding cases was passive surveillance.
4. D. Using incident cases allows the odds ratio to estimate the
incidence density ratio or risk ratio. In contrast, the exposure
distribution among prevalent cases will reflect differential survival

in relation to exposures as well as differential incidence.
5. C. Selective recall (the disease status has an influence on the recall
of exposures) is a form of information bias, not selection bias.
6. Since 26 of the cases reported using a private well, 85-26=59 cases
did not. Sensitivity=0.77 means that the proxy respondents
correctly classified as "exposed" 0.77x26 approx.=20 brain cancer
cases. Specificity=0.75 means that the proxy respondents correctly
classified as "unexposed" 0.75x59 approx.=44 brain cancer cases.
The rest of the table can be completed by subtraction and addition.
As a check on the arithmetic, the positive predictive value is 20/35
approx.=0.57.
Validation of proxy reports of use water from a
private well
Case's self report
Report of
proxy
Yes
No
Total
Yes private
well
20
15
35
No private
well
44
50
26
59
85
Total
7. C. Manifestional criteria histological criteria are observable

characteristics of tumor cells in microscopic examination.
8. C. 10% the proportion of cases who are exposed is 85/291
approx.=0.29, and the OR approx.=1.5. Substituting into the
formula for PARP in a case-control study gives 0.29x(1.51)/1.5
approx.=0.097.
9. A. The study base consisted of those people who if they developed
brain cancer could have been selected as a case.
10.
E. The OR for the oldest group is less likely to be due to
chance because the confidence interval does not include 1.0
(although not without problems, this response was the best).
11.
C. The narrower confidence interval indicates that the
estimate for men is slightly more precise.
12.
D. 2.6 = (7x423)/(30x38) for men with above median tap
water intake
Exposure to chlorinated
water
40+ years
< 40 years
Cases
30
Control
s
38
423
13.
Average population
50,010
2,500
Cases
32
112
Control
s
246
780
Crude OR = (32x780)/(112x246) = 0.91 versus 0.7 adjusted. The

estimates differ because the OR in the table has been adjusted for
age and sex (according to the footnote to Table 1).
14.
Bradford Hill criteria are:
The associations observed for this association were of medium

strength (1.7 for 20-39 years of exposure to chlorinated surface
water, 2.5 for >=40 years). The authors measured lifetime
exposure (through recall) so in spite of the prolonged induction and
latent periods for brain cancer, the criterion of temporality is
satisfied to some extent. Some of the exposure history in Table 3
must have occurred after the brain cancer had begun and is
therefore not relevant. However, it seems unlikely that if the
association were causal it could go in the opposite direction (i.e.,
brain cancer causes exposure to chlorinated water). There is little
evidence to support the plausibility of the association nor of its
being found for men but not for women. Studies of the association
have not yielded consistent results. (The remaining criteria
coherence, experiment, and analogy are not applicable to the
information in the article.)
15.
The "ecologic fallacy" is the inference from aggregate data
that a relationship exists at the level of the individual. The flaw in
this inference is that the prevalences of a characteristic (e.g.,
exposure to trihalomethanes in drinking water) and a condition
(e.g., brain cancer) can both be elevated in a population even if the
individuals who possess the characteristic are not those with the
condition. In the study described in the question, people who
developed brain cancer may not themselves have ingested large

amounts of THM despite living in counties with high THM levels in
the county water supplies. A related analytic problem is that the
absence of individual-level data precludes individual-level control
for potential confounders, such as farming occupation.
16.
D (both A and C). "Induction period" refers to the time
between exposure and the onset of the disease; "latent period"
refers to the time from disease onset to diagnosis. For exposure to
be causal in early stages of tumor development, the exposure must
be present prior to the latent period. In principle, exposure prior to
the sum of the longest possible induction period and the longest
possible latent period would not be relevant, either.
17.
B. Stratification of cases by brain cancer type would permit
examination of the relationship for the individual subtypes.
18.
B. "We selected cases and controls with at least 70 percent of
their lifetime years with a known source of drinking water in order
to minimize misclassification of exposure " (end of p 554).
19.
(question was not asked)
20.
a.
Risk of brain cancer by number of years resided in a
dwelling supplied with chlorinated surface water
Cases
Controls
Total
>=40 years
None
Total
13 + 7 = 20
92 + 78 = 170
190
81 + 60 = 141
875 + 400 =
1275
1416
161
1445
1606
b. OR = (20x1275) / (170x141) approx.= 1.1

c. The presence of confounding is usually determined on the
basis of existence of a meaningful difference between the
crude and adjusted OR's, which there is not. Since the OR's
for men (2.5) and women (0.7) are quite different, for an
unambiguous indication of confounding the crude OR would
have to be above 2.5 or below 0.7.
d. Yes, there is modification of the OR by gender, in that they
differ meaningfully. Although the two confidence intervals
overlap substantially, neither point estimate is contained
within the confidence interval for the other gender's
estimate, so besides giving opposite "messages", the two OR's
are likely to differ in fact (not necessarily for biological
reasons).
e. The ORs shown in Table 1 are (according to the footnote)

controlled for farming occupation, so that should not be a
source of confounding, except to the extent that the
crudeness of the measure ("yes" versus "no") prevents the
control from being fully effective.
21.
a.
Farming occupation and brain
cancer risk
Farming occupation
Yes
No
Total
Cases
85
206
291
Control
s
628
1355
1983
Total
713
1561
2274
OR = (85x1355) / (206x628)
approx.= 0.89
The OR of 0.89 indicates no (or possibly a slight inverse)
crude association between brain cancer risk and having had a
farming occupation.
b. If 10% of "unexposed" cases in fact had had a farming
occupation, then 0.10x206=21 cases should be reclassified as
exposed; if 15% of "exposed" controls in fact had not had a
farming occupation, then 0.15628=94 controls should be
reclassified as exposed. The resulting table and OR would be:
Farming occupation and brain cancer risk
Farming occupation
Yes
No
Total
Cases
85 + 21 = 106
206 21 =
185
291
Control
s
628 94 =
534
1355 + 94 =
1449
1983
Total
640
1634
2274
OR = (106x1449) / (185x534) approx.= 1.6

Correcting for the misclassification produces a table with a
moderate positive association between odds of farming
occupation and brain cancer.
c. A. The odds ratios are different due to differential
22.
D. Sensitivity is a measure of validity (kappa is a measure of
agreement that gives equal weight to both classifications; standard
error measures variability of an estimate)
23.
a. The crude OR = (85 x 1,355) / (206 x 628) = 0.89. This value
is substantially different from the adjusted value of 1.5,
indicating that confounding by age and sex are present.
b. Controls were matched by age and sex to cancer cases for
five cancer sites. Thus, the control group is not a simple
random sample from the study base, so that analyses must
control for the matching variables.
Most of the questions on this examination relate to the article "Individual

risk factors for hip osteoarthritis: obesity, hip injury, and physical
activity" (Cyrus Cooper, Hazel Inskip, Peter Croft, Lesley Campbell,
Gillian Smith, Magnus McLaren, and David Coggon. Am J Epidemiol
1998; 147:516-22). You may refer to this article during the examination.
1. Briefly list two reasons why a case control study is (or is not) appropriate to
examine individual risk factors for hip osteoarthritis. (2 pts)
2. The authors state that their cases come from a defined population. List four
features of the population or the study design that support this statement or helped the
authors to achieve it? (4 pts)
3. Considering the study population, study design, and other information in the article,
which of the following statements is (are) TRUE and which is (are) FALSE. (2 pts
each)
a. In these two health districts, the incidence density of symptomatic hip
osteoarthritis of sufficient severity to warrant hip arthroplasty exceeds 40 per
100,000 person-years.
b. If about 12% of the population was age 65 years or older, then about 12,000
people age 65 years or older in the two districts have radiographic evidence of
hip osteoarthritis.
c. The data in Table 1 demonstrate that women are 1.9 times as likely to
develop severe symptomatic hip osteoarthritis as are men.
d. The data in Table 2 indicate that female gender is not a risk factor for hip
osteoarthritis.
e. In this study, matching the control group to the cases on age, as opposed to a
random sample of the general adult population, probably resulted in greater
statistical power and precision.
4. The case identification process was based on a register in each district made up of
persons on a waiting list for a total hip arthoplasty (surgical reformation of the hip
joint). Waiting lists for procedures are common in societies with a nationa l or social
medicine system. In the United States, a region wide waiting list for a hip arthoplasty
is unlikely, as the availability of receiving this procedure would be more related to
insurance status or ability to afford such a procedure. Explain how using the register
system in the Untied Kingdom to select cases either increases or decreases the
possibility of selection bias as compared to a study conducted in the United States. (4
pts)
5. How was the diagnosis of hip osteoarthritis made in this study? Was this based on
manifestional or causal criteria? Explain your answer. (3 pts)
6. According to the authors: "For each case, a control of the same sex and age was
selected from the list of the same general practice held by the county Family Health
Service Association". State in one sentence the rationale for using a list from ge neral
practioners? (3pts)
7. Eighty-four percent of the patients listed for total hip arthroplasty fulfilled the
criteria for entry into the study as cases. Which of the following best describes the
criteria: (3 pts)
a. age > 45 years, being on the waiting list for hip arthroplasty, and the
presence of Heberdens nodes.
b. age > 45 years, pain duration at least for 36 months, and presence of
Heberdens nodes.
c. history of hip fracture within the past year, being on the waiting list for hip
arthroplasty and reside in the study area.
d. presence of Heberdens nodes, history of hip fracture within the past year,
and reside in the study area.
e. being on the waiting list for hip arthroplasty, reside in the study area, and
age > 45 years
8. The authors report that 89% of the eligible cases agreed to participate and 60% of
the 1060 controls approached agreed to participate. Which of the following best states
a condition regarding the non-responders that could lead to an odds ratio re ported for
the risk of osteoarthritis associated with previous hip injury that is biased away from
the null (>1). Choose one best answer. (3 pts)
a. control non-responders are more likely to have a history of hip injury
compared to case non-responders.
b. control non-responders are less likely to have a history of hip injury

compared to case non-responders.
c. being a non-respondent is not related to previous hip injury.
d. none of the above
9. What was accomplished by replacing controls who refused to participate? (Choose

one best answer) (3 pts)
If controls who refused had not been replaced:
a. selection bias would have been greater;
b. the control group would have been less representative of the study base;
c. probability of a Type I error would have been greater;
d. probabillty of a Type II error would have been greater;
e. nondifferential misclassification bias would have been greater.
f. it would have been necessary to control for age and sex in the analysis.
10. The authors selected controls who were individually matched to cases by age,
gender, and family practitioner. Matching in the design stage is usually considered
only for those variables that are known to be confounders. Under which of the follow
ing circumstances could gender be a confounder of the association between a risk
factor (obesity) and the outcome (hip osteoarthritis)? Circle all that apply. (4 pts)
a. the prevalence of obesity and the prevalence of hip osteoarthritis are both
higher in men that in women
b. the prevalence of obesity is lower in men than women, but the prevalence of
hip osteoarthritis is higher in men than women.
c. the prevalence of obesity is higher in men than women, but the prevalence
of hip osteoarthritis is the same in men and women.
d. the prevalence of obesity is the same in men and women, but the prevalence
of hip osteoarthritis is higher in men than women.
11. The odds ratios in Table 2 are "mutually adjusted for the other two variables" by
logistic regression. The following questions concern the models used to estimate the
odds ratios in the table (ignore the fact that it was "condit ional" logistic regression
and ignore the middle categories for body mass index and presence of Heberdens
nodes) (2 pts each):
a. How many logistic models were necessary to estimate the odds ratios for
body mass index >28.0, definite Heberdens nodes, and previous hip injury
among women.
b. The odds ratio estimate for hip injury in women was 2.8. What must the
logistic coefficient have been?
c. From this table, estimate the odds ratio for women who had both definite
Heberdens nodes and previous hip injury compared to women who had
neither.
12. In this study, information on medical history, life style, and leisure time physical
activities was obtained through a "structured interviewer-administered questionnaire".
(page 517). It is possible that persons on a waiting list for a hip arthoplasty would be
more keenly aware of hip injuries they may have had in the past than controls. If true,
this is an example of which of the following? Choose one best answer. (3 pts)
a. differential case ascertainment bias
b. differential misclassification bias
c. differential selection bias
d. differential precision bias
e. none of the above
13. Among women, the odds of previous hip injury is higher among cases than
controls (Table 2; OR=2.8). As indicated in the footnotes for Table 2, the odds ratio
for pervious hip injury is adjusted or controlled for the other two variables in the Ta
ble (body mass index and Heberdens nodes). Using the counts shown in Table 2,
calculate an unadjusted (crude) odds ratio for previous hip injury in women. (3 pts)
Unadjusted (crude) odds ratio = _________
14. Which of the following conclusions can be made from the above results? (choose
one best answer) (3 pts)
a. the unadjusted (crude) association between hip injury and hip osteoarthritis
in women is completely confounded by body mass index and Heberdens
nodes.
b. since the unadjusted and adjusted odds ratios are similar, the risk factor (hip
injury) must not be associated with the adjustment variables (body mass index
and Heberdens nodes)
c. since the unadjusted and adjusted odds ratios are similar, there is no effectmeasure modification of the association between hip injury and hip
osteoarthritis.
15. The odds ratios presented in Table 5 are adjusted for previous hip injury. Why
might they still be confounded by hip injury? (3 pts)
16. In Table 6, is the crude association between previous hip injury and risk of
unilateral hip osteoarthritis biased towards the null or away from the null? (2 pts)
17. Based on the data in Table 3, what is the odds ratio for Heberden's nodes (definite
versus none) for persons in the Upper tertile of body mass index? (3 pts)
18. Rothman has proposed that "public health synergism" is present when an observed
joint effect exceeds that expected under the additive model. Do the odds ratios in
Table 3 indicate the presence of "public health synergism" for effect of Heberden 's
nodes and elevated body mass index on hip osteoarthiritis? If not, do the odds ratios
conform to a multiplicative model? Include in your answer a 1-2 sentence assessment
of whether these data indicate "public health synergism". (For this question, ignore
the row for "Possible" Heberden's nodes and the column for the middle tertile of body
mass index, and assume that both Heberdens nodes and elevated BMI reflect casual
risk factors for hip osteoarthritis. Note: do not necessarily rely on the autho rs'
description of this table.) (6 pts)
19. The authors investigated the association of specific sporting activities with risk of
hip osteoarthritis. Their data are presented in Table 5. Using their data, compute
separately the unadjusted (crude) risk of osteoarthritis associated with pla ying golf
and for swimming in men and women combined. Consider those who do not
participate in any sport as the reference group and assume no missing data. Show two
appropriate 2x2 table and your calculations. (4 pts)
19a. Compare these unadjusted (crude) odds ratios with the ones presented in Table 3.
Briefly describe and explain the comparison. (3 pts)
19b. Consider the possibility that golfers who have hip osteoarthritis are reluctant to
seek medical attention for their condition for fear it will mean the end of their ability
to play golf. Therefore, cases who golf are less likely to be se lected for this study
than cases who do not golf. If the true OR associated with golf is 2.0, then which of
the following best describes the selection bias and its impact on the odds ratio you
computed. (3 pts)
a. non-differential selection bias resulting in an odds ratio biased toward the
null.
b. non-differential selection bias resulting in an odds ratio biased away from
the Null.
c. differential selection bias resulting in an odds ratio biased away from the
null.
d. differential selection bias resulting in an odds ratio biased toward the null.
19c. The authors state that "...the association with swimming may have arisen because
patients with hip osteoarthritis were advised to swim..." (page 521). Suppose that 25%
of the cases had been incorrectly classified as swimmers and assume that the
misclassified cases had not participated in any other sporting activity, either. Recompute the odds ratio for the association of hip osteoarthritis and swimming, after
re-classifying these individuals, using the number from the 2x2 table in question 19
above. Briefly discuss how your conclusion about the role of swimming does (or does
not) change. In what direction did misclassification bias the study OR? (3 pts)
20. The odds ratio (95% confidence interval) estimating the risk of osteoarthritis
associated with a previous hip injury was 24.8 (3.1-199.3) in men and 2.8 (1.4-5.8) in
women (see Table 2).
a. Which estimate indicates a stronger association? (2 pts)
b. Which estimate is more precise? (2 pts)
c. Which estimate is more compatible with a population odds ratio of 4.0? (2

pts)
21. Which one of the statements best interprets the following passage? (3 pts)
"In a previous case-control study (17) of men aged 60-76 years, we observed a
doubling of risk for hip osteoarthritis among those in the highest third of body
mass index distribution, as compared with those in the lowest third, although
the increased risk was not statistically significant." (p519 bottom of right
column)
a. Hip osteoarthritis is not as significant when it occurs in obese older patients,
because it is expected that overweight that lasts for many years will lead to
damage to the joints.
b. A doubling of risk is not significant from a statistical perspective, because it
represents only a moderate association.
c. The doubling of risk was not statistically significant because a p-value was
not computed, so it is not possible for the authors to know whether the
increased risk was due to chance.
d. If 1,000 independent random samples the same size as that study population
were drawn from a population with no increased risk of hip osteoarthritis,
fewer than 950 would have an OR between 0.5 and 2.0.
e. If 1,000 independent random samples the same size as that study population
were drawn from a population with a doubling of risk of hip osteoarthritis for
the highest third of the body mass distribution, as compared with the lowest
third, more th an 5% of the samples would display no elevation in risk.
f. If 1,000 independent random samples the same size as that study population
were drawn from a population with a doubling of risk of hip osteoarthritis for
the highest third of the body mass distribution, as compared with the lowest
third, fewer t han 80% would display an association of that magnitude.
22. A medical journalist, confused by the thrust of this article, comes to you and says:
"I've read this article several times, but I can't figure out what it shows about the
relationship of body mass index, Heberden's nodes, and hip osteoarthri tis. The
authors explain that 'two broad mechanisms are believed to underlie the pathogenesis
of osteoarthritis at any joint site: mechanical stress and a generalized predisposition to
the disorder' as indexed by Heberdens nodes [p519 right column]. T hat seems
straightforward enough, and they later conclude that the analysis 'supports the notion
that this condition arises through an interaction between a generalized predisposition
to the disorder and specific mechanical insults to the hip' [p521]. Y et on page 518
[right column], the authors state that there was 'no statistically significant interaction'
between body mass index and Heberden's nodes, and on page 519 [left column] they
refer to obesity and a tendency to polyarticular involvement as 'i ndependent risk
factors for hip osteoarthritis'. Would you please assess for me what this article shows
about the relationship among body mass index, Heberden's nodes, and hip
osteoarthritis? I have room for 40-60 words. Thanks!" (6 pts)
23. Write a brief statement for or against a causal relationship between hip injury and
risk of osteoarthritis. Comment specifically on at least two of Bradford Hills criteria
for causal inference. Support your conclusion with data or statements f rom the
article. (4 pts)
Answer Guide
1. Briefly list two reasons why a case control study is (or is not)
appropriate to examine individual risk factors for hip osteoarthritis. (2
pts)
Condition rare, faster to complete than cohort study, wide range of
exposures of interest.
2. The authors state that their cases come from a defined population.
List four features of the population or the study design that support this
statement or helped the authors to achieve it? (4 pts)
1. The two health districts had a centralized orthopedic facility for
assessment and treatment of hip osteoarthritis;
2. Local orthopedic surgeons were willing to enter all patients into
the study;
3. All men and women 45 years and older who were placed on the
waiting list for primary total hip arthoplasty were considered for
the study;
4. The authors included patients who consulted orthopedic
surgeons privately.
5. The study excluded patients who lived outside the two districts.
The diverse socioeconomic profile was an advantage for
generalizability but does not make this a defined population.
3. Considering the study population, study design, and other

information in the article, which of the following statements is TRUE and
which are FALSE . (2 pts each)
a. In these two health districts, the incidence density of
symptomatic hip osteoarthritis of sufficient severity to warrant hip
arthroplasty exceeds 40 per 100,000 person-years.
[TRUE - 726 eligible cases / 1 million population over 18
months = 48.4 per 100,000]
b. If about 12% of the population was age 65 years or older, then
about 12,000 people age 65 years or older in the two districts have
radiographic evidence of hip osteoarthritis.
[TRUE - 10% population prevalence in age 65 years and older
* 12% of one million]
c. The data in Table 1 demonstrate that women are 1.9 times as
likely to develop severe symptomatic hip osteoarthritis as are men.
[FALSE - the data in Table 1 cannot demonstrate this female
excess, since there is no information about the sex ratio in
the older population; this ratio may well reflect a greater
incidence of severe symptomatic hip osteoarthritis in women,
but some of the excess presumably derives from greater
mortality among men.]
d. The data in Table 2 indicate that female gender is not a risk
factor for hip osteoarthritis.
[FALSE - controls were matched to cases on gender (and
age), so the sex ratio in the controls must match that in the
cases]
e. In this study, matching the control group to the cases on age, as
opposed to a random sample of the general adult population,
probably resulted in greater statistical power and precision.
[TRUE - the mean age of the cases is 70 years old, with the
majority older than 60; thus, the use of general population
controls without regard to age would result in relatively little
overlap between the age distributions of cases and controls
on this very important variable.]
4. The case identification process was based on a register in each district
made up of persons on a waiting list for a total hip arthoplasty (surgical
reformation of the hip joint). Waiting lists for procedures are common in
societies with a national or social medicine system. In the United States,
a region wide waiting list for a hip arthoplasty is unlikely, as the

availability of receiving this procedure would be more related to
insurance status or ability to afford such a procedure. Explain how using
the register system in the Untied Kingdom to select cases either
increases or decreases the possibility of selection bias as compared to a
study conducted in the United States. (4 pts)
Using the registry may reduce selection bias if affluence or ability
to pay for a hip replacement is associated with exposures like BMI,
physical activity, Heberdens nodes. Cases selected from surgery
lists in the United States system may have a differential association
with a risk factor as compared cases not receiving this procedure,
so measures of association may be more biased in a U.S. study.
5. How was the diagnosis of hip osteoarthritis made in this study? Was
this based on manifestional or causal criteria? Explain your answer. (3
pts)
(page 517, left column, 2nd paragraph): Diagnosis of hip
osteoarthritis in this study was based on pelvic radiographs. This is
based on manifestional criteria.
6. According to the authors: "For each case, a control of the same sex and
age was selected from the list of the same general practice held by the
county Family Health Service Association". State in one sentence the
rationale for using a list from general practioners? (3pts)
(page 517, left column, 3rd paragraph): In England and Wales,
almost everyone is registered with a general practitioner so that
these lists essentially provide an enumeration of the general
population.
7. Eighty-four percent of the patients listed for total hip arthroplasty
fulfilled the criteria for entry into the study as cases. Which of the
following best describes the criteria: (3 pts)
a. age > 45 years, being on the waiting list for hip arthroplasty, and
the presence of Heberdens nodes.
b. age > 45 years, pain duration at least for 36 months, and
presence of Heberdens nodes.
c. history of hip fracture within the past year, being on the waiting
list for hip arthroplasty and reside in the study area.
d. presence of Heberdens nodes, history of hip fracture within the
past year, and reside in the study area.
e. being on the waiting list for hip arthroplasty, reside in the study
area, and age > 45 years (answer)
8. The authors report that 89% of the eligible cases agreed to participate
and 60% of the 1060 controls approached agreed to participate. Which of
the following best states a condition regarding the non-responders that
could lead to an odds ratio reported for the risk of osteoarthritis
associated with previous hip injury that is biased away from the null
(>1). Choose one best answer. (3 pts)
a. control non-responders are more likely to have a history of hip
injury compared to case non-responders. (answer)
b. control non-responders are less likely to have a history of hip
injury compared to case non-responders.
c. being a non-respondent is not related to previous hip injury.
9. What was accomplished by replacing controls who refused to
participate? (Choose one best answer) (3 pts) If controls who refused
had not been replaced:
a. selection bias would have been greater;
b. the control group would have been less representative of the
study base;
c. probability of a Type I error would have been greater;
d. probabillty of a Type II error would have been greater; (answer)
e. nondifferential misclassification bias would have been greater.
f. it would have been necessary to control for age and sex in the
analysis.
Answer: d. Failure to replace controls who refused would have
reduced both the number of controls and of cases (due to the
matching), with a loss of statistical power and increase in the
probability of a type II error.
10. The authors selected controls who were individually matched to cases
by age, gender, and family practitioner. Matching in the design stage is
usually considered only for those variables that are known to be
confounders. Under which of the following circumstances could gender
be a confounder of the association between a risk factor (obesity) and the
outcome (hip osteoarthritis)? Circle all that apply. (4 pts)
a. the prevalence of obesity and the prevalence of hip osteoarthritis

are both higher in men that in women (true)
b. the prevalence of obesity is lower in men than women, but the
prevalence of hip osteoarthritis is higher in men than women.
(true)
c. the prevalence of obesity is higher in men than women, but the
prevalence of hip osteoarthritis is the same in men and women.
d. the prevalence of obesity is the same in men and women, but the
prevalence of hip osteoarthritis is higher in men than women.
11. The odds ratios in Table 2 are "mutually adjusted for the other two
variables" by logistic regression. The following questions concern the
models used to estimate the odds ratios in the table (ignore the fact that
it was "conditional" logistic regresion and ignore the middle categories
for body mass index and presence of Heberdens nodes) (2 pts each):
a. How many logistic models were necessary to estimate the odds
ratios for body mass index >28.0, definite Heberdens nodes, and
previous hip injury among women.
"Mutually adjusted" means that each odds ratio comes from a
model that includes the other two factors, which therefore
means that all three factors are included in the same model.
So one model yields an adjusted odds ratio for each variable.
So one model was used.
b. The odds ratio estimate for hip injury in women was 2.8. What
must the logistic coefficient have been?
<p
The OR for a dichotomous or indicator variable is
exp(beta), where beta is the logistic coefficient.
Therefore the coefficient was 1n(2.8) = 1.0296.
</p
c. From this table, estimate the odds ratio for women
who had both definite Heberdens nodes and previous
hip injury compared to women who had neither.
The logistic model is based on additivity of the
logit or multiplicativity of the odds. Therefore the
odds ratio for the double exposure is the product
of the adds ratio for each of the risk factors:
1.5*2.8=4.2.
12. In this study, information on medical history, life style,

and leisure time physical activities was obtained through a
"structured interviewer-administered questionnaire". (page
517). It is possible that persons on a waiting list for a hip
arthoplasty would be more keenly aware of hip injuries they
may have had in the past than controls. If true, this is an
example of which of the following? Choose one best answer.
(3 pts)
a. differential case ascertainment bias
b differential misclassification bias (answer)
c. differential selection bias
d. differential precision bias
13. Among women, the odds of previous hip injury is higher
among cases than controls (Table 2; OR=2.8). As indicated in
the footnotes for Table 2, the odds ratio for pervious hip
injury is adjusted or controlled for the other two variables in
the Table (body mass index and Heberdens nodes). Using the
counts shown in Table 2, calculate an unadjusted (crude)
odds ratio for previous hip injury in women. (3 pts)
Unadjusted (crude) odds ratio = __________ 2.9
14. Which of the following conclusions can be made from the
above results? (chose one best answer) (3 pts)
a. the unadjusted (crude) association between hip
injury and hip osteoarthritis in women is completely
confounded by body mass index and Heberdens nodes.
b. since the unadjusted and adjusted odds ratios are
similar, the risk factor (hip injury) must not be
associated with the adjustment variables (body mass
index and Heberdens nodes)
c. since the unadjusted and adjusted odds ratios are
similar, there is no effect-measure modification of the
association between hip injury and hip osteoarthritis.
d. none of the above (answer)
15. The odds ratios presented in Table 5 are adjusted for

previous hip injury. Why might they still be confounded by hip
injury? (3 pts)
There may be residual confounding by type of hip
injury or by how long ago the hip injury occurred, or
imperfect recall of hip injury (non-differential
misclassification).
16. In Table 6, is the crude association between previous hip
injury and risk of unilateral hip osteoarthritis biased towards
the null or away from the null? (2 pts)
Towards the null (crude OR = 7.6 vs. adjusted OR =
10.6)
17. Based on the data in Table 3, what is the odds ratio for
Heberden's nodes (definite versus none) for persons in the
Upper tertile of body mass index? (3 pts)
OR for Definite Heberden's nodes / none = 3.2 / 1.6 =
2.0
18. Rothman has proposed that "public health synergism" is
present when an observed joint effect exceeds that expected
under the additive model. Do the odds ratios in Table 3
indicate the presence of "public health synergism" for effect
of Heberden's nodes and elevated body mass index on hip
osteoarthiritis? If not, do the odds ratios conform to a
multiplicative model? Include in your answer a 1-2 sentence
assessment of whether these data indicate "public health
synergism". (For this question, ignore the row for "Possible"
Heberden's nodes and the column for the middle tertile of
body mass index, and assume that both Heberdens nodes
and elevated BMI reflect casual risk factors for hip
osteoarthritis. Note: do not necessarily rely on the authors'
description of this table.) (6 pts)
Odds ratios for hip
osteoarthiritis
Heberden's nodes
Body mass
index
Lowest third
Middle third
Highest third
1.0
1.1 (0.7-1.8)*
1.6 (1.0-2.7)
Possible
1.5 (0.8-2.7)
1.5 (0.8-2.6)
2.0 (1.1-3.6)
Definite
1.4 (0.9-2.3)
2.2 (1.4-3.7)
3.2 (1.9-5.4)
None
* Numbers in parentheses, 95% confidence interval.
Ignoring the intermediate categories for Heberden's

nodes and body mass index gives the following
expression for the additive model:
Expected joint excess risk = excess risk for factor 1 + excess
risk for factor 2
= excess risk for Heberden's nodes + excess risk for Body
mass index
Since hip osteoarthritis of this severity is rare, the
following approximate expressions are appropriate:
Expected excess risk = (OR for Heberden's nodes - 1) + (OR
for Body mass index - 1)
Expected joint excess risk = (1.4 - 1) + (1.6 - 1) = 1.0
Observed joint excess risk = (3.2 - 1) = 2.2
The substantial difference between 2.2 and 1.0
indicates that the odds ratios in this table do not
conform to an additive model for expected joint effect.
The odds ratios do not conform to a multiplicative
model, either:
Expected joint OR = (OR for Heberden's nodes) * (OR for
Body mass index )
= 1.4 * 1.6 = 2.24, vs. 3.2 observed
Thus, the relationship is "supramultiplicative", though
not greatly so.
Since these odds ratios indicate a joint effect greater
than that expected under an additive model, "public
health synergism" is present, to a moderate degree (we
expect a 100% increase in risk but observe a 220%
increase in risk)
19. The authors investigated the association of specific
sporting activities with risk of hip osteoarthritis. Their data
are presented in Table 5. Using their data, compute
separately the unadjusted (crude) risk of osteoarthritis
associated with playing golf and for swimming in men and
women combined. Consider those who do not participate in
any sport as the reference group and assume no missing
data. Show two appropriate 2x2 table and your calculations.

(4 pts)
Golfers
Cases
Controls
YES
51
34
NO
140
162
OR = 1.7
Swimming
Cases
Controls
YES
156
110
NO
140
162
OR = 1.6
19a. Compare these unadjusted (crude) odds ratios with the
ones presented in Table 3. Briefly describe and explain the
comparison. (3 pts)
Table shows 1.4 and 1.5, respectively. This suggests
that BMI, nodes, and hip injury explain very little of the
association of these two sports with hip osteoarthritis.
19b. Consider the possibility that golfers who have hip
osteoarthritis are reluctant to seek medical attention for their
condition for fear it will mean the end of their ability to play
golf. Therefore, cases who golf are less likely to be selected
for this study than cases who do not golf. If the true OR
associated with golf is 2.0, then which of the following best
describes the selection bias and its impact on the odds ratio
you computed. (3 pts)
a. non-differential selection bias resulting in an odds
ratio biased toward the null.
b. non-differential selection bias resulting in an odds
ratio biased away from the null.
c. differential selection bias resulting in an odds ratio
biased away from the null.
d. differential selection bias resulting in an odds ratio
biased toward the null. (answer)
19c. The authors state that "...the association with swimming

may have arisen because patients with hip osteoarthritis
were advised to swim..." (page 521). Suppose that 25% of the
cases had been incorrectly classified as swimmers and
assume that the misclassified cases had not participated in
any other sporting activity, either. Re-compute the odds ratio
for the association of hip osteoarthritis and swimming, after
re-classifying these individuals, using the number from the
2x2 table in question 19 above. Briefly discuss how your
conclusion about the role of swimming does (or does not)
change. In what direction did misclassification bias the study
OR? (3 pts)
Swimming
Cases
Controls
YES
156-25% = 117
110
NO
140 + 39 = 179
162
OR = 0.96: The misclassification was differential

and biased the odds ratio upward.
20. The odds ratio (95% confidence interval) estimating the
risk of osteoarthritis associated with a previous hip injury
was 24.8 (3.1-199.3) in men and 2.8 (1.4-5.8) in women (see
Table 2).
a. Which estimate indicates a stronger association? (2
pts) 24.3
b. Which estimate is more precise? (2 pts) 2.8 (1.4-5.8)
c. Which estimate is more compatible with a population
odds ratio of 4.0? (2 pts) 2.8 (1.4-5.8)
21. Which one of the statements best interprets the following
passage? (3 pts)
"In a previous case-control study (17) of men aged 6076 years, we observed a doubling of risk for hip
osteoarthritis among those in the highest third of body
mass index distribution, as compared with those in the
lowest third, although the increased risk was not
statistically significant." (p519 bottom of right column)
a. Hip osteoarthritis is not as significant when it occurs
in obese older patients, because it is expected that
overweight that lasts for many years will lead to
damage to the joints.
b. A doubling of risk is not significant from a statistical

perspective, because it represents only a moderate
association.
c. The doubling of risk was not statistically significant
because a p-value was not computed, so it is not
possible for the authors to know whether the increased
risk was due to chance.
d. If 1,000 independent random samples the same size
as that study population were drawn from a population
with no increased risk of hip osteoarthritis, fewer than
950 would have an OR between 0.5 and 2.0. (answer)
e. If 1,000 independent random samples the same size
with a doubling of risk of hip osteoarthritis for the
highest third of the body mass distribution, as
compared with the lowest third, more than 5% of the
samples would display no elevation in risk.
f. If 1,000 independent random samples the same size
with a doubling of risk of hip osteoarthritis for the
highest third of the body mass distribution, as
compared with the lowest third, fewer than 80% would
display an association of that magnitude.
Answer: d. "Statistically significant", as conventionally
used, means that in the absence of any true association
a model based on chance would yield an association as
strong or stronger than the observed value less than
5% of the time.
22. A medical journalist, confused by the thrust of this article,
comes to you and says: "I've read this article several times,
but I can't figure out what it shows about the relationship of
body mass index, Heberden's nodes, and hip osteoarthritis.
The authors explain that 'two broad mechanisms are believed
to underlie the pathogenesis of osteoarthritis at any joint site:
mechanical stress and a generalized predisposition to the
disorder' as indexed by Heberdens nodes [p519 right
column]. That seems straightforward enough, and they later
conclude that the analysis 'supports the notion that this
condition arises through an interaction between a
generalized predisposition to the disorder and specific
mechanical insults to the hip' [p521]. Yet on page 518 [right
column], the authors state that there was 'no statistically
significant interaction' between body mass index and
Heberden's nodes, and on page 519 [left column] they refer

to obesity and a tendency to polyarticular involvement as
'independent risk factors for hip osteoarthritis'. Would you
please assess for me what this article shows about the
relationship among body mass index, Heberden's nodes, and
hip osteoarthritis? I have room for 40-60 words. Thanks!" (6
pts)
Points to include:
1. Both body mass index and presence of Heberden's
nodes were associated with greater risk of hip
osteoarthritis, even when the other is absent.
2. People with both elevated BMI and Heberden's
nodes have a greater risk for hip osteoarthritis than
people with only one of these risk factors and even
greater than would be expected from adding or
multiplying their individual effects (i.e., greater than
expected by both additive or multiplicative models).
3. The authors seem to believe and the study does not
show otherwise that most cases of hip osteoarthritis in
their study result from a combination of mechanical
stress (which could be something other than obesity)
and biologic predisposition (which might not yet have
manifested in other joints).
4. The paper presents no biological theory or other
information suggesting a mechanistic interaction
between obesity and osteoarthritis at other sites in
regard to hip osteoarthritis, but rather discusses a
possible etiologic role for each individually;
Grading: 6 points for 3 of these, 5 points for two of
them, 3 points for one. If none was mentioned then 1-2
points awarded depending upon the relevance and
accuracy of what was written.
23. Write a brief statement for or against a causal
relationship between hip injury and risk of osteoarthritis.
Comment specifically on at least two of Bradford Hills
criteria for causal inference. Support your conclusion with
data or statements from the article. (4 pts)

NOTE: Adjust margins and/or pagination before printing.
NOTE: This exam is illustrative only. It proved somewhat on the easy
side, and a number of the questions were problematic.
1.
Match the term from column A with the most appropriate topic or
concept from column B (use each term only once and each topic only
once). (1 pt each = 12 pts)
Column A - Terms
Column B - Topics
____
cumulative incidence
1. Case-control studies
____
incidence density
2. Causal inference
prevalence
3. Confounds cross-sectional
____
dose response
4. Death certificate
____
induction period
5. Descriptive epidemiology
____
odds ratio
6. Diagnostic tests
____
preventive fraction in the exposed
7. Estimates risk
____
underlying cause of death
8. Measures impact
____
positive predictive value
9. Natural history of disease
____
detectable, pre-clinical phase
10. Population screening
____
migrant studies
11. Proportion
____
cohort effect
12. Relative rate
____
data
2.
Which of the following best describes the basis of the diagnosis of
myocardial infarction?
(Choose one best answer) (4 pts)
____
a.
manifestational criteria
____
b.
Bradford criteria
____
c.
causal criteria
____
d.
etiologic criteria
3.
In the Minnesota Heart Health Program (as described in class) and
many
other community intervention studies, the effectiveness of an

educational intervention program is evaluated. Which of the
following
selections best describes the unit of assignment, the unit of
observation, and the unit of analysis (in this order) in studies of
these types? (Choose one best answer) (4 pts)
____
a.
community, person, community
____
b.
person, community, community
____
c.
community, community, community
____
d.
none of the above
4.
In a hypothetical clinical trial, a new drug was compared with

"standard therapy" treatment. The endpoint was myocardial
infarction.
Which of the following best describes the primary reason to
randomize
patients to treatments? (Choose one best answer) (4 pts)
____
a.
to create two treatment groups that are similar at baseline on

both known and unknown factors associated with myocardial
infarction.
____
b.
prevent bias introduced when the patients know what type of

treatment they are receiving
____ c. prevent bias introduced when the investigators know what type
of treatment the patients are receiving
____
5.
____
d.
b and c
Indicate TRUE or FALSE next to each of the following statements.

(2 pts each)
a.
The indirect method of age standardization applies stratumspecific rates from an external population to the age
distribution of the study population.
____
b.
A standardized mortality ratio is an example of a stratumspecific crude rate.
____
c.
Standardized mortality
comparisons among multiple populations.
____
d.
ratios
are
perferred
for
making
Direct age standardization can be characterized as applying the

same set of weights to the age-specific rates of populations to
be compared.
6. 200 women with a history of chest pain were assessed by an exercise

tolerance test (ETT).
Compared with coronary angiography (the "gold
standard"), ETT had a sensitivity of 68% for detecting coronary artery
disease, with specificity 61%. The predictive value of a negative ETT was
higher in younger women (less than 52 years old) and in women with no more
than one risk factor (i.e., family history, hypertension, high cholesterol,
smoking, or diabetes). If sensitivity and specificity do not vary by age
or risk factor status, why is the higher negative predictive expected? (3
pts)
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
7.
A randomized trial studied 242 HIV-seropositive, 2nd-trimester

pregnant women to assess the efficacy of zidovudine (AZT) in
preventing perinatal HIV transmission. Results were:
Results from a randomized trial of the efficacy of
zidovudine in preventing perinatal HIV transmission
___________________________________________________________________
Zidovudine
Placebo
All
121
121
242
Non-infected
112
90
202
HIV-infected
31
40
Transmission rate (%)
7.4
25.6
16.5
Births (no.)
Infection status of infant
___________________________________________________________________
7A.
____
Which one answer best describes the transmission rate in the table?
(4 pts)
a.
proportion
____
b.
relative rate
____
c.
absolute rate
____
d.
odds
7B.
Using the data in the table, estimate the relative risk of HIV
infection for infants whose mothers took zidovudine relative to
infants of mothers who took placebo. Show formula and
calculations. (4 pts)
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
7C.
Based on the data in the above table, estimate the proportion of

potential cases of perinatal HIV transmission that could be
prevented by providing zidovudine to HIV-positive, 2nd trimester pregnant
women who would otherwise not receive the drug. (Assume all women take the
medication and consider only singleton births.) Show formula or diagram
and calculations. (4 pts)
7D.
Zidovudine is now routinely offered in association with all
pregnancies to known HIV-seropositive mothers in the United States.
However, growth of resistant strains will reduce the drug's effectiveness
in preventing perinatal HIV transmission.
Observational studies for
assessing zidovudine's effectiveness have serious methodologic problems,
but which of the following case-control designs would be the most nearly
valid? (Choose one best answer.) (4 pts)
____ a.
infants.
Cases are HIV-infected infants; controls are uninfected
____ b. Cases are HIV-infected infants; controls are uninfected infants

of HIV-seropositive mothers.
____ c. Cases are HIV-infected infants; controls are infants whose
mothers should have received zidovudine but did not.
____ d. Cases are HIV-infected infants whose mothers received
zidovudine; controls are uninfected infants whose mothers received
zidovudine.
8.
The following is background information for questions 8A-8E.
Objective: To determine the prevalence of sexually transmitted

diseases (STD) and high risk sexual behavior for STD among adolescent males
admitted to a juvenile detention facility.
Methods:
Data were obtained from interview, exam, and lab tests.
Results:
Table 1. Behavioral variables in 966 subjects
___________________________________________________________________
Variable
Mean (SD)
Range
Median
Age at first coitus

12.3 (2.0)
5-17
13
No. lifetime partners
13.7 (16.8)
1-100
8
No. partners past 4 months
2.9 (3.4)
0-30
2
No. weeks since last sex
5.8 (15.1)
1-260
2
___________________________________________________________________
SD = standard deviation
8A.
8B.
Which of the descriptive statistics in Table 1 (mean, SD, range,

median) is most susceptible to being influenced by a single extreme
value? (Choose one_best answer.) (4 pts)
a.
mean
b.
SD
c.
range
d.
median
Of the four variables in Table 1, which has the most symmetrical

(normal-like) distribution? (Choose one best answer.) (4 pts)
a.
age at first coitus
b.
number of lifetime partners
c.
number of partners in the past 4 months
d.
number of weeks since last sex
Table 2.
Sexually transmitted diseases in adolescent males

admitted to a juvenile detention facility.
______________________________________________________
No. positive
Disease
/tested
Syphilis
7/930
Gonorrhea
42/940
Chlamydia
66/957
Any of the above
109/908
_______________________________________________________
8C.
Based on the above data and assuming that the the two diseases have
the same average duration, how do their incidence rates compare in
this population? (Choose the one correct answer.) (3 pts)
a.
Incidence of gonorrhea is lower than that of chlamydia.
b.
Incidence of gonorrhea is the same as that of chlamydia.
c.
Incidence of gonnorhea is higher than that of chlamydia.
8D. Based on the above data but this time assuming that the two
diseases have the same incidence, how do their average durations compare in
this population? (Choose the one correct answer.) (2 pts)
a.
Duration of gonorrhea is shorter than that of chlamydia.
b.
Duration of gonorrhea is longer than that of chlamydia.
8E. Elaborate on your answer to the preceding question by deriving an

estimate of the relative duration of gonorrhea relative to chlamydia. Show
the basis for your answer. (3 pts)
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
9.
The following is background information for questions 9A-9D.
In a large urban school district, among 8,000 middle-school school youth

who were well at the beginning of the school year, 400 were absent for 10
days or longer due to acute asthma ("AA-10") during the first nine-week
quarter. Based on a survey believed accurate for the period, 15% of
middle-school youth in the county middle schools smoke cigarettes.
Interviews with the youth who were absent for 10 days or longer revealed
that 100 of them were cigarette smokers. Assume that the school enrollment
does not change during the quarter.
9A. Show these data in the form of a 2 x 2 table. Include an
appropriate title, labels that identify each row and column, and row and
column totals. (4 pts)
9B. What is the cumulative incidence (CI) of AA-10 (10+ absent days due
to acute asthma), in:
a. the cohort of 8,000 youth? (1 pt)
b. youth who smoke cigarettes? (1 pt)

c. youth who do NOT smoke cigarettes? (1 pt)
9C. What measure would you use to quantify the strength of association
between cigarette smoking and AA-10? Show the formula for this measure,
substitute the appropriate numbers for that formula, compute the result,
and state its meaning in one sentence. (4 pts)
a. Formula
b. Substitution
c. Result
d. Meaning ____________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
9D. Assuming that cigarette smoking is responsible for the observed

excess in AA-10, how many cases of AA-10 during the quarter are
attributable to cigarette smoking? Show a relevant formula or diagram,
intermediate computation, and result, and give a sentence stating the
meaning of the result. (4 pts)
a. Formula or diagram
b. Substitution
c. Result
d. Meaning ____________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
10. Suppose that 900 of the subjects in question #8 consent to regular

STD screening following release from detention. Subjects are counseled
about preventive measures and screened every three months for two years.
All cases are treated and cured.
Table 3. Numbers of cases of three sexually transmitted diseases
in adolescent males discharged from a juvenile detention facility
____________________________________________________________________
Follow up Time (Months)
9
12
15
18
Syphilis
Gonorrhea
Chlamydia
0
10
15
1
8
23
0
15
8
3
21
18
1
11
17
Dropouts (cumulative)
10
30
50
90
120
21
24
2
12
17
3
19
14
4
24
11
140
190
270
Number tested
890
870
850
810
780
760
710
630
____________________________________________________________________
(Subjects can become infected with the same organism more than once
and/or become co-infected with more than one organism.)
10A.
pts)
What is the prevalence of chlamydia at the 12 month follow-up? (3
10B.
What is the average incidence density (per 100 person months or per
100 person years) of chlamydia for the two years of follow up?
Assume that: dropouts contribute no time to follow up after the last time
they are tested; subjects remain at risk even while infected. (3 pts)
10C. Give two reasons for preferring incidence density over cumulative
incidence for assessing frequency of infection in this cohort. (6 pts)
i. ___________________________________________________________
_______________________________________________________________
ii. ___________________________________________________________
_______________________________________________________________
11.
A study of alcoholism and major depressive disorder recruited 100

consecutive patients in a Veterans Administration hospital in
Urbana, Illinois.
All patients had been diagnosed as being alcohol
abusers. An equal number of non-abusers were selected randomly from the
same VA hospital.
76 of the participants identified as being abusers
fulfilled criteria for major depression, as did 20 of the non-abusers.
Evaluate the evidence provided by this study for the inference that alcohol
abuse causes depression in relation to the following aspects:
11A.
What is an inherent weakness in this
susceptible to obtaining inaccurate data? (3 pts)
design
that
makes
it
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
11B. Many of the criteria for causal inference pertain to the evaluation
of evidence from multiple studies, but several can also apply to a single
study. Name two (2) such criteria and use them to evaluate (quantitatively
where possible) the evidence from the above study. (6 pts)
i. ___________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
ii. ___________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
_______________________________________________________________
Answer Guide
1.
data
Matching (1 pt each):
Column A - Terms
7 cumulative incidence (11 is ok)
12 incidence density
11 prevalence (7 is ok)
2
9
1
8
4
6
10
5
3
1.
2.
3.
Column B - Topics
Case-control studies
Causal inference
Confounds cross-sectional
dose response
4. Death certificate
induction period
5. Descriptive epidemiology
odds ratio
6. Diagnostic tests
preventive fraction in the exposed 7. Estimates risk
underlying cause of death
8. Measures impact
positive predictive value
9. Natural history of disease
detectable, pre-clinical phase
10. Population screening
migrant studies
11. Proportion
cohort effect
12. Relative rate
(Credit was also given for some other pairings.)
2. Diagnosis of myocardial infarction is based on manifestational

criteria. (4 pts)
3. a. community, person, community (units of assignment, observation,
analysis, respectively, in the Minnesota Heart Health Program. (4 pts)
4. a. to create two treatment groups that are similar at baseline on
both known and unknown factors associated with myocardial infarction (4 pt)
5.
T
Age standardization, True or False (2 pts each):

a.
The indirect method of age standardization uses data from the

stratum specific rates from an external population applied to
the age distribution of the study population.
F b. A standardized mortality ratio is an example of a stratum
specific crude rate.
F c. Standardized mortality ratios are useful when the number of
events is small and multiple comparisons among populations are to be made.
T d. Direct age standardization can be characterized as applying the
same set of weights to the age-specific rates between populations to be
compared.
6.
Predictive value depends both on specificity and on prevalence.
For a given specificity, higher prevalence means higher positive predictive
value, lower prevalence means higher negative predictive value. Prevalence
of coronary artery disease is lower in women who are younger and have few
risk factors, so negative predictive value is higher in this group. (3 pts)
7A.
a. proportion -- The "transmission rate" is the number of HIVinfected infants divided by the total number of births in that group. The
proportion estimates the prevalence of HIV infection in these infants. The
proportion also estimates cumulative incidence of HIV-infected babies among
2nd trimester, HIV-infected pregnant women. Cumulative incidence measures
for birth outcomes are a complex matter, because of the great opportunity
for selection bias due to impaired fecundity and fertility, and
unrecognized pregnancy loss.
In this case, however, the exposure occurs
after the pregnancy has been recognized. (4 pts)
7B.
Relative risk of HIV infection for zidovudine vs. placebo:

Relative risk (RR) = CI1 / CI0 = 7.4% / 25.6% = 0.29
The transmission rates serve as estimates of CI1 and CI0 (the

incidences can be estimated from the transmission rates even if the former
are regarded as prevalences, since there is a restricted risk period and
duration is not a factor). (4 pts)
7C. Proportion of potential cases of perinatal HIV transmission that
could be prevented by zidovudine, i.e., the preventive fraction in the
exposed, PF1 (all women take zidovudine, so all are exposed) (4 pts):
PF1 = 1 - RR = 1 - 0.29 = 0.71 or 71%
By diagram:
H
I
V
T
r
a
n
m
i
_ _ _ _ _ _ _ _ _ _ _ _
|
|
^
|
|
|
|
|
|
|
v
|_______________________
|
|
25.6% transmission rate in women

who do not take zidovudine (based on
the placebo group)
Amount of the transmission rate that
is prevented by zidovudine
7.4% transmission rate in women
who took zidovudine
s
.
|_______________________
(25.6% - 7.4%) / 25.6%
0
=
1 - 7.4% / 25.6% = 0.71 (= 1 - RR)
7D. b. Cases are HIV-infected infants; controls are uninfected infants

of HIV-seropositive mothers. Using all uninfected infants as controls
would make zidovudine appear to be a risk factor for HIV transmission,
since most mothers do not have HIV so their infants will be uninfected.
Choices c. and d. choose the control and/or case group partly on the basis
of exposure, which completely undermines a case-control design. (4 pts)
8A. c. Range -- the range is in fact completely determined by the
highest and lowest values. (4 pts)
8B. a. Age at first coitus -- its mean and mean are both close together
and not very far from the middle of the range.
Although the mean and
median are also close together for the number of partners in the past 4
months, but they are no where near the middle of the range. (4 pts)
-38C. a. Incidence of gonorrhea is lower than that of chlamydia if
duration is the same for both diseases, the prevalence odds are
proportional to the incidence density, so gonorrhea's smaller prevalence
(42/940 vs. 66/957) implies a lower incidence. (3 pts)
8D. a. Duration of gonorrhea is shorter than that of chlamydia if
incidence rates are the same, chlamydia must last longer in order for its
prevalence to be higher. (2 pts)
8E. (3 pts)
Therefore:
Prevalence odds = duration x incidence density.
prevalence odds (gonorrhea)

----------------------------prevalence odds (chlaymdia)
duration(G) x incidence density

-------------------------------duration(C) x incidence density
Since both diseases have the same incidence, the ratio of their
durations equals the ratio of their prevalence odds:
prev. odds for gonorrea
-----------------------prev. odds for chlamydia
42 / 898
-------66 / 891
0.468
------0.741
= 0.63
(Credit was also given for "prevalence = incidence x duration", though this
true only approximately.)
9A.
School absence from acute asthma and cigarette smoking (4 pts):
School absence due to acute asthma in middle school by cigarette smoking

status
AA-10*
Absent fewer than
10 days
Smokers
------100
Nonsmokers
---------300
Total
----400
1,100
6,500
7,600
------
-----
-----
Total
1,200**
6,800
8,000
* AA-10 refers to absence 10+ days due to acute asthma.

** Based on 15% smoking prevalance
9B.
Cumulative incidence of AA-10:

a. Crude CI = 400 / 8,000 = 50 per 1,000 or 5%
b. CI in smokers = 100 / 1,200 = 83 per 1,000 or 8.3%
c. CI in nonsmokers = 300 / 6,800 = 44 per 1,000 or 4.4%
9C.
Strength of association (4 pts):

Cumulative incidence ratio
CI in smokers
----------------CI in nonsmokers
8.3%
-----4.4%
1.89
d. The cumulative incidence ratio (CIR) of 1.9 indicates a moderate

association between cigarettes and extended school absence.
9D. Number of cases of excessive absence due to acute asthma (AA-10)

that (assuming causation) are attributable to smoking.
This question asks for the size of the shaded box in the diagram in
the "evolving text". That diagram, with numbers instead of variables is:
|
8.3% |
|
exposed
Incidence |
|
|
"attributable
4.4% |
|
|
0 |
unexposed
8.3% = incidence
in
|XXXXXXXXXXXXXXX|
|
|
|
|
3.9% x 1,200 |
= 47
|
persons
3.9% =
|XXX
XXXX|
|\\\\\\\\\\\\\\\|
| 4.4% x 1,200 |
|\\
= 53
\\|
300
6,800
Nonsmokers
risk"
1,200 (15%)
Smokers
4.4% = incidence
in
So the number of cases attributable is 47 (after rounding).

can be obtained in various ways:
persons
This number
Number of cases in smokers - "expected" cases in smokers

100 - 1,200 x 4.4%
Attributable risk
Number of smokers
(I1 - I0)
(8.3% - 4.4%)
x
x
1,200
1,200
Number of cases in smokers x Attributable risk proportion (ARP)

100 x (1.89 - 1 ) / 1.89
Overall number of cases x Pop. attributable risk proportion (PARP)
400 x (I - I0) / I
400 x (5% - 4.4%) / 5%
400 x 12%
All these methods come up with approximately the same answer, the
differences being due to the rounding of intermediate results in obtaining
some of the incidences and the CIR. When the numbers from the table are
used and intermediate results not rounded, the number of cases attributable
to smoking is 47.0588
Assuming causation, cigarette smoking is responsible for heavy absence (10
days or more during the fall quarter) due to acute asthma in about 47
middle schoolers in the district, or 12% of all students with heavy absence
due to acute asthma.
10A
Prevalence of chlamydia at the 12-month follow-up (3 pts):

Cases
18 cases found at 12-month follow-up
Prevalence = ----- = -------------------------------------- = 2.2%
PAR
810 youth tested at 12-month follow-up
10B Average incidence density of chlamydia (average simply means one

number that applies to the entire two-year interval, rather than one rate
for each three-month interval - if you compute the latter rates, however,
and take the average, you should obtain the same result as the overall
incidence density) (3 pts):
Prevalence
(Total) Cases
--------------------(Total) person-time
(15 + 23 + 8 + 18 + 17 + 17 + 14 + 11) cases

-----------------------------------------------------------(890 + 870 + 850 + 820 + 780 + 760 + 710 + 630) x 3 months
123 cases
= ------------------ = 0.65/100 person-months = 7.8/100 person-yrs
18,930 person-months
10C
Reasons for preferring incidence density in this case (6 pts):
These diseases have an extended risk period (i.e., one longer than
the period of observation)
People can acquire these diseases more than once
Different lengths of follow-up time per subject
11A. Inherent weaknesses in this design that make it susceptible to

obtaining inaccurate data are the potential for problems of recall,
reporting, and recording in medical records; also, there is considerable
opportunity for alcohol abuse status to influence diagnosis of depression.
(3 pts)
11B.
Criteria for causal inference (6 pts)
Strength of association -- in this regard the study provides strong

evidence of causation due to its very high odds ratio
([(76)(80)]/[(20)(24)] = 12.7 -- assuming for this discussion that the OR
is not biased by design problems)
Temporality (antecedant-consequent) -- there is no indication here that
alcohol abuse preceded major depression, and the reverse seems just as
possible. Other criteria (e.g., dose-response, biological plausibility,
experiment, analogy, consistency, coherence) either do not apply to a
single study or cannot be evaluated with the information provided.

The following exam questions relate to the article:
Freudenheim J et al. Exposure to breastmilk in infancy and
the risk of breast cancer. Epidemiology 1994;5:324-331. You
may refer to this article du ring the examination.
NOTE:
o
o
Write all answers on the answer sheets provided.

You may keep the examination questions.
Write the last five digits of your student id number in

the upper right-hand corner of each page of your
answer sheets.
This examination is closed book. However, you may use

a calculator, English, foreign language, or medical
dictionary.
When you finish please sign your name on the sign-out

sheet under the pledge:
"I have neither given nor received help from others in completing
this examination."
o
Good luck and happy holidays.
___________________________________________________________________________
___________
1. Which of the following best characterizes the present study

as presented in the article (2 pts):
A. Analytic study to investigate the hypothesized
relationship in an available dataset
B. Descriptive study using available data
C. Analytic study of data collected to investigate the
hypothesized relationship
D. A post-hoc analysis of data collected primarily for
another study (i.e., of secondary data)
2. Find an example from the paper for each of the following
(give the page number and quote enough of the words to
identify the point or passage; the same point or phrase
cannot be used more than once) (2 pts each)
a. A finding from a migrant study or studies;
b. A finding from descriptive epidemiology;
c. An association from an ecologic study.
3. Several previous studies of exposure to breastmilk and risk
of breast cancer in adulthood reported little association in
crude analyses (p. 324). The authors suggest that the
absence of an association could have resulted from a fai lure
to adjust for age. Which of the following best explains why
failure to adjust for age could have obscured an underlying
true association. (Choose one best answer.) (2 pts)
A. Age is causally related to breast cancer risk and an
infants age is related to her exposure to breastmilk.
B. Age is causally related to breast cancer risk and infant
feeding practices have changed over time.
C. Age is causally related to breast cancer risk but not
associated with breast feeding purchases.
D. Age is causally related to breast cancer risk but is
causally related to breast feeding practices.
4. The authors describe their study as a case-control study of
dietary and reproductive factors for breast cancer (p. 324).
Which of the following best describes the type of situation
for which case-control studies are most advantageo us
compared to other designs. (choose one best answer). ( 2
pts)
A. rare exposure, common endemic disease.

B. rare exposure, rare endemic disease.
C. common exposure, common endemic disease
D. common exposure, rare endemic disease
5. The authors used the term "cohort effects" in regard to
results from previously reported studies. Which of the
following best describes what is meant by cohort effects in
this context? (choose one best answer). (2 pts)
A. Breast cancer cases are heterogeneous with respect to
known factors.
B. Secular changes in infant feeding practices result in an
association between age and exposure to breastmilk.
C. Breast cancer and control subjects come from
nonoverlapping birth cohorts.
D. Recall accuracy of breastmilk exposure may differ by
birth cohort.
6. Cases in this study were incident cases of conformed cancer
of the breast (p. 325). Which of the following best describes
the advantage of selecting incident cases over prevalent
cases (choose one best answer) (2 pts)
A. selecting from a pool of prevalent cases would make
separation of factors associated with risk and those
with survival more difficult.
B. selecting from a pool of prevalent cases would make
exposure assessment more difficult because of preexisting disease status.
C. selecting from a pool of incident cases creates a more
homogenous case group with regard to unknown
confounding factors.
D. selecting from a pool of incident cases reduces
misclassification bias.
7. The authors characterize this study as a case-control study of
primary and histologically confirmed cancer of the breast in
women. For each of the two key terms in this phrase, briefly
explain its meaning and significance for the s tudy: (2 pts
each)
a. primary
b. histologically-confirmed
8. In this study, controls were selected by a random process

from residents of the two counties and were frequency age
matched to cases (p. 325). Which of the following best
describes a reason for preferring community controls over ho
spital-based controls for this study? (choose one best
answer). (2 pts)
A. the random selection of controls from the community
usually produces groups of cases and controls that are
similar in known and unknown confounding variables.
B. the random selection of controls from the community
provides a better estimate of breastmilk exposure
among the source population.
C. the random selection of controls from the community
ensures that the subsequent odds ratio is not an
overestimation of the association of breast feeding and
adult breast cancer.
D. The random selection of controls from the community
reduces the likelihood of differential misclassification
of exposure in cases and controls.
9. Information on breastmilk exposure was based on subject's
self-report (p. 325). If exposure information could also be
obtained from an independent source (such as physician
records, or reports from parents), then the agreement betw
een these two methods could be compared. Which of the
following measures would be most appropriate to quantify
the reliability between the two methods? (choose one best
answer). (2 pts)
A. kappa coefficient
B. correlation coefficient of reproducibility
C. intraclass correlation coefficient
D. product-moment correlation
E. A or B
F. A, B, or C
10.
In a hypothetical validation study of self-report of being
breastfed as an infant, the presence of a newly discovered
antibody that could serve as a "gold standard" indicator of

being breast fed as an infant was compared to self-re port.
Testing for the presence of this new antibody is very
expensive and was done only on the 204 cases age 40-50 (see
table 1). The following data from the validation study were
compiled. Calculate the (a) sensitivity, (b) specificity, (c)
positive pre dictive value, and (d) predictive value of a
negative test. Construct an appropriate 2x2 table and show
your work (6 pts)
Data from validation study:
1. the breastfed antibody was found in 73.5% cases.
2. 80 self-reports were false negative
11.
From the data presented in Table 1 answer the
following:
a. For premenopausal women with greater than a high
school education, compute and interpret the odds ratio
for having breastfed as an infant and breast cancer as
an adult. (2 pts)
b. Referring to your analysis in part (a), assume now that
20% of controls who gave a positive history of having
been breastfed had not in fact been breastfed, but that
all other data were correct. Compute and interpret the
odds ratio for having breastfed as an infant and breast
cancer as an adult under this assumption. (2 pts)
c. Which of the following best describes the type of
misclassification illustrated in part (b) above. (2 pts)
A. differential misclassification of disease and exposure
status
B. differential misclassification of exposure
C. nondifferential misclassification of exposure
D. nondifferential misclassification of disease and
exposure status
12.
For each of the following statements, indicate if it is
TRUE OR FALSE: (1 pt each)
a. By matching the controls to the cases on age, the

authors have ensured that age will not be a
confounder .
b. The procedure for identifying cases is essentially one of
active surveillance.
c. The difference between the proportion of cases
interviewed and the proportion of controls interviewed
will cause selection bias.
d. The fact that premenopausal controls who had been
breastfed were somewhat older than controls who had
not (page 325, bottom of col. 2) indicates frequency
matching by age did not "work.
e. The absence of an association between age and breast
cancer in tables 1 and 2 is likely to be a reflection of
selection bias from the low response rates for cases and
controls.
f. In postmenopausal women there appears to be a "dose
response" relationship between body mass index and
the association between having been breastfed.
g. A case-control study design is often the design of
choice in outbreak investigations.
h. For a factor under study to be considered an effect
modifier it must be an independent risk factor for the
outcome of interest
13.
A list of control variables for use in the logistic
regression models appears on page 325, middle of column 2.
These variables have been chosen because they (choose one
best answer): (2 pts)
A. are likely to be associated with breast cancer risk in the
bottle-fed women.
B. are known or suspected risk factors for breast cancer,
or at least proxies for such factors
C. are likely to be associated with infant feeding history in
the controls
D. are likely to be associated with infant feeding history in
the cases
14.
The presentation of data in Table 2 can be used to
examine a number of relationships. Using these data give a
numerical example of each of the following (show your work
and in one sentence explain what the number means): (2 pts
eac h)
a. An association between breast cancer risk and having
zero pregnancies. Use > 3 pregnancies as a reference.
b. An association between having been breastfed and
being over 165 cm in height. Use <160 cm as a
reference.
c. An association between breast cancer and having been
breastfed, overall.
15.
On page 326, 2nd column, the authors state "As shown
in Table 3, the risk of breast cancer associated with having
been breastfed, was about 0.7 for both pre- and
postmenopausal women." In this context, to which of the
following epi demiologic measures does the term "risk"
refer? Choose one best answer. (2 pts)
A. Cumulative incidence
B. Incidence density
C. Attributable risk
D. Odds ratio
16.
Using the data in Table 3, estimate AND state the
meaning of the following measures (for this question you
may ignore the possibility of selection bias in cases and
controls):
a. Attributable Risk Proportion (ARP) for NOT having
been breastfed for all breast cancer (both
premenopausal and postmenopausal breast cancer,
combined). Note that an ARP is also known as the
etiologic fraction in the e xposed. (3 pts)
b. Population Attributable Risk Proportion (PARP) for
NOT having been breastfed for premenopausal and for
postmenopausal breast cancer, separately (i.e., 2
PARP's). Note that the PARP is also known as the
etiologic fract ion. (4 pts)
c. Why would you or would you not expect the PARP to be

different for premenopausal breast cancer compared to
the PARP for postmenopausal breast cancer case in this
investigation (part b)? (2 pts)
17.
In the multiple logistic model referred to as Model 2 in
Table 3, what was the coefficient for the variable not-havingbeen-breastfed among all breast cancer cases? (2 pts)
Which of the following assumptions is involved in that
model? Indicate True or False for each assumption. (1 pt
each)
a. The odds of breast cancer vary as the product of the
odds for age and the odds for education.
b. The odds of breast cancer vary as the sum of the odds
for age and the odds for education.
c. Age, education, and not having been breastfed were
independent of (i.e., uncorrelated with) each other.
d. Breast cancer is a rare disease.
18.
Suppose that cases who refused to participate in this
study were less likely to have been breastfed as infants than
those who participated in the study. Which of the following
best describes what this fact would imply for the obser ved
relative risk associated with being breastfed compared with
what would have been observed had all persons participated
I the study? (choose one best answer). (2 pts)
A. the observed relative risk would be biased away from
the null.
B. the observed relative risk would be subject to selection
bias and the direction of the bias can not be estimated.
C. the observed relative risk would be biased toward the
null.
D. the observed relative risk would be subject to
misclassification bias and the direction of the bias can
not be estimated.
19.
In table 3, the confidence intervals for the OR's for all
women do not include the value 1.0, whereas all but one of
the OR's for premenopausal breast cancer and
postmenopausal breast cancer do. Mathematically, what does
this patte rn reflect? (2 pts)
20. On page 324, 2nd column, the authors offer a possible
explanation of why two previous studies of breastfeeding and

breast cancer found little crude association, observing that
the result may have been "confounded by a fa ilure to adjust
for age, because of cohort effects with regard to
breastfeeding frequency". The following stratified analysis
has been constructed to illustrate a situation where cohort
effects with regard to breastfeeding completely obscure a
true prote ctive association seen when age is controlled.
Age < 60
Age > 60
Total
Breastf Bottlefe
ed
d
Breastf Bottlefe
ed
d
Breastf Bottlefe
ed
d
Cases
24
40
256
100
280
140
Contro
ls
79
86
204
54
280
140
OR
0.653
0.678
1.0
Based on these hypothetical data:

a. demonstrate that there is a cohort effect for
breastfeeding, (2 pts)
b. briefly explain (1-2 sentences referring to specific
numbers or calculations for these tables) how failure to
adjust for age interferes with finding a protective effect of
breastfeeding. (2 pts)
21.
An epidemiology graduate student finds evidence in the
literature that childhood sunlight exposure may affect adult
breast cancer risk. To explore this hypothesis, she obtains
from the authors the palace of birth for all of the sub jects in
the present study and constructs a sunlight exposure
variable ('high" or "low") based on geologic and meteorologic
data for the years of the subject's childhood. Her data show
that 56.2% of the 219 premenopausal women who were not

breastfed as i nfants grew up with "high" sunlight exposure.
Based on this fact and the partially-completed tables below,
(a) calculate the odds ratio of breast cancer with respect to
breastmilk exposure within each of the two sunlight
exposure strata, and (b) briefly describe the relationship of
the sunlight exposure variable to the association between
breast cancer and breastmilk exposure (i.e. in relation to
confounding and effect modification. (4 pts)
High sunlight
Case
s
Breastfe
d
Bottlefe
d
Total
Control
s
Total
24
Low sunlight
Case
s
Control
s
Total
67
81
36
191
284
22.
Use the data from Table 2 (Distribution of
Characteristics of Postmenopausal Cases and Controls) to
draw separate 2 x 2 tables for women who have had : 0
pregnancies, 1-2 pregnancies, and >=3 pregnancies. (5 pts)
a. calculate odds ratios for each of these three categories.
b. Assuming no effects of confounding, interpret your
findngs in part (a).
23.
A hypothetical cross-sectional ancillary study to this
report was conducted. In that study a survey of breast cancer
annual incidence rates in geographically distinct areas was
completed. Region A in the upper Midwest were breast c
ancer mortality is high, and Region B the Southeast where
mortality from breast cancer is low. The following data were
obtained.
Region A
Region B
Age
No.
of
cas
es
Populati
on
Rate/1,0
00
No.
of
cas
es
Populati
on
Rate/1,0
00
4050
10
7,000
1.4
10
15,000
0.7
5160
15
10,000
1.5
20
5,000
4.0
6165
30
3,000
10
600
55,000
10.9
Tot
al
55
20,000
630
75,000
4050
1,000
5.0
2,000
3.0
5160
2,000
2.5
10
15,000
0.7
6165
500
8.0
1,000
4.0
Tot
al
14
3,500
20
18,000
69
23,500
650
93,000
< High
School
Educati
on
>=
High
School
Educati
on
Grand total
Crude
2.9
Compute the following (for adjusted rates use the direct

method and the total population as a standard):
a. the overall region B crude event rate. (1 pt)
b. Age and educational achievement adjusted rate for
Region B: (2 pts)
c. Age and educational achievement adjusted rate for
Region B: (2 pts)
d. Compare the overall crude rates with the age and
educational achievement adjusted rates. Briefly explain
your findings. (2 pts)
24.
Write a brief statement for or against a causal
relationship between breastfeeding in infancy and risk of
breast cancer as an adult. Comment specifically on at least
two of Bradford Hill's criteri for causal inference. Include in
y our comments data or statements from the article. (5 pts)
25.
Assuming that this relationship is causal, why might a
similar study, 50 years from now, fail to find as strong a
relationship? (2 pts)
Answer Guide
1.
C.
2.
Analytic study of data collected to investigate the

hypothesized relationship
a. A finding from a migrant study or studies: "Studies of
migrants provide some evidence; for example, migrants to the
United States from Japan experienced a rate of breast cancer
intermediate between the lower rate in Japan and the higher
rate in the U.S."
b.
c.
A finding from descriptive epidemiology: *Many possibilities,

including either of these sentences:
"This finding implies a possible connection between the
trend toward increasing bottlefeeding in the postwar
period and current trends toward increasing incidence of
breast cancer. Furthermore, it offers a partial
explanation of the international variation in breast
cancer rates, with rates considerably lower in less
developed than in developed nations."
An association from an ecologic study: *"Micozzi found mean
adult height and breast cancer incidence in 30 countries to be
highly correlated (r=0.8)."
3. B. Age is causally related to breast cancer risk and infant feeding

practices have changed over time.
4.
D.
Common exposure, rare endemic disease.
5. B. Secular changes in infant feeding practices result in an

association between age and exposure to breastmilk.
6. A. selecting from a pool of prevalent cases would make separation of
factors associated with risk and those with survival more difficult.
7.
a.
Primary -- Primary breast cancer is a tumor that originates in the

breast, rather than a tumor in the breast that is the result of
metastasis from a tumor that originated in another location or
tissue. In general, tumors originating in the same organ and
tissue are more likely to have similar etiologies than are
tumors that originate in different organs.
b.
8.
B.
9.
A.
10.
Histologically-confirmed -- histological confirmation

refers to the verification of the diagnosis (of breast cancer)
through laboratory examination of tumor tissue. Microscopic
examination of tumor cells establishes the existence and type
of tumor with a greater degree of certainty than does a
clinical diagnosis alone. Counting only histological-confirmed
cases reduces the potential for false positive breast cancer
diagnoses and the misclassification bias will cause.
The random selection of controls from the community provides a
better estimate of breastmilk exposure among the source population.
Kappa coefficient
Table:
Biomarker validation of women's self-report of having been
breastfed
Breastfeeding biomarker found
S
e
l
f
r
e
p
o
r
t
Yes
No
Total
-------------------------------------------Breastfed
70
26
96
Not breastfed
80
28
108
-------------------------------------------Total
150
54
204
Derivation: 204 cases tested (overall total), 73.5% (=150) have

the marker (so 54=204-150 do not), 80 are false negatives by
self-report (so 80 = "yes" biomarker, "no" self-report), and the
remaining cells and marginals are obtained from these numbers.
a.
Sensitivity = 70 / 150 = 47% (Answers the question, "Of women

who truly were breastfred, as demonstrated by the presence of the
biomarker for having been breastfed, what % were correctly
classified by self-report?"))
b.
Specificity = 28 / 54 = 52% (Answers the question, "Of women

who were not breastfed, as demonstrated by the absence of the
biomarker, what % were correctly classified by self-report?")
c.
Positive predictive value (PPV) = 70 / 96 = 73% (Answers the

question, "Of women classified, on the basis of their self-report,
as 'having been breastfed', what % were correctly classified?")
d.
Negative predictive value (NPV) = 28 / 108 = 26% (Answers the

question, "Of women classified, on the basis of their self-report,
as 'not having been breastfed', what % were correctly
classified?")
11.
a. Table:
Adult breast cancer by having been breastfed as an infant,
among premenopausal women with education beyond high school
Breastfed
Not breastfed
Case
Control Total
-----------------------61
93
154
69
61
130
-------------------------
Total
130
154
284
OR = (61 x 61) / (93 x 69) = 0.58.

Interpretation: having been breastfed appears to be protective
against female adult breast cancer, with a reduction in risk of
approximately 40%.
b.
Table:
Adult breast cancer by having been breastfed as an infant,
among premenopausal women with education beyond high school,
assuming that 20% of controls who reported having been
breastfed had in fact not been
Breastfed
Not breastfed
Total
Cases
Controls Total
------------------------61
74
135
69
80
149
------------------------130
154
284
Derivation: 20% of the 93 controls who reported having been

breastfed had not been, so 20% of 93 (=18.6->19) are switched from
"Breastfed" to "Not breastfed", being added to the 61 who reported
not having been breastfed. The remaining 80% of 93 (=74.4->74)
remain in the upper row.
OR = (61 x 80) / (74 x 69) = 1.0, i.e.
c.
12.
no association.
B. differential misclassification of exposure

TRUE or FALSE
a. False - matching controls to cases does not prevent the
matching variable (age) from being associated with the exposure
(having been breastfed), so the matching cannot prevent
confounding. (See also d. and e.)
b. True - The nurse telephoned hospitals on a frequent, regular
basis, to identify all breast cancer cases.
c. False - The difference in the proportions interviewed among
cases and among controls provides a great deal of potential for
selection bias, but if nonparticipation was not related to having
been breastfed then selection bias will not occur.
d. False - The matching caused cases and controls to have the

same age distribution, so it did "work"; matching would not be expected to
eliminate an association between age and the exposure, since exposure
status was not known when controls were being selected and in any case
would not have been used in the matching procedure.
e.
False - The matching procedure prevented an association.
f. False - The association between body mass index and breast

cancer can be assessed by estimating odds ratios from Table 2. To
avoid confounding infant feeding history we should preferably
assess the association separately in breastfed women and in women
who have not been breastfed (omitting the complexities from
considering body mass to be an intervening variable in the effect

of infant feeding history). To avoid being misled by a possible
"synergism" involving infant feeding and body mass, ideally we
would look in the "unexposed" group. However, although this study
focuses on breastfeeding, one can also consider "formula feeding"
as an exposure that might be "synergistic" with body mass. So we
can choose either exposure group (or both).
Here are the computations:
From Table 2:
Body mass
index (kg/mz)
16-22
23-27
>27
Cases
------------------------Breastfed
Not breastfed
---------- -------------48
103
90
15
26
17
Controls
------------------------Breastfed
Not breastfed
--------------------89
125
91
19
16
16
To show the details, here is a table for estimating OR's for body mass
index and breast cancer:
Body mass
index (kg/m sq)
16-22
23-27
>27
Breastfed
--------------Cases Controls
48
89
103
125
90
91
Not breastfed
15
19
26
16
17
16
Total
63
108
129
141
107
107
and the resulting OR's are [e.g., (90 * 89) / (48 * 91) = 1.83]:
Breastfed
Body mass
--------index (kg/m sq)
16-22 (ref. level)
1.0
23-27
1.83
>27
1.83
Not breastfed
-------------
Total
---------
1.0
2.06
1.34
1.0
1.57
1.71
The OR's in the total column are shown to illustrate that in this
case there is some confounding by breastfeeding history, at body
mass index level 23-27 kg/m sq. Within either breastfed or not
breastfed group there is no "dose-response" relationship.
g. True - Generally, generally an outbreak investigation begins after
the outbreak has begun and the investigation seeks to determine what
characteristics of cases might have been responsible for their disease. If
the cases happened to be part of an existing cohort for which the requisite
exposure information was already available in some form, then a
retrospective cohort study would be another possibility.
If cases are
still occurring a prospective cohort study might be initiated, but the
better an idea the investigators have about which exposures to assess, the
more they should intervene to minimize the occurrence of additional cases.
h.
False - for a factor to be considered a confounder, it must be

an independent risk factor for the outcome, but this requirement
does not pertain to effect modification. For example, genital
ulcers cannot cause HIV by themselves, but in conjunction with a

sex partner who is HIV infected, genital ulcers can increase
(modify) the risk of HIV infection.
13.
Potential confounders are factors that are known or suspected risk

factors for breast cancer or its detection, or at least proxies for
such factors.
14.
a.
Breast cancer risk and no previous pregnancies
No pregnancies
Cases
Controls
Total
------------------------------50
38
88
>= 3 pregnancies
Total
167
216
383
------------------------------217
254
471
OR = (50 x 216) / (38 x 167) = 1.7 (for zero vs. >= 3 pregnancies)
Interpretation: having never been pregnant was associated with an
increased breast cancer rate, with an apparent 70% greater rate
among nulligravidae (women who have never been pregnant).
Other choices of a reference level produce the same result, e.g.,
1-2 pregnancies as the reference level:
OR = (50 x 102) / (38 x 82) = 1.6.
If both groups, 1-2 pregnancies and 3+ pregnancies are combined
and used as the reference group, then:
OR = (50 x 318) / (38 x 249) = 1.7
b.
Height above 165 centimeters and having been breastfed

Height
Breastfed
> 165 cm
< 160 cm
Total
----------------------------------148
183
331
Not breastfed
Total
41
25
66
---------------------------------189
208
397
OR = (148 x 25) / (183 x 41) = 0.49.

Interpretation:
Women who were breastfed were less likely
to be over 165 cm. tall.
Other possible OR's -> 165 vs. 160-165:
OR = (148 x 43) / (213 x 41) = 0.73
> 165 vs. all others:

c.
OR = (148 x 68) / (396 x 41) = 0.62
Breast cancer and having been breastfed (crude)

Cases
Controls
Total
----------------------------------
Breastfed
241
Not breastfed
305
546
Total
58
51
109
---------------------------------299
356
655
OR = (241 x 51) / (305 x 58) = 0.69

Interpretation: having been breastfed was associated with lower
risk of breast cancer
15.
D.
The statement refers to the (relative) risk of breast cancer

between women who were and were not breastfed, estimated using
the odds ratio.
16.
a.
Estimate RR for Not breastfed

as 1/OR for Breastfed: 1 / 0.69 = 1.45
ARP
(RR - 1) / RR
(1.45 - 1) / 1.45
0.45/1.45
0.31
Interpretation: Some 31% of breast cancer in women who were not

breastfed was attributable to their having not been breastfed.
b.
If know the formula (or can derive it from the diagram and the
"grand synthesis"):
PARP
P(E|D) (RR-1)
--------------RR
Premenopausal:
and since breast cancer is rare, use OR.
(117)
----------- (1.47-1)
(117+112)
----------------------1.47
(0.51) (0.47)
--------------1.47
0.16
AND
(58)
-------------- (1.45-1)
(58+241)
Postmenopausal: -------------------------
(0.19) (0.45)
--------------- = 0.06
1.45
1.45 Meaning: In women who wre not breastfed, some 16% of premenopausal
breast cancer and some 6% of postmenopausal breast cancer were attributable
to their having not been breastfed.
=
OR, reason as follows:

Proportion of exposed (Not breastfed) cases that are atttributable to not
having been breastfed is:
ARP = (RR-1)/RR
Since breast cancer is rare, we can estimate with
(OR-1)/OR = (1.47-1) / 1.47 = 0.3197 for postmenopausal.
However, this proportion applies only to cases who are exposed
(because ARP is "proportion of exposed cases . . ."). So estimate
proportion of all cases who are exposed:
=
Pr(Exposed|Case) = 117 / (117+112) = 0.51 for postmenopausal
Muliplying 1. by 2., 0.51 x 0.3197 = 16% for postmenopausal
c.
17.
The PARP for premenopausal breast cancer is expected to be

greater due to the secular decrease in breastfeeding during the
decades when these women were infants. Thus, the proportion
exposed to not having been breastfed is substantially greater for
the premenopausal breast cancer cases. Hence, their PARP is
greater.
Logistic model coefficients for risk factor variables are natural

logarithms of odds ratios per one unit change in the variable.
So the coefficient was ln(0.70) = -0.3567
Assumptions:
a. True - The odds of breast cancer vary as the product of the odds
for age and the odds for education.
b. False - Only in a few special cases will the product of two odds
equal their sum (e.g., both odds equal zero or both odds equal two). The
logistic model is additive in the logit (logarithm of odds), multiplicative
in the odds.
c. False - One of the reasons for using mathematical modeling is that
the risk factors (exposures and potential confounders) ARE associated
(i.e., not independently distributed)
d.
True - Breast cancer is a rare disease.
18.
C.
The observed relative risk would be biased toward the null.
19.
Smaller sample sizes produce wider confidence intervals, so if the

point estimates for the crude and stratum-specific measures are about
the same, then the confidence intervals for the latter will be wider.
20.
Cases
Controls
OR
AGE < 60
AGE > 60
TOTAL
---------------------------------------------------Breast Bottle
Breast Bottle
Breast Bottle
------ ----------- ----------- -----24
40
256
100
280
140
79
86
204
54
280
140
---------------------------------------------------0.653
0.678
1.0
a.
Control women in older stratum are more likely to have been

breastfed than control women in the younger stratum, e.g., odds of
having been breastfed are 0.9 (79/86) among younger women and 3.8
for AGE > 60.
b.
Age is a strong risk factor for breast cancer, so if breastfed

women were older than bottle-fed women, than a possible protective
effect of breastfeeding could have been offset by the greater risk
associated with older age.
21. An epidemiology graduate student finds evidence in the literature

that childhood sunlight exposure may affect adult breast cancer risk. To
explore this hypothesis, she obtains from the authors the place of birth
for all of the subjects in the present study and constructs a sunlight
exposure variable ("high" or "low") based on geologic and meteorologic data
for the years of the subject=B9s childhood. Her data show that 56.2% of
the 219 premenopausal women who were NOT breastfed as infants grew up with
"high" sunlight exposure. Based on this fact and the partially-completed
tables below,
(a) calculate the odds ratio of breast cancer with respect to breastmilk
exposure within each of the two sunlight exposure strata, and
(b) briefly describe the relationship of the sunlight exposure variable to
the association between breast cancer and breastmilk exposure (i.e. in
relation to confounding and effect modification. (4 pts)
22.
High Sunlight
Breastfed Yes
Breastfed No
Total
Cases
44
81
125
Controls
24
*42
66
Total
68
123
191
Low Sunlight
Breastfed Yes
Breastfed No
Total
Cases
67
36
103
Controls
*120
*61
181
Total
187
97
284
crude from Table 1 or Table 3 = 0.68

High sunlight OR = (44x42)/(24x81) = 0.95
Low sunlight OR = (67x61)/(120x36) = 0.95.
Sunlight is a confounder of the protective effect of breastfeeding
as an infant. It is not an effect modifier.
Use the data from Table 2 (Distribution of Characteristics of

Postmenopausal Cases and Controls) to draw separate 2 x 2 tables for
women who have had: a. 0 pregnancies, b. 1-2 pregnancies, c. >=3
pregnancies. Be sure to include appropriate labels. (5 pts)
Breast
Bottle
Total
a)
0 pregnancies
Cases Controls
34
35
16
3
50
38
1-2 pregnancies
Cases Controls
71
90
11
12
82
102
3 pregnancies
Cases Controls
136
180
31
36
167
216
Calculate odds ratios for each of these three categories.

0 pregnancies: OR = (34 x 3) / (16 x 35) = 0.18
1-2 pregnancies: OR = (71 x 12) / (11 x 90) = 0.86
>=3 pregnancies: OR = (136 x 36) / (31 x 180) = 0.88
b)
Assuming no effects of confounding, interpret your findings in

part (a).
There is effect modification. The magnitude of the protective
effect of having been breast-fed on development of breast cancer
is dependent on pregnancy history. Having been breast-fed is a
stronger protective factor for those women who never had a
pregnancy.
23.
A hypothetical cross-sectional ancillary study to this report was

conducted. In that study a survey of breast cancer annual incidence
rates in geographically distinct areas was completed, Region A in the
upper midwest where breast cancer mortality is high, and Region B the
Southeast where mortality from breast cancer is low. The following
data were obtained.
Region A
Cases Population Rate/1000
< High School Education
Age
40-50
10
7,000
1.4
51-60
15
10,000
1.5
61-65
30
3,000
10
Total
55
20,000
High School Education

Age
40-50
5
1,000
51-60
5
2,000
61-65
4
500
Total
Grand Total
5.0
2.5
8.0
Region B
Population Rate/1000
10
20
600
15,000
5,000
55,000
630
75,000
6
10
4
2,000
15,000
1,000
14
3,500
20
18,000
69
23,500
650
93,000
Crude
a.
Cases
0.7
4.0
10.9
3.0
0.7
4.0
2.9
Compute the overall Region B crude event rate: (1 pt)

= 7.0/1000
Using the total population as a standard compute the following by

the direct method of adjustment:
b. Age and educational achievement adjusted rate for Region A (2
6.0/1000
c. Age and educational achievement adjusted rate for Region B (2
pts) = 6.3/1000
d. Comparison of the overall crude rates with the age and
educational achievement adjusted rates.
pts) =
Briefly explain your findings. (2 pts): Much of the difference

between the crude rates of the two regions is due to the different
distributions of age and educational achievement.
24. Causal relationship - Comment specifically on at least two of
Bradford Hill's criteria for causal inference. Include in your comments
data or statements from the article. (5 pts)
25.
Assuming that this relationship is causal, why might a similar study,

50 years from now, fail to find as strong a relationship? (2 pts)
Formula changes (less fat), overfeeding reduced reflecting recent

trends.
Midterm Exam, Fall 1996
Most of the questions in this examination are based on the article:

Garry VM, Schreinemachers D, Harkins ME, Griffith J. Pesticide
appliers,
biocides, and birth defects in rural Minnesota. Environ Health
Perspect
1996;104:394-399.
A copy of this article was provided to you before this examination and can
be
used in answering the following questions.
1.
the
Briefly state the primary study question of this report. Identify

main exposure and outcome of interest.
2.
on
(3 pts)
Briefly explain the difference between disease classification based

manifestational criteria and disease classification based on causal
criteria. What is the logic for analyzing the data in relation to
categories of anomalies grouped by organ system? (4 pts)
___________________________________________________________
3.
this
As discussed in class, epidemiologic studies often have both

descriptive and analytic characteristics. State one way in which
study is descriptive and one way in which it is analytic? (4 pts)
4.
example
The reporting of birth defects was provided in accord with state

statutes, and grouping of birth defects categories followed the
National Centers for Health Statistics guidelines (page 394 second
paragraph - methods). This reporting of birth defects is an
of which of the following types of data collection methods.
one best answer. (4 pts)
A.
B.
C.
D.
Active surveillance
Ongoing crossectional survey
Passive surveillance
Follow up study of dynamic population
Choose
5.
of
apply
This study determined exposure and outcomes using data from "a list
all members of the agricultural community who were certified to
restricted-use pesticides in 1991" (p. 394-methods) and from "all
inthrough
wedlock live births recorded in the state for the years 1989
1992" (p. 394-methods). Briefly assess the strength of these data
sources in establishing the temporal sequence of pesticide exposure
and birth defects and provide support for your assessment. (4 pts)
6.
it
For each of the following epidemiologic measures, indicate whether

is a rate, a proportion, or a ratio that is neither a rate nor a
proportion, or none of these. Circle the best answer (4 pts)
A. Population attributable risk (PAR)

neither
B. Incidence density (ID)
neither
C. Prevalence
neither
D. Relative risk
neither
rate
proportion
ratio
rate
proportion
ratio
rate
proportion
ratio
rate
proportion
ratio
7. The use of the term "rate" is not an infallible guide to the

specific
epidemiologic measure being presented. Which one of the following
epidemiologic measures best characterizes the measure that the
authors
refer to as the "rate of anomalies per 1000 live births" (Table 2
footnote)? Choose one best answer. (4 pts)
A.
B.
C.
D.
8.
one
cumulative incidence (CI)

incidence density (ID)
prevalence
attributable risk proportion
The authors indicate that table 1 supports their statement...

"pesticide appliers had significantly more children with an anomaly
than did nonappliers" (p.395 results first paragraph).
This
statement is readily understood but not literally correct. Which
of the following state the finding more precisely? Choose one best
answer. (4 pts)
A.
than
pesticide appliers had 1.37 times more births with anomalies

did the general population.
B.
did
pesticide appliers had more children with birth anomalies than

the general population.
C.
pesticide appliers had a greater proportion of births with

anomalies as compared to the general population.
D.
than
Pesticide appliers accounted for more births with anomalies

did the general population.
9.
is
Table 1 presents both crude and age-adjusted odds ratios. In the

table, the age adjusted odds ratio for gastrointestinal anomalies
slightly larger than the crude estimate, as is the case for most of
the odds ratios presented. If the difference between the crude and
age-adjusted odds ratios had been large, explain in general terms
what
this would mean regarding the respective ages of the pesticide

appliers and the general population. Assume the maternal age
structure of the combined population was used as the standard. (3
pts)
___________________________________________________________________
10.
Using data in Table 1:

a.
birth
anomalies (in wedlock, all types together) to fathers who are

certified pesticide appliers. State the assumption required to
interpret this estimate. (4 pts)
b.
birth
11.
ratio
birth
and
have
Compute an estimate of the potential impact of pesticides on
Compute an estimate of the potential impact of pesticides on

anomalies (in wedlock, all types together) in the Minnesota
population as a whole. (3 pts).
Using the data presented in Table 1, recalculate the crude odds

for all births with anomalies assuming that all
musculoskeletal
anomalies occurring among those with maternal age greater than 30

the "other" anomalies among maternal age > 35 were later found to
actually have occurred among persons incorrectly classified as
appliers. Explain what implications this new calculation would
on the conclusions of the study. (3 pts)
___________________________________________________________________
12.
have
in
It is possible that the pesticides examined in this study might

reduced fecundity or increased the proportion of conceptions not
resulting in live births. Assume that both of these effects (lower
fecundity, more spontaneous abortions, and more still births) have
fact occurred in the pesticide applier population studied here, so
that the number of live births to pesticide applier fathers is
smaller
of
than it would have been in the absence of pesticide exposure. Which
the following statements is (are) TRUE and which is (are) FALSE? (2

pts each)
TRUE FALSE
____ ____
____ ____
birth
the
A. Since all births would be affected equally, effects on

fecundity and spontaneous abortion WOULD NOT have influenced
the size of the odds ratio presented in this study. [This
question is problematic.]
B. If pesticides were equally likely to cause fetal loss and
anomalies, then the odds ratios would strongly understate
harmful effects of pesticides.
13.
Table 4 shows the frequency per 1000 births of major anomalies for
the
general population by region. Which of the following best

describes
the study design from which these data were obtained. (4 pts)
A.
B.
C.
D.
14.
report
ecologic study
prospective cohort study
retrospective cohort study
region-specific case control study
The authors begin their discussion section by stating that this
need
"is an initial step in the evaluation of the possible relationships

between the frequency of birth anomalies and pesticide use". They
conclude, however by saying that these data "signify a clear-cut
for comprehensive examination of the health issues involved".
latter statement seems to indicate that the authors suspect a
causal
This
relationship. Identify and describe three criteria for causal

inference for which at least some information is present in the
article.
Give specific examples from the article to support your
selection. (9 pts)
___________________________________________________________________
15.
defects
Suppose that after this publication came out, another study was
conducted in Illinois to investigate the hypothesis that birth
occurred more often in Illinois
However,
as compared to Minnesota.
in this new study the authors thought that the type of water
consumed
could be related to birth defects. They wanted to adjust

(standardize) the rates of defects in the two states for water
type.
Data from the two studies are compared as below.

Births by state and water type
Minnesota Pesticide Appliers
Appliers
Normal
anomalies
Water Type
rate*
Well water only
____
City water only
____
Bottled water only
____
Total
____
With anomalies
Illinois Pesticide
Normal
With
(#)
(#)
rate*
(#)
(#)
3379
93
26.8
100
874
27
30.0
200
206
23.7
7293
145
4456
125
28.0
7593
153
* per 1000 live births

a.
crude
calculate the crude rate and the water-type specific rates for
Illinois. Briefly describe how these two states compare in
rates of birth anomalies. (4 pts)
b.
calculate
the
Using the combined number of live births as a standard,

a standardized rate (standardized for water type) for each of
states. Briefly describe how these standardized rates compare
with each other and reasons why they may or may not agree with
the
crude rates. (6 pts)

16.
Briefly
Would an inference of causality based on the data in Table 4 be

subject to criticism based on the ecologic fallacy concept.
explain your answer. (2 pts)
17.
Which of the following statements about the present study are (is)
TRUE and which are (is) FALSE. Indicate TRUE or FALSE for each
statement. (2 pts each)
TRUE FALSE
____ ____
A. Subjects used in the analyses for Table 1 of this study were

selected on the basis of their exposure status.
____ ____
B. Table 4 in this study supplied dose response evidence to

support an inference of a causal relationship between
pesticides and birth defects.
____ ____
is
C. The age-adjusted odds ratio for all birth anomalies of 1.41
____ ____
D. Since birth defects of these types are rare in the general

population, a cohort study could be designed to efficiently
examine further the relationship of pesticides and birth
anomalies.
____ ____
E. Exposure status in this study was randomized resulting in an
considered a modest association.
variables
equal distribution of known and unknown confounding

between pesticide appliers and the general population.
____ ____
F. a correlation coefficient is a measure of association but is

not useful in assessing the dichotomous outcomes measured in
this study.
____ ____
G. Table 1 used stratified analyses to adjust for a confounding

effect of maternal age on the association between
musculoskeletal/integumental anomalies and pesticide
exposure.
[question #18 has been removed, 10/7/97]
19.
in
Succinctly evaluate whether or not, on the basis of the information

the article (including information that the authors cite to other
work), further measures are warranted now to prevent birth defects
caused by chlorophenoxy herbicides. (5 pts)
Answer Key - REVISED

Note: this answer guide is especially detailed in order to provide
thorough
explanations of the many concepts that exam touched on (including a few it
touched on unintentionally!).
1.
The primary study question for this investigation concerns the

relationship, suggested by previous studies, between exposure to
pesticides and risk of birth anomalies in offspring. The main
exposure
is pesticides (assessed by the surrogate measure of being licensed
to
apply certain pesticides). The main outcome is birth anomalies in
offspring, as recorded in birth records.
2. Classification of disease using manifestional criteria means
grouping
disorders on the basis of their having similar observable
characteristics, e.g., symptoms, signs, behavior, laboratory
findings,
onset, course, prognosis, response to treatment. Classification
using
causal criteria means grouping disorders on the basis of their
having the
same primary etiologic agent, which, of course, must have been
previously
identified. The logic for analyzing the data in terms of organ
systems
(a manifestational criterion) is that anomalies occurring in the
same
organ system may be more likely to have the same (or closely
related)
etiology and therefore should exhibit stronger associations with the
relevant exposure than would the more general category of all birth
anomalies.
3.
with
The presentation of data concerning the occurrence of birth defects
regard to place (crop region) and time (seasons) is

descriptive
epidemiology. The fact that the study was designed
examining specific relationships of interest, which
assessed
with measures of association and statistical tests,
analytic perspective.
4.
C.
basic
with a view to
were then
derives from an
Passive surveillance
5. This study cannot really establish the temporal sequence of

pesticide
exposure and birth defects because a) half of births occurred before
the
data used for the pesticide certification (1991); and b) the time of
actual exposure cannot be determined, since exposure is measured so
indirectly and without the ability to establish when it occurred.
6. A. Any answer can be defended - the population attributable risk
(PAR) is
equal to the attributable risk multiplied by exposure prevalence or,
equivalently, the crude incidence minus the incidence in unexposed
persons. When incidence is measured as a rate (i.e., ID), then the
PAR
is the difference of two rates. When incidence is measured as a
proportion (i.e., CI), then PAR is the difference of two proportions
and
therefore cannot exceed 1.0. The resulting value is typically
expressed
as a rate or a proportion. So this question is ambiguous -apologies!
(or
B. Rate - by the definition of ID

C. Proportion - by the definition of prevalence
D. Ratio - relative risk is a ratio of independently-derived risks
rates, if "relative risk" is interpreted as applying to the concept,
rather than specifically to the risk ratio).
7. C. prevalence - Although a birth with an anomaly is an "event",

there is
no way to establish the population at risk (denominator) for these
events. For example, would the denominator population be couples,
fecund
couples, fecund couples trying to conceive, embryos, recognized
pregnancies? Birth anomalies do not arise out of "live births",
since
the anomalies already exist in the fetus. Therefore the "rate of
anomalies per 1000 live births" is simply the proportion of live
births
in which a birth defect is present.
8. C. Pesticide appliers had a greater proportion of births with
anomalies
as compared to the general population.
9. Assuming that prevalence of birth anomalies increases with
increasing
maternal age, an increase in the odds ratio due to age-adjustment
indicates that the maternal age distribution in the general

population is
shifted toward older ages relative to that distribution in pesticide
applier spouses. The basis for this conclusion is the following.
Birth
defect prevalence was greater for pesticide applier couples. If
some of
that excess were due to greater age among pesticide applier mothers,
then
age-adjustment would diminish the excess, thereby decreasing the
odds
ratios. Since instead, age-adjustment increased the odds ratios,
then
the older ages of general population mothers must have offset some
of the
excess risk due associated with pesticide exposure.
10A. Since the question does not specify absolute or relative impact,
either
attributable risk (AR) or attributable risk proportion (ARP) is
correct
(actually, attributable prevalence, but the term attributable risk
is
typically applied to rates and prevalences as well as risks).
AR
=
=
P1 - P0 = [125 / (125 + 4456)] - [3666 / (3666 + 179,265)]

0.02728 - 0.02004 = 0.0072466 = 0.0072, or
7.2 per 1000 total live
births
Meaning:
by
7.2 births with anomalies per 1000 live births fathered
pesticide appliers are attributable to pesticide exposure.

Attributable Risk proportion (ARP) = (RR-1) / RR
RR)
or
27%
ARP
(OR - 1) / OR
=
AR / P1
Meaning:
all
(1.37 - 1) / 1.37
(using OR for
0.270
27%
(0.027283 - 0.02004)/0.027283 = 0.26548
27% of the prevalence of births with anomalies among
live births fathered by pesticide appliers are attributable to

pesticide exposure.
To attribute cases to exposure requires the assumption of a causal
relationship between pesticides exposure and birth defects.
10B.
Again, either population attributable risk (PAR) or population

attributable risk proportion (PARP) provide an answer.
Prevalence of paternal exposure among all live births is:
Pe
births
4456 / (4456 + 179,265)
So PAR = AR x Pe
births.
0.02425
0.0072466 x 0.02425
=
=
2.4% of live
0.0655 = 0.000176
1.8 per 10,000 live
or PCrude - P0
10,000
Meaning:
0.020217 - 0.02004
0.000177
1.8 /
1.8 births with anomalies per 10,000 live births to the
general
(married) population are attributable to pesticide exposure in

pesticide
appliers.
PARP
[Pe (RR-1) ] / [1 + Pe (RR-1)]

(using OR for RR)
= [(0.02425) (1.37-1)] / [1+0.02425(1.37)] = 0.0089
= 1% (approximately)
Or, using the case-control formulation,

Pe|d
PARP
ARP
125 / ( 125 + 3666 ) = .032973
Or, PARP
Pe|d (OR-1) / OR
= Pe x ARP
= (.032973) (1.37-1) / 1.37

= 1% (approximately)
0.02425 x 0.26548
0.008905
0.00644, using the
from part a.
Meaning: Approximately 1% of all Minnesota live births with
anomalies
are attributable to pesticide exposure in pesticide appliers.
are
fewer
(Note:
small differences among the results from the various methods
primarily due to the fact that the OR of 1.37 has been rounded to
significant digits than are the prevalences computed above.
11.
OR = 1.04 (Derivation:
"Corrected" cases in exposed
Proportion in exposed = 96 /
"Corrected" cases in control
Proportion in control = 3697
0.0211 / 0.0202 = 1.04
= 127 - (19 + 12) = 96

(4456 + 96) = 0.0211
= 3666 + 31 = 3697;
/ (3697 + 179,265) = 0.0202
= new odds ratio)
Thus, incorrectly classifying those anomalies into the exposed group

overestimates the strength of association.
12.
A. False - there is no basis for assuming that all births would be

affected equally.
B. True - The total proportion of harm, including fetal loss, is:
(lost fetuses + birth anomalies)
----------------------------------------------------(lost fetuses + birth anomalies + normal live births)
This proportion exceeds the prevalence of birth anomalies among live
births, potentially by a substantial amount.
13. A. ecologic study - exposure is assessed at the community (region)

level,
and exposure of persons is inferred based on residence in a
geographic
region where pesticides are heavily used.
14.
and
causal
1) Strength of association, estimated using odds ratios, is modest,

therefore does not provide strong evidence on which to infer
relationships.
2) Biological plausibility - various laboratory studies and a

clinical
epidemiologic study show that active ingredients and contaminants
in
pesticides can be teratogenic and/or spermatotoxic. Also,
several
compounds in the pesticides are endocrine disrupters.
3) Consistency (the authors cite epidemiologic studies [in Iowa,
Nebraska, Colorado] that have found similar relationships).
15.
some
this
which
one in
This question underwent a revision to simplify it, but unfortunately

parts of the previous version remained. The columns labelled
"# live births" should have included the qualifier "Normal", and the
rates for Minnesota needed to be re-computed accordingly. Due to
problem, two alternate solutions are completely acceptable, one in
the denominators are the numbers in the "# live births" column and
which the denominators equal the sum of these numbers plus the
numbers of
births with anomalies. In addition, full credit is given if the
rates
for Minnesota were recomputed.
Here is the version in which the
stated
rates were used and the # of live births column was treated as if it
meant "Total live births":
Birth anomaly prevalences for Illinois, by water type:
Well water: 2/100 = 20.0 per 1000 live births
City water: 6/200 = 30.0 per 1000 live births
Bottled water: 145/7293 = 19.9 per 1000 live births
Overall (crude): 153/7593 = 20.2 per 1000 live births
Thus, the crude prevalence is higher in Minnesota than in Illinois.
Number of live births (both states combined)
-------------------------------------------Well water
3479
City water
1074
Bottled water
7499
Total
12,052
Standardized prevalence for MN:
3479 x 26.8 + 1074 x 30.0 + 7499 x 23.7
---------------------------------------12,052 x 1000
25.2 per 1,000
20.8 per 1,000
Standardized prevalence for IL:

3479 x 20.0 + 1074 x 30.0 + 7499 x 19.9
----------------------------------------
12,052 x 1000
The standardized prevalence for Minnesota also exceeds that for
Illinois, though by a smaller amount than the difference in the
crude
to
prevalences. The difference has been slightly reduced because the

standardized prevalence for Minnesota gives somewhat greater weight
the prevalence for bottled water (23.7/1000) and less to the

prevalence for well water (26.8/1000) than did the crude
prevalence.
16. Yes - it is not clear from these data whether birth anomalies
occurred
in people with or without exposure because exposure information was
based on group data.
17.
births
of
A. False - subjects were selected from birth records for live

B.
C.
D.
E
F.
False
True
False
False
True - (however, a correlation coefficient indicates the extent
association in the sense of two variables moving in tandem; it
does
sense
change
not indicate the strength of association in the epidemiologic

of how great a change occurs in the response variable for a
of a given size in the exposure variable)
G. True
18.
[Question removed, 10/7/97]
19.
Points in favor of action at this time are the evidence that the
relationship is causal (biological plausibility, consistency
between
results of ecologic [by crop-region] and individual-based

[pesticide
applier] analyses, pattern of findings (season of conception),
consistency across several epidemiologic studies, and the high
attributable risk percent (27%) among babies with birth anomalies
born
to pesticide applier couples. In addition, the substantially
increased prevalences of birth anomalies among all live births in
county clusters with high use of chlorophenoxy
herbicides/fungicides
(Table 4), consistent across the four regions, suggest that
anomalies
due to pesticides (assuming that the relationship is causal) occur
throughout areas where these pesticides are used. Even though the
population attributable risk proportion is very small (about 1%)
for
exposure due to being a pesticide applier, the proportion of all
Minnesota birth anomalies potentially attributable to residence in
a
county cluster with high pesticide use is 27% [overall prevalence
of
183,721
birth anomalies for all Minnesota in-wedlock births was 3791 /
= 20.63 per 1000 live births (Table 1), prevalence of birth

anomalies
in low-pesticide county clusters ("unexposed") was 15 per 1000
(Table
4), so PARP = (PCrude - P0) / PCrude = (20.63 - 15) / 20.63 = .
27).
The effects seem to be strongest for chlorophenoxy pesticides,
suggesting that at least this category should be restricted.
Moreover, there are powerful arguments for reducing pesticide use
for
environmental reasons as well.
studies
Against taking action other than continuing research are that the
evidence is still not very strong (biological mechanisms not yet
elucidated, relationship is not highly specific, epidemiologic
limited and not entirely consistent, experimental evidence not
available), the potential impact on agriculture and therefore food
prices is considerable, and the costs to industry and commerce from
restrictions on a major product are substantial. Moreover, the
relative weakness of the odds ratios (below 2.0) indicates a
significant possibility that other factors could be responsible for
the increase in birth anomaly prevalence seen in association with
pesticide exposure, a possibility whose investigation requires
better
data on exposure and other factors that may lead to birth

anomalies.
Grading of this question is based on the clarity and support for
your
evaluation and recommendation.
Fall 1996 Final Exam (Tuesday 10 Dec 1996)

This examination is based on Per-Gunnar Persson, Anders Ahlbom, Goran
Hellers. Diet and inflamatory bowel disease: a case-control study.
Epidemiology 1992;3:47-52.
NOTE: For simplicity, ignore the requirement that this study was
restricted to those persons with a telephone number.
1.
disease
than
Which of the following best describes the primary objective of this

study? (Choose one best answer) (3 pts)
A. To test the hypothesis that persons with inflammatory bowel
are more likely to have been exposed to certain dietary factors
those without inflammatory bowel disease.
bowel
than
B. To test the hypothesis that the risk of having inflammatory

disease given that you have certain dietary exposures is greater
the risk of not having inflammatory bowel disease.
C. To test the hypothesis that the increase in inflammatory bowel

disease in the population is attributed to certain dietary
exposures.
D. To test the hypothesis that the average consumption of certain
dietary factors increases as the proportion of a group of people
with
inflammatory bowel disease increases.

2.
the
Designation as a case of ulcerative colitis was based on which of

following classification models.
A. Manifestational criteria
B. Causal criteria
C. Both manifestational and causal criteria
D. Neither
3. Medical records were used to validate the hospital diagnoses of
Crohn's
disease and ulcerative colitis. By using this validation process
instead of relying on hospital discharges coding alone, the authors
are
reducing which of the following sources of error? (Choose one best
answer) (3 pts)
A.
Selection bias
B.
Prevalence-incidence bias
C.
Information bias
D.
Surveillance bias
-2-
4. Controls were selected as a random sample using the population

register
of Stockholm County Council. Which of the following best describes
the
primary purpose of using a random sample in this study? (Choose
one
best answer) (3 pts)
A. Maximize generalizability by obtaining a statistically
representative
sample.
B. Select a control group that was as similar as possible to the
case
group except for dietary exposures.
C. Provide an estimate of the dietary exposure in the source

population
from which the cases arose.
D. Select a control group with dietary habits similar to those in
the
population of cases.
5.
past
would
best
Dietary exposures were assessed using a questionnaire with

retrospective questions aimed at a period of time 5 years in the
(page 48).
Which of the following situations of misclassification
make sucrose appear more harmful than it really was?
(Choose one
answer) (3 pts)
A. Controls underreported sucrose intake but cases did not.
B. Cases underreported sucrose intake but controls did not.
C. Both cases and controls underreported sucrose intake.
D. Both cases and controls overreport sucrose intake.
6.
were
Suppose that cases excluded due to administrative delay problems

more likely to have daily soft drink exposure than less than daily.
Which of the following best describes the impact this would have on
the
odds ratio presented in Table 3?

null.
A. Without the exclusion the odds ratio would be closer to the

B. Without the exclusion the odds ratio would be larger.
C. The exclusion did not affect the odds ratio.
D. Cannot determine on the basis of this information.
7.
and
Diagnoses of disease were verified in this study. Define validity

compare and contrast this concept with reliability. (4 pts)
8. This study uses a case control design with a population based

control
group. Which of the following, in general, is a strength of this
design. (Choose one best answer) (3 pts)
A. Allows examination of rare diseases.
B. Allows examination of rare exposures.
C. Good for establishing temporality.
D. Good for equalizing on known and unknown confounders.
-3-
_
9.
with
to
Items on the food frequency questionnaire were mostly in a format

six response options that ranged from twice per day or more often
less frequently than once every 2 weeks (pg 48). In deriving

values for
daily energy intake, the authors treated the food frequency
responses as
which level of measurement? (Choose one best answer) (3 pts)
A. Nominal
B. Ordinal
C. Interval
D. Ratio
10. Control for age in the analyses presented in Table 2 was
accomplished
through which of the following methods? (Choose one best answer)
(3 pts)
A. Stratified analysis plus matching.
B. Matching plus mathematical modeling.
C. Restriction without stratification
D. Mathematical modeling and stratification.
11. Based on the data presented in Table 2, is ulcerative colitis
associated
with fat intake among men? Give a brief statement to support your
answer. (4 pts)
12.
the
The authors state on page 49 that after controlling for smoking,

relative risk for Crohns disease among men was 1.9 for a high
consumption of sucrose and 0.7 for a high consumption of fiber.
Briefly
not
explain why based on these data the authors state that smoking did
confound these associations. (3 pts)
13.
The data presented in Table 3 indicate that Crohn's disease is

associated with the consumption of fast foods. Suppose that when
stratified by educational attainment, the resulting data were as
follows:
Educational attainment
High
Low
Controls
Cases
Controls
Cases
12
10
14
150
100
135
28
Fast foods
1+ times/wk
None
a. Calculate the crude and stratum-specific odds ratios.
(3 pts)
b. Is this association between fast food and Crohns disease

confounded
by education level? Quantify and briefly explain your answer.
(3
pts)
c. Briefly explain in 2 sentences or a diagram how education might
fit
into a conceptual model consisting of fast food, education, and
risk
of Crohn's disease. (3 pts)
-414.
in
In the discussion (page 50), the authors state that if the change
diet is the same in cases as in controls, then the relative risk
estimates would be biased toward unity. This is an example of
which of
the following?
A.
B.
C.
D.
Non differential misclassification bias

Non differential selection bias
Differential information bias
Differential misclassification bias
15.
This articles does not present p-values yet reports 95% confidence
intervals for all odds ratios. Which of the following best
describes
what information a confidence interval conveys that a p-value does
not.
context
A. A confidence interval puts the observed point estimate in the

of randomness.
B. A confidence interval provides information on the precision of
the
point estimate.
power
C. A confidence interval includes an estimate of the statistical

of the study.
point
D. A confidence interval reflects the clinical significance of the

estimate.
16.
The study describes the association of consumption of Muesli-type

breakfast cereal and Crohn's disease (Table 3). Briefly state and
evaluate the strength of the numerical evidence for the association
between Muesli-type breakfast cereals and Crohn's disease. (3 pts)
17.
Briefly present the evidence for or against the role of fiber as a
confounder of the association of sucrose intake and Crohns

(3
pts)
disease.
18.
(per
Suppose a follow-up to this study was done to estimate the rate

10,000 person years) of ulcerative colitis among a large sample in
the
Swedish population.
The table below summarizes the results.

Fast food intake
Soft drink intake

Daily
Less frequently
2/week
None
18.0
9.1
6.8
3.7
data?
a. Which model for the joint effect of these two food items, the
additive model or the multiplicative model, better fits the
Your answer should give the formula for each model and show how
to
evaluate it with the above data.
(5 pts)
b. Do these data, assuming that they accurately reflect causal

effects,
indicate a synergistic effect from a public health perspective?

Justify your answer and state an appropriate public health
implication if any. (2 pts)
-5-
19. This study did not differentiate between caffeinated and

decaffeinated
coffee. Using the data presented in Table 4 and applying the
assumptions below, calculate the odds ratio (heavy versus no use)
associated with caffeinated coffee consumption and determine if it
is
protective against ulcerative colitis. Describe in 2 sentences or
less
the interpretation of this new odds ratio, ignoring issues of
random
error. (4 pts)
Assumptions:
drink
1. 20% of the heavy coffee drinkers ( 3 cups per day) among cases
only decaffeinated coffee.
2. 90% of heavy coffee drinkers among controls drink only

decaffeinated
coffee.
20.
model
Which of the following variables was NOT in the multiple logistic

that was used to estimate the relative risk for sucrose intake in
relation to ulcerative colitis in women? (Choose best answer) (3
pts)
A. Age
B. Gender
C. Total energy intake
D. Ulcerative colitis
21. In the multiple logistic model that yielded the relative risk
estimate
of 0.7 for Ulcerative colitis in relation to daily vegetable
consumption
(Table 4), what was the value of the coefficient for the vegetable
consumption variable assuming that it was coded as 1=daily, 0=less
frequently? Write the conversion equation of coefficient to
relative
risk estimate. (3 pts)
22. Assume that the population of Stockholm County in the age range
covered
by this study was 1,000,000 in 1980 and remained constant
throughout the
decade. What was the average annual incidence of hospitaldiagnosed
Crohn's disease during that period regardless of when their medical
record became available? (3 pts)
23. Using the data in Table 2, for which of the following two
associations
is there more of an indication of confounding by age and total
energy
intake in WOMEN? Support your answer with relevant data and/or
computations. (3 pts)
a. Crohn's disease and sucrose intake (highest versus lowest level)
level)
24.
study
_
b. Crohn's disease and disaccharide intake (highest versus lowest
Briefly state one major strength and one major limitation of this
(2 pts)
-6-
25. List two Bradford Hill criteria for evaluating whether dietary
sucrose
intake is causally related to inflammatory bowel disease. Evaluate
each
using specific facts from the article. (4 pts)
26.
Which of the following statements about the data in Tables 1 and 2
are
TRUE and which are FALSE (answer TRUE or FALSE for each statement).
(2
pts each)
a. In women, the rate of (hospitalized) ulcerative colitis was
higher
than that of (hospitalized) Crohn's disease.

b. The similarity in age distribution between the case groups and
controls indicates that the rates of these disease are fairly
uniform
between the ages of 15 and 79 years.

c. Reporting of dietary intake by the Crohn's disease cases
involved
case
recall over longer periods of time, on the average, than was the
for the ulcerative colitis cases.
d. The proportion of controls with high dietary fat intake was
higher
27.
twin
friend,
for men than for women.

A Swedish friend of yours who lives in Stockhom has an indentical
sister who is anything but identical in terms of her diet.
Your
as other health conscious Swedes, avoids fast foods and soft

drinks, and
eats whole grain bread and muesli-type cereals daily. Her twin
sister,
and many Swedes, often consumes fast foods and soft drinks, but
never
touches whole grain bread or muesli.
you are
EPID
napkin,
Your friend comes to visit with you over the holidays, and while
sleeping late one morning she comes across your class notes from
168.
At breakfast, where she has been busily scribbling on her
she asks you this question.

mueslican
"Suppose that fast foods, soft drinks, whole grain bread, and
type cereal affect Crohn's disease risk independently, and that I
ignore other risk factors. Suppose also that the excess risks are
additive. Is my twin sister's risk of Crohn's disease 10 times my
own?"
that
She shows you how she used the information in Table 3 to obtain
estimate:
(3.4 - 1) + (2.8 - 1) + ((1/0.4) - 1) + ((1/0.2) - 1) + 1 = 10.7
She goes on to explain "(3.4 -1) is the excess risk from fast
foods, and
((1/0.4) - 1) is the excess risk from eating bread that is not
whole
grain."
Even though you're not quite fully awake, you feel justifiable
pride in
your command of epidemiologic concepts and explain to her the one
big
mistake she has made.
of
You say, " . . . ".
Write a brief statement
what you would say. (4 pts)
Answer Guide
1. A. To test the hypothesis that persons with inflammatory bowel
disease
are more likely to have been exposed to certain dietary factors
than
those without inflammatory bowel disease.
2.
A. Manifestational criteria
3.
C. Information bias
4.
from
5.
C. Provide an estimate of the dietary exposure in source population

which the cases arose.
A. Controls underreported sucrose intake but cases did not.
6. B. This differential selection bias would underestimate the odds

ratio.
7. Validity refers to accuracy or how well an instrument or method
measures
what it purports to measure. Reliability refers to repeatability,
does
an instrument or method get the same result or answer consistently,
regardless of whether the reading is correct.
8.
A. allows examination of rare diseases.
9.
D. Ratio (The response scale for each item was ordinal, but in order
to
create the total energy variable the authors had to convert each
response into calories.)
B. Matching plus mathematical modeling.
11.
The odds ratios for 80 to 104 grams per day was 1.4 and for intakes
of
10.
greater than 105 grams per day the odds ratio was 1.3. This
suggests a
tendency for cases to have a greater proportion of high fat eaters
than
controls.
low
However, the confidence intervals are broad, extending as
as 0.4 and 0.6. Furthermore there is no suggestion of a dose

response.
This is at most weak evidence of a relationship between fat intake
and
ulcerative colitis.
12.
the
a. The crude (with respect to smoking) and adjusted odds ratios are
same.
between
If smoking had been a confounder in the relationship
sucrose and Crohn's disease or between fiber and Crohn's disease
the
adjusted odds ratio would have been meaningfully different from
the
values in Table 2.
13.
2.7
a. Odds ratios:
Crude = (24 x 285) / (20 x 128) = 6840 / 2560 =
among High education = (10 x 150) / (12 x 100) = 1.3

among Low education = (14 x 135) / (28 x 8) = 8.4
b. The stratum-specific odds ratios are quite different from each
other,
suggesting some degree of effect modification.
ratio
The crude odds
is within the range of the two stratum-specific odds ratio, which

suggests that education is not so much a confounder as an effect
modifier.
c. Three conceptual models of the relationship among fast food,

education
and Crohn's disease could be:
which
education-- lower fast food-- lower Crohn's disease (i.e., higher

educational status could lead to lower fast food consumption
could then lead to reduced association with Crohn's disease)
education-- (lower fast food + education) -- lower Crohn's
disease
(i.e., education also has an interactive effect with fast food

consumption to lead to an association with Crohn's disease)
[education -- lower Crohn's disease] AND [lower fast foods-lower
act
14.
15.
the
16.
of
[0.1-
Crohn's disease risk] (i.e., lower fast food intake and education
as independent main effects to influence Crohn's disease risk).
A. Nondifferential misclassification bias
B. A confidence interval provides information on the precision of
point estimate.
There appears to be a strong protective effect of daily consumption
Muesli-type breakfast cereals and Crohn's disease (odds ratio = 0.2
0.7]). The association is considerably weaker for weekly

consumption of
these cereals (odds ratio = 0.8). There is evidence of a dose
response
relationship, even though the OR for weekly consumption was not
statistically significant. One should also consider that the
absolute
number of cases with daily consumption of Muesli-type cereals is
small
(n=4).
17.
with
The authors state that sucrose and fiber intake could be associated
18.
two
a. Under the additive model, we expect the joint excess rate of the
one another as well as with Crohn's disease and thus each factor
might be
a confounder of the associations between Crohn's disease and the
other
("mutual confounding"). The odds ratio was 2.6 for a high sucrose
intake
(bottom page 48). When adjusted for fiber the sucrose odds ratio
changed
only slightly to 2.5. Therefore, fiber was a only a slight modifier
of
the sucrose and Crohn's disease relationship.
factors will be equal to the sum of the excess rate from each
factor
separately.
rates:
The additive model can also be written in terms of
expected rate of ulcerative colitis with both daily soft drink
and =2
fast foods per week = rate (daily soft drinks, without fast food)
rate (less freq. soft drink, =2 fast food per week) - rate
(neither).
(absent),
equation
9.1
18.0.
the
If Ri,j is the rate for exposures i and j = 1 (present) or 0

then the additive model is: R1,1 = R1,0 + R0,1 - R0,0.
expressed with numbers from the tables is:
+ 6.8 - 3.7 = 12.2.
This
Expected joint rate =
The observed rate with both factors was
Therefore, the additive model does not explain the full amount of
observed joint risk.
the
each
expressed
numbers
is
Under the multiplicative model, we expect the joint rate ratio of

two factors to be equal to the product of the rate ratios for
factor separately.
In the above notation, the model can be
as: R1,1 = (R1,0 x R0,1)/R0,0.
This equation expressed with
from the tables is: (9.1 x 6.8) / 3.7 = 16.7.

18.0.
The observed rate
The close agreement for the observed joint rate and that
expected under the multiplicative model suggests that the

relationship
among daily soft drink consumption, frequent fast food exposure,
and
Crohn's disease is closer to multiplicative than to additive.
b. Generally, synergism from a public health perspective is equated
with
a joint effect that is greater than expected with an additive
model.
Therefore, the relationship between fast foods and soft drink is

synergistic, implying that the exposure group to target for
maximum
reduction in ulcerative colitis rates per person year is people
who
consume both fast foods and soft drinks.
posting
19.
for
One could propose
warnings signs in fast food establishments, soft drink vending

machines, and beverage containers, etc.
odds ratio for =3 caffeinated coffee = (56 x 36) / (18 x 36) = 3.1
Heavy caffeinated coffee drinking now appears to be a risk factor
Ulcerative colitis where before coffee drinking appeared to be
protective. An alternative approach would be to include the
decaffeinated coffee drinkers in the "No" (caffeinated) coffee
group.
Under this model the odds ratio for =3 cups caffeinated coffee,
relative
to none or only decaffeinated = (56 x 201) / (50 x 18) = 12.5
20.
B. Gender -- all subjects in this analysis are women.
21.
Regression coefficient = log (OR) = log (0.7) = -0.36
22. 236 cases / 5,000,000 person years = 4.72 cases/100,000 person

years.
Full credit was given for 236 cases / 4,000,000 person years = 5.9
cases
/ 100,000 per year. Note that the incidence is obtained from all
cases
(or at least all confirmed cases), rather than from only consenting
cases.
23.
3.6
1.2
There is more confounding for sucrose:

For sucrose:
Crude OR = (34 x 67) / (27 x 38) = 2.22, versus adjusted OR of
For disaccharides:
Crude OR = (30 x 66) / (35 x 45) = 1.26, versus adjusted OR of
24. Strengths could include attempts to evaluate dose response,

populationbased case and control selection, validation of case status, large
study
population. Weaknesses include potential for recall bias,
information
bias in diet assessment.
25. Strength of association (This study assessed the strength of

association
by calculating odds ratios. These measures of strength were also
put in
context by providing confidence intervals. Some stratum-specific
odds
ratios were strong while others were very weak.), dose response,
consistency across studies (limited).
26.
a.
b.
c.
d.
F
F
T
F
27. Models of joint effects combine effects of "pure" exposures, i.e.,

in the
absence of other exposures. But the excess risk for each food item
in
Table 2 is estimated without controlling for the effects of others.
For
example, since people who eat fast foods are also likely to take
soft
drinks and not to eat whole grain bread, the relative risk estimates
for
fast food 2+ times/week probably already reflect frequent soft drink
consumption and low whole grain bread consumption. In order to add
up
the excess risk for each food item, we need to know the excess risks
for
exposure to that item in the absence of the others.

Praktikum Epid Ari

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Praktikum Epid Ari

Enviado por

Direitos autorais:

Formatos disponíveis

Sumber: http://www.epidemiolog.

University of North Carolina at Chapel Hill

2. Cohort studies can form the framework for efficient substudies,

2. Over a ten-year period the number of bicycle injury events in a

3. Which of the following best describes the condition(s) that are

person-years were generated for specific exposure groups and

b. What was the (crude) cumulative incidence ratio (CIR) for

c. For brain cancer, the SMR for the exposed workers

diagnosis codes other than 434 were determined by the panel to

b. specificity of a 434 code: (2 pts)

c. positive predictive value of a 434 code: (2 pts)

d. Constructing a receiver/response operating characteristic

e. If you were to use a 434 discharge code to identify a group of

phones and pagers contribute to auto collisions. They wanted

Corona del Mar, California

* per 1000 persons

b. Using the combined number of persons in both areas as

c. In general, which of the following best describes a

magnitude of disease burden in the population.

i. Cross-sectional studies are limited by their lack of

two conditions are best met when controls are selected

a. Attributable risk proportion (INACTIVITY) (3 pts)

b. Additional data from the National Health and Nutrition

having a high error profile (N=150) or a low error profile

a. Describe the type of study design used in this example. (2

b. Compute the incidence density rate of Alzheimers disease

c. Compute the incidence density ratio for the risk of

e. Compare the odds ratio with the incidence density ratio

6. (B)- In a prospective cohort study, information on exposure is

Comparison of discharge code 434 and classification by expert

(multiplied) by the standard weights. The total of the weighted

f. F ecological studies use group-level variables (e.g., per capita

o. F since case-control studies begin with people who are already

PARP = p1(RR-1) / [1 + p1(RR-1)] = 0.73(1.286-1) / [1 +

c. IDR= ID High / ID low = 2.24/0.651 = 3.4. Nuns with a high

odds ratio = (18) (192)]/[(8) (132)] = 3.27

Midterm Examination, Fall 1999

Show all your work and include units where appropriate.

Write all answers and computations on these pages.

5. For each of the following epidemiological measures, indicate whether it is a rate, a

b. Incidence density (ID)

6. Indicate true or false next to each of the following. (2 pt each)

b. A risk ratio measure and a correlation coefficient are both measures of

c. A population attributable risk proportion depends on the prevalence of

f. If an exposure is a cause of a disease, then "temporality" is the Bradford Hill

c. What is the positive predictive value? In your own words, how

d. Based on these data is the death certificate "injury at work"

e. The use of data from the "tick-box" on the death certificates

A. Reliability of death certificate classification

a. Which of the following best describes the research design

c. Calculate the 5-year cumulative incidence of age-related

d. Calculate the cumulative incidence ratio comparing the

a. Compute a standardized event ratio (similar to a standardized

b. Compute a standardized event ratio (similar to a standardized

mortal events) of back pain for the textile-manufacturing

remains inconclusive, especially among women. A recent study (Am

* kg body weight per height in meters squared

c. Calculate the relative risk (RR) of colon cancer associated

4. Incidence rates cannot be estimated from case-control studies

b. TRUE A risk ratio measure and a correlation coefficient are

c. FALSE A population attributable risk proportion depends on

d. TRUE The study base for a case-control study consists of