Unit 08 Contingency Tables

Biostatistics 200 Unit 8: Inference for Contingency Tables P & G Sections 15.1, 15.3 Section 6.
19 November 2012
1 / 61
The chi-squared test in contingency tables, P&G Chapter 15, pp 342 - 349
Odds and Odds Ratios, P&G Chapter 6, pp 144 - 149, section 15.3 Condence intervals for an OR, P&G, pp 352 - 357
Summary
2 / 61
P ROGRESS T HIS U NIT

The chi-squared test in contingency tables, P&G Chapter 15, pp 342 - 349 Odds and Odds Ratios, P&G Chapter 6, pp 144 - 149, section 15.3 Condence intervals for an OR, P&G, pp 352 - 357 Summary
3 / 61
O UTLINE FOR THIS SECTION
Reformulating the two-sample binomial problem as a
contingency table. Contingency tables and testing for associations between categorical variables (22 and r c tables)
Main method will be the Pearson chi-squared (2 )
test, but we will also discuss Fishers exact test.
4 / 61
A NALYZING BINARY DATA FROM TWO

POPULATIONS IN A TABLE
Let x1 and x2 be the observed number of successes for two binomial variables, with numbers of trials n1 and n2 . The two sample proportions would be x 1 = 1 p n1 x 2 = 2 p n2 Inference for the difference p1 p2 can be based on 1 p 2 . p The text covers that approach in section 14.6. Instead, we will use two-way tables.
5 / 61
T WO - WAY ( CONTINGENCY ) TABLES

Binomial outcomes in two samples can be organized in a table Table of Outcome by Group Group Sample 1 Sample 2 Total Success a = x1 b = x2 a+b Failure c d c+d Total a + c = n1 b + d = n2 n1 + n2 = n Outcome With this notation, 1 = p 2 p a a+c b = b+d
6 / 61
T HE ORGANIZATION OF TABLES
Typically, the group variable is used for the columns,
and the outcome variable (success or failure) for the rows. The group variable is sometimes called the explanatory variable. The outcome variable is sometimes called the response variable.
7 / 61
T HE ORGANIZATION OF TABLES . . .
Epidemiologists often use case-control designs,
where cases are sampled according to outcome (typically, disease or no disease) and exposure is measured based on retrospective records. In a case-control analysis, Stata uses the exposure status as the column label because it is the explanatory variable. Epi 202 uses exposure as the column variable. Epi 500 uses the exposure status as the row variable.
I use the Epi 202/Stata convention.
8 / 61
H ARVARD UNDERGRADUATE ATTITUDES

TOWARD ABORTION
In 2007, I asked my undergraduate class Should a woman be able to obtain a legal abortion for any reason if she chooses not to have the child? Answer by Sex of Student Outcome Group Female Male Total Yes 73 37 110 No 22 18 40 Total 95 55 150 Sex is group variable, Yes/No is outcome.
9 / 61
T HE SURVEY ON ATTITUDES TOWARD ABORTION . . .

Equivalent ways of asking the research question in my class survey
Are female and male students equally likely to
answer yes to the abortion question? Is response to the abortion question independent of sex of the student? Is there any association between response and sex of the student?
10 / 61
ATTITUDES TOWARD ABORTION

In this setting, the null hypothesis is that response and sex of the student (i.e, row and column variables) are independent. The alternative is that they are not independent. Equivalently, let
pF = probability that a randomly chosen female
student answers yes pM = probability that a randomly chosen male student answers yes Then the hypotheses are H0 : pF = pM vs HA : pF = pM .
11 / 61
T ESTING THE NULL HYPOTHESIS OF

INDEPENDENCE
The (Pearson) Chi-Square Test: basic idea 1. If the row and column variables are independent (null hypothesis is true) what do we expect to see? 2. How do expected values in the table compare to what has been observed ?
12 / 61
C OMPUTING EXPECTED CELL COUNTS

UNDER THE NULL HYPOTHESIS OF INDEPENDENCE
Consider the 95 female respondents in the 2 2 table
If sex and response were independent, what
percentage of the female students would be expected to respond yes? 110/150 = 0.73333 = 73% Why? How many students do we expect to fall into the cell dened by female, yes? (95)(0.73333) = 69.7 Why? What was the observed cell count? 73 observed - expected= 73 69.7 = 3.3 Now compute the expected values for all cells.
13 / 61
A SIMPLE FORMULA FOR EXPECTED COUNTS

Possible to show that for a cell in row i, column j, the expected cell count under the hypothesis of independence is row i total column j total total count in table This formula applies also to tables with more than two rows or columns (example coming later) Lets test the hypothesis of no association (independence) between the sex of the respondent and response.
14 / 61
N ULL AND A LTERNATIVE H YPOTHESES IN

THIS SETTING
Equivalent ways to state the relevant hypotheses in this 2 2 table:
H0 : Sex of respondent and response are independent
vs. HA : Sex of respondent and response are not independent.

H0 : pF = pM vs. HA : pF = pM , where pF and pM are
the probabilities of female and male yes response, respectively.
15 / 61
(P EARSON ) C HI -S QUARE (2 ) T EST

The test statistic for the hypotheses on the previous slide is 2 =
all cells
(obs - exp)2 exp
The statistic has a sampling distribution that is approximately 2 with degrees of freedom df = (r 1)(c 1) where r = # rows, c = # columns. In a 2 2 table, df = 1. Important values from the distribution are given in Table A.8 on page A-26.
16 / 61
A SIMILAR 2 TABLE , BUT WITH MORE

DETAIL T-20 Tables
Integre Technical Publishing Co., Inc.
Moore/McCabe
November 16, 2007 1:29 p.m.
moore
page T-20
Table entry for p is the critical value ( 2 ) with probability p lying to its right.
Probability p
( 2)*
TABLE F 2 distribution critical values

Tail probability p df 1 2 3 4 5 6 7 8 9 10 11 12 13 .25 1.32 2.77 4.11 5.39 6.63 7.84 9.04 10.22 11.39 12.55 13.70 14.85 15.98 .20 1.64 3.22 4.64 5.99 7.29 8.56 9.80 11.03 12.24 13.44 14.63 15.81 16.98 .15 2.07 3.79 5.32 6.74 8.12 9.45 10.75 12.03 13.29 14.53 15.77 16.99 18.20 .10 2.71 4.61 6.25 7.78 9.24 10.64 12.02 13.36 14.68 15.99 17.28 18.55 19.81 .05 3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51 16.92 18.31 19.68 21.03 22.36 .025 5.02 7.38 9.35 11.14 12.83 14.45 16.01 17.53 19.02 20.48 21.92 23.34 24.74 .02 5.41 7.82 9.84 11.67 13.39 15.03 16.62 18.17 19.68 21.16 22.62 24.05 25.47 .01 6.63 9.21 11.34 13.28 15.09 16.81 18.48 20.09 21.67 23.21 24.72 26.22 27.69 .005 7.88 10.60 12.84 14.86 16.75 18.55 20.28 21.95 23.59 25.19 26.76 28.30 29.82 .0025 9.14 11.98 14.32 16.42 18.39 20.25 22.04 23.77 25.46 27.11 28.73 30.32 31.88 .001 10.83 13.82 16.27 18.47 20.51 22.46 24.32 26.12 27.88 29.59 31.26 32.91 34.53 .0005 12.12 15.20 17.73 20.00 22.11 24.10 26.02 27.87 29.67 31.42 33.14 34.82 36.48
17 / 61
E VER GROWING C ATALOGUE OF D ISTRIBUTIONS
Binomial Normal t F 2
18 / 61
C HI -S QUARE (2 ) T EST
If the rows and columns are independent, the
observeds and expecteds shouldnt be very different, and 2 value will be small If there is an association between rows and columns, the observeds may be far from the expecteds, leading to a large 2 value So, reject the null hypothesis when 2 value is sufciently large (look only at upper tail of chi square distribution). Like the F-test in ANOVA, the alternative is inherently two-sided, but the right tail area is not doubled.
19 / 61
A BORTION ATTITUDES , IN S TATA

. tabi 73 37 \ 22 18, cchi2 chi2 expected +--------------------+ | Key | |--------------------| | frequency | | expected frequency | | chi2 contribution | +--------------------+ | col row | 1 2 | Total -----------+----------------------+---------1 | 73 37 | 110 | 69.7 40.3 | 110.0 | 0.2 0.3 | 0.4 -----------+----------------------+---------2 | 22 18 | 40 | 25.3 14.7 | 40.0 | 0.4 0.8 | 1.2 -----------+----------------------+---------Total | 95 55 | 150 | 95.0 55.0 | 150.0 | 0.6 1.0 | 1.6 Pearson chi2(1) = 1.6311 Pr = 0.202 20 / 61
U SING A TABLE OF THE 2 DISTRIBUTION

As with the t-distribution, tables in P&G yield only approximations to p-values. Here are the rst three rows of the P&G table:
df 1 2 3
0.10 2.706 4.605 6.251
Area in the Upper Tail 0.05 0.025 0.01 3.841 5.991 7.815 ........ 5.024 7.378 9.348 6.635 9.210 11.345
0.001 10.828 13.816 16.266
From the table we see that p > 0.10
21 / 61
E XACT METHODS FOR 22 TABLES

Fishers exact test is often used with small and moderate sample sizes. It is similar in spirit to using exact Binomial calculations in a one sample problem. Rule of thumb coming later for sample sizes where Fishers test should be used. Easy to get in Stata:
. tabi 73 37 \ 22 18, exact | col row | 1 2 | Total -----------+----------------------+---------1 | 73 37 | 110 2 | 22 18 | 40 -----------+----------------------+---------Total | 95 55 | 150 Fishers exact = 0.251 1-sided Fishers exact = 0.139
22 / 61
I DEA BEHIND F ISHER S EXACT TEST

Condition on the observed row and column totals. Given the row and column totals, the 4 cell counts are
determined by any one of the cell counts. Upper left cell is typically used. For the sampling distribution: Use the conditional distribution of values for the upper left cell, given the row and column totals and under the hypothesis of independence. A one-sided p-value is the probability of observing a value as or more extreme as the cell count in the upper left corner. Two-sided p-values even more complicated. We will only consider two-sided tests in contingency tables.
23 / 61
C ONTINGENCY TABLES WITH MORE THAN 2 ROWS OR 2 COLUMNS

2 tests for tables with r > 2 rows and/or c > 2 columns present no difculties. Use the earlier formula for expected count for cell in row i, col j under hypothesis of independence of rows and columns: row i total column j total total count in table The chi-square statistic still has the form 2 =
all cells
(obs-exp)2 exp
but has degrees of freedom df= (r 1)(c 1).

24 / 61
TABLES WITH MORE THAN 2 ROWS OR

COLUMNS
Example: Accuracy of death certicates (see p 347 in P&G for more detail on the data). Certicate Status Conrmed Inaccurate Incorrect Accurate No Change Recode 157 18 54 268 44 34 425 62 88
Hospital Commun. Teaching Total
Total 229 346 575
Note that this table has the outcome variable in the columns, since that is the way it is displayed in the text. That does not affect the value of the 2 statistic.
25 / 61
Values in green are expected counts. Certicate Status Hospit. Conrmed Inaccurate Incorrect Accurate No Change Recoded Commun. 157 169.3 18 24.7 54 35.0 Teaching 268 255.7 44 37.3 34 53.0 Total 425 62 88 2 =
all cells
Total 229 346 575
(obs-exp)2 exp
= 21.52 df = (r 1)(c 1) = (1)(2) = 2 In Table A.8, 2 df=2,0.001 = 13.82, p-value < 0.001, so we reject the null-hypothesis that accuracy in death certicates is independent of hospital type.
26 / 61
S TATA . . .
. tabi 157 18 54 \268 44 34, expected chi exact ....some output not shown | col row | 1 2 3 | Total -------+---------------------------------+---------1 | 157 18 54 | 229 | 169.3 24.7 35.0 | 229.0 -------+---------------------------------+---------2 | 268 44 34 | 346 | 255.7 37.3 53.0 | 346.0 -------+---------------------------------+---------Total | 425 62 88 | 575 | 425.0 62.0 88.0 | 575.0 Pearson chi2(2) = Fishers exact = 21.5235 Pr = 0.000 0.000
27 / 61
S OME C OMMENTS
The 2 test for r c tables does not take into account any natural ordering of rows or columns that might be present in data. The text mentions the Yates continuity correction (p 346) sometimes used in calculating the 2 statistic in small samples. Used far less often now; better to use Fishers exact test. Fishers exact test can be used to assess associations in general r c tables. Very common now for papers to report Fishers exact test, even in moderately large samples.
28 / 61
C OMMENTS . . .
The following rule of thumb is used for the validity for the Pearson 2 test:
In 2 2 tables, each expected cell count (calculated
under the hypothesis of independence) should be at least 5. In tables with more than 4 cells (excluding the cells with the row and column totals), the average expected count should be at least 5, and no expected count should be smaller than 1. When these conditions do not hold, use Fishers exact test.
29 / 61
C OMMENTS . . .
Alternate form of 2 test for 2 2 tables. If the table has entries Outcome Success Failure Total Group Sample 1 Sample 2 a b c d n1 n2
Total a+b c+d n
then the 2 test can be written 2 = n(ad bc)2 (a + c)(b + d)(a + b)(c + d) df = 1
30 / 61

31 / 61
I NTRODUCTION
The 2 and Fishers exact test provide methods for testing the null hypothesis of independence between row and column variables. But neither test provides an estimate of the nature of the association when the hypothesis of independence is rejected. We will use odds ratios for estimating association between row and column variables. To study odds ratios, we rst need to study odds.
32 / 61
U SING O DDS IN B INOMIAL M ODELS
33 / 61
B ETTING IN A FAIR GAME

An American roulette wheel has 38 slots: 1,2,3,. . . ,36, 0, 00 If you place a $1 bet on 00 for a single spin of the wheel,
You have 1/38 chance of winning in a single spin 1 way to win, 37 ways to lose, or The casino has 37 ways to win, 1 way to lose The odds of winning for the house are 37 to 1 and are
1 to 37 for you
34 / 61
B ETTING IN ROULETTE
For the game to be fair,
Casino keeps your $1 if 00 does not come up Casino pays $37 if 00 comes up, and you keep your
$1 bet If X represents your winnings from a $1 bet and E(X) the average winnings in many such bets E(X) = 1(37/38) + 37(1/38) = 0 Casinos stay in business
by paying out 35 to 1, the casinos insure that roulette
is not a fair game.

In this case
E(X) = 1(37/38) + 35(1/38) = (2/38) = 0.053

35 / 61
M ATHEMATICAL D EFINITION OF O DDS

In roulette the 37:1 odds for the house is the same as 37/38 1/38 More generally, if the probability of an event A is p,
the odds of the event is p/(1 p) Sometimes written as p : (1 p) (read as p "to" 1 p). (1/3) : (2/3) odds is the same as 1:2 odds
If p is small (say p < 0.10) then (1 p) 1 and so odds p. The approximation improves as p approaches 0.
36 / 61
O DDS VS . P ROBABILITIES
Probability 0 1/100 = 0.01 1/10 = 0.10 1/4 1/3 1/2 2/3 3/4 1 Odds = p/(1 p) 0/1 = 0 1/99 = 0.0101 1/9 = 0.11 1/3 1/2 1 1 ( 2 )/( 2 )=1 (2/3)/(1/3)=2 3 1/0 Odds 0 1 : 99 1:9 1:3 1:2 1:1 2:1 3:1
37 / 61
O DDS R ATIO OR R ELATIVE O DDS

Suppose we have a disease (e.g., lung cancer denoted by D) and two groups (e.g., smokers denoted by E for exposure, non-smokers denoted by Ec ) Odds Ratio (OR) or relative odds of disease comparing smokers to non-smokers is = Pr(D|E) 1 Pr(D|E) Pr(D|Ec ) 1 Pr(D|Ec )
This is the odds of disease for smokers divided by the odds of disease for non-smokers.
OR > 1 implies smokers have higher probability of
disease OR < 1 implies smokers have lower probability.

38 / 61
F UNDAMENTAL R ESULT FOR E PIDEMIOLOGISTS

The odds ratio of disease, comparing exposed to unexposed: OR = ... = Pr(E|D) 1 Pr(E|D) Pr(E|Dc ) 1 Pr(E|Dc ) Pr(D|E) 1 Pr(D|E) Pr(D|Ec ) 1 Pr(D|Ec )
is equal to the odds ratio of exposure, comparing diseased vs. non-diseased subjects. We will derive this later using a simple formula for OR in a 2 2 table.
39 / 61
T WO IMPORTANT POINTS
Why is this surprising?
Because when cases and controls are sampled and
exposure is determined retrospectively, it is only possible to estimate Pr(E|D) or Pr(E|Dc ), not Pr(D|E) and Pr(D|Ec ). Why is this important?
Because even when exposure is estimated by
sampling from cases and controls, it is possible to estimate the correct OR.
40 / 61
E XPLOITING THE SYMMETRY OF THE ODDS

RATIO
Thus, OR can be estimated in two ways:
Prospective studies of exposed and unexposed, to see
who develops disease, as in a cohort study design Retrospective studies of diseases vs. healthy subjects, to see who is exposed, as in a case-control study design Both types of studies can estimate the OR of disease, comparing exposed to unexposed.
41 / 61
E XPLOITING THE RARE DISEASE

ASSUMPTION
When a disease D is rare in both exposed and unexposed groups

1 Pr(D|E) and 1 Pr(D|Ec ) are both close to 1.
In this case OR
Pr(D|E) , Pr(D|Ec )
which is called relative risk.
42 / 61
A SIMPLE FORMULA FOR AN ODDS RATIO IN

A CASE CONTROL STUDY
Exposed Disease a No Disease c Total a+c OR = Unexposed b d b+d Total a+b c+d n
(E|D)/(1 P (E|D)) P (E|DC )/(1 P (E|DC )) P (a/(a + b)) (b/(a + b)) (d/(c + d))
= (c/(c + d)) = ad bc
43 / 61
A SIMPLE FORMULA FOR AN ODDS RATIO IN

A PROSPECTIVE STUDY
Exposed Disease a No Disease c Total a+c OR = Unexposed b d b+d Total a+b c+d n
(D|E)/(1 P (D|E)) P (D|EC )/(1 P (D|EC )) P (a/(a + c)) (c/(a + c)) (d/(b + d))
= (b/(b + d)) = ad bc
44 / 61
E LECTRONIC F ETAL M ONITORING (EFM), P&G PP 354 - 357

Does EFM have an impact on Caesarean-section (C-section) delivery decisions? Assume that in a sample of 5,824 births: EFM Exposure Caesarean Delivery Yes No Yes 358 229 No 2,492 2,745 Total 2,850 2,974
Total 587 5,237 5,824
This is a case-control study, sampled according to type of delivery. EFM is the exposure variable.
45 / 61
T HE EPIDEMIOLOGIST S APPROACH TO THIS

PROBLEM
Even though this is a case-control study that sampled
according to type of delivery, it is possible to estimate the odds ratio = Pr(C|E) 1 Pr(C|E) Pr(C|Ec ) , 1 Pr(C|Ec )
where C = (woman delivers by C-Section) and E = (EFM used during pre-natal care). Invoke the rare disease assumption to estimate Pr(C-section|EFM) Pr(C-section|no EFM)
46 / 61
C- SECTION AND EFM

EFM Exposure Caesarean Delivery Yes No Yes 358 229 No 2,492 2,745 Total 2,850 2,974 Total 587 5,237 5,824
Relative Risk
(C-section|EFM) Pr (C-section|no EFM) Pr
Odds Ratio (358)(2745) = = 1.72 (229)(2492) Can we check the rare disease assumption with these data?
47 / 61
T HE CHI - SQUARED TEST IN S TATA

The null hypothesis of no association between rows and columns (independence) is equivalent to H0 : OR = 1. The two-sided alternative is HA : OR = 1.
. tabi 358 229 \ 2492 2745, chi2 expected | col row | 1 2 | Total -------+----------------------+---------1 | 358 229 | 587 | 287.3 299.7 | 587.0 -------+----------------------+---------2 | 2,492 2,745 | 5,237 | 2,562.7 2,674.3 | 5,237.0 -------+----------------------+---------Total | 2,850 2,974 | 5,824 | 2,850.0 2,974.0 | 5,824.0 Pearson chi2(1) = 37.9488 Pr = 0.000
48 / 61
I NTEGRATING THE OR AND THE TEST

The small pvalue from the table leads to rejection of H0 . Data from the study suggests that the odds of a C-section in women with pre-natal EFM is 72% higher than in women without pre-natal EFM If the rare disease assumption is justied, RR OR = 1.72, and study suggests women with pre-natal EFM are 72% more likely to have C-section. Can also test H0 by examining a condence interval for OR Next section gives the formula for this condence interval.
49 / 61

50 / 61
O DDS R ATIO OR R ELATIVE O DDS

Recall. . . Suppose we have a disease (yes or no) and two groups (exposed and unexposed)
Odds ratio (OR), or relative odds, is given by
P(D|E) 1 P(D|E)
P(D|EC ) 1 P(D|EC )
where D = disease, E = exposed, Ec = unexposed We will calculate a condence interval for log(OR), then convert that to a condence interval for OR.
51 / 61
W OOLF S APPROXIMATE CONFIDENCE INTERVAL FOR LOG (OR)

Exposed Disease a No Disease c Total a+c Unexposed b d b+d Total a+b c+d n
OR = ad/bc 1 1 1 1 + + + a b c d where the log function is the natural log or log base e, sometimes denoted by ln. Condence intervals for log(OR) have form se log(OR) = log(OR) z/2 s.e.(log(OR))
52 / 61
EFM MONITORING
Caesarean EFM Exposure Delivery Yes No Total Yes a = 358 b = 229 587 No c = 2, 492 d = 2, 745 5,237 Total 2,850 2,974 n = 5, 824 OR = (358 2745) = 1.72; log(OR) = 0.542 (229 2492) 1 1 1 1 + + + = 0.089 358 229 2492 2745
se(log(OR)) =
53 / 61
C ONFIDENCE INTERVAL (CI) FOR LOG (OR) AND FOR OR . . .

Based on the data from the EFM study, a 95% CI for log(OR) will be (0.542 (1.96)(.089), 0.542 + (1.96)(.089)) = (0.368, 0.716), and a 95% CI for OR will be (exp(0.368), exp(0.716)) = (1.44, 2.05). Note that the value 1 is outside this condence interval; that is consistent with rejecting H0 with 2 test, What is the interpretation of the condence interval?
54 / 61
S TATA AGAIN . . .
cci 358 229 2492 2745, woolf
Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+-----------------------Cases | 358 229 | 587 0.6099 Controls | 2492 2745 | 5237 0.4758 -----------------+------------------------+-----------------------Total | 2850 2974 | 5824 0.4894 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Odds ratio | 1.722035 | 1.446314 2.050318 (W Attr. frac. ex. | .4192916 | .308587 .5122708 (W Attr. frac. pop | .2557178 | +------------------------------------------------chi2(1) = 37.95 Pr>chi2 = 0.0000
The notation (Woolf) has been clipped from the output, next to the condence intervals.
55 / 61
U SING S TATA FOR CONFIDENCE INTERVAL

CALCULATIONS
The Woolf method must be requested in the options
for cci. Exact CIs are numerically more difcult to estimate, but easy in software. Exact CIs are now the default in Stata. Use the exact method whenever Stata will compute it in a reasonable amount of time.
56 / 61
S TATA WITH EXACT CI. . .

. cci 358 229 2492 2745
Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+-----------------------Cases | 358 229 | 587 0.6099 Controls | 2492 2745 | 5237 0.4758 -----------------+------------------------+-----------------------Total | 2850 2974 | 5824 0.4894 | | | Point estimate | [95% Conf. Interval] |------------------------+-----------------------Odds ratio | 1.722035 | 1.44092 2.058247 (e Attr. frac. ex. | .4192916 | .3059991 .5141496 (e Attr. frac. pop | .2557178 | +------------------------------------------------chi2(1) = 37.95 Pr>chi2 = 0.0000
The notation (exact) has been clipped from the output, next to the condence intervals.
57 / 61
M ORE DETAILS ON S TATA OUTPUT

In Stata:
Attr. frac. ex. is an abbreviation for the fraction of
cases in the exposed group attributed to the exposure. Attr. frac. pop is an abbreviation for the fraction of cases in the whole population attributed to the exposure. Before we give the formulas, important to note that these two concepts only have meaning if there is a clear causal relationship between exposure and outcome. Rarely possible to draw causal inference from a case-control study. Nevertheless . . .
58 / 61
F ORMULAS FOR ATTRIBUTABLE RISK

In a case-control study,
Attr. frac. ex. = = = Attr. frac. pop = = =
OR 1 OR 1.722 1 1.722 0.419 Attr. frac. ex. proportion exposed cases (0.419)(0.6099) 0.2557
Formulas for cohort studies are different.

59 / 61

60 / 61
M AIN IDEAS
Inference for two-sample binomial framed as a 2 2
table, with the 2 test used to test independence between rows and columns. Can extend this to r c tables Odds ratios (OR) in a 2 2 table and condence intervals for OR used to quantify the association. OR for exposure, given disease, is the same as OR for disease, given exposure. OR approximates relative risk when disease is rare. In a rare disease, OR from a case control study can be used to estimate relative risk. In the next unit, we will extend the use of odds ratios to stratied 2 2 tables to adjust for possible confounders.
61 / 61

Unit 08 Contingency Tables

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Unit 08 Contingency Tables

Enviado por

Direitos autorais:

Formatos disponíveis

Biostatistics 200 Unit 8: Inference for Contingency Tables P & G Sections 15.1, 15.3 Section 6.

P ROGRESS T HIS U NIT

O UTLINE FOR THIS SECTION

Reformulating the two-sample binomial problem as a

test, but we will also discuss Fishers exact test.

A NALYZING BINARY DATA FROM TWO

T WO - WAY ( CONTINGENCY ) TABLES

Typically, the group variable is used for the columns,

I use the Epi 202/Stata convention.

H ARVARD UNDERGRADUATE ATTITUDES

T HE SURVEY ON ATTITUDES TOWARD ABORTION . . .

ATTITUDES TOWARD ABORTION

T ESTING THE NULL HYPOTHESIS OF

C OMPUTING EXPECTED CELL COUNTS

A SIMPLE FORMULA FOR EXPECTED COUNTS

N ULL AND A LTERNATIVE H YPOTHESES IN

vs. HA : Sex of respondent and response are not independent.

the probabilities of female and male yes response, respectively.

(P EARSON ) C HI -S QUARE (2 ) T EST

(obs - exp)2 exp

A SIMILAR 2 TABLE , BUT WITH MORE

Integre Technical Publishing Co., Inc.

November 16, 2007 1:29 p.m.

TABLE F 2 distribution critical values

E VER GROWING C ATALOGUE OF D ISTRIBUTIONS

A BORTION ATTITUDES , IN S TATA

U SING A TABLE OF THE 2 DISTRIBUTION

0.10 2.706 4.605 6.251

0.001 10.828 13.816 16.266

From the table we see that p > 0.10

E XACT METHODS FOR 22 TABLES

I DEA BEHIND F ISHER S EXACT TEST

C ONTINGENCY TABLES WITH MORE THAN 2 ROWS OR 2 COLUMNS

but has degrees of freedom df= (r 1)(c 1).

TABLES WITH MORE THAN 2 ROWS OR

Hospital Commun. Teaching Total

Total 229 346 575

Total 229 346 575

Total a+b c+d n

P ROGRESS T HIS U NIT

U SING O DDS IN B INOMIAL M ODELS

B ETTING IN A FAIR GAME

is not a fair game.

E(X) = 1(37/38) + 35(1/38) = (2/38) = 0.053

M ATHEMATICAL D EFINITION OF O DDS

O DDS R ATIO OR R ELATIVE O DDS

disease OR < 1 implies smokers have lower probability.

F UNDAMENTAL R ESULT FOR E PIDEMIOLOGISTS

E XPLOITING THE SYMMETRY OF THE ODDS

E XPLOITING THE RARE DISEASE

When a disease D is rare in both exposed and unexposed groups

which is called relative risk.

A SIMPLE FORMULA FOR AN ODDS RATIO IN

A SIMPLE FORMULA FOR AN ODDS RATIO IN

E LECTRONIC F ETAL M ONITORING (EFM), P&G PP 354 - 357

Total 587 5,237 5,824

T HE EPIDEMIOLOGIST S APPROACH TO THIS

C- SECTION AND EFM

(C-section|EFM) Pr (C-section|no EFM) Pr

T HE CHI - SQUARED TEST IN S TATA

I NTEGRATING THE OR AND THE TEST

P ROGRESS T HIS U NIT

O DDS R ATIO OR R ELATIVE O DDS

W OOLF S APPROXIMATE CONFIDENCE INTERVAL FOR LOG (OR)