Você está na página 1de 19

! ! ! ! ! ! ! ! ! ! ! ! !

STA303H5S - Winter 2014: Data Analysis II


LECTURE 2: One Way ANOVA and Linear Regression Ramya Thinniyam

January 9th, 2014

! ! ! ! ! ! ! ! ! ! ! ! !

The Spock Conspiracy Trial


Q: Is there evidence of gender bias in the jury selection of Spocks trial? A1: Last Class: Used a two-sample t-test to answer the question of interest. H0 : spock = other vs. Ha : spock = other t-test Method Pooled t-test (assuming equal variances) Satterthwaite Approximation Test Statistic 5.67 7.16 p-value < 0.0001 < 0.0001

Concluded that there is very strong evidence of a difference in the mean percentage of women on Spocks judges venires and that of the other judges.
1 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Spock Conspiracy Trial


A2: Use a Linear Model approach / ANOVA

2 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Spock Conspiracy Trial


A2: Use a Linear Model approach / ANOVA Recall: A Multiple Linear Regression model Yi = 0 + 1 X1,i + 2 X2,i . . . + p Xp,i + +ei ; for i = 1, 2, . . . , n
Yi : response for the i th case (quantitative variable) X1,i , X2,i , . . . , Xp,i : predictors for i th case (quantitative or categorical) ei : error term for the i th case, where ei iid N (0, 2 ) 0 , 1 , . . . , p : regression coefcients/parameters, 0 : intercept n : number of cases / sample size
<- P predictor

If we are interested in using a factor/categorical variable with levels, then we model with 1 indicator/dummy variables. Choose one level as the default (has no indicator variable) and all the other levels do. Q: Why do we use 1 indicator variables instead of ?
2 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Spock Conspiracy Trial


A2: Use a Linear Model approach / ANOVA Recall: A Multiple Linear Regression model Yi = 0 + 1 X1,i + 2 X2,i . . . + p Xp,i + +ei ; for i = 1, 2, . . . , n
Yi : response for the i th case (quantitative variable) X1,i , X2,i , . . . , Xp,i : predictors for i th case (quantitative or categorical) ei : error term for the i th case, where ei iid N (0, 2 ) 0 , 1 , . . . , p : regression coefcients/parameters, 0 : intercept n : number of cases / sample size

If we are interested in using a factor/categorical variable with levels, then we model with 1 indicator/dummy variables. Choose one level as the default (has no indicator variable) and all the other levels do. A:
1, 2 ,3 ...,or l -1, by default it Q: Why do we use 1 indicator variables instead of ? belong In level l.
2 / 11

If case does not belong to level

! ! ! ! ! ! ! ! ! ! ! ! !

Using Indicator Variables


Suppose a factor has levels, we can dene indicator variables as follows. For k = 1, 2, . . . , 1 Ik , i = 1, 0, if ith case belongs in factor level k otherwise

Then, in Spock example: Ispock,i = 1, 0, if ith venire has Spocks judge otherwise

Fit the model: Yi = 0 + 1 Ispock,i + ei for i = 1, 2, . . . , 46 where Yi = % women on ith venire Simple Linear Regression Model
3 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Least Squares Estimates of Regression Parameters


0 b0 = y b1 x 1 b1 = SSXY /SSXX =
n i =1 (xi x )(yi n 2 i =1 (xi x )

) y

n i =1 xi yi nx y n 2 2 i = 1 xi n x

Q: In Spock example, what are the following quantities? xi = n i = 1 xi = = x n 2 i = 1 xi = n i =1 xi yi =

4 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Least Squares Estimates of Regression Parameters


0 b0 = y b1 x 1 b1 = SSXY /SSXX =
n i =1 (xi x )(yi n 2 i =1 (xi x )

) y

n i =1 xi yi nx y n 2 2 i = 1 xi n x

Q: In Spock example, what are the following quantities? xi = n i = 1 xi = = x n 2 i = 1 xi = n i =1 xi yi = A:


4 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Parameter Interpretation
For the model Yi = 0 + 1 Ispock,i + ei : E (Yi ) = 0 + 1 , 0 , if ith venire has Spocks judge if ith venire has another judge

So, 0 is the mean % of women in other judges venires 1 is the difference in the mean % of women (response) between Spocks and other judges venires 1 = 0: no difference between mean % women in Spocks and other judges 1 > 0 : mean % women is higher for Spocks than other judges 1 < 0: % women is lower for Spocks than other judges
Caution: If the factor has more levels, interpretation is slightly different: expectations are relative to the default factor level. Write out the model using indicators and take expectations to correctly interpret the parameters.
5 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Regression Parameter Estimates


The parameter estimates in Spocks example simplify to: spock y other b1 = y

6 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Regression Parameter Estimates


The parameter estimates in Spocks example simplify to: spock y other b1 = y Proof:

6 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Regression Parameter Estimates


The parameter estimates in Spocks example simplify to: spock y other b1 = y Proof:

other . Homework Exercise: Show b0 = y

6 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Testing using a Linear Regression Model


H0 : 1 = 10 vs Ha : 1 = 10 t= b1 10 tn2 under H0 se(b1 )

Assuming the following hold: Correct form of the model Gauss-Markov Conditions:
1. E (ei ) = 0 2. Var (ei ) = 2 (constant) 3. E (ei ej ) = 0 for i = j (uncorrelated errors)

ei are Normal Testing if the means differ is equivalent to testing if the 1 parameter is signicant in the regression.
7 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Connection to ANOVA
When 10 = 0 (like in Spock example), using the linear model is the same as One-Way Analysis of Variance (ANOVA): 1 factor - testing if the means of the groups are different. In general, it can be extended to multiple factors and factors with more than two levels: testing if all the factor level means are equal or if any of them differ. We will discuss ANOVA next class and use it to answer the questions of interest in Spock Conspiracy case study:

8 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Connection to ANOVA
When 10 = 0 (like in Spock example), using the linear model is the same as One-Way Analysis of Variance (ANOVA): 1 factor - testing if the means of the groups are different. In general, it can be extended to multiple factors and factors with more than two levels: testing if all the factor level means are equal or if any of them differ. We will discuss ANOVA next class and use it to answer the questions of interest in Spock Conspiracy case study:
Question of Interest 1: Is there evidence of difference in mean percent of women on Spocks judges venires when compared to other judges? One-Way ANOVA with 2 factor levels (Spock and other) Question of Interest 2: Is there evidence that there are differences in womens representation in venires of the other 6 judges? One-Way ANOVA with 6 factor levels (A,B,C,D,E,F)

8 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

Spock Linear Model in R


> I_spock=rep(0,46) > I_spock [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [23] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > for(i in 1:length(judge1)) { if (judge1[i]=="SPOCK"){ I_spock[i]=1 } } > I_spock [1] 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [23] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

9 / 11

! ! ! ! ! ! ! ! ! ! ! ! !

> spock_linearreg = lm(percentwomen I_spock) > summary(spock_linearreg) Call: lm(formula = percentwomen I_spock) Residuals: Min 1Q -12.9919 -4.6669

Median 0.2581

3Q 3.7854

Max 19.4081

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 29.492 1.160 25.42 < 2e-16 *** I_spock -14.870 2.623 -5.67 1.03e-06 *** --Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 7.056 on 44 degrees of freedom Multiple R-squared: 0.4222, Adjusted R-squared: 0.409 F-statistic: 32.15 on 1 and 44 DF, p-value: 1.03e-06
10 / 11

! ! ! ! !

Example: Spock Conspiracy


Q: Answer the rst question of interest using a linear model approach. Include all the necessary elements, assumptions, and make a conclusion. A:

11 / 11

Você também pode gostar