Escolar Documentos
Profissional Documentos
Cultura Documentos
ANOVA Assumptions
2
One-way ANOVA
National Airlines
3
Raw Data (in Excel)
4
Raw Data (in SPSS)
5
Transform Data into Analysis-Ready Form
6
Analyze Compare Means One-Way ANOVA
7
Post Hoc Contrasts
8
Results
Descriptives
Unfilled
95% Confidence Interval for
Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
National 10 9.80 2.044 .646 8.34 11.26 7 13
Competitor 1 10 11.30 2.003 .633 9.87 12.73 7 13
Competitor 2 10 12.60 2.011 .636 11.16 14.04 9 15
Total 30 11.23 2.269 .414 10.39 12.08 7 15
ANOVA
Unfilled
Sum of
Squares df Mean Square F Sig.
Between Groups 39.267 2 19.633 4.815 .016
Within Groups 110.100 27 4.078
Total 149.367 29
Multiple Comparisons
Mean
Difference 95% Confidence Interval
(I) Airline (J) Airline (I-J) Std. Error Sig. Lower Bound Upper Bound
National Competitor 1 -1.500 .903 .325 -3.81 .81
Competitor 2 -2.800* .903 .013 -5.11 -.49
Competitor 1 National 1.500 .903 .325 -.81 3.81
Competitor 2 -1.300 .903 .484 -3.61 1.01
Competitor 2 National 2.800* .903 .013 .49 5.11
Competitor 1 1.300 .903 .484 -1.01 3.61
*. The mean difference is significant at the .05 level.
9
The χ Test
2
10
One-way χ 2
(2 O1 - E1 )2 ( Om - Em )2
χ = + . +. .
E1 Em
Where Oi and Ei are the observed and expected # of
occurrences for m (exhaustive & mutually exclusive)
outcomes
11
100 Analysts Rate an IPO
Strong Buy Buy Hold Sell Strong Sell
24 33 22 16 5
H0 : PSB = PB = PH = PS = PSS
Ha : PSB ≠ PB ≠ PH ≠ PS ≠ PSS
(O − E ) 2 4 2 13 2 2 2 4 2 15 2
χ2 = ∑
E
= 20
+
20
+ + +
20 20 20
= 21 .50
12
100 Analysts Rate an IPO:
Testing Unequal Categories
Test the null hypothesis that twice as many will
offer some form of buy recommendation (either
Strong Buy or Buy) than will offer either a hold or
some form of sell recommendation (Sell or Strong
Sell).
H0 : PB = 2PH = 2PS
Ha : PB ≠ 2PH ≠ 2Ps
13
100 Analysts Rate an IPO:
Testing Unequal Categories in SPSS
Analyze Nonparametric Chi-Square
14
Set up the expected values for each category
Results
Analysts' Recommendations
Test Statistics
Analysts' Recommendations
Chi-Squarea 1.980
df 2
Asymp. Sig. .372
a. 0 cells (.0%) have expected frequencies less than
5. The minimum expected cell frequency is 25.0.
15
Two-way χ 2
16
SPSS: 2-Way χ2 Tests
17
Analyze Descriptives Crosstabs
18
Click OK. This is the χ2 output:
Crosstabs
Case Processing Summary
Cases
Valid Missing Total
N Percent N Percent N Percent
Industry Ties? *
32 100.0% 0 .0% 32 100.0%
Bring Vioxx Back?
Count
Bring Vioxx Back?
no yes Total
Industry no 14 8 22
Ties? yes 1 9 10
Total 15 17 32
19
Notice that the χ test below is significant (p = .005), but not
2
When one or more of your cells has an expected count less than
5, report Fisher's Exact Test (in the SPSS output). Fisher’s
Exact Test has no test statistic, no critical value, and no
confidence interval. Report it as follows: “p = .007, Fisher’s
Exact Test, 2-tailed.”
20
Correlation
How do the scores on one variable change with the scores on
another variable?
21
Correlation Coefficients
Measures extent to which individual Xi-Yi scores that make up a
pair occupy the same or opposite positions within their
distributions.
- Pos relation: Pairs tend to occupy similar relative
positions in their distributions
Range from -1 to 1
1 = perfect pos relation
-1 = perfect neg relation
0 = No relation
22
R Computation (by hand)
1. Transform each Y score into a Z score (Zy)
2. Transform each X score into a Z score (Zx)
3. Determine correspondence between each of the paired Zs
- r indicates the average correspondence between the paired
Zs.
Population Sample
∑ Z xZ y ∑
r= r = Z xZ y
N N −1
Strength of Relationship
23
Strong Correlation
(population computation)
Student # High School # College Zx Zy
A’s (X) A’s (Y)
Alejandro 13 14 1.50
0.50
Bernardo 9 18 0.50 1.50
Carlos 7 12 0.00 0.00
Dominique 5 10 -0.50 -0.50
Enrique 1 6 -1.50 -1.50
24
Strong Correlation: Using SPSS
Analyze Correlate Bivariate
Correlation Output
Correlations
25
Two Points of Caution with Correlations
1. Restriction of range (i.e., truncated range) problem
26
Phi Coefficient Φ
Correlation for Categorical Data (2 X 2 Tables):
a b
c d
ad - bc
φ=
(a + b)(c + d)(a + c)(b + d)
Yes No
50 20
10 4
φ= 0
27
Phi Coefficient Φ (using SPSS)
10 5
5 8
28
Click “Statistics” and check “Phi and Cramer’s V”
Symmetric Measures
29
Regression
Regression: The primary purpose of regression is prediction
Types of Regression
30
Lines: Y = bo + b1X
31
Scatterplot: Delays vs. Complaints
5.00
4.00
complaints
3.00
2.00
1.00
delays
X Y
0 1 + 2(0) = 1 (0,1)
1 1 + 2(1) = 3 (1,3)
2 1 + 2(2) = 5 (2,5)
32
But what if the scatterplot looked like this?
20.00
15.00
complaints
10.00
5.00
0.00
delays
33
The regression line (also called “Least squares regression line”)
minimizes the squared difference between the observed and predicted
values of the response variable (as give by the regression line).
20.00
15.00
complaints
10.00
5.00
R Sq Linear = 0.475
0.00
delays
34
Example #2: Suppose UT wants to examine relation between alumni
donations to the school and number of football victories.
X = Number of football victories
Y = Amount of alumni donations the following year
Y = 10,000,000 + 200,000X
35
Linear Regression Assumptions
1. Linearity
- Linear relationship between X and Y
2. Independence of Observations
- Residuals across Xs are not correlated
36
3. Normality
- The distrib. at each Xi is normal
- The errors have normal distribution
37
Example: Simple Linear Regression
Houston Astros Payroll
Identify a regression equation that predicts the median salary for a
Houston Astros baseball player based on knowledge of the total team
payroll
You can access this data file on the website as well (“Houston Astros
salary data”)
38
1. Create X–Y Scatterplot
39
Median Salary – Total Payroll Scatterplot
1500.00
1200.00
900.00
600.00
S
n
lryM
d
ia
e
300.00
0.00
40
2. Visual check for outliers (remove if necessary)
Single click on a data point (it will enlarge and change color)
41
Fit Line at Total Linear
42
4. Conduct Regression Analysis
43
Put Independent and Dependent variables in the right boxes
Click Statistics
44
Click Plots
45
Click Save
By checking these boxes, you will create extra columns on your data
file. You will get a Predicted Values (“PRE_1”) column and a Residual
Values (“RES_1”) column.
46
5. Examine Regression Output
M o d e l S u m mba ry
C h a n g e S ta tis tics
A d ju s te d S td . E rro r o f R S q u a re D u rb in -
M odel R R S q u a re R S q u a re th e E s tim a te C h a n g e F C h a n g e d f1 d f2 S ig . F C h a n g e W a ts o n
1 .7 5 4a .5 6 9 .5 4 0 2 2 0 .5 3 9 7 8 .5 6 9 1 9 .7 9 0 1 15 .0 0 0 2 .3 4 6
a . P re d ic to rs : (C o n s ta n t), T o ta l P a y ro ll
b . D e p e n d e n t V a ria b le : M e d ia n S a la ry
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 962530.2 1 962530.159 19.790 .000a
Residual 729566.9 15 48637.793
Total 1692097 16
a. Predictors: (Constant), Total Payroll
b. Dependent Variable: Median Salary
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 110.736 111.951 .989 .338
Total Payroll .012 .003 .754 4.449 .000
a. Dependent Variable: Median Salary
47
6. Is model statistically significant?
7. Identify equation for the simple linear model (i.e., the regression line)
Coefficientsa
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 110.736 111.951 .989 .338
Total Payroll .012 .003 .754 4.449 .000
a. Dependent Variable: Median Salary
48
8. Check the other 3 linear regression assumptions
8a. Independence
: D-W = 2.346 (OK, because it’s between 1.5 and 2.5)
8b. Normality
: Histogram of Residual (is it normal?)
: Normal Prob. Plot (are points near the diagonal?)
Histogram
2
n
u
q
y
cF
re
Mean = -6.94E-17
Std. Dev. = 0.968
0 N = 17
-2 -1 0 1 2 3
Regression Standardized Residual
49
Normal P-P Plot of Regression Standardized Residual
0.8
0.6
0.4
bm
C
P
ro
uE
td
p
c
e
x
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Observed Cum Prob
50
8c. Constant variance?
: Is there an absence of a funnel shape in scatterplot of X vs.
Residuals?
51
Here’s a look at your data file ordered from lowest to highest payroll,
where some of the columns are rearranged to make it more readable:
52
Test the Constant Variance assumption by looking at the
X vs. Residuals scatterplot. Check for funnel pattern.
600.00000
400.00000
200.00000
0.00000
R
u d
rize
l U
n
ta
s
-200.00000
-400.00000
53
9. Search output for “Case Diagnostics” that describe outliers
Predicted
Case Number Std. Residual Median Salary Value Residual
3 1.060 500.00 266.1707 233.82928
8 -1.320 185.00 476.0631 -291.06310
14 2.229 1300.00 808.3513 491.64868
15 -1.558 500.00 843.7011 -343.70113
16 1.218 1200.00 931.4056 268.59437
17 -1.051 750.00 981.7387 -231.73868
a Dependent Variable: Median Salary
54
Don’t Trust Your Model TOO Much…
Question:
The Houston Astros payroll in 2005 = $76,779.000. What does the
regression line predict the median salary will be?
Answer:
Predicted Median Salary =
$110,736 + (.012)(76,779,000) = $1,032,084
Actual: $500,000
Question:
Why was the model so far off?
55
1988 Houston Astros (Total payroll = $13,455,000; Median = $500,000)
T
56
2005 Houston Astros (Total payroll = $76,779,000; Median = $500,000)
57