Escolar Documentos
Profissional Documentos
Cultura Documentos
com © 2014 1
Section 11
GrowingKnowing.com © 2014 2
Correlation and Regression
Correlation is the relationship between two variables:
If a salesperson doubles the number of sales calls, will that
affect their sales commissions?
If a student attends half the classes during a particular course,
will that affect their final grade ?
GrowingKnowing.com © 2014 4
GrowingKnowing.com © 2014 5
Correlation and Regression
Simple Regression has two variables:
Dependent variable: (y)
Independent variable: (x)
GrowingKnowing.com © 2014 6
The Variables:
To identify the dependent versus independent variable,
Ask these questions:
Does variable 1 affect/influence variable 2 ???
Or.....does variable 2 affect/influence variable 1 ?
Example 1:
Total beer sales at a Raptors game and the attendance at
the ACC for that game,
Which are the dependent and the independent variables?
Dependant: beer sales
Independent: attendance
GrowingKnowing.com © 2014 7
The Variables:
Example 2:
The sales volumes for a particular product and the
amount spent on advertising for that product.
Dependant variable: sales volume
Independent variable: advertising budget
HINT....in business,
the dependent variable is almost always money …
Sales, expenses, profit, price, … etc.
business cares more about money than anything or
anyone else!
GrowingKnowing.com © 2014 8
Coefficient of Correlation: r
The coefficient of correlation (r) gives you the strength of
the relationship between the variables:
+1 is a perfectly positive relationship,
Positive indicates the regression line points upward.
GrowingKnowing.com © 2014 9
Weak or Strong?
The more scattered the data, the closer the correlation
coefficient is to zero.
The closer the correlation value is to zero,
the weaker the relationship.
The more concentrated the data around the regression
line (the graph line), the closer the correlation
coefficient is to either +1 or -1.
The closer the correlation value is to +1 or -1,
the stronger the relationship.
GrowingKnowing.com © 2014 10
Strength of the Relationship:
1 (or -1): Perfect
.99 to .8: Strong
.79 to .50: Moderate
.49 to .10: Weak
o: No relationship
GrowingKnowing.com © 2014 13
Coefficient of Determination: r2
Coefficient of Determination:
= (coefficient of correlation)^2
GrowingKnowing.com © 2014 14
Coefficient of Correlation (r):
Defines the strength of relationship between (x) and (y).
Coefficient of Determination (r2):
Indicates how much of an affect (x) has on (y).
GrowingKnowing.com © 2014 15
Class Exercise, Part 1:
What is the relationship between the effort in studying
and course grades?
GrowingKnowing.com © 2014 16
Computer Check....
To ensure your computer is set up properly,
Open a spreadsheet,
Click on “Data” in the menu,
Does “Data Analysis” appear on the far right of the
menu bar?
GrowingKnowing.com © 2014 17
Toolpak update....
Click on “File” in the top left of the menu bar,
Click on “Options” at the bottom left,
Click on “Add-ins”
Click on “Analysis Toolpak” (not the VBA version)
Click on “GO”
Click on the menu box next to
“Data Analysis/Toolpak”,
Click on “OK”
Return to Excel menu; click on “Data”;
Does “Data Analysis” now appear?
GrowingKnowing.com © 2014 18
Class Exercise, Part 1:
What is the relationship between the effort in studying
and course grades?
Effort Level: 1,2,3,4,5 (x variable)
Grades: 9,11,15,14,20 (y variable)
Input the data in two columns,
Click on “Data” at the top of the screen,
Click on “Data Analysis” at the top right,
Scroll down the menu and click on “regression” then “OK”,
Click on the box next to “input y range” and type in the parameters for
the grade column, including the heading (or scroll down the
spreadsheet).
Repeat for the Effort Level column,
Click on “OK”.
19
Excel Output:
** Note: focus on items highlighted in red
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.939557535
R Square 0.882768362
Adjusted R Square 0.843691149
Standard Error 1.663329993
Observations 5
ANOVA
df SS MS F Significance F
Regression 1 62.5 62.5 22.59036145 0.01767543
Residual 3 8.3 2.766666667
Total 4 70.8
GrowingKnowing.com © 2014 20
Excel Output
Multiple R 0.93955754
R Square 0.88276836
GrowingKnowing.com © 2014 21
Conclusions:
The Coefficient of Correlation is .9395 or .94,
Confirms that there is a relationship between study
effort and grades.
Value is very close to 1 so it is a very strong relationship.
GrowingKnowing.com © 2014 22
Class Exercise, Part 2:
The Seneca Student Federation is concerned about the
cost of student textbooks. They believe that there is a
relationship between the number of pages in the text
and the selling price of the book. A sample of 8
textbooks currently on sale at the bookstore was
selected.
GrowingKnowing.com © 2014 23
Textbook Sample:
Book # pages $ price
Intro to History 500 84
Basic Algebra 700 75
Intro to Psych 800 99
Intro to Sociology 600 72
Bus. Mgmt. 400 69
Intro to Biology 500 81
Fun with Stats 600 63
Intro to Nursing 800 93
Prepare an Excel Regression Output .
GrowingKnowing.com © 2014 24
Excel Output:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.613878889
R Square 0.376847291
Adjusted R Square 0.272988506
Standard Error 10.41290408
Observations 8
GrowingKnowing.com © 2014 25
Conclusions:
At .614, the coefficient of correlation indicates that
there is a moderate relationship between the number
of pages and the price of a textbook.
GrowingKnowing.com © 2014 26
Typical test questions.
Quiz or exam questions will show information that is
similar to an Excel Regression Output.
You will then be asked to explain or to draw
conclusions from the given information .
GrowingKnowing.com © 2014 27
Class Exercise
Log in to growingknowing,
Go to Section 11, Correlation and Regression,
Complete practice questions up to Level 2,
GrowingKnowing.com © 2014 28
Class Exercise:
The H.R. department of a large corporation wants to
determine if there is a relationship between the annual
bonus employees received and the years of experience of
an employee. Identify the variables;
Experience (yrs): 1 2 3 4 5 6 : independent
Bonus ($ thousands): 6 1 9 5 17 12 : dependent
GrowingKnowing.com © 2014 29
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.700696
R Square 0.490974
Adjusted R Square 0.363718
Standard Error 4.502909
Observations 6
Coefficients
Intercept 0.933333
X Variable 1 2.114286
growingKnowing.com © 2014 30
Analysis:
Coefficient of Correlation:
At .70, a moderate relationship exists between the bonus
amount and years of experience.
Coefficient of Determination:
At .49 (or 49%), years of experience has a weak affect on
the bonus bonus.
If a bonus increases, 49% of the increase is due to
additional years of experience. Other variables
(individual performance, the corporations financial
status, status of the economy, etc.) make up the
remaining 51%.
GrowingKnowing.com © 2014 31
Correlation vs. Regression
Correlation Analysis:
Determines whether or not a relationship exists
between two variables.
Regression Analysis:
Determines the value of one variable, given the
value of the other.
GrowingKnowing.com © 2014 32
Regression Analysis:
An equation is used to calculate what could happen:
ŷ = a + bx
GrowingKnowing.com © 2014 33
The Equation and the Slope:
If the slope, (b), is positive, the Regression Equation is:
ŷ = a + bx
ŷ = a – bx
GrowingKnowing.com © 2014 34
Intercept:
Where the regression line
intersects the (y) axis.
Slope:
Indicates the angle of the
regression line.
Regression Statistics
Multiple R 0.700696
R Square 0.490974
Adjusted R Square 0.363718
Standard Error 4.502909
Observations 6
Coefficients
Intercept 0.933333 ← Intercept
X Variable 1 2.114286 ← Slope
growingKnowing.com © 2014 36
The Regression Equation: Analysis
Coefficients
Intercept 0.9333
x Variable 1 2.114
Slope:
Is 2.114; A positive value;
Result: The Coefficient of Correlation is positive.
GrowingKnowing.com © 2014 37
Class Exercise:
It is believed that the monthly maintenance costs for a
particular model of automobile is related to its age.
GrowingKnowing.com © 2014 38
Age(yrs) Mthly Cost ($)
2 72
3 99
1 65
7 138
6 170
8 140
4 114
1 83
2 101
5 110
GrowingKnowing.com © 2014 39
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.867405237
R Square 0.752391845
Standard Error 17.32128175
Observations 10
Coefficients
Intercept 65.0456942
X variable 1 11.32161687
GrowingKnowing.com © 2014 40
Interpretations:
What is the regression equation?
y= 65.05 + 11.32(x)
If a vehicle is 11 yrs old, what would be the anticipated
maintenance costs monthly?
y= 65.05 + 11.32 (11) = $189.57
How would you interpret the slope?
Positive relationship between maintenance costs and the age of a
vehicle. Monthly maintenance cost are expected to increase by
$11.32 with each anniversary date of the vehicle.
What does the coefficient of correlation tell you?
At .867 there is a strong correlation between age of vehicle and its
maintenance costs.
What does the coefficient of determination tell you?
At 75.2%, the age of a vehicle has a moderately strong impact on
its maintenance costs. If maint. costs increase, 75% of the
change is be due to the increased age of the vehicle. 41
Class Exercise:
Return to Section 11 in growingknowing.com,
Attempt the practice questions for the section!
GrowingKnowing.com © 2014 42
Class Exercise:
A college surveyed its graduates regarding the number
of statistics classes they missed and their starting
salaries. What would be the dependant and
independent variables?
Dependant(y): salaries
Independent(x): classes missed
Data:
Classes missed: 1, 2, 3, 4, 5, 6
Salary (in thousands): 30, 28, 32, 25, 18, 24
Prepare a regression output in excel.
GrowingKnowing.com © 2014 43
Multiple R 0.716738113 Classes-missed & salary
R Square 0.513713523
Standard Error 3.895663034
Coefficients
Intercept 32.86666667
X variable 1 -1.914285714
GrowingKnowing.com © 2014 44
Multiple R 0.716738113 Classes-missed and salary .
R Square 0.513713523
Standard Error 3.895663034
Coefficients
Intercept 32.86666667
X variable 1 -1.914285714
GrowingKnowing.com © 2014 45
Exercise 4:
Determine the relationship between average mortgage
rates and the number of new homes constructed
(housing starts). What would be the dependent and
independent variables?
Dependent: Housing Starts
Independent: Mortgage Rates
GrowingKnowing.com © 2014 46
Housing Data:
Housing Starts
Rate (%) (000)
8.5 115
7.8 111
7.6 185
7.5 201
8 206
8.4 167
8.8 155
8.9 117
8.5 133
8 150
GrowingKnowing.com © 2014 47
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.542950618
R Square 0.294795374
Adjusted R Square 0.206644795
Standard Error 31.47717427
Observations 10
ANOVA
Significanc
df SS MS F eF
Regression 1 3313.5 3313.5 3.344225 0.104842
Residual 8 7926.5 990.8125
Total 9 11240
GrowingKnowing.com © 2014 48
Analysis:
What is the regression equation?
y= 475.17 – 39.17(x)
If the average mortgage rate was 7.15%, what would be the number of
housing starts?
y = 475.17 – 39.17(7.15) = 195.1 or 195,100
How would you interpret the slope:
Negative relationship. With every 1% rate increase, housing starts
are projected to decrease by 39,170.
How do you interpret the coefficient of correlation?
At .543 , the relationship between mortgage rates and housing starts is
moderate .
How do you interpret the coefficient of determination?
At 29.5%, changes to interest rates have a weak impact on the change in
housing starts. If number of housing starts changes, 29.5% of the change
is due to changes in interest rates.
GrowingKnowing.com © 2014 49
Homework
Chapter 11: Correlation and Regression
Read the text,
View the video
Complete the practice questions (levels 1-4),
Attempt the additional exercises in the ppt.
GrowingKnowing.com © 2014 50
Quiz 3
Will be held on Monday April 10th ,
Results could be worth 30% of your final grade,
Primary focus:
Proportion (Sections 8, 9, 10),
Correlation & Regression (Section 11)
Structure and protocol for Quiz 2 will be repeated.
GrowingKnowing.com © 2014 51
Final Exam:
Friday, April 21st @ 8.30am (sharp)
Rm. C3030
2 hour time limit,
Worth 30% of your final grade,
Covers material from the entire course.
GrowingKnowing.com © 2014 52