session 4

© All Rights Reserved

4 visualizações

session 4

© All Rights Reserved

- Simple Regression 2-10-12
- Descriptive statistics
- Chapter 10 Elementary Statistics
- FMWB September 2012 Michigan General Election Poll Press Release
- Assignments Program Mbjk..j.a 2yrs Semester-i
- Does a Long Reference List Guarantee More Citations? Analysis of Malaysian Highly Cited and Review Papers
- CH 4 quiz bank Testing and Assessment
- Applied Multivariate Statistical Analysis 6th Ed by Johnson and Wichern 2007 Pearson Book
- CLEAN Results From Survey - Summary of Correlation-2
- Tugas
- Credit Risk Modelling Final
- (c) (d) (b)
- simple linear regression
- Growth Pattern and Size of Oilfish, Ruvettus pretiosus (Cocco, 1833) Landed in the Southwestern Coast of Surigao del Norte, Philippines
- Linear Regression and Tire Correlation
- ADV Correlation Coefficient and the Pulse Coherent Doppler Processing
- Chapter 21 business statistic
- datta
- Validasi X1-dikonversi
- AGUSALIM

Você está na página 1de 45

Correlation involves calculating

an index to measure the nature

of the relationship between

variables.

With regression, an equation is

developed to predict the values

of a dependent variable.

Pearson Product Moment Coefficient r

coefficient r varies over a range

of +1 through 0 to 1.

It symbolizes the coefficients

estimate of linear association

based on the sampling data, The

coefficient represents the

population correlation.

Correlation

coefficients reveal the

magnitude and

direction of

relationships.

Illustration of Direction:

Positive Correlation

Family income vs. household food

expenditures

Negative Correlation

Prices of products and services in

relation to their scarcity or

availability.

SCATTERPLOTS

They are essential for

understanding the relationship

between variables.

They provide a means for visual

inspection of data that a list of

values for two variables cannot.

Correlation Analysis

Used to measure and interpret the

strength of association (linear

relationship) between two numerical

variables

Only concerned with strength of the

relationship

No causal effect is implied

Session 12.7

Scatter Diagram

determine or show a

relationship between two

variables

Paired Data

When there

appears to be a

linear relationship

between x and y:

attempt to fit a line to the

scatter diagram.

Linear Correlation

seems to follow a straight line

segment.

Linear Correlation

Non-Linear Correlation

No Linear Correlation

High Linear

Correlation

Moderate Linear Correlation

Perfect Linear

Correlation

Coefficient, r

linear association between two

variables

moment correlation coefficient

Positive Linear

Correlation

high values of y and low values

of x are paired with low values

of y.

low values of y and low values of x

are paired with high values of y.

Little or No Linear Correlation

sometimes paired with high values

of y and sometimes with low values

of y.

Positive Correlation

x

Negative Correlation

Little or No Linear

Correlation

y

x

What type of

correlation is

expected?

Height and weight

IQ and height

Linear correlation

coefficient

1 r +1

Table of Interpretation

Pearson r Qualitative Interpretation

1.00 Perfect Correlation

0.91 - 0.99 Very High Correlation

0.71 - 0.90 High Correlation

0.41 - 0.70 Marked Correlation

0.21 - 0.40 Slight/Low Correlation

0 - 0.20 Negligible Correlation

If r = 0, scatter diagram

might look like:

y

x

If r = +1, all points lie on

the least squares line

y

the least squares line

y

x

1<r<0

0<r<1

x

Find the Correlation Coefficient

x y x2 y2 xy

(Miles) (Min.)

2 6 4 36 12

5 9 25 81 45

12 23 144 529 276

7 18 49 324 126

7 15 49 225 105

15 28 225 784 420

10 19 100 361 190

x = 58 y = 118 x2 = 596 y2=2340 xy = 1174

The Correlation

Coefficient,

r = 0.9753643

r 0.98

Warning

measures the strength of the

relationship between two variables.

Just because two variables are related

does not imply that there is a cause-

and-effect relationship between them.

Testing the

Correlation

Coefficient

the sample correlation

coefficient, r, is far enough from

zero to indicate correlation in

the population.

The Population

Correlation

Coefficient

Hypotheses to Test

Rho

Assume that both variables x and y are

normally distributed.

To test if the (x, y) values are correlated in

the population, set up the null hypothesis

that they are not correlated:

Spearman Rank Correlation

A measure of Rank Correlation

The Spearman Correlation

Spearmans correlation is designed to measure

the relationship between variables measured on

an ordinal scale of measurement.

uses ranks as opposed to actual values.

Assumptions

The data is a bivariate random variable.

Advantages

outliers

- Can be used to reduce the weight of outliers (large distances

get treated as a one-rank difference)

problematic, it is advisable to study the

rankings rather than the actual values.

Disadvantages

1. Calculations may become tedious. Additionally

ties are important and must be factored into

computation.

Steps in Calculating Spearmans Rho

1. Convert the observed values to ranks

(accounting for ties)

2. Find the difference between the ranks, square

them and sum the squared differences.

3. Set up hypothesis, carry out test and conclude

based on findings.

4. If the null is rejected then calculate the

Spearman correlation coefficient to measure

the strength of the relationship between the

variables.

Hypothesis: I

A. (Two-Tailed)

Ho : There is no correlation between the Xs and the Ys.

(there is mutual independence between the Xs and the Ys)

(there is mutual dependence between the Xs and the Ys)

Spearmans Rho

Assumes values between -1 and +1

-1 0 +1

Correlation Correlation

Example 1

The ICC rankings for One Day International (ODI) and

Test matches for nine teams are shown below.

Team Test Rank ODI Rank

Australia 1 1

India 2 3

South Africa 3 2

Sri Lanka 4 7

England 5 6

Pakistan 6 4

New Zealand 7 5

West Indies 8 8

Bangladesh 9 9

Example 1

Team Test Rank ODI Rank d d2

Australia 1 1 0 0

India 2 3 1 1

South Africa 3 2 1 1

Sri Lanka 4 7 3 9

England 5 6 1 1

Pakistan 6 4 2 4

New Zealand 7 5 2 4

West Indies 8 8 0 0

Bangladesh 9 9 0 0

Total 20

Answer:

T = d i = 20

2

= 0.8333.

Example 2

A composite rating is given by executives to

each college graduate joining a plastic

manufacturing firm. The executive ratings

represent the future potential of the college

graduate. The graduates then enter an in-plant

training programme and are given another

composite rating. The executive ratings and the

in-plant ratings are as follows:

A 8 4

B 10 4

C 9 4

D 4 3

E 12 6

F 11 9

G 11 9

H 7 6

I 8 6

J 13 9

K 10 5

L 12 9

is a positive correlation between the variables

B) Find the rank correlation coefficient if the null is

rejected

Regression

Analysis

Purpose of Regression

Analysis

Regression analysis is used primarily to

establish linear relationship between

variables and provide prediction

Predicts the value of a dependent (response)

variable based on the value of at least one

independent (explanatory) variable

Explains the relationship of the independent

variables on the dependent variable

Session 13.56

Types of Regression

Models

Positive Linear Relationship Relationship NOT Linear

Session 13.57

Simple Linear

Regression

Relationship between variables

is described by a linear function

This function relates how much

change in the dependent variable

is associated with a unit increase

(or decrease) in the independent

variable.

Session 13.58

Population Linear Regression:

Simple Linear Regression Model

Population regression line is a straight line that describes the

relationship of the average value of one variable on the other

Population Population

Random

Y intercept Slope

Error

Coefficient

Dependent

(Response)

Variable Yi = 0 + 1 X i + i

Population Independent

Regression Line YX (Explanatory)

Variable

Session 13.59

i is the random error term for the ith

observation

where i s are independently normally

2

distributed with mean 0 and variance

for i = 1,..,n, n is the number of

observations

Session 13.60

Random Error Term

It represents the effect of other factors, apart

from X, which are omitted from the model

but do affect the response variable to some

extent

or measurements in recording the response

variable

Session 13.61

Random Error Term

one another;

2. The error terms are normally

distributed;

3. The error terms all have a mean of 0;

and

4. The error terms have constant

2

variance,

Session 13.62

Population Linear

Regression: Simple

Linear Regression Model

Y (Observed Value of Y) = Yi = 0 + 1 X i + i

1

i = Random Error

YX = 0 + 1 X i

0 (Conditional Mean)

Observed Value of Y

X

Session 13.63

Interpretation of the

Slope and the Intercept

0 = E(Y | X = 0) is the average value of Y

when the value of X is zero.

E (Y | X )

1 = measures the change in the

X

average value of Y as a result of a one-unit

change in X.

Session 13.64

Steps in Doing a Simple Linear

Regression Analysis

1. Obtain the equation that best fits the data;

2. Evaluate the equation to determine the strength

of the relationship for estimation and prediction;

3. Determine if the assumptions on the error terms

are satisfied and if model fits the data adequately;

4. Use the equation for prediction and description.

Session 13.65

Sample Linear

Regression

Sample regression line provides an estimate of the

population regression line as well as a predicted value of Y

Sample

Sample Slope

Y Intercept Coefficient

Yi = b0 + b1 X i + ei Residual

Y = b 0 + b1 X =(Fitted

Sample Regression Line

Regression Line, Predicted Value)

Session 13.66

Estimation using Method of Least

Squares

and 1 are obtained by minimizing the

sum of the squared errors

n n

(Y ) =

2 2

i YX i i

i =1 i =1

b0 provides an estimate of 0

b1 provides an estimate of 1

Session 13.67

values of b0 and b1 also minimize

the sum of the squared residuals.

n 2 n

(

i =1

Yi Yi ) = e i =1

2

i

Session 13.68

Sample Linear Regression

Yi = b0 + b1 X i + ei Yi = 0 + 1 X i + i

b1

Y

i 1

ei

YX = 0 + 1 X i

0 Y i = b0 + b1 X i

b0

X

Observed Value

Session 13.69

Interpretation of the

Slope and the Intercept

(Y | X = 0 ) is the estimated

b = E

0

is zero.

E (Y | X )

b1 = is the estimated

X

change in the average value of Y as a

result of a one-unit change in X.

Session 13.70

EXAMPLE

Annual

Examine the linear Store Square Sales

relationship of the Feet ($1000)

annual sales of 1 1,726 3,681

produce stores on

2 1,542 3,395

their size in square

3 2,816 6,653

footage. Find the

equation of the 4 5,555 9,543

straight line that fits 5 1,292 3,318

the data best. 6 2,208 5,563

7 1,313 3,760

Session 13.71

EXAMPLE

Yi = b0 + b1 X i

= 1636.415 +1.487 X i

From Excel Printout:

C o e ffi c ie n ts

I n te rc e p t 1636.414726

X V a ri a b l e 1 1 .4 8 6 6 3 3 6 5 7

Session 13.72

EXAMPLE

12000

Annua l S a le s ($000)

10000

8000

7X i

1.48

6000

15 +

36.4

4000

= 16

2000 Yi

0

0 1000 2000 3000 4000 5000 6000

S q u a re F e e t

Session 13.73

EXAMPLE

Yi = 1636.415 +1.487 X i

The slope of 1.487 means that for each increase of one

unit in X, we predict the average of Y to increase by an

estimated 1.487 units.

square foot in the size of the store, the expected

annual sales are predicted to increase by $1,487.

Session 13.74

RESIDUAL ANALYSIS

Purposes

Examine linearity

Evaluate assumptions to see if any

is violated

Graphical Analysis of Residuals

Plot residuals vs. Xi ,Y i (and time

if necessary)

Session 13.75

Linearity

Y Y

X X

e e

X

X

Not Linear

Linear

Session 13.76

Residual Analysis for

Homoscedasticity

Y Y

X

X

SR SR

X X

Heteroscedasticity

Homoscedasticity

Session 13.77

Residual Analysis:Excel

Output

Observation Predicted Y Residuals

1 4202.344417 -521.3444173

2 3928.803824 -533.8038245

3 5822.775103 830.2248971

Excel Output 4 9894.664688 -351.6646882

5 3557.14541 -239.1454103

6 4918.90184 644.0981603

7 3588.364717 171.6352829

Residual Plot

Inference about the Slope: t

Test

t test for a population slope

Is there a linear relationship of Y on X ?

Null and alternative hypotheses

H 0: 1 = 0 (no linear relationship)

H 1: 1 0 (linear relationship)

Test statistic

MSE

b1 where Sb1 =

t= n

2

Sb1 n

Xi

X i2 i =1

n i =1 n

(Y Y )

2

i i

SSE i =1

where MSE = =

n2 n2

Session 13.79

Example: Produce

Store

Data for Seven Stores:

Annual

Store Square Sales Estimated Regression Equation:

Feet ($000)

1 1,726 3,681 Yi = 1636.415 +1.487Xi

2 1,542 3,395

3 2,8166,653 The slope of this model is

4 5,5559,543 1.487.

5 1,2923,318

Is square footage of the

6 2,2085,563

store affecting its annual

7 1,3133,760

sales?

Session 13.80

Inferences about the Slope:

t-test

H0: 1 = 0 Test Statistic:

= .05 Coefficients Standard Error t Stat P-value

df = 7 - 2 = 5 Intercept 1636.4147 451.4953 3.6244 0.01515

Critical Values: Footage 1.4866 0.1650 9.0099 0.00028

Decision: Reject H0

Reject Reject

Conclusion:

.025 .025 There is evidence that square footage

affects annual sales.

-2.5706 0 2.5706 t

Session 13.81

Pitfalls of Regression

Analysis

underlying least-squares regression

Not knowing how to evaluate assumptions

Not knowing the alternatives to classical

regression if some assumption is violated

Using a regression model without

knowledge of the subject matter

Session 13.82

Strategies for Avoiding the

Pitfalls of Regression

observe possible relationship

Perform residual analysis to check the

assumptions

Use a histogram, stem-and-leaf

display, box-and-whisker plot, or

normal probability plot of the

residuals to uncover possible non-

normality

Session 13.83

Pitfalls of Regression

use alternative methods to least-

squares regression or alternative least-

squares models (e.g.: Curvilinear or

multiple regression)

If there is no evidence of assumption

violation, then test for the significance of

the regression coefficients

Session 13.84

Problem Set

cost per day given the length of services in days.

- Simple Regression 2-10-12Enviado porDon Ho
- Descriptive statisticsEnviado porapi-3856352
- Chapter 10 Elementary StatisticsEnviado porDiana Bracamonte Dyck
- FMWB September 2012 Michigan General Election Poll Press ReleaseEnviado porCasey Michel
- Does a Long Reference List Guarantee More Citations? Analysis of Malaysian Highly Cited and Review PapersEnviado porNader Ale Ebrahim
- CH 4 quiz bank Testing and AssessmentEnviado porJB
- Applied Multivariate Statistical Analysis 6th Ed by Johnson and Wichern 2007 Pearson BookEnviado porMikkel Eliasen
- Assignments Program Mbjk..j.a 2yrs Semester-iEnviado porAkansha Gupta
- CLEAN Results From Survey - Summary of Correlation-2Enviado porlouise carino
- Credit Risk Modelling FinalEnviado porSharad Paward
- (c) (d) (b)Enviado porMathathlete
- TugasEnviado porSulTan Zhayla
- simple linear regressionEnviado porapi-285777244
- Growth Pattern and Size of Oilfish, Ruvettus pretiosus (Cocco, 1833) Landed in the Southwestern Coast of Surigao del Norte, PhilippinesEnviado porBernard C. Gomez
- Linear Regression and Tire CorrelationEnviado porflavio82pn
- ADV Correlation Coefficient and the Pulse Coherent Doppler ProcessingEnviado porRobertoAntonioFernandes
- Chapter 21 business statisticEnviado porPei Xin
- dattaEnviado porcsrajmohan2924
- Validasi X1-dikonversiEnviado porElbach Net
- AGUSALIMEnviado porMuh Arsawan
- tesis pEnviado porKustur Wibisono
- Ch 12 Solutions Manual.pdfEnviado porErika Moreno
- ChE 510 Lec 10Enviado porkiiadizon07
- Frequencies 1Enviado porAmz Ancreeng
- Correlation AnalysisEnviado porabdulbabul
- SSRN-id805327Enviado porsurendargad
- EOM2 Study GuideEnviado porEric Yu
- OM - ForecastingEnviado porUN-HABITAT Nepal
- development cities in italyEnviado porablle

- Research Critique 7Enviado porJoseph TheThird
- Or What to Watch OutEnviado porJoseph TheThird
- Renosa-Theory of Ethical CaringEnviado porJoseph TheThird
- Rubric 6th EdEnviado porJoseph TheThird
- ProgramEnviado porJoseph TheThird
- Intro, RatioEnviado porJoseph TheThird
- ONCO_Journal_Article_DelaCruz_Macavinta.docxEnviado porJoseph TheThird
- 9 Exploring Bullying Implications for Nurse EducatorsEnviado porJoseph TheThird
- 10 Workplace Bullying of General Surgery Residents by NursesEnviado porJoseph TheThird
- Statistics With LaboratoryEnviado porJoseph TheThird
- Session 5Enviado porJoseph TheThird
- Session 4Enviado porJoseph TheThird
- Session 3Enviado porJoseph TheThird
- 2 New Graduate Nurses Perception of the Workplace Have They Experienced BulllyingEnviado porJoseph TheThird
- 1 Bullying or Simply Establishing Social CohesionEnviado porJoseph TheThird
- Session 2Enviado porJoseph TheThird
- Session 1Enviado porJoseph TheThird
- Session 7Enviado porJoseph TheThird
- Session 6Enviado porJoseph TheThird
- Session 8Enviado porJoseph TheThird
- Session 9Enviado porJoseph TheThird
- Ethics Module 4Enviado porJoseph TheThird
- Pauline EthicsEnviado porJoseph TheThird
- TimelinesEnviado porJoseph TheThird
- GlobalEnviado porJoseph TheThird

- Krishnan 2000Enviado porAndre Taudiry
- Logistic Regression sample size calculationEnviado porAmado Saavedra
- Stealing PINs via mobile sensors: actual risk versus user perceptionEnviado porAnonymous h17rsKuLc
- ListMF26Enviado porMay Chee
- spssbase.pdfEnviado porDanilo Santiago Criollo Chávez
- customer satisfaction towards big bazarEnviado porAnsh
- SPE-175892-PAEnviado porSS
- Lecture 4b Urban Transportation Planning (Part 2).pdfEnviado porBhanuKhanna
- American Journal of Clinical Nutrition Volume 97 Issue 1 2013 [Doi 10.3945_ajcn.112.042267] Van Der Merwe, L. F.; Moore, S. E.; Fulford, A. J.; Halliday, K. -- Long-chain PUFA Supplementation in RurEnviado porAndi Achmadi
- LEARNING STYLES PREFERENCE, GENDER AND ENGLISH LANGUAGE PERFORMANCE OF EFL LIBYAN SECONDARY SCHOOL STUDENTS IN MALAYSIAEnviado porJen Mummar
- Adults With Intelectual Disability Discuss LonelinessEnviado porGerardo Damian
- Garson_2008_PathAnalysis.pdfEnviado porSara Bennani
- the impact of LEVEnviado porbooboo2803
- Jurnal Akin & Iskander tentang internet addictionEnviado porni putu sartika
- 000 MacrosEnviado porRiko Aditama
- Chapter 3Enviado porLuis Oliveira
- 2010 MININ - Vein DepositsEnviado porPatricio Leddy
- Cell Radius Inaccuracy _ a New Measure of Coverage ReliabilityEnviado porFelix Von Bormann
- 1-SCA200Enviado porHerry Suhartomo
- Gini mean differenceEnviado porJoanne Wong
- Statgraphics Centurion Version 17 EnhancementsEnviado porRamMishra
- On-line AppendixB-Hand Calculation of Statistical TestsEnviado porsigma70eg
- Sibling Birth Spacing Influence on Extroversion, Introversion and Aggressiveness of Adolescents in Nairobi, KenyaEnviado porinventionjournals
- SPSS.introductionJun2014edit4.SignedEnviado porPatricia Monteferrante
- The Challenges of Creating a Valid and Reliable Speaking Test as Part of a Communicative English ProgramEnviado porViki Afrina
- spss_exercises.docxEnviado pormehdi.chlif4374
- Correlation Coefficient.docEnviado porPinal Shah
- article critiqueEnviado porapi-241392518
- Se Comp Kjsce Syllabus (2015-16)Enviado porRohit Ahuja
- BUS 511 PresentationEnviado porAfzal Hossain Riaz