Você está na página 1de 29

Correlação e Regressão

Prof. Dr. Ricardo Primi


Correlação e Regressão
• Mensura a relação entre variáveis
Correlação = co-relação = co-variância = r
• Geralmente examina variáveis bidimensionais
• Mas diferenças de média entre grupos também podem ser expressas
por meio da co-relação
• Em geral as medidas estão associadas por relações
lineares
• Mas existem técnicas para correlações e regressões não lineares
• Correlação ≠ Causalidade
• r’s assumem valores entre -1.0 e +1.0
• O sinal mostra a direção das relações
• Os valores absolutos mostram a magnitude da relação
• 0.0 = ausência de relação
• -1.0 or +1.0 = relação perfeita
Correlation plot

1
T5_01Ac1

T5_02Ac1

T5_03Ac1
0.8
T5_04Ac1

T5_05Ac1
0.6
T5_06Ac1

T5_07Ac1

T5_08Ac1 0.4

T5_09Sc1

T5_10Sc1
0.2
T5_11Sc1

T5_12Sc1

T5_13Sc1 0

T5_14Sc1

T5_15Sc1
−0.2
T5_16Sc1

T5_17Em1

T5_18Em1 −0.4

T5_19Em1

T5_20Em1
−0.6
T5_21Em1

T5_22Em1

T5_23Em1 −0.8

T5_24Em1

−1
T5_01Ac1

T5_02Ac1

T5_03Ac1

T5_04Ac1

T5_05Ac1

T5_06Ac1

T5_07Ac1

T5_08Ac1

T5_09Sc1

T5_10Sc1

T5_11Sc1

T5_12Sc1

T5_13Sc1

T5_14Sc1

T5_15Sc1

T5_16Sc1

T5_17Em1

T5_18Em1

T5_19Em1

T5_20Em1

T5_21Em1

T5_22Em1

T5_23Em1

T5_24Em1
Correlation plot

1
Agree1

Consc1

Extra1 0.8

Neuro1

Open1 0.6

Agree2

Consc2
0.4

Extra2

Neuro2
0.2
Open2

CndProb
0
EmoSym

HypAc

−0.2
PeerProb

ProSoc

Locus −0.4

SlfAcd

SlfEmo −0.6

SlfSoc

Grit
−0.8

SE

−1
Agree1

Consc1

Extra1

Neuro1

Open1

Agree2

Consc2

Extra2

Neuro2

Open2

CndProb

EmoSym

HypAc

PeerProb

ProSoc

Locus

SlfAcd

SlfEmo

SlfSoc

Grit

SE
10 20

9 18

8 16

7 14

6 12
RMG

RG
5 10

4 8

3 6

2 4

1 2
0 0
200 300 400 500 600 700 800 40 55 70 85 100 115 130 145 160

GF_GC EPN_EG
Correlations

RA_measure RV_measure RN_measure RP_measure Idade Escolaridade


RA_measure Pearson Correlation 1 ,575** ,581** ,474** ,367** ,410**
Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000
N 289 289 288 287 289 289
RV_measure Pearson Correlation ,575** 1 ,473** ,475** ,269** ,323**
Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000
N 289 289 288 287 289 289
RN_measure Pearson Correlation ,581** ,473** 1 ,507** ,376** ,427**
Sig. (2-tailed) ,000 ,000 ,000 ,000 ,000
N 288 288 288 286 288 288
RP_measure Pearson Correlation ,474** ,475** ,507** 1 ,094 ,120*
Sig. (2-tailed) ,000 ,000 ,000 ,113 ,042
N 287 287 286 287 287 287
Idade Pearson Correlation ,367** ,269** ,376** ,094 1 ,936**
Sig. (2-tailed) ,000 ,000 ,000 ,113 ,000
N 289 289 288 287 289 289
Escolaridade Pearson Correlation ,410** ,323** ,427** ,120* ,936** 1
Sig. (2-tailed) ,000 ,000 ,000 ,042 ,000
N 289 289 288 287 289 289
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Correlations

c_br epn_eg epn_re av_des


c_br Pearson Correlation 1 ,145 ,214** ,045
Sig. (2-tailed) ,076 ,008 ,621
N 157 151 152 124
epn_eg Pearson Correlation ,145 1 ,799** ,315**
Sig. (2-tailed) ,076 ,000 ,000
N 151 151 151 122
epn_re Pearson Correlation ,214** ,799** 1 ,325**
Sig. (2-tailed) ,008 ,000 ,000
N 152 151 152 123
av_des Pearson Correlation ,045 ,315** ,325** 1
Sig. (2-tailed) ,621 ,000 ,000
N 124 122 123 124
**. Correlation is significant at the 0.01 level (2-tailed).
If r = ⌥ , then depending upon the type of data being analyzed, a variety of correlations are
xi2 y2i
found.
Coe⌅cient symbol X Y Assumptions
Pearson r continuous continuous
Spearman rho (⌥) ranks ranks
Point bi-serial r pb dichotomous continuous
Phi ⇥ dichotomous dichotomous
Bi serial rbis dichotomous continuous normality of latent X
Tetrachoric rtet dichotomous dichotomous bivariate normality of latent X, Y
polychoric r pc categorical categorical bivariate normality of latent X, Y
polyserial r ps categorical continuous bivariate normality of latent X, Y
120 4 Covariance, Regression, and Correlation

Table 4.22 Alternative Estimates of effect size. Using the correlation as a scale free estimate of effect
size allows for combining experimental and correlational data in a metric that is directly interpretable
as the effect of a standardized unit change in x leads to r change in standardized y.
4.5.1.1 Spearman ⌥: a Pearson correlation
Regression by.x = Cxy
of ranks
by.x by.x = r x
2 y
x
Cxy
Pearson correlation rxy = x y
rxy
In the first of two major papers
Cohen’s d published
d= in the American
X1 X2
r = ⌥ d2 Journal dof =⌥ Psychology
2r in 1904,
x d +4 1 r 2
Spearman (1904b) reviewed
Hedge’s g
for psychologists
g = X1 sx X2
the efforts
r= ⌥ 2 g
made to define g=
the correlation co-
g +4(d f /N)
e⌅cient by Galton (1888) and Pearson (1895). ⌥ Not only ⇤ did he consider ⌅ the application of
2
the Pearson correlationt -totestranked data, t = 2dbut d f he also 2
r = developed 2 t = 1r drf2 for attenuation
t /(t + d f ) corrections
⇤ 2d f
F-test
and the partial correlation, two subjects 4d 2 d f will ber addressed
F =that = F/(F + d flater. ) F = 1r advantage
The r2 of using

Chi Square r= 2 /n 2 =r n2
ranked data rather than the raw data is that ln(OR)
it is more robust
ln(OR)
to variations 3.62r
in the extreme
Odds ratio d = 1.81 r= ⌥ ln(OR) = ⌥
scores. For whether a person has an 8,000 or a 6,000 on 1.81 an exam,2 +4
(ln(OR)/1.81) that he or she 1 r2 is the highest
requivalent r with probability p r = requivalent
score makes no difference to the ranks. Consider Y as ten numbers sampled from 1 to 20
and then find the Pearson correlation with Y 2 and eY . Do the same things for the ranks
of these numbers. That is, find the Spearman correlations. As is clear from Figure 4.5, the
important than the squared correlation and it is more appropriate to consider the slope of
Correlação e regressão

• Uma distinção simples:


• Empregamos a análise correlacional quando queremos investigar a existência
de relações entre variáveis e a análise de regressão quando queremos prever
uma variável a partir de outra ou de uma soma de outras
Exercício 1

• Instalar o JASP: https://jasp-stats.org


• Explore o arquivo ex1_ie_bpr_16pf_avdes.sav
• Use visualização
• Entenda as estatísticas descritivas
Modelos
http://r4ds.had.co.nz/program-intro.html

Visualização de padrões -> modelos


Modelagem
• http://r4ds.had.co.nz/model-basics.html#introduction-15
• “Patterns provide one of the most useful tools for data scientists
because they reveal covariation. If you think of variation as a
phenomenon that creates uncertainty, covariation is a phenomenon
that reduces it. If two variables covary, you can use the values of one
variable to make better predictions about the values of the second. If
the covariation is due to a causal relationship (a special case), then
you can use the value of one variable to control the value of the
second... Models are a tool for extracting patterns out of data. ” (p.
106)
http://rpsychologist.com/d3/NHST/

http://rpsychologist.com/d3/CI/

http://rpsychologist.com/d3/cohend/
Correlação e Regressão
http://rpsychologist.com/d3/correlation/
Fórmula da correlação
N

∑z xi zyi
r= i=1
N −1

N
⎛ ( x i − x )⎞ ⎛ ( y i − y )⎞

i=1
⎜ s

⎟⎜ s
⎠ ⎝


x y
r=
( N − 1)

https://rpsychologist.com/d3/correlation/
Produto-momento!

• A média do produto de dois momentos indicando


co-relação
(X - X )
• Produto : multiplicação de duas variáveis (X,Y) z=
• Momento: função aplicada a média de desvios s
• Momentos centrais: : 1o = Média, 2º = Variancia, 3º =
Assimetria, 4o = Kurtose
• Os escores z são momentos

å
• Desvios da média em unidades de desvio padrão
z z
• Co-relação: ocorrência simultânea together r= X Y

• z para X pareado copm z para Y N


• Então a correlação Produto-Momento de Pearson (r)
é a magnitude média em que pares de escores (X,Y)
se correlacionam por desviarem simultaneamente de
suas respectivas médias
Reta de regressão

• Melhor previsão de Y em relação aos valores de X


• Equação de previsão:
Ŷ= b0+ b1X
Na qual:
X = valor do preditor (variável preditora ou VI)
Ŷ= valor previsto de Y (variável resposta ou VD ou critério)
i.e., valor de Y na linha, dado X
b1 = inclinação (slope) da linha, Mudança em Ŷ para uma mudança de
1-unidade de mudança em X
b1 = rXY(SY/SX)
b0 = constante (intercept)
Ŷ quando X = 0.0
b0= MY – b1MX
Soma de Quadrados da Regressão
8

SSTotal = ∑(Y –MY)2 7

Y
3

SSModel = ∑(Ŷ –MY)2 0 SSResidual = ∑(Y – Ŷ)2


0 1 2 3 4 5 6 7 8

8 X 8

7 7

6 6

5 5

Y
4
Y

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8

X X

http://setosa.io/ev/ordinary-least-squares-
regression/
Conceitos até aqui ...

• Escore z
• Variância/covariância
• Correlação
• Equação da reta: intercept e slope
• VD e VI
• Variância total, Variância residual, variância da regressão
• Proporção de variância explicada
• Graus de liberdade
•F
Manual do jasp: https://jasp-stats.org/jasp-materials/

https://www.jamovi.org

https://rpsychologist.com/d3/CI/
https://rpsychologist.com/d3/cohend/

https://rpsychologist.com/d3/NHST/

https://gallery.shinyapps.io/simple_regression/

https://gallery.shinyapps.io/anova_shiny_rstudio/

https://gallery.shinyapps.io/multi_regression/
Exercício 1

• Instalar o JASP: https://jasp-stats.org


• Explore o arquivo ex1_ie_bpr_16pf_avdes.sav
• Use visualização
• Procure interpretar as estatísticas descritivas
Exercício 2

• Abra o arquivo ex1_ie_bpr_16pf_avdes.sav


• Escolha duas variáveis e faça a regressão simples
Exercício 3

• ex1_ie_bpr_16pf_avdes.sav

• ANOVA
• RMANOVA
• TWO-WAY INDEPENDENT ANOVA
• MIXED FACTOR ANOVA
136 THE SAGE HANDBOOK OF PERSONALITY THEORY AND ASSESSMENT

Table 7.1 16PF Scale Names and Descriptors


Descriptors of Low Range Primary Scales Descriptors of High Range
Reserved, Impersonal, Distant Warmth (A) Warm-hearted, Caring, Attentive To Others
Concrete, Lower Mental Capacity Reasoning (B) Abstract, Bright, Fast-Learner
Reactive, Affected By Feelings Emotional Stability (C) Emotionally Stable, Adaptive, Mature
Deferential, Cooperative, Avoids Conflict Dominance (E) Dominant, Forceful, Assertive
Serious, Restrained, Careful Liveliness (F) Enthusiastic, Animated, Spontaneous
Expedient, Nonconforming Rule-Consciousness (G) Rule-Conscious, Dutiful
Shy, Timid, Threat-Sensitive Social Boldness (H) Socially Bold, Venturesome, Thick-Skinned
Tough, Objective, Unsentimental Sensitivity (I) Sensitive, Aesthetic, Tender-Minded
Trusting, Unsuspecting, Accepting Vigilance (L) Vigilant, Suspicious, Skeptical, Wary
Practical, Grounded, Down-To-Earth Abstractedness (M) Abstracted, Imaginative, Idea-Oriented
Forthright, Genuine, Artless Privateness (N) Private, Discreet, Non-Disclosing
Self-Assured, Unworried, Complacent Apprehension (O) Apprehensive, Self-Doubting, Worried
Traditional, Attached To Familiar Openness to Change (Q1) Open To Change, Experimenting
Group-Orientated, Affiliative Self-Reliance (Q2) Self-Reliant, Solitary, Individualistic
Tolerates Disorder, Unexacting, Flexible Perfectionism (Q3) Perfectionistic, Organized, Self-Disciplined
Relaxed, Placid, Patient Tension (Q4) Tense, High Energy, Driven
Global Scales
Introverted, Socially Inhibited Extraversion Extraverted, Socially Participating
Low Anxiety, Unperturbable Anxiety Neuroticism High Anxiety, Perturbable
Receptive, Open-Minded, Intuitive Tough-Mindedness Tough-Minded, Resolute, Unempathic
Accommodating, Agreeable, Selfless Independence Independent, Persuasive, Willful
Unrestrained, Follows Urges Self-Control Self-Controlled, Inhibits Urges
Adapted with permission from S.R. Conn and M.L. Rieke (1994). 16PF Fifth Edition Technical Manual. Champaign, IL: Institute
for Personality and Ability Testing, Inc.

personality measurement. Instead of being hydrogen and oxygen). For psychology to


developed to measure preconceived dimen- advance as a science, he felt it also needed
sions of interest to a particular author, the basic measurement techniques for personality.
Big Five factor models (e.g. Costa and those between the NEO five factors and the
McCrae, 1976, 1985; Norman, 1963; Big Five markers which the NEO was devel-
McKenzie et al., 1997; Tupes and Christal, oped to measure (H.E.P. Cattell, 1996;
1961). For example, the first NEO manual Goldberg, 1992). The alignments among the
(Costa and McCrae, 1985: 26) describes the Big Five models are summarized in Table 7.4.
development of the questionnaire as beginning However, there are important differences
Tablewith
7.2 16PF cluster
globalanalyses
factors and ofthe 16PF
primary scales, which
trait` make-up between the two models. Although propo-
these researchers had been using for over Global nents Factors of the other five-factor models have done
20 years in their own research. However, this much in the last decade to try to bring about
origin, or even acknowledgement
Extraversion/Introversion
of the exis-
High Anxiety/Low Anxiety
a consensusIndependence/Accommodation
Tough-Mindedness/Receptivity
in psychology about the exis-
Self-Control/Lack of Restraint
tence of the five 16PF global factors, does not tence of five global factors, their particular
appear in any current accounts of the develop- set of traits have been found to be problem-
ment of the Big Five (Costa and McCrae, atic. In the development process, the NEO
(A)1992a; Digman, 1990;
Warm-Reserved Goldberg,
(C) Emotionally Stable– 1990). (A) Warm–Reserved Big Five factors were forced to(F) be
(E) Dominant–Deferential statisti-
Lively–Serious
(F) Lively-Serious Reactive (I) Sensitive–Unsentimental (H) Bold–Shy (G) Rule-conscious/Expedient
Furthermore, when
(H) Bold-Shy the 16PF correlation
(L) Vigilant–Trusting cally uncorrelated
(M) Abstracted–Practical or orthogonal(M)for
(L) Vigilant–Trusting reasons
Abstracted–Practical
(N)matrix, which was used
Private-Forthright (O) in the original devel-
Apprehensive–Self-assured (Q1) of theoreticalTraditional
Open-to-Change/ (Q1) and statistical simplicity.
Open-to Change/ (Q3) Perfectionistic–Tolerates
(Q2) Self-Reliant–Group-oriented (Q4) Tense–Relaxed Traditional disorder
opment of the Big Five, is re-analyzed However, few have found this as a satisfactory
using more modern, rigorous factor-analytic Primaryapproach Factors for defining the basic dimensions

Table 7.4 Alignments among the three main five-factor models


16PF (Cattell) NEO-PI-R (Costa and McCrae) Big Five (Goldberg)
Extraversion/Introversion Extraversion Surgency
Low Anxiety/High Anxiety Neuroticism Emotional stability
Tough-Mindedness/Receptivity Openness Intellect or culture
Independence/Accommodation Agreeableness Agreeableness
Self-Control/Lack of Restraint Conscientiousness Conscientiousness or dependability
Correlation Preliminaries Alternative cases What is r Multiple R Path algebra R in R Moderation setCor SIgnificance Mediation R

Cautions about correlations: Anscombe data set

Anscombe's 4 Regression data sets


12

12
10

10
y1

y2
8

8
6

6
4

4
5 10 15 5 10 15

x1 x2
12

12
10

10
y3

y4
8

8
6

6
4

5 10 15 5 10 15

x3 x4

40 / 119

Você também pode gostar