Correlation and Regression

Hi there, in the previous video, we spent
some time looking at the possible causal

connections between trust and other
social, political and economic variables.
In this video, we will see how
social scientists try establish and
verify such links.
This is a strict scientific process.
Employing statistical techniques known
as correlation and regression analysis.
Now, if you're already familiar with
these, you could always skip this lecture.
On the other hand,
if you want to refresh your memory,
you're welcome to stay on board.
For those of you who don't know anything
about these techniques, don't worry.
You don't have to actually
do anything yourself,
I'm just going to give you a passive
knowledge of what's going on.
Not quite as painful as it sounds.
Well, let's start then with a list of
countries for which we have measurements.
And we'll call these measurements x and y.
These could be anything.
Doesn't matter at all.
The first question one's to answer is,
is there any connection between the two?
Okay, how do we go about it?
Well, the first thing we want to
do is to plot them on a graph.
One on the horizontal axis and
the other on the vertical axis.
We'll do it twice with different examples.
The top graph shows no relation
whatsoever between the two.
But in the one below,
you can already see a pattern emerging.
But how close is that relationship?
Do we have a simple way of
expressing the degree of closeness?
And the answer is that we do, and
that relationship is called correlation.
Now what correlation does,
is to place a statistical measure which
it calls r small r between the two.
And r can vary from not or zero where
there is no relationship at all.
2r equals 1 for
complete unity between the two measures.
Our top graph will be close to 0.
The bottom one close to 0.7.
Now if both variables increase or
decrease together,
we talk about a positive correlation.
We give it a plus sign.
On the other hand, if an increase in one
is accompanied by a decline in the other,
we talk about negative correlation,
and we give it a minus sign.

In our bottom graph then,
we have a positive correlation.
Now so far, we've made no
judgements on what causes what.
So, we can take the analysis further
when we do make a hypothesis that
one x is actually the cause of the other,
y.
And when we do this we automatically
make the implicit hypothesis that
there might not be any relationship
at all between the two.
And this is called the Null Hypothesis.
So, let's get back to our graph.
We assume that changes
in x cause changes in y,
and we have the causal variable
on the horizontal axis.
The next challenge now is to
draw a line through the graph
that provides the best fit for
all the observations.
Now, I've taken our graph and
tried to draw a line.
Or in fact draw two lines that
seem best to fit the data.
One is in red.
The other is in blue.
And they both seem to
do the job quite well.
But that's not really scientific.
What I need is a formula that will allow
me to draw what really is the best
fit line.
A line that minimizes the distance
between each of the data points.
And the hypothetical line or
relationship I'm trying to establish.
Such a formula exists, but
I don't need you to know it.
Well, not now anyway, and anyway
nowadays a computer does it for you.
This line is called
the Least Squares Regression Line.
Now, the regression line is
usually expressed as a formula.
Which stipulates how high or low on
the Y axis the zero value of X begins.
And the direction and
gradient of the line.
Note that I've extended the axes
since the relationships expressed in
the regression line should hold for
the missing data as well.
So, if I add the results
of ten more countries,
they should show the same pattern, and
it should also predict the pattern for
any other matching set of
data juxtaposing x or y.
For example, this could be the data for
the same countries, but
for 2013 instead of 2012.
But how sure are we of this relationship?
How confident are we that
this is not a chance result?
Well basically, our confidence depends
on the closeness of the relationship.
The correlation.
And the number and
the range of the observations.
There are statistical tables for
doing this.
But nowadays,
they're embedded in computer programs.
And such tests can confirm the degree
of confidence you can have,
statistically, in the result.
Confident, for example,
that the data for the following year for
the same countries,
would show the same relationship.
Or confident that the same
relationship would appear with
a different group of countries.
So, after seeing our r value and
the regression line formula,
social scientists should also
tell you the confidence level.
And depending on that confidence level,
you can go ahead on the basis of
the supposition that x causes y.
Confidence level is usually 99%.
But sometimes, 95% is okay.
And this is all you need to know for now.
But note.
First, this is a purely
statistical relationship.
Second, we still need to check
that the initial data is accurate.
Third, we need to check whether
the hypothesis is possible.
Fourth, we need to ask ourselves whether
there might not be a reverse causation.
And last but not least, we need to
check whether the author gives us
the confidence level or
error margin in the results.
There should always be one,
but often there isn't.
Now despite all of this, there is still
disagreements among social scientists.
Why should that occur?
Well sometimes the data is incomplete.
How many countries are the total
are in the comparison?
Is there a bias in
the ones that are missing.
Often the truth is uncertain.
Time after time, variables are entered

into the calculation ignoring the fact
that they have their own
margins of their own.
And again, the historical periods chosen
for comparison might be different.
And therefore the quality of the data
might have changed in the interval.
As it often does.
Another reason is that the data
is chosen as a proxy for reality.
An artificial construction.
Labeled as representative for
something else.
But does it really cover
the issue as it's claimed?
So, let's sum up now.
In this video, we've examined the way
in which social scientists try
to establish statistical relationships
between sets of variables.
We've dealt with correlation,
regression and confidence levels.
Now, that's not bad, but
we need to internalize these concepts,
because they're absolutely necessary
whenever anybody tells you that more of
x leads to more of y.
And believe me,
they're telling you this all the time.
Now next week, we'll look at the various
ways in which society might be fragmented
and what the implications might be for
levels of trust and for governments.

Correlation and Regression

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Correlation and Regression

Enviado por

Direitos autorais:

Formatos disponíveis

Hi there, in the previous video, we spent

some time looking at the possible causal

and we give it a minus sign.

Time after time, variables are entered

Você também pode gostar