Você está na página 1de 34

Generalized Causal Inference

Source: Times Daily 1998
Conditions for Causality

John Stuart Mill (1806-1873)

1. Temporal precedence (X precedes Y)

2. Covariation (Y changes with X)

3. No plausible alternative explanations


Are causes necessary and sufficient conditions

for their effects?

J.L. Mackie would disagree . . .

Consider Short circuits cause house fires
A short circuit is not necessary for a fire to burn houses
may catch fire for many other reasons
A short circuit is not sufficient other conditions (i.e.,
oxygen and inflammable material) must be present

Causality (contd)

INUS condition:
An insufficient but non-redundant part of an
unnecessary but sufficient condition
(Mackie 1974, p. 62)

Insufficient but
Necessary part of a condition which is itself
Unnecessary but
Sufficient for the result


Effect: the difference between what did

happen and the counterfactual, i.e.
something that is contrary to the fact

Counterfactual What would have happened

to the same individuals who received a
treatment if they simultaneously had not
received that treatment?

Two roads diverged in a yellow wood,
And sorry I could not travel both
Robert Frost

We cannot observe a counterfactual: Acts

demolish their alternatives, that is the
paradox (Salter 1975, p. 36)
Hence we try to
Create reasonbale but imperfect approximations
to the counterfactual
Measure how these approximations differ from
the counterfactual

Research Designs
Experiment: A study in which an intervention is deliberately
introduced to observe its effects.
Randomized Experiment: An experiment in which units are assigned
to receive the treatment or an alternative condition by a random
process such as the toss of a coin or a table of random numbers.
Quasi-Experiment: An experiment in which units are not assigned to
conditions randomly.
Natural Experiment: Not really an experiment because the cause
usually cannot be manipulated; a study that contrasts a naturally
occurring event such as an earthquake with a comparison condition.
Correlational Study: Usually synonymous with non-experimental or
observational study; a study that simply observes the size and
direction of a relationship among variables.
Source: Shadish et al. 2002, p. 12
The Origins of Experimentation

Empedocles (495 BC-430 BC): empirical demonstrations

against Parmenides
Leonardo da Vinci (1452-1519)
William Gilbert (1544-1603)
Francis Bacon (1561-1626)
Hacking (1983) says of early experimenter Sir Francis Bacon: He
taught that not only must we observe nature in the raw, but that we
must also twist the lions tale, that is, manipulate our world in order
to learn its secrets (p. 149)
Galileo Galilei (1564-1642)
Scientific Revolution in the 17th century


Natural philosophy Modern science

Part of our odinary life

What happens to my grades if I study more?
What happens to my weight if I exercise more?
. . .

Natural Philosophy vs. Modern Science

Natural Philosophy Modern Science

First principles Theory Observation
Theory Observations to Correction of errors in
support theory theory

Passive observation Deliberate interventions

and observation of effects
No control of extraneous Control of extraneous
influences influences (e.g.,
Randomized Experiments

Random assignment creates two or more

groups of units that are probabilistically
similar to each other on the average.

Hence, any outcome differences between

groups are very likely due to treatment.

Causal Description vs. Causal Explanation

Randomized experiments are the Cadillac

of research designs for describing the effects
of manipulations (Molar causation)

Randomized experiments do not help much

in explaining the mechanisms through which
and the conditions under which the causal
relationship holds (Molecular causation)

Mediators and Moderators

A mediator is a variable that accounts for all

or some of the observed relationship
between a predictor and an outcome.
whats a predictor?

A moderator is a variable that affects the

strength or direction of the relationship
between a predictor and an outcome; in
other words, the effect of the predictor on
the outcome depends on the level of the

Unlike with randomized experiments, in quasi-experiments

assignment to conditions is by means of
Self-selection: units choose treatment for themselves
(e.g. individuals voluntarily enrolling in job training
Selection by non-random mechanisms: the decision of
who gets which treatment is based on some non-random
criterion (e.g. school principal allocating teachers to
Treatment and control groups may differ in many systematic
(non-random) ways other than the presence of the

Quasi-Experiments (contd)

Systematic differences between treatment

and control groups may constitute potential
alternative explanations for the observed
We try to rule out alternative explanations
Quasi-Experiments (contd)

The ruling out of alternative explanations is

related to a falsificationist logic (Popper
Many confirming observations are not sufficient
to prove an hypothesis
One single disconfirming observation is sufficient
to falsify an hypothesis
Scientists should try to falsify the conclusions
they wish to draw

Natural Experiments

Study of a natural setting that appears to assign a

treatment in a reasonably random manner
Often treatment is not manipulable (e.g.
Example (Card and Krueger 1993)
In 1992, New Jersey rose its state-mandated minimum
wage from $4.25 to $5.05.
RQ: What were the effect of a minimum wage increase on
the employment of low-skill teenagers in the fast food

Natural Experiments (contd)

Example (Card and Krueger 1993)

Treatment group: fast food workers in NJ
Control group: fast food workers in neighboring
Pennsylvania, which did not increase its minimum wage
NJ and PA are similar states
Teenagers families decisions to live in one or the other are very
unlikely to be correlated with NJs decision to raise its minimum
wage in 1992
Two sources of counterfactual evidence:
Treated vs. non-treated (NJ vs. Pennsylvania)
Before vs. after treatment (Before vs. after minimum wage
(Card and Krueger) vs.(Neumark and Wascher)

Correlational Studies

Synonyms: non-experimental or passive-

observational designs
Counterfactual inference is difficult due to the lack of
Pretests and
Control groups
Causal claims are particularly problematic when
We dont know all alternative plausible explanations
Alternative explanations cannot be measured
Statistical models are not well-specified

Confounds and Spurious Correlations

Source: http://tylervigen.com/spurious-correlations, last accessed on Sept. 6, 2016

Formal Statistical Inference

The process of drawing conclusions

about a population based on sample
Practical questions
How much uncertainty is associated with
sample data?
Do my results constitute strong evidence or
just a lucky draw/chance finding?


Validity is the approximate truth of an

Validity is a property of inferences not a
property of designs
We will study
Four types of validity
Threats to each type of validity
Possible remedies

Four Types of Validity (SKC definitions)

Statistical Conclusion Validity: The validity of inferences

about the correlation (covariation) between treatment X and
outcome Y
Internal Validity: The validity of inferences about whether
observed covariation between X (the presumed treatment)
and Y (the presumed outcome) reflects a causal relationship
from X to Y as those variables were manipulated or
Construct Validity: The validity of inferences about thenon so cosa vuol dire
higher order constructs that represent sampling particulars
External Validity: The validity of inferences about whether
the cause-effect relationship holds over variation in persons,
settings, treatment variables, and measurement variables
Source: Shadish et al. 2002,26p. 38
Four Types of Validity (More intuitive)

Statistical Conclusion Validity: Is the use of

statistics approriate to infer whether X and Y
Internal Validity: Does observed covariation
between X and Y result from a causal relationship?
Construct Validity: Are we actually measuring
the concepts that we want to measure?
External Validity: Does the the causal relationship
between X and Y holds over varied persons,
treatments, outcome measures, and settings?

Random Sampling

Simple random sampling is the basic

sampling technique where we select a group
of subjects (a sample) for study from a larger
group (a population)
Each individual is chosen by chance and each
member of the population has an equal chance
of being included in the sample
Every possible sample of a given size has the
same chance of selection

Random Assignment

An aspect of an experimental design in which

study participants are assigned to the
treatment or control group using a random
Random assignment creates two or more groups
of units that are probabilistically similar to ecah
other on the average
Hence, any outcome differences between groups
are very likely due to treatment

Random Sampling vs. Random Assignment


RANDOM High Internal Validity Low Internal Validity Generalizability to

SAMPLING High External Validity High External Validity Population

NO RANDOM High Internal Validity Low Internal Validity No Generalizability to

SAMPLING Low External Validity Low External Validity Population

Causation Correlation

Main Types of Data

Cross-sectional data: multiple cases (such as

individuals, firms, countries, or regions) are
observed at the same point of time, or without
regard to differences in time
Panel data (longitudinal data or cross-sectional time
series data): multiple cases (people, firms,
countries, etc.) are observed at two or more time

Variables, Attributes, Values




Levels of Measurement

Nominal: no ordering is implied

e.g., party affiliation, gender, race, university major
Ordinal: the attributes can be rank-ordered but distances
between attributes do not have any meaning
e.g., pain measurement scale: no pain, mild, moderate, severe, worst pain
Interval: distance between attributes does have meaning but
there is no meaningful zero
e.g., temperature in Celsius: the difference between a temperature of 80C
and 70C is the same difference as between 50C and 40C, however 0C
does not mean no heat
Ratio: has all the properties of an interval variable, and also
has a clear definition of zero, i.e., when the variable equals
zero there is none of that variable
e.g., height, weight, number of children in a family, temperature in Kelvin


Observations made on the units

Setting of the experiment