Você está na página 1de 65

Basic Quantitative Tools (Prof.

Campbell)
Data needed
last updated:
Friday, March 17, 2008
(INFERENTIAL STATISTICS -- inferring from a sample to a population using the laws of probability)
How confident can one be that the sample mean (or proportion) represents the population as a whole?
confidence interval (mean)
one interval variable
inverse: given a specific confidence interval, what is the needed sample size?
confidence interval (proportion)
one nominal variable
Do differences found in a sample (a subset of the population) reflect differences in the
population as a whole? (commonly used to generalize from survey results)
two categorical variables
an interval variable divided
into two categories
a nominal variable divided
into two categories

chi-square
difference of means
difference of proportions

an interval variable divided


into three or more categories

ANOVA (Analysis of Variance)


What is the relationship between two variables?
correlation analysis (including an example of
ecological fallacy)

two interval variables

How many total jobs are dependent on basic (export-based) jobs?


number of basic (export)
jobs, number of total jobs
Multiplier
(export + locally serving)
What is the relative concentration of local employment by sector?
employment (total and by
sector) for both the locality
Location Quotients
and the nation
How can we estimate interaction (e.g., trade, traffic) between two cities?
population of two cities,
Gravity Model
distance, constant
How do we measure growth over time?
Growth Rates (3 types)

population levels over time

How do we compare costs and benefits (e.g., of a project) over time?


quantified costs and benefits
for each year, discount rate

Cost-benefit analysis

251437191.xls.ms_office

Overview

11/17/2014 6:52 AM

calculate a confidence interval (with interval data)

that is, how confident are you that your sample estimate comes close to the populat
one interval variable

enter data
in yellow cells

Data needed:
sample mean (X)
std dev of sample (s)
sample size (n)
value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate

X t.025

Data Hhd Income


1
2

24,000
36,000

24,000
36,000

12,000

12,000

4
5

74,000
46,000

74,000
46,000

27,000

27,000

23,000

23,000

8
9
10
11
12
13
14
15
16

69,000
107,000
53,000
29,000
34,000
43,000
28,000
24,000
43,000

69,000
107,000
53,000
29,000
34,000
43,000
28,000
24,000
43,000

MEAN
STDEV
n
t

42,000
24,105
16
2.131

SO:
u=

42,000 +/-

lower end of confidence interval


upper end of confidence interval
range
Confidence Interval

set the confidence level (2-tail)

0.05

20,000

40,000

60,000

close to the population mean?

X t.025

s
n

12,845
29,155
54,845
25,690

80,000

100,000

calculate a confidence interval

that is, how confident are you that your sample estimate comes close to the populat
one interval variable
Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n)
Data needed:
sample mean (X)
std dev of sample (s)
sample size (n)
value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate

enter data
in yellow cells

MEAN 42,000
STDEV 5,000
n
384
t

SO:
u=

1.966

42,000

lower end of confidence interval


upper end of confidence interval
range

set the confidence level (2-tail)

0.05

Confidence Interval

20,000

40,000

e comes close to the population mean?

data (mean, std dev., n)

X t.025
+/-

s
n
502

f confidence interval 41,498


of confidence interval 42,502
1,003
Confidence Interval

40,000

60,000

80,000

100,000

calculate a minimum sample size need to achieve a specific confi

that is, how confident are you that your sample estimate comes close to the populat
one interval variable
Here we will skip using the raw data and instead calculate with the summary data (mean, std dev., n)
Data needed:
sample mean (X)
std dev of sample (s)
sample size (n)
value of t-score for.025 (two-tail test) -- from t-table or let Excel calculate

MEAN
42,000
STDEV
5,000
c (confidence interval range) 500
t

enter data
in yellow cells

SO:
u=

1.960

lower end of co
upper end of co
range

set the confidence level (2-tail)

0.05
given values of stdev and c and confidence level, we calculate "n":

sample size needed

384

NOTES:
1. For the value of "t", we simply
assumed a large sample size (t --> Z),
e.g., for 95% confidence interval (2tailed), t = 1.96.
2. We are also assuming a large
population size (M), so that N/M --> 0.

20,000

a specific confidence interval range

close to the population mean?

here is the formula to calculate a confidence

X t.025

s
n

solving for n (sample size)

42,000 +/-

500

ower end of confidence interval 41,500


upper end of confidence interval 42,500
1,000

t.025 s
n
c

Confidence Interval

leads to this equation (so, to estima


size, you need to know Stdev, the con
and the value of t.

t.025 s 2
n(
)
c

20,000

40,000

60,000

80,000

100,000

calculate a confidence interval

t.025

s
n

mple size)

t.025 s

equation (so, to estimate sample


to know Stdev, the confidence interval,

t.025 s 2
(
)
c

calculate a confidence interval using proportions (nominal data)


for large n
one nominal variable (proportions)
the population proportion is

Data needed:
sample proportion (P)

enter data
in yellow cells

P 1.96

sample size (n)


set the confidence level (2-tail)

0.05
P
n

50%
100

SO:
p

0.500 +/-

lower end of confidence interval


upper end of confidence interval
range

1.984

Confidence Interval

0%

10%

20%

30%

40%

50%

60%

70%

(nominal data)

P(1 P)
P 1.96
n
in percent

70%

80%

0.099

9.9%

0.401
0.599
0.198

40.1%
59.9%
19.8%

90%

100%

Chi-Square

does the distribution of ou


from a random distribution

CHI-SQUARE TEST (EXCEL: FUNCTION)


ACTUAL (OBSERVED)
city suburb
strong
2
1
medium
1
2
weak
1
1
4
4

rural
1
1
2
4

4
4
4
12

enter data
in yellow cells

PREDICTED/EXPECTED (based on mutiplying row and column to

strong
medium
weak

city suburb
1.3333 1.3333
1.3333 1.3333
1.3333 1.3333
4
4

rural
1.3333
1.3333
1.3333
4

4
4
4
12

Chi-square test (Calculated by Excel): "CHITEST"

###

(probability of this sample outcome if no difference in population)

fo

range: 0 to 1

Difference between predicted and actual

strong
medium
weak

city suburb
rural
-0.6667 0.3333 0.3333
0.3333 -0.6667 0.3333
0.3333 0.3333 -0.6667
0
0
0

Page 11

fe

fe
0
0
0
0

Chi-Square

distribution of outcomes (observed) significantly differ


andom distribution (expected)?

nter data 20
yellow cells
16
12

8
12
16

20
12
4

and column totals)

( fo fe )

fe
2

fo

observed frequencies

fe

expected frequencies

fe

Page 12

The t distribution is used for hypothesis testing with small samples (e.g., smaller than about 100 cases)
the t distribution is similar to the z distribution, but is "flatter" because of the smaller sample size.
When the sample size gets large (e.g., over 50-100), the t distribution approaches that of the Z distribution (a normal c
d.f.
tails

5
2
1.000

probability of this outcome if no difference in population

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3

1.000

0.950
0.924

10
2

50
1000
Probabilities
and t-scores for various degree
2
2
test)
1.000
1.000
1.000

0.922
0.921
0.920
0.849
0.845
0.842
0.842
0.900
0.776
0.770
0.765
0.764
0.706
0.698
0.691
0.689
0.850
0.638
0.628
0.619
0.617
0.800
0.575
0.562
0.551
0.549
0.515
0.500
0.487
0.484
0.750
0.460
0.442
0.427
0.424
0.409
0.389
0.372
0.368
0.700
0.363
0.341
0.322
0.318
0.321
0.297
0.277
0.272
0.650
0.284
0.258
0.236
0.230
0.600
0.250
0.223
0.200
0.194
0.220
0.192
0.168
0.162
0.550
0.194
0.165
0.140
0.134
0.170
0.141
0.116
0.110
0.500
0.150
0.120
0.095
0.089
0.132
0.102
0.078
0.072
0.450
0.116
0.087
0.063
0.058
0.400
0.102
0.073
0.051
0.046
0.090
0.062
0.041
0.036
0.350
0.079
0.052
0.032
0.028
0.070
0.044
0.026
0.022
0.300
0.062
0.037
0.020
0.017
0.054
0.031
0.016
0.013
0.250
0.048
0.026
0.012
0.009
0.043
0.022
0.009
0.007
0.200
0.038 the 0.050.019
0.007
level is by convention used0.005
as the threshold of
0.150
0.034 statistical
0.016
0.004 we use an even more
significance0.006
(though sometimes
0.030 strict level,
0.013
0.004
0.003
such as 0.01
or even 0.001
0.100
0.050
0.000

0.5

1.5

standardized sample differences (t

bout 100 cases)


the Z distribution (a normal curve)

r various degrees of freedom (two-tail


test)

Degrees of freedom
5
10
50
1000

the larger the sample size, the lower the


value of the critical t ...
when the sample size gets large (e.g.,
over 50 - 100), then the critical t level (.05,
2 tail) approaches 1.96

2
differences (t-scores)

2.5

difference of means

Small Standard Deviation

Larger Standard Deviation

Factor
Case
1
2
3
4
5
6
7
8
9
10
11
12

50
Male Income
69,000
77,000
46,000
59,000
55,000
50,000
38,000
63,000
50,000
56,000
74,000
50,000

45
Female Income
49,000
67,000
69,000
64,000
30,000
68,000
73,000
61,000
61,000
48,000
72,000
57,000

Mean
Std Dev.

57,250
11,702

59,917
12,428

female

Factor
80
75
CaseMale Income
Female Income
1
40,000
72,000
2
42,000
34,000
3
83,000
65,000
4
100,000
34,000
5
100,000
86,000
6
86,000
86,000
7
104,000
67,000
8
70,000
64,000
9
37,000
79,000
10
62,000
78,000
11
88,000
85,000
12
72,000
83,000
Mean
Std Dev.

73,667
24,092

69,417
18,372

female

mean

mean

Male

mean

mean

Male

20,000

40,000

60,000

80,000

100,000

20,000

40,000

60,000

80,000

100,000

t-Test: Two-Sample Assuming Equal Variances

t-Test: Two-Sample Assuming Equal Variances

Male Income
Mean
57250
Variance
136931818.2
Observations
12
Pooled Variance
145689393.9
Hypothesized Mean Difference 0
df
22
t Stat
-0.541
P(T<=t) one-tail
0.297
t Critical one-tail
1.717
P(T<=t) two-tail
0.594
t Critical two-tail
2.074

Male IncomeFemale Income


Mean
73666.6667 69416.6667
Variance
580424242 337537879
Observations
12
12
Pooled Variance458981061
Hypothesized Mean Difference
0
df
22
t Stat
0.486
P(T<=t) one-tail
0.316
t Critical one-tail
1.717
P(T<=t) two-tail
0.632
t Critical two-tail
2.074

Female Income
59916.66667
154446969.7
12

fail to
reject
Page 15

fail to
reject H

difference of means

Data Needed:
number of cases for each of the two groups
sample means for the two groups
standard deviation for each group
Hypothesis (no difference between the two population means):

1 2
i.e., 1 2 0

How to calculate t (note; EXCEL will do this all for you -- so do don't need to really use this formula)

(X1 X2 ) ( 1 2 )
t
_ _
X 1 X 2

SINCE WE HYPOTHESIZE U1=U2, OR U1 -U2 = 0, then the (u1-u2) drops out of the numerator of the equation for t

(X 1 X 2 )
t
_ _
X1 X 2

the formula for the standard error (the denominator of the equation for t)

X1X2

12 22

N1 N2
Page 16

difference of means

if we can assume the same standard deviation of the populations ("equal variance")

N1 N 2
X1X 2
N1 N 2

Page 17

difference of means

Bigger DOM
Factor
80
70
CaseMale Income
Female Income
1
37,000
55,000
2
105,000
40,000
3
52,000
82,000
4
58,000
97,000
5
107,000
56,000
6
105,000
44,000
7
96,000
39,000
8
86,000
70,000
9
82,000
77,000
10
104,000
79,000
11
71,000
44,000
12
100,000
47,000
Mean
Std Dev.

83,583
23,922

60,833
19,441

female

mean

Male

mean

100,000

120,000

20,000

40,000

60,000

80,000

100,000

g Equal Variances

t-Test: Two-Sample Assuming Equal Variances

emale Income

Male IncomeFemale Income


Mean
83583.3333 60833.3333
Variance
572265152 377969697
Observations
12
12
Pooled Variance475117424
Hypothesized Mean Difference
0
df
22
t Stat
2.557
P(T<=t) one-tail
0.009
t Critical one-tail
1.717
P(T<=t) two-tail
0.018
t Critical two-tail
2.074

fail to
reject H0

reject H0

Page 18

120,000

difference of means

r of the equation for t

Page 19

Diff of Proportions

A special case of the difference of means test


Do you Own a Car? 1= yes, 0=no

Percent of Residents Who Own a Car

Case

City ResidentsSuburban Residents


1
0
0
2
0
0
3
0
1
4
0
1
5
0
1
6
0
1
7
0
1
8
1
1
9
1
1
10
1
1
11
1
1
12
1
1
13
1
1
14
1
1
15
1
1
16
1
1
17
1
1
18
1
1
19
1
1
20
1
1
21
1
1
22
1
1
Mean
68.2%
90.9%
n of cases
22
22
degrees of freedom (n1 +n2-2)

100.0%
90.0%
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
City Residents

42

for simplicity and conservatism, we could have


also assumed that the population proportions are 50% and 50%
t-score
Numerator:
pu
sqrt(pu,qu)
denominator=

-22.7%
0.79545
0.40337
0.12162

t-score

-1.86871

Prob-t

0.068649

see Blalock, p. 234

The central question: does the differen


an actual difference among the entire p
hypothesis]

Alternative: the difference is due mere


variation, and that there is no difference
[the null hypothesis]

Remember: generally if |t| >2 (i.e., if t < -2 or t >


2), then it is "statistically significant" at the .05 level.
That is, there is less than a 5% chance that one could get
this difference in the sample drawn from a population
where there is no difference between city and suburban

Page 20

Diff of Proportions

cent of Residents Who Own a Car

Suburban Residents

on: does the difference found in the sample reflect


ce among the entire population? [the research

difference is due merely to random sample


there is no difference in the population as a whole.

2 or t >
he .05 level.
at one could get
a population
and suburban

Page 21

Diff of Proportions (2)

Here, if given just the mean, n of cases

Mean
10.0% 20.0%
n of cases 150
120
degrees of freedom (n1 +n2-2) 268

t-score
Numerator:-10.0%
pu
0.14444
sqrt(pu,qu)0.35154
denominator=
0.04305
t-score

-2.3226

Prob-t

0.0209

see Blalock, p. 234

NOte that as the mean values deviate from 50%, we can


be more accurate:
e.g., compare 10% to 20%, vs. 40% to 50%
or 80% 90%

Page 22

ANOVA

AUTO MILES DRIVEN PER WEEK


Case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Mean

City ResidentsSuburban Residents


Rural Residents
20
50
40
0
80
50
50
90
60
100
350
70
70
240
80
35
120
90
12
90
100
150
80
100
120
70
20
0
60
30
18
90
40
35
111
50
42
122
60
67
133
250
95
144
170
66
155
120
77
96
150
123
23
170
0
65
180
18
24
111
24
17
130
75
85
75
54.4
104.3
97.5

City ResidentsSuburban Residents


0
17
mean
rural 0
23
12
24
18
50
18
60
20
65
24
70
35
80
35
80
42
85
50
90
mean
66
90
67
90
suburban
70
96
75
111
77
120
95
122
100
133
120
144
123
155
150
240
urban
200
350
54.4mean 104.3
1
0

50

100

AUTO MILES DRIVEN PER W

Anova: Single Factor


SUMMARY
Groups
Count
City Residents
22
Suburban Residents
22
Rural Residents
22

ANOVA
Source of Variation SS
Between Groups21054.6364
Within Groups 242243.727
Total

263298.364

Sum
1397
2295
2146

df
2
63

Average
63.5
104.318182
97.5454545

Variance
2672.64286
5471.65584
3391.11688

MS
10527.3182
3845.13853

F
2.73782547

P-value
0.07241657

F crit
3.14280868

65

Why use ANOVA?


In situations where you are comparing the means from more than two groups.
since in a difference of means test, you compare x2-x1.
For more than two groups, you can't compare x3-x2-x1.
so you look at the variation (sum of squares) within vs. between groups.
Intuitively, sample groups with low internal variation, but high variation across groups, will
likely represent real differences in the population as a whole.

Page 23

ANOVA

While sample groups with high internal variation and low variation across groups have
a greater chance of representing populations with no real differences.
Anova: Single Factor
SUMMARY
Groups
Count
City Residents
22
Suburban Residents
22
Rural Residents
22

ANOVA
Source of Variation SS
Between Groups32248.5758
Within Groups 225825.545
Total

258074.121

Sum
1197
2295
2146

df
2
63

Average
54.4090909
104.318182
97.5454545

Variance
1890.82468
5471.65584
3391.11688

MS
16124.2879
3584.53247

F
4.49829595

65

Page 24

P-value
0.01492455

F crit
3.14280868

ANOVA

Rural Residents
20
30
40
40
50
50
60
60
70
75
80
mean
90
100
100
111
120
130
150
170
170
180
250
97.5
150

200

level
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
250

300

1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
1.1
350

1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
1.2
400

AUTO MILES DRIVEN PER WEEK

SS
be twe en d . f .
F SSw ithin d. f .
SSbetween sum of squares between the groups
SSwithin sum of squares within the groups
d.f. = degrees of freedom
Page 25

SSbetween sum of squares between the groups


SSwithin sum of squares within
the groups
ANOVA
d.f. = degrees of freedom

Page 26

Case
x
y
1
0
0
2 0.1
0.1
3 0.2
0.2
4 0.3
0.3
5 0.4
0.3
6 0.5
0.4
7 0.6
0.5
8 0.7
0.6
9 0.8
0.7
10 0.8
0.8
11 0.9
0.9
12
1
0.9
correlation+0.99

Case
x
1 0.2
2 0.3
3 0.4
4 0.4
5 0.5
6 0.5
7 0.5
8 0.5
9 0.6
10 0.6
11 0.7
12 0.8
correlation

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

F
### 0 0 0.1 0.2 0.3
p value
0.00
correlation (range: -1 < r < +1)

0.4

0.5

0.6

0.7

0.8

0.9

F
p value

Case
x
y
1
0
1
2 0.1
0.9
3 0.2
0.8
4 0.3
0.7
5 0.4
0.6
6 0.5
0.5
7 0.6
0.3
8 0.7
0.4
9 0.8
0.2
10 0.8
0.2
11 0.9
0.1
12
1
0
correlation -0.99
F
p value

###
0.00

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Case
x
1
0
2 0.1
3 0.2
4 0.3
5 0.4
6 0.5
7 0.6
8 0.7
9 0.8
10 0.8
11 0.9
12
1
correlation
F
p value

y
0.5
0.4
0.6
0.4
0.8
0.3
0.5
0.6
0.3
0.8
0.2
0.5

-0.09
0.1

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.79
1

y
0
0.2
0.4
0.6
0.8
1
1
0.8
0.6
0.4
0.2
0

+0.07
0.0
0.84

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

Case
1
2
3
4
5
6
7
8
9
10
11
12

x
0.8634
0.5109
0.691
0.95
0.2146
0.7882
0.5534
0.5532
0.4124
0.3131
0.1139
0.1786

y
0.3088
0.2311
0.9983
0.1494
0.7479
0.6823
0.4609
0.9613
0.3507
0.9722
0.3687
0.3059

-0.1 correlation (range: -1 < r < +1)


1

0.9

0.8

0.7

0.6

using a random number


generator

F
0.1
p-value 0.745
P-Value: this is the probability
that the x-y relationship found in
the sample cases -- expressed as
an r-value -- is simply due to
random variation, and that if one
looked at the population as a
whole, there would be no
relationship. If p<.05, we
generally conclude that the
relationship is statistically
Hit "recalculate now" to
see a new set of numbers
(note: on a MAC this is "COMMAND ="

0.5

0.4

0.3

0.2

0.1

0
0

0.1

0.2

0.3

0.4

0.5

0.6

nge: -1 < r < +1)

0.6

0.7

0.8

0.9

-0.95 # # # # # # # # # # # # # # # # # # 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
92.5641 # # # # # 7 6 4 3 3 2 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 2 3 3 4 6 7 # # # #
2.3E-06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
12

Comparing values of correlation coefficient r (with a range of


to +1) from a sample of size n and the corresponding probabililty
of its outcome if no relationship in the population as a whole

1
0.95
ABOVE the .05 line: relationship
NOT statistically significant at the
0.9
0.05 level
0.85

Probability of this outcome (based on the F-test)

r
F=
sign F
n

0.8
0.75
0.7
0.65
0.6
0.55
0.5
0.45

BELOW the .05 line:


relationship is statistically
significant at the 0.05 level

0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
-1

-0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1

0.1

Value of r

0.2

0.3

0.4

0.5

0.6

0.7

0.8

11
##
00

a range of -1
probabililty
as a whole

Note: as r gets farther


away from zero, both the
strength of the
relationship and the
statistical significance
increase.
Also: as the sample
size (n) increases, the
statistical significance
increases.
As a result: If you want
to demonstrate a
statistically bivariate
relationship, you will
need either an r value
that is far from zero
and/or a large sample

LOW the .05 line:


ationship is statistically
nificant at the 0.05 level

red line:
critical
value: .05
0.8

0.9

D
D
D
D
D
D
D
D
D
E
E
E
E
E
E
E

20%
52%
46%
38%
24%
27%
26%
50%
30%
11%
10%
14%
10%
11%
41%
22%

$
$
$
$
$
$
$
$
$
$
$
$
$
$
$
$

60,928
49,790
51,994
54,799
59,676
58,654
58,771
50,643
57,515
59,173
59,437
58,202
59,436
59,076
48,778
55,147

0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.1
0.1
0.1
0.1
0.1
0.1
0.1

68000
68000
68000
68000
68000
68000
68000
68000
68000
63000
63000
63000
63000
63000
63000
63000

annual hhd income

correlation

-0.43

Scatterplot; unit of anal


yes: cities with more transit have
$57,500

mean annual hhd income

32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

annual hhd income

case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

percent of
hhd trips
"The ecological fallacy consists in thinking that re
using public
invisible
invisible for groups necessarily hold for individu
observed
factor
constant1
0 range 0 Fallacy"
to 1
city
transit
hhd income factor
source:
"Ecological
Inference and the Ecological
0 to 1 of California,
A
64% $
57,748
0.5 A. Freedman
80000 constant2
0 range
David
(Department of Statistics,
University
A
61% $
58,767
0.5
80000
3 range
high
to 1 & Behaviora
Prepared
for theconstant3
International Encyclopedia
of the
Social
A
52% $
61,738
0.5
80000
Technical
Report No. 549, 15 October 1999.
pdf0.5
file accessed
A
58% $
59,687
80000 Jan. 13, 2002, http://www.stanford.edu/class/ed260/f
A
69% $
55,707
0.5
80000
A
67% $
56,473
0.5
80000
A
75% $
53,775
0.5
80000
A
57% $
60,074
0.5
80000
A
81% $
51,761
0.5
80000
A
67% $
56,494
0.5
80000
B
66% $
51,894
0.4
75000
B
57% $
55,087
0.4
75000
Scatterplot; unit of analysis: in
B
45% $
59,280
0.4
75000
one seesunit
patterns
in the indivi
Scatterplot;
of analysis:
indiv
B
50% $
57,581
0.4
75000
This
example:
B
53% $
56,602
0.4
75000
$65,000
B
53% $
56,337
Is there
a relationship
between0.4use75000
of
$65,000
B
68%
$
51,028
0.4
75000
public transit and hhd income?
B
60% $
0.4
75000
$60,000
$60,000
Aggregate
data
(unit53,913
of analysis:
B
63% $
52,819
0.4
75000
city):
positive
relationship
B
67% $
51,662
0.4
75000
$55,000
$55,000
Individual
data
(unit 53,907
of analysis:
C
46% $
0.3 hhd):
70000
C
43% $
54,781
0.3
70000
negative
relationship
$50,000
$50,000
C
57% $
49,885
0.3
70000
C
58% $
49,842
0.3
70000
$45,000
DANGER:
making
an
ecological
C
38% $
56,742
0.3
70000
$45,000
fallacy
-- using
data
C
51% $ aggregate
52,198
0.3to 70000
$40,000
C
39% $
56,205
0.3
70000
0%
10%
20%
30%
$40,000
C
40% $
56,016
0.3
70000
0%
10%
20%
30%
Percent of hhd trips
C
35% $
57,779
0.3
70000
Percent of hhd trips by p
C
42% $
55,416
0.3
70000
D
48% $
51,254
0.2
68000

$57,000
$56,500
$56,000
$55,500
$55,000

E
E
E
F
F
F
F
F

29%
20%
19%
3%
6%
17%
30%
1%

$
$
$
$
$
$
$
$

52,822
55,841
56,426
58,852
57,908
53,971
49,420
59,539

0.1
0.1
0.1
0
0
0
0
0

63000
63000
63000
60000
60000
60000
60000
60000

56
57
58
59
60

F
F
F
F
F

3%
13%
20%
10%
8%

$
$
$
$
$

59,072
55,532
53,016
56,452
57,257

0
0
0
0
0

60000
60000
60000
60000
60000

percent of
hhd trips
using public
transit
65%
58%
45%
36%
19%
11%

$
$
$
$
$
$

AGGREGATED DATA

CITY
A
B
C
D
E
F

hhd income
57,222
54,620
54,277
55,402
56,434
56,102

mean annual hhd inc

48
49
50
51
52
53
54
55

$55,000
$54,500
$54,000
0%

10%

20%

Percent of hhd trips

correlation

-0.12

s in thinking that relationships


ly hold for individuals..."

ogical Fallacy"
s, University of California, Berkeley)
ange
high
to 1 & Behavioral Sciences,
of the
Social

stanford.edu/class/ed260/freedman549.pdf

lot; unit of analysis: individual household


esunit
patterns
in the individual
by cities
of analysis:
individualdata
household

30%
30%

40%
40%

50%
50%

60%
60%

70%
70%

Percent of hhd trips by public transit


Percent of hhd trips by public transit

80%
80%

Scatterplot; unit of analysis: cities


with more transit have higher income, but...

90%
90%

20%

30%

40%

50%

Percent of hhd trips by public transit

60%

70%

Multiplier

Multiplier: the relationship between local and export employmen


R.O.W.
(rest of world)

Twin Peaks

Revenues from Timber

Local
Services

Timber

Export Jobs (Basic) + Non-Export Jobs (NonBasic) = TOTAL JOBS

Imagine a simple economy of Twin Peaks, an isolated timber economy


Service Jobs
Timber Jobs (export)
Total Jobs

2,000
1,000
3,000

Mutliplier

3.0

So, can use a multipler to estimate the impact of a change in basic em


(Up or down) on total employment.
[assumes a simple, linear relationship]

Page 37

Multiplier

Change in Basic Employment


Change in Total Employment
100
300
500
1500
1200
3600
-100
-300
-500
-1500

Page 38

Multiplier

port employment

TOTAL JOBS

d timber economy:

ange in basic employment

Page 39

Location Quotients

Location Quotient (LQ) - a measure a relative local employment concentration in a s


Used to also estimate local vs. export (I.e., non-basic vs. basic) employment
(Can also use to help understand the level of industrial diversification in a local eco

EXAMPLE: You are given data for the town of Icarus in the far-away country of Daedalus
Icarus Daedalus
Population
20,000 2,500,000
Annual Gross Per Capita Income
$17,000
$25,000
Total Employment
10,000 1,000,000
Agricultural Emp.
1,000
50,000
Govt Employment
300
100,000
Private Service Emp.
4,000
500,000
Airplane Manufacturing Emp.
700
10,000
Non-airplane Manufacturing Emp.
1,000
200,000
All Other Employment
3,000
140,000

Based upon this data, which sectors of the Icarus economy likely are exporting goods or s
Estimate the share of each sector's employment that could be due to exports
(and explain how you did these estimates and the name of the technique(s) you used).
Finally, explain why these estimates may not be accurate.

Icarus
Population
20,000
Annual Gross Per Capita Income
$17,000
Total Employment
10,000
Agricultural Emp.
1,000
Govt Employment
300
Private Service Emp.
4,000
Airplane Manufacturing Emp.
700
Non-airplane Manufacturing Emp.
1,000
All Other Employment
3,000

Percent of Total Employment


Daedalus
Icarus Daedalus
2,500,000
$25,000
1,000,000
100%
100%
50,000
10%
5%
100,000
3%
10%
500,000
40%
50%
10,000
7%
1%
200,000
10%
20%
140,000
30%
14%

Take the locatio quotients to estimate amount of export jobs (if any):
TOTAL JOBS = LOCAL JOBS + EXPORT JOBS

Page 40

Location Quotients

Total Jobs Local Jobs Export Jobs


Total Employment
10,000
Agricultural Emp.
1,000
500
500
Govt Employment
300
300
Private Service Emp.
4,000
4,000
Airplane Manufacturing Emp.
700
100
600
Non-airplane Manufacturing Emp.
1,000
1,000
All Other Employment
3,000
1,400
1,600
TOTAL
10,000
7,300
2,700
5,000
4,500
4,000
3,500
3,000

Export Jobs
2,500

Local Jobs

2,000
1,500
1,000
500
-

Agricultural Emp.

Private Service Emp.

Page 41

Non-airplane
Manufacturing Emp.

Location Quotients

t concentration in a specific sector


employment
ication in a local economy)

ei
LQ e
Ei
E

ay country of Daedalus.

ei = local employment in sector i


e = total local employment
Ei = national employment in sector i
E = total national employment

e exporting goods or services outside the community?


o exports
nique(s) you used).

LOCATION QUOTIENT: Ratio of Local to National Percentages

2.00 export industry


0.30
0.80
7.00 export industry
0.50
2.14 export industry

Page 42

Location Quotients

xport Jobs

ocal Jobs

on-airplane
ufacturing Emp.

Page 43

Gravity Model

Gravity Model
Using Newton's Universal Law of Gravitation for social processes

m1m2
F G( 2 )
r

m1

where F = force of gravity between m1 and m2


G is the universal constant
r is the distance between m1 and m2

To convert to society:
F becomes the interaction between m1 and m2 (e.g., traffic, trade, etc.
m1 and m2 become population (or employment, or GDP, etc.)
r is distance
G is a constant
Example:
G
0.001
0.001
0.001
0.001
0.001

Population
m1
10,000
20,000
20,000
10,000
10,000

m2
20,000
20,000
40,000
20,000
20,000

Page 44

Distance Interaction (e.g., car trip


r
F
20
500
20
1,000
20
2,000
10
2,000
5
8,000

Gravity Model

cial processes

m1

m2

g., traffic, trade, etc.)


GDP, etc.)

raction (e.g., car trips/day)

Page 45

3 Growth Rates

Three Growth Rates

Pn P0 (1nr)

Simple, Linear Growth (e.g., average annual growth)

Discrete Compounded Growth (e.g., annual)

Pn P0 (1r)

Compounded continuously (with exponent)


[almost the same results as discrete]

Pn P0e

rn

where e = 2.7183....
remember than ln (e) = 1

A Comparison of these Three Growth Patterns

Po
100
100
100
100
100
100
100
100
100
100

n
0
1
2
3
4
5
6
7
8
9

r
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05

Linear
Pn
100
105
110
115
120
125
130
135
140
145

Page 46

Discrete
Compounded
Pn
100.0
105.0
110.3
115.8
121.6
127.6
134.0
140.7
147.7
155.1

Continuously
Compounded
Pn
100.0
105.1
110.5
116.2
122.1
128.4
135.0
141.9
149.2
156.8

3 Growth Rates

100
100
100
100
100
100
100
100
100

10
15
20
25
30
40
50
75
100

0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05

150
175
200
225
250
300
350
475
600

162.9
207.9
265.3
338.6
432.2
704.0
1146.7
3883.3
13150.1

164.9
211.7
271.8
349.0
448.2
738.9
1218.2
4252.1
14841.3

16000.0

14000.0

12000.0

10000.0

8000.0

6000.0

4000.0

2000.0

0.0
0

20

40

Page 47

60

80

3 Growth Rates

Page 48

3 Growth Rates

Continuously Compounded
Discrete Compounded
Linear

100

120

Page 49

Cost-benefit

Cost-Benefit Thinking
TWO CHALLENGES:
1. how to sum up all the costs and benefits.
2. How to deal with time: discounting. --->>> time preferences.
Present value (PV) = B(t) / (1+r)t
where B(t) is the benefit in year t, r is the discount rate.
Net Present Value (NPV) = (B(t) - C(t)) / (1+r)t
where B is benefits and C is costs.

why is money worth less in the future?


1 people are impatient (and mortal)
2 opportunity cost of investing the capital elsewhere.
The argument for discounting is referred to as the 'marginal productivity of capital'
AND THE TRICK IS TO INCLUDE ENVIRONMENTAL COSTS AND BENEFITS. [99]
if (B(t) - C(t)E(t)) * (1+r)t > 0 , then the project is a net good project.
The Problems with Discounting for the Environment
a way to shift heavy costs to future generations.

note: it is hard to shift capital costs to future generations, since lenders want payba

1 actual damage may be far larger than the discounted value.


2 long-term benefits are also not strongly valued (even though today's action
3 will lead to greater exhaustion of exhaustible resources, esp. with a high d

However: "There is, in fact, no unique relationship between high discount rates an
How to select a discount rate: simply the rate of economic growth for a nation?
Taking sustainability into account:
Page 50

Cost-benefit

EX: "require that any environmental damage be compensated by projects specifica

note how the r can really change the outcome, especially if costs and benefits patte
EXAMPLE

t
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Benefit Cost
Net Benefit
B(t)
C(t)
B(t) - C(t)
0 1,000,000 -1,000,000
100,000
100,000
0
110,000
100,000
10,000
120,000
100,000
20,000
130,000
100,000
30,000
140,000
100,000
40,000
150,000
100,000
50,000
160,000
100,000
60,000
170,000
100,000
70,000
180,000
100,000
80,000
190,000
100,000
90,000
200,000
100,000
100,000
210,000
100,000
110,000
220,000
100,000
120,000
230,000
100,000
130,000
240,000
100,000
140,000
250,000
100,000
150,000
260,000
100,000
160,000
270,000
100,000
170,000
280,000
100,000
180,000
290,000
100,000
190,000

discoun discoun
t rate
t rate
r
(1+r)^t
0.02
1.00
0.02
1.02
0.02
1.04
0.02
1.06
0.02
1.08
0.02
1.10
0.02
1.13
0.02
1.15
0.02
1.17
0.02
1.20
0.02
1.22
0.02
1.24
0.02
1.27
0.02
1.29
0.02
1.32
0.02
1.35
0.02
1.37
0.02
1.40
0.02
1.43
0.02
1.46
0.02
1.49

Compare front-loading and backloading costs


and changing discount rates

1,500,000

Page 51

Cost-benefit

Benefit

1,000,000

Cost
Cumulative Net Present Value (NPV)

Net Benefit

500,000

0
1

10

11

12

13

-500,000

the year when the green line crosses over


axis (where y=0) is the year when the cumu
impact shifts from a net cost to a net benef

-1,000,000

-1,500,000
Year

Page 52

Cost-benefit

(Bt Ct )
NPV
t
t 0 (1 r)
n

preferences.

lsewhere.

Bt benefits in year t
Ct costs in year t
t year
NPV net present value (benefits adjusted for cost)
r discount rate (e.g.,6% per year or 0.06)

marginal productivity of capital' argument, the use of the word 'marginal' indicating that it is

COSTS AND BENEFITS. [99]


is a net good project.

rations, since lenders want paybacks. e.g., 30 year loans. but it is easier to shift non-mone

discounted value.
alued (even though today's actions are required for those 50 years from now to enjoy them).
ble resources, esp. with a high discount rate.

p between high discount rates and environmental deterioration." [103]

conomic growth for a nation?

the interest rate?

Page 53

[104]

Cost-benefit

ompensated by projects specifically designed to improve the environment." [106]

ecially if costs and benefits patterns vary over time.

(see graph).

Net Benefit
discounted for
present value
Cumulative Net Present Value (NPV)
(B(t) - C(t)) / (1+r)t (B(t) - C(t)) / (1+r)t
-1,000,000
-1,000,000
0
-1,000,000
9,612
-990,388
18,846
-971,542
27,715
-943,827
36,229
-907,597
44,399
-863,199
52,234
-810,965
59,744
-751,221
66,940
-684,280
73,831
-610,449
80,426
-530,023
86,734
-443,288
92,764
-350,525
98,524
-252,001
104,022
-147,979
109,267
-38,712
114,266
75,554
119,027
194,581
123,558
318,139
127,865
446,003

Page 54

Cost-benefit

Net Present Value (NPV)

13

14

15

16

17

18

19

20

21

the year when the green line crosses over the x


axis (where y=0) is the year when the cumulative
impact shifts from a net cost to a net benefit.

Page 55

Cost-benefit

nal' indicating that it is the productivity of additional units of capital that is relevant. [99]

sier to shift non-monetary costs to the future, since the lenders are around to complain! the

m now to enjoy them). ie., they should not be discounted like capital.

Page 56

Cost-benefit

ent." [106]

Page 57

Cost-benefit

Page 58

Cost-benefit

is relevant. [99]

und to complain! they don't have a contractual agr

Page 59

gini
0.386
RANGE: 0 (PERFECT EQUALITY; 1 PERFECT INEQUALITY)
n

20

Person "i"
Income calculated
calculated
calculated
"i"
X(i)
CULULATIVE
x(i)
X(i)
x(i)*i
1
1,000
0.003
0.003
0.00
2
3,000
0.009
0.013
0.02
3
4,000
0.013
0.025
0.04
4
5,000
0.016
0.041
0.06
5
6,000
0.019
0.059
0.09
6
8,000
0.025
0.084
0.15
7
8,000
0.025
0.109
0.18
8
9,000
0.028
0.138
0.23
9
11,000
0.034
0.172
0.31
10
12,000
0.038
0.209
0.38
11
14,000
0.044
0.253
0.48
12
17,000
0.053
0.306
0.64
13
19,000
0.059
0.366
0.77
14
21,000
0.066
0.431
0.92
15
23,000
0.072
0.503
1.08
16
27,000
0.084
0.588
1.35
17
29,000
0.091
0.678
1.54
18
32,000
0.100
0.778
1.80
19
33,000
0.103
0.881
1.96
20
38,000
0.119
1.000
2.38
SUM
320000
1
14.36
mean
0.05
Insert income amounts for each of
the 20 people here -- be sure to
arrange from LOW to HIGH
Do NOT enter data in any of the
other columns -- those are
calculated.
Try entering both a fairly equal
income distribution -- and then try a
broadly unequal one.

GINI COEFFICENT
CUMULATIVE X
1.000

LINE OF EQUALITY
0.900
0.800
0.700
0.600
0.500
0.400
0.300
0.200
0.100

0.100
0.000

10

the LORENZ CURVE -- see how


the curve deviates from the line
of equality as the gini coefficient

source of formula and text: U.S. Census Bureau. The


Changing Shape of t he Nations Income Distribution, 19471998, Curren tPopulationReport, By Arthur F. Jones Jr.and
Daniel H. Weinberg, (Issued June 2000)
http://www.census.gov/prod/2000pubs/p60-204.pdf

MEASURES OF
INEQUALITY/DISPARITY:
how to calculate a Gini
Coefficient

GINI COEFFICENT
CUMULATIVE X

COEFFICENT
MULATIVE X
1.000

OF EQUALITY

GINI COEFFICENT
CUMULATIVE X
1.000

LINE OF EQUALITY

LINE OF EQUALITY

0.900

0.900

0.800

0.800

0.700

0.700

0.600

0.600

0.500

0.500

0.400

0.400

0.300

0.300

0.200

0.200

0.100

0.100

CURVE -- see how


viates from the line
s the gini coefficient

13

11

19

17

15

13

11

20

0.000
1

0.000
15

I COEFFICENT
UMULATIVE X

NE OF EQUALITY

19

17

15

13