Você está na página 1de 3

Serik Sagitov, Chalmers Tekniska Hgskola, December 21, 2012 o

Chapter 7. Survey sampling


1 Random sampling
Population = set of elements {1, 2, . . . , N } labeled by values {x1 , x2 , . . . , xN }. PD = population distribution of x-values. A single value of a random element X PD. Types of x-values (data): continuous, discrete, categorical, dichotomous (2 categories). General population parameters population mean = E(X), population standard deviation = Var(X), population proportion p (dichotomous data). Two methods of studying PD and population parameters: enumeration - expensive, sometimes impossible, random sample: n random observations (X1 , . . . , Xn ). Randomisation is a guard against investigators biases even unconscious IID sample, sampling with replacement: Independent Identically Distributed observations. 2 Simple random sample, sampling without replacement: negative dependence Cov(Xi , Xj ) = N1 . Proof: X1 + . . . + XN = const. Use the addition rule of variance. Example. Students heights: height in cm = discrete data, gender = dichotomous data.

Point estimates

Population parameter estimation uses a point estimate = (X1 , . . . , Xn ). around unknown : dierent values observed for dierent samples. Sampling distribution of Mean square error 2 2 E( )2 = E() + = systematic error, bias, lack of accuracy; = random error, lack of precision. E() Desired properties of point estimates: is an unbiased estimate of , if E() = , 2 is consistent, if E( ) 0 as n . Sample mean X =
X1 +...+Xn n

is an unbiased and consistent estimate of 2 /n 2 (1 n


n1 N 1

Var(X) =

n1 ) N 1

if IID sample if simple random sample


n N

Finite population correction 1

can be neglected if sample proportion

is small.

Dichotomous data: P(Xi = 1) = p, P(Xi = 0) = q, = p, 2 = pq, population proportion p. Sample proportion p = X is an unbiased and consistent estimate of p.

1 Sample variance s2 = n1 (Xi X)2 , where s is the sample standard deviation. Other formulae: 1 n 2 2 s2 = n1 (X 2 X 2 ), where X 2 = n (X1 + . . . + Xn ), n dichotomous data case s2 = n1 pq . Sample variance is an unbiased estimate of 2

E(s2 ) =

2 if IID sample 2 N N 1 if simple random sample.


s n

Standard errors of X and p for simple random sample: sX =

n , N

sp =
pq n1

pq n1

n . N

Standard errors for IID sampling sX =

s , n

sp =

Condence intervals

2 a Approximate sampling distribution X N(, ) n P(X zsX < < X + zsX ) = P(z < X < z) 2(1 (z)) sX Approximate 100(1)% two-sided CI for and p: X z/2 sX and p z/2 sp , if n is large

100(1)% 68% 80% 90% 95% 99% 99.7% z/2 1.00 1.28 1.64 1.96 2.58 3.00 The higher is condence level the wider is the CI, the larger is sample the narrower is the CI. 95% CI is a random interval: out of 100 intervals computed for 100 samples Bin(100,0.95) N(95,(2.18)2 ) will cover the true value.

Stratied random sampling

Population consists of L strata with known L strata fractions W1 + . . . + WL = 1 and unknown strata means l and standard deviations l Population mean = W1 1 + . . . + WL L , population variance 2 = 2 + Wl (l )2 , 2 2 average variance 2 = W1 1 + . . . + WL L , average standard deviation = W1 1 + . . . + WL L . Stratied random sampling: take L independent samples from each stratum with sample means X1 , . . . , XL Stratied sample mean: Xs = W1 X1 + . . . + WL XL Xs is an unbiased and consistent estimate of : E(Xs ) = W1 E(X1 ) + . . . + WL E(XL ) = . Sample variance s2s = (W1 sX1 )2 + . . . + (WL sXL )2 X Approximate CI for : Xs z/2 sXs

1 Pooled sample mean Xp = n (n1 X1 + . . . + nL XL ), polled sample size n = n1 + . . . + nL . L E(Xp ) = n1 1 + . . . + nn L = + ( nl Wl )l , n n nl bias(Xp ) = ( n Wl )l .

Example. Students heights: L = 2, W1 = W2 = 0.5, compare Xs with Xp .


l Optimal allocation: nl = n Wl , Var(Xso ) =

1 n

Xso minimizes standard error of Xs . Weakness: usually unknown l and . Proportional allocation: nl = nWl , Var(Xsp ) =
1 n

Compare three unbiased estimates of : Var(Xso ) Var(Xsp ) Var(X). Variability in l across strata:
1 1 Var(Xsp ) Var(Xso ) = n ( 2 2 )= n

Wl (l )2 .

Variability in means l across strata:


1 Var(X) Var(Xsp ) = n ( 2 2 ) = 1 n

Wl (l )2 .

Você também pode gostar