Você está na página 1de 3

Sampling With and Without Replacement and the Hypergeometric and Binomial Distributions Jonathan D.

Cryer Suppose that a population consists of N members. Each member is either a success (a red marble or a vote for candidate A or ...) or a failure. Suppose there are r successes or proportion of successes p = r/N in the population. To make inferences about r or p we take a simple random sample of size n without replacement. Interest centers here on the distribution of the number of successes, Y, in the sample. We will consider two different ways of carrying out the sampling. Note that sampling without replacement is appropriate since it is inefficient to ever collect redundant data from the same member of the population. Suppose we sample by randomly scooping out a subset of size n ensuring that all subsets of size n have an equal chance of being selected. With this method there is no first member drawn, no second member, etc. (With this sampling method it may also be argued that each member of the population has an equal chance of being selected. However, each member being equally likely is not sufficient to ensure that all subsets of size n are equally likely.) Since all subsets of size n are equally likely we have the Hypergeometric Distribution for Y r N r k n k Pr ( Y = k ) = ------------------------------- for max ( 0, n + r N ) k min ( r, n ) N n Alternatively, we could sample sequentially without replacement. Let Yi be the number of successes on the ith draw (trial). The Yi are binary valued since on each draw we get either a success (Yi = 1) or we dont (Yi = 0). The Yi are not independent so the trials are not Bernoulli trials. Clearly Pr(Y1=1) = r/N = p = the proportion of successes in the population. Now consider Y2. We have r1 Pr ( Y 2 = 1 Y 1 = 1 ) = -----------N1 r Pr ( Y 2 = 1 Y 1 = 0 ) = -----------N1 According to the Law of Total Probability Pr ( A ) = Pr ( A B )Pr ( B ) + Pr ( A B )Pr ( B ) . So Pr ( Y 2 = 1 ) = Pr ( Y 2 = 1 Y 1 = 1 )Pr ( Y 1 = 1 ) + Pr ( Y 2 = 1 Y 1 = 0 )Pr ( Y 1 = 0 ) (r 1) r r (N r) = ----------------- --- + ----------------- ---------------(N 1) N (N 1) N r(N 1) = --------------------( N 1 )N r = --N = p
c c

By induction we can show that Pr ( Y i = 1 ) = r N = p for all trials i. The (unconditional) success probability is the same for all trials just like in Bernoulli trials! These are not Bernoulli trials however since the trials are dependent. (Note that if we were sampling with replacement the trials would be Bernoulli trials and Y would have a Binomial Distribution with n trials and success probability p = r/N.) The distribution of Y is the same whether we sample randomly by scooping out a subset of members or sample one-at-a-time without replacement. Thinking one-at-a-time permits us to calculate the mean and variance of Y fairly easily. Compare the following calculation with finding these moments directly from the Hypergeometric probability function. First, E ( Y i ) = 0 ( 1 p ) + 1p = p = r N for all trials. Thus, since Y = Y1 + Y2 +...+ Yn, we have E ( Y ) = E ( Y 1 ) + E ( Y 2 ) + + E ( Y n ) = np = nr N just like the Binomial distribution. Finding Var(Y) is more difficult since the Ys are not independent. We have n Var ( Y ) = Var Yi = i = 1
n

i=1

Var ( Yi ) + 2 Cov ( Yi, Yj )


i<j

Now, since the Ys are binary and all have the same success probability p, Var(Yi) = p(1p) for all i. In general, Cov ( Y i, Y j ) = E [ ( Y i i ) ( Y j j ) ] = E ( Yi Y j ) i j To evaluate E(YiYj), we note that in this case YiYj must be either 0 or 1 since the Ys are binary. Thus E ( Y i Y j ) = 1Pr ( Yi Y j = 1 ) = Pr ( Y j = 1, Y j = 1 ) = Pr ( Y j = 1 Y i = 1 )Pr ( Y i = 1 ) (r 1) r - = --------------- --N1 N and (r 1) r r 2 Cov ( Y i, Y j ) = ----------------- --- --- ( N 1 ) N N r[N(r 1) r(N 1)] = ----------------------------------------------------( N 1 )N 2 r(N r) = ----------------------( N 1 )N 2 p(1 p) = ------------------N1 (A negative covariance and hence correlation makes sense since a success on one trial tends to reduce slightly the conditional probability of a success on another trial.)

Finally, there are n identical variances and n(n1)/2 identical covariances to add to obtain Var(Y). Putting this all together we have n(n 1) p(1 p) Var ( Y ) = np ( 1 p ) + 2 ------------------- ------------------- N 1 - 2 n1 = np ( 1 p ) 1 ------------ N 1 Note that np(1p) is the variance of the corresponding Binomial distribution. The factor 1 ( n 1 ) ( N 1 ) which multiplies the standard deviation of the Binomial to obtain the standard deviation for the corresponding Hypergeometric is called the finite population correction factor or fpc. (Caution: Some authors call 1 ( n 1 ) ( N 1 ) the fpc.) Note that if, as is usually the case, the sample size is much smaller than the population size, n N , the fpc will be close to 1 so that the standard deviation from a Binomial distribution gives an excellent approximation to the correct standard deviation of the Hypergeometric. A guideline given in many textbooks is that the sample size should not exceed 10% of the population size. Since 1 0.10 = 0.95 the fpc can be neglected in calculating the standard deviation for this situation. In fact, when n N , the Binomial distribution will give an excellent approximation to the correct Hypergeometric distribution. Very often we go one step further. If n is large but still n N , we use the Central Linit Theorem to approximate the distribution of Y with a normal distribution with mean np and variance np(1p). Equivalently, we approximate the distribution of the sample proportion, p = Y n by a normal distribution with mean p and variance p(1p)/n.

Você também pode gostar