Você está na página 1de 38

SAMPLING

Population Sample Variable Parameter Statistic

POPULATION
Population: the entire group of objects about which information is wanted. Population defined in terms what we want to generalize to. If we desire information about all US college students, that is our population. If we are interested in all Davidson students then that is the population. Important to define clearly the population of interest. This is the sampling frame

SAMPLE
Sample: part or subset of the population used to gain information about the whole. Key here is to draw a representative sample to legitimately generalize from the sample to the population of interest. Drawing a representative sample not easy, Focus here samples of people Can sample news stories, TV programs, countries whatever you are interested in.

VARIABLE
Variable: a characteristic of the people, places, things, events being sampled e.g., their PID, gender, ideology Measurable characteristics of the people you are interested in learning about. We have names for how we describe the effects of variables in samples and populations.

PARAMETER
Parameter: a numerical characteristic of the population. a fixed number whose value we typically do not know
e.g., the number of Republicans who like the President.

The parameter is unknown that is what you want to find out

STATISTIC
KEY:
Statistic (: a numerical characteristic of a sample that estimates the population mean

The value of a statistic is obtained from the sample KEY: Statistics vary from sample to sample. There are many different ways of sampling

SAMPLES VARY
Fact -- samples vary -- is critical to statistical inference. Important point: people who show up in a sample differ from sample to sample Sample statistics will vary from sample to sample. Whereas parameter is unknown, the number of people in the sample with some characteristic is known. That is the statistic; an estimate of the parameter.

SIMPLE RANDOM SAMPLE


Best is A Simple Random Sample (SRS) An SRS is a sample of n people chosen such that every one of them has the same chance of being chosen. Someone is or is not in the sample by the luck of the draw.

USES OF SAMPLES
Public Opinion Polls, such as the Gallup Poll, CBS/NYTimes, and NES designed to determine public opinion and explain voting behavior and policy preferences. National Public Opinion Surveys typically interview 1000 to 1500 people. Market Research. Aim to discover consumer preferences and product usage.

Nielsen Media Research develop ratings for TV programs, by type of person watching every program, the data used to price commercials. Draw a SRS to represent the 100+ million households with TVs. Sample about 4000 households using a people meter which records
the TV program they watch.

Song Royalties for Radio Play: Organization of music composers (ASCAP) collects royalties for song composers and performers by charging radio stations a fee to play music. ASCAP randomly samples and tapes about 60,000 hours of radio
broadcasts from across the country.

Tapes/CDs shipped to NY where monitors count number of plays, Pays composers and performers according to this count from
samples.

ALTERNATIVES TO SRS
The Decennial Census is required by U.S. Constitution. Population: about 100 million households. Not a sample -- attempt to canvas the entire population for basic information.

PROBLEMS:
Population too large, too expensive, too timeconsuming to collect timely information

Why do Republicans oppose Census Bureau sampling?


The 1990 Census was estimated to have missed 1.4% of American population: estimated to have missed 5.9% of Blacks; estimated to have missed 2 % Hispanics; mostly in inner cities.

COMMON PITFALLS OF SAMPLING:


Convenience Sampling: Look at whats readily available (e.g. man-in-the-street interviews). Convenience sampling is not representative. Problem: Selection Bias (self-selection) Not all units are equally likely to be sampled!

Example:
Based on a snowball method of sampling with volunteer respondents, the Kinsey Report found that 10-12% of males are homosexual; John Ganyon (Sex in America) replicated the study with a great effort to get all people sampled to complete the questionnaire. He found only about 2 - 3%.

COMMON PITFALLS OF SAMPLING:


Voluntary Response: When people agree to participate in a phone interview, return a mail questionnaire, answer an internet poll or call an 800 number. Problem: Response Bias Some units are more likely to yield a measurement than others.

EXAMPLE
Women and Love, by Shere Hite 1987 best seller, 100,000 questionnaires through various womens groups and womens magazines, asking about relations between the sexes. Responses show great deal of anger with men. 91% of divorcees said they had initiated the divorce and were happy to have gotten rid of the spouse. What is the problem?

SELF-SELECTION BIAS
4.5% of the questionnaires were returned Angry people more likely to make an effort to respond than others.

Characteristics of Good Samples


Simple Random Samples (SRS) guard against volunteerism, self-selection bias. allow some form of impersonal CHANCE to choose sample, not voluntary participation. Note SRS defined by method: every possible sample of people given same chance of being chosen. SRS not systematically biased. No over representation of some part of the population.

Ways to obtain SRS (1)


Physical Mixing: Identify every unit in the sampling frame with a tag or number, mix tags, draw one blindly. Next draw second unit; an SRS sample size n = 2. Continue to draw full sample. Every unit has same chance to be chosen. Example is the Lottery.

Ways to obtain SRS (2)


Table of Random Digits:
A list of 10 digits -- 0,1,2,3,4,5,6,7,8,9 having the following properties: The digit in any position in the list has the same chance of being in any one of the positions; and The digits in different positions are independent in the sense that the value of one has no influence on the value of any other.

Using Table of Random Digits


Characteristics of Random Number Table: Any pair of digits has the same chance of being drawn as any other pair. Any triplet of digits has the same chance of being drawn as any other triplet for any size sample.

ANN LANDERS
Asked readers: If you had it to do over again would you have children? Received 10,000 responses, almost 70% saying NO! What is the Problem?

Example of voluntary responding and therefore the results are untrustworthy.

NEWSDAY POLL
Newsday commissioned a nationwide SRS poll of n = 1373 parents found that 91% would have children again! Which figure is correct?

HALLMARK OF A GOOD SRS


SRS has no bias: In the Newsday poll every one of the 55 million families had the same chance of being interviewed, in the Ann Landers poll, angry and sad parents were agitated enough to respond.

WHY BE MORE CONFIDENT IN NEWSDAY POLL?


Why should we rely on the responses of n = 1373 people to make a generalization about 55 million when Ann Landers had a sample of 10,000? Interested in generalizing to some characteristic of the population, here the percent of families who would or would not have children again.

In the Newsday poll the estimate is based on that fraction of the 1373 people in the SRS who said YES. This statistic is . If 1249 of this sample of 1373 were to say Yes, then The sample estimate of the population parameter is = 91%. This is an estimate of the unknown population parameter.

SAMPLING VARIABLILITY
Note that another SRS of n = 1373 people would almost certainly yield a somewhat different estimate. Predictable patterns arise from repeated random sampling If we can determine what those patterns are, we can say how likely it is that we will get a good or bad sample

Sampling Variability Allows Us


to determine how much the sample statistic is likely to vary from the population proportion owing to sampling variability. This is the key to making inferences from samples to populations.

How good is the estimate?


Population Parameter = Sample Statistic + Random Error

It Depends on how much Sampling Error there is!

How Much Sampling Error is There?


Variability Random Sampling Error = ---------------Sample Size More Variability increases Sampling Error Large Sample Size decreases Sampling Error This makes up key components of Standard Error Formula

SRS is difficult to do
Often a sampling frame a list of every person in the population is not available. Expensive to send interviewers to remote parts of the country when that respondent is randomly selected.

ALTERNATIVE TO SRS SAMPLING MULTI-STAGE SAMPLING


Randomly select a sample of counties Randomly select townships or wards from within each county From aerial map randomly select a sample of small areas such as blocks Finally randomly select households from the sample of blocks Interview one adult from the selected household

Advantages to Multistage sampling design


You do not need a list of all households in the US, only a list of households in the small areas selected (This information is readily available from Town Halls and County Courts). Interview sites are clustered in smaller areas. Hire local interviewers.

Telephone Surveys
Typically: 1. area codes are randomly selected from a list of all area codes, e.g., 516, 212 2. then exchanges are randomly selected from each area code, e.g., 751 3. then the last four digits are randomly generated.

Stratified Sampling
When we are interested in comparing special sub populations we use stratified samples of counties or telephone exchanges. Stratified sampling useful when the sub populations are too small to show up in great numbers in a national SRS. E.g. Muslims in U.S. where an SRS of 1500 would only include 2-3 Muslims.

METHOD FOR CREATING STRATIFIED SAMPLE


1. Divide sampling frame into groups, which are areas, called strata, where sub populations of interest are known to be concentrated (by the census). 2. Take a separate SRS in each stratum 3. Combine strata to make up your stratified sample. Stratified sampling used to gather data on public opinion minorities because there are too few in an SRS of 1500 people to do in-depth analyses.

Você também pode gostar