Escolar Documentos
Profissional Documentos
Cultura Documentos
3. Select a sample. The choices for sampling designs for this class are:
• voluntary response (the only one not random, not the best)
• simple random sample
• stratified random sample
• multistage sample
• catch-and-release sample
7. Analyze your data using either exploratory data analysis (looking for
trends/relationships in the actual data) or formal statistical inference
(answering statistical questions with a known degree of confidence).
1
What can go wrong?
• Bias (response bias, nonresponse, undercoverage)
• Variability
• Poor experimental design (not using a control, not randomizing,
not replicating)
• Other (poor choice of sampling design, date of survey)
Vocabulary Terms
2
Major Idea: We are interested in one or more variables associated with
a population of units. Because it is impossible or too expensive to
measure the variables of interest on all the units in the population, we
only measure the variables on a subset or a sample of units. We use the
sample to draw conclusions about the population. To be useful,
however, the sample must represent the population.
Sampling Design
The design of the sample refers to the method used to choose the sample
from the population.
3
Stratified Random Sample: The population is first divided into groups
of similar units. A SRS is then selected from each of the groups.
4
Capture-Recapture Sample: A type of repeated SRS sampling that
biologists use to estimate the size of animal populations. This type of
sample may also be used by the government when estimating the
number of households in an area. Take a SRS from the population and
label them (or tag them). Later take a new SRS and find the percent of
this sample that were in the original sample. Assume the proportion
tagged in the second sample is equivalent to the proportion of the
population who were tagged in the original sample. The population size
can thus be estimated.
In order to make sure that a random sample is really random, you should
use either a Table of Random Digits or a statistical computer application
like SPSS to select the sample.
5
• The digit in any position in the list has the same chance of being
any one of the digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
• The digits in different positions are independent in the sense
that the value of one has no influence on the value of any other.
6
Steps for selecting a random sample using SPSS:
Enter all names in one column. Click on the column, then click Data >
Select Case. Select “Random Sample of Cases”. Click on the “sample”
button. Select “Exactly” and type in “4” cases from first “20”cases.
Click “Continue” and click “OK”.
The data editor will have a column of 0’s and 1’s. The units with 1’s
are the units selected into your sample:
Abel 1 Adams 0
Whit 0 Kan 1
John 0 Dritu 0
Woods 0 Haste 0
Williams 1 Howell 0
Fisher 0 Smith 0
Johns 0 Lu 0
Brown 0 Wilsey 0
South 1 Krintz 0
French 0 Hamilton 0
7
Nonresponse Bias: Nonresponse occurs when a selected unit either
cannot be contacted or refuses to cooperate. If the non-respondent
systematically differs with respect to the variable of interest compared to
those who respond, then sampling bias will be introduced into the study.
Example where the race or sex of the interviewer might result in biased
answers:
Other problems:
8
Advantages of Experiments:
• In principal, experiments can give good evidence for causation.
Observational studies are not as good at this.
• Experiments allow us to study the specific factors we are interested
in, while controlling the effects of lurking variables.
• Experiments allow us to study the combined effects of several
factors. Interactions between factors can be very important.
Disadvantage of Experiments:
• Experiments often lack reality.
Example:
A sports engineer is interested in determining the effects that speed
and air pressure have on throwing distance for his mechanical trainer
football throwing machine. Two speeds (40 mph, 55 mph) and three
air pressures (175 psi, 200 psi, 230 psi) were chosen for the study.
Treatments were randomly assigned to thirty footballs.
Experimental Unit: a football
9
Simplest type of experiment: apply a single treatment and observe the
response.
• This is ok in very controlled situations, but you may miss lurking
variables, especially if you are using living subjects.
• May miss confounding with the placebo effect: a patient responds
favorably to being treated, not to the treatment itself.
• Bias: the study systematically favors certain outcomes (if you
have no control group, your study will be biased towards finding
the new medicine effective).
• Lack of realism: if the subjects know they’re in an experiment,
they might not behave naturally during the treatment.
10
3. Replication of the experiment on many subjects reduces chance
variation in the results.
11
• All subjects receive both treatments. The order of the treatments
may be randomly assigned to see if the order of the treatments
matters.
Example - Do students retain information better if there is noise in the
background while they are studying? A group of 20 students are randomly
selected to participate in an experiment. Each will be put in a room where
they will read a list of 50 statements. After a break they will be given quiz
to see how much of the information they retained. Ten of the students will
read the statements in a silent room, ten in a room with background noise
piped in. The groups will switch places and repeat the process using the
opposite environment. Students are randomly assigned to a starting group.
Statistical Inference
In statistical inference we calculate a value from our sample and use that
value as an estimate for our population.
12
Sampling Variability:
Sampling variability represents the variation associated with the value of
the statistic that is generated by repeatedly selecting samples of the same
size, using the same probability sampling design from the population.
Goal:
We want to estimate a value from a population such as µ (parameter)
using a value we calculate from a sample x (statistic). Naturally we
want x to be as close to as possible. To achieve this we need x to
have low bias and low variability.
Variability describes how spread out the sampling distribution is for the
statistic. This spread is determined by the sampling design and the
sample size n. Larger samples have smaller variability—the population
size is not important to variability.
13
Think of the true population
parameter as the bull’s-eye
of a target and the dots
as our sample statistics.
14
Example:
The president of Purdue is interested in the average number of credit
hours taken by Purdue students. Some statistics students at Purdue are
asked to submit potential plans for gathering sample information.
Comment on the plans listed below. Specifically comment on bias and
variability.
Protocol A: Stand outside one of the residence halls and ask every third
person how many credit hours they are currently taking until you have
recorded 10 values.
Protocol D: Obtain a list of all students who have registered for at least
one class at Purdue and randomly select 10 students from the list.
Protocol E: Obtain a list of all students who have registered for at least
one class at Purdue and randomly select 100 students from the list.
Protocol F: Obtain a list of all students who have registered for at least
one class at Purdue and group the students on the list by whether they
live on or off campus. Then randomly select 50 students who live on
campus and 50 students who live off campus to survey.
Additional Problems:
15
16