Você está na página 1de 16

CHAPTER 3 – EXPERIMENTAL AND SAMPLING DESIGN

Overview of how to answer a research question:

1. Pick a specific question you want to answer.

2. Decide on your population.

3. Select a sample. The choices for sampling designs for this class are:
• voluntary response (the only one not random, not the best)
• simple random sample
• stratified random sample
• multistage sample
• catch-and-release sample

4. Decide whether to conduct an observational study or an experiment?


If observational study, just state the sampling design.
If experiment, the choices for experimental designs for this class are:
• completely randomized design
• block design
• matched pairs

5. Choose your response variable and your explanatory variables.


Decide on your treatments (for experiments).

6. Collect the data.

7. Analyze your data using either exploratory data analysis (looking for
trends/relationships in the actual data) or formal statistical inference
(answering statistical questions with a known degree of confidence).

8. State your conclusions.

1
What can go wrong?
• Bias (response bias, nonresponse, undercoverage)
• Variability
• Poor experimental design (not using a control, not randomizing,
not replicating)
• Other (poor choice of sampling design, date of survey)

Causation is not the same thing as association! Is the relationship


between your explanatory variables based on:
♦ Causation
♦ Confounding
♦ Common response

Principles of Ethical Experiments


 Planned studies should be reviewed by a board to protect subjects
from harm.
 All subjects must give their informed consent before data are
collected.
 All individual data must be kept confidential. Only summaries can
be made public. (Anonymity is not the same as confidentiality.)

Vocabulary Terms

Unit: An individual person, animal or object upon which the variable of


interest (called the response variable) is measured. Units are called
“individuals” when they refer to people.

Population: The entire group of units or individuals about which we


desire information.

Census: An attempt to contact every individual in the entire population.

Sample: The part of the population selected to be measured or observed


in order to gather data for analysis.

2
Major Idea: We are interested in one or more variables associated with
a population of units. Because it is impossible or too expensive to
measure the variables of interest on all the units in the population, we
only measure the variables on a subset or a sample of units. We use the
sample to draw conclusions about the population. To be useful,
however, the sample must represent the population.

Sampling Design

The design of the sample refers to the method used to choose the sample
from the population.

There are two approaches to selecting a sample from a population:

Voluntary Response Sample: A sample which consists of people who


choose themselves by responding to a general appeal. (Also called Non-
random or Convenience sampling.)

Random or Probability-Based Sampling: A sample that is selected in


such a way that each unit in the population has a non-zero chance of
being chosen. (SRS, Stratified Random Sample, Multistage Sample,
Capture-Recapture)

Types of Random Sampling Designs

Simple Random Sample (SRS) of size n: A sample that is selected from


the population in such a way that every set of n units has an equal
chance of being the selected sample. (We use SPSS or a random
number table to select the sample.)

3
Stratified Random Sample: The population is first divided into groups
of similar units. A SRS is then selected from each of the groups.

Multistage sample: A sample in which successively smaller groups


within the population are selected in stages. It is typically used when
our population is so large that it is difficult or impossible to get a list of
all of the units in the population. Here you start by splitting the
population into groups and randomly selecting a number of the groups.
You can split these selected groups again using another variable.
Finally, when you have the number of units down to a manageable size,
you select a SRS of units from each of the groups.

4
Capture-Recapture Sample: A type of repeated SRS sampling that
biologists use to estimate the size of animal populations. This type of
sample may also be used by the government when estimating the
number of households in an area. Take a SRS from the population and
label them (or tag them). Later take a new SRS and find the percent of
this sample that were in the original sample. Assume the proportion
tagged in the second sample is equivalent to the proportion of the
population who were tagged in the original sample. The population size
can thus be estimated.

Number in original sample Number tagged in second sample


Total population Total number in second sample

In order to make sure that a random sample is really random, you should
use either a Table of Random Digits or a statistical computer application
like SPSS to select the sample.

Table of Random Digits:


A table of random digits is a list of digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
that has the following properties:

5
• The digit in any position in the list has the same chance of being
any one of the digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
• The digits in different positions are independent in the sense
that the value of one has no influence on the value of any other.

Steps for using the Table of Random Digits:


1. Assign a number to each unit in the population. Make sure all
assigned numbers that have the same number of digits. (If you have
less than 10 items, use single digit numbers 0 through 9. If you have
11 to 100 items, use two digit numbers starting with 00. etc.)
2. Look at the Table of Random Digits (in your textbook or under the
Additional Resources tab of the course website).
3. Draw a line under every group of digits. The number of digits in the
groups should match the number of digits used in number 1 above.
4. The first unique, non-repeated numbers that match those assigned to
units in the population determine the units selected to be in the
sample.

Example: From the population below, select a simple random sample of


n=4 using Line 124 from the Table of Random Digits. (I have typed
part of the table below.)

Abel (00) Fisher (01) Adams (02) Smith (03)


White (04) Johns (05) Kan (06) Lu (07)
Johns (08) Brown (09) Dritu (10) Wilsey (11)
Woods (12) South (13) Haste (14) Krintz (15)
Williams (16) French (17) Howell (18) Hamilton (19)

(Line 124) 71035 09001 43367 49497


72719 96758 27611 91596

Individuals selected to be in the sample:

6
Steps for selecting a random sample using SPSS:
Enter all names in one column. Click on the column, then click Data >
Select Case. Select “Random Sample of Cases”. Click on the “sample”
button. Select “Exactly” and type in “4” cases from first “20”cases.
Click “Continue” and click “OK”.

The data editor will have a column of 0’s and 1’s. The units with 1’s
are the units selected into your sample:
Abel 1 Adams 0
Whit 0 Kan 1
John 0 Dritu 0
Woods 0 Haste 0
Williams 1 Howell 0
Fisher 0 Smith 0
Johns 0 Lu 0
Brown 0 Wilsey 0
South 1 Krintz 0
French 0 Hamilton 0

(Abel, Williams, South and Kan made it into our sample.)

Potential Sources of Bias in Sampling:

Sampling Bias occurs when the sample systematically favors certain


parts of the population over others.

Undercoverage bias: Occurs when the sample systematically excludes a


portion of the population. Note: If the excluded portion systematically
differs with respect to the response from those units that are available for
sampling, sampling bias will be introduced into the study.

Example – If you wanted to take a survey of people in Lafayette,


could you use the phone book to select your sample? Why not?

7
Nonresponse Bias: Nonresponse occurs when a selected unit either
cannot be contacted or refuses to cooperate. If the non-respondent
systematically differs with respect to the variable of interest compared to
those who respond, then sampling bias will be introduced into the study.

Example – If you were able to find phone numbers for everyone in


your sample group, what might happen if you called each one and
asked if they would take part in your survey?

Response bias: When the behavior of the respondent or the interviewer


changes the sample result. Examples of this include the respondent
lying, poor interviewing techniques, wording of questions, race or sex of
interviewer influencing respondent, etc.

Example where respondents might lie:

Example where the wording of the question might be a problem:

Example where the race or sex of the interviewer might result in biased
answers:

Example where the interviewer might influence the respondent:

Other problems:

What should you do now that you have a sample?


• Observational Study: observe units or individuals and measure
variables of interest but do not manipulate the units or their
environment in any way.
• Experiment: deliberately impose some treatment on the units in order
to observe their responses.
• Anectdotal evidence base information on haphazardly selected
individual cases which often come to our attention because they are
striking in some way.

8
Advantages of Experiments:
• In principal, experiments can give good evidence for causation.
Observational studies are not as good at this.
• Experiments allow us to study the specific factors we are interested
in, while controlling the effects of lurking variables.
• Experiments allow us to study the combined effects of several
factors. Interactions between factors can be very important.

Disadvantage of Experiments:
• Experiments often lack reality.

Experiment Vocabulary Terms

Experimental Unit: individual or units on which the experiment is done.


Subjects: Experimental units that are human beings.
Treatments: A specific experimental condition applied to the units.
Factors: The explanatory variable(s) being studied.
Factor level: A specific level (option) for the factor.

Example:
A sports engineer is interested in determining the effects that speed
and air pressure have on throwing distance for his mechanical trainer
football throwing machine. Two speeds (40 mph, 55 mph) and three
air pressures (175 psi, 200 psi, 230 psi) were chosen for the study.
Treatments were randomly assigned to thirty footballs.
Experimental Unit: a football

Treatments: 40 mph and 175 psi 55 mph and 175 psi


40 mph and 200 psi 55 mph and 200 psi
40 mph and 230 psi 55 mph and 230 psi

Factors: speed and air pressure

Factor Levels: For speed - 40 mph & 55 mph.


For air pressure - 175 psi, 200 psi, 230 psi.

9
Simplest type of experiment: apply a single treatment and observe the
response.
• This is ok in very controlled situations, but you may miss lurking
variables, especially if you are using living subjects.
• May miss confounding with the placebo effect: a patient responds
favorably to being treated, not to the treatment itself.
• Bias: the study systematically favors certain outcomes (if you
have no control group, your study will be biased towards finding
the new medicine effective).
• Lack of realism: if the subjects know they’re in an experiment,
they might not behave naturally during the treatment.

It is much more useful to do comparative experiments using a control


group. A control group is a set of experimental units that receive no
active treatment. If the units are subjects, then the control group usually
receives a placebo and any effect of the treatment on this group is called
the placebo effect.

In experiments we hope to see a difference in the response that is too


large to happen merely due to random chance. In other words, we are
looking for a statistically significant result.

Principles of Experimental Design:

1. Control the effects of irrelevant (lurking) variables on the response.


Doing a comparative experiment with a control group is the
simplest form of control. Double-Blind experiments are even
better as neither the subjects themselves nor the personnel who
worked with them know which treatment any subject has received.
This method avoids any unconscious bias.
2. Randomization, the use of impersonal chance to assign subjects to
treatments. Randomization makes sure that characteristics of the
units and/or the judgment of the experimenter do not influence the
selection or the treatment assignment.

10
3. Replication of the experiment on many subjects reduces chance
variation in the results.

***In order to say an experiment has shown that a change in the


explanatory variable causes a change in the response variable, we
must show that all three of these principles have been met. ***

Types of Experimental Designs:

Completely Randomized Design: In this method of randomization, each


unit is randomly assigned to a treatment group.

Example – What instructional method helps students learn new material


better: self-study from the textbook, traditional classroom, online
instruction. A sample of students are randomly assigned to one of these
three groups. Scores on an exam are measured to compare these methods of
teaching.

Randomized Block Design: In this design, a blocking variable


distinguishes between different groups going into the experiment. The
random assignment of the units to the treatments is carried out separately
within each block.
Example – Men and women may have different learning styles. Does this
affect the instructional method that is most effective? A sample of 30 men
and 30 women are assigned randomly to the three above instructional
methods. An exam is given to determine the amount of material retained.

Matched Pairs Design: A design that compares just two treatments.


This can occur in one of two ways:
• Choose pairs of subjects that are closely matched. Within each
pair, randomly assign one individual to each treatment.

11
• All subjects receive both treatments. The order of the treatments
may be randomly assigned to see if the order of the treatments
matters.
Example - Do students retain information better if there is noise in the
background while they are studying? A group of 20 students are randomly
selected to participate in an experiment. Each will be put in a room where
they will read a list of 50 statements. After a break they will be given quiz
to see how much of the information they retained. Ten of the students will
read the statements in a silent room, ten in a room with background noise
piped in. The groups will switch places and repeat the process using the
opposite environment. Students are randomly assigned to a starting group.

Statistical Inference
In statistical inference we calculate a value from our sample and use that
value as an estimate for our population.

A parameter is a number that describes the population. A parameter is


a fixed number, but in practice we do not know its value.

A statistic is a number that describes a sample. This value can change


from sample to sample.

We are using a statistic to estimate an unknown parameter.

12
Sampling Variability:
Sampling variability represents the variation associated with the value of
the statistic that is generated by repeatedly selecting samples of the same
size, using the same probability sampling design from the population.

Sampling Distribution of a Statistic:


The sampling distribution of a statistic is the distribution of values taken
by the statistic over ALL possible samples of EQUAL SIZE selected
from the SAME POPULATION using the SAME PROBABILITY
SAMPLING DESIGN. In other words, the distribution tells us what
values the statistic may take on and how often each of these values will
show up.

Goal:
We want to estimate a value from a population such as µ (parameter)
using a value we calculate from a sample x (statistic). Naturally we
want x to be as close to as possible. To achieve this we need x to
have low bias and low variability.

Bias vs. variability:


Bias concerns the center of the sampling distribution. Your results are
biased if the statistic for your sampling distribution is not at the
population parameter.

Variability describes how spread out the sampling distribution is for the
statistic. This spread is determined by the sampling design and the
sample size n. Larger samples have smaller variability—the population
size is not important to variability.

We want both small bias and small variability!

13
Think of the true population
parameter as the bull’s-eye
of a target and the dots
as our sample statistics.

How do we obtain low bias?


1. Make sure all members of the population have a non-zero chance
of being selected into our sample. (This means we need a list of
the entire population if possible!)
2. Use random sampling.

Bias is related to the center of the sampling distribution. If the mean of


the sampling distribution of the statistic is equal to the parameter it is
estimating, we say the statistic is an unbiased estimator of the parameter.

How do we obtain low variability?


1. Increase our sample size.
2. Use a sampling design that enables us to get a sample that best
reflects our population.

Variability is related to the spread of the sampling distribution. The


variability of a statistic from a random sample does not depend on the
size of the population, as long as the population is at least 100 times
larger than the sample.

14
Example:
The president of Purdue is interested in the average number of credit
hours taken by Purdue students. Some statistics students at Purdue are
asked to submit potential plans for gathering sample information.
Comment on the plans listed below. Specifically comment on bias and
variability.

Protocol A: Stand outside one of the residence halls and ask every third
person how many credit hours they are currently taking until you have
recorded 10 values.

Protocol B: Leave forms in numerous places on campus for students to


fill out along with boxes for them to place the forms in.

Protocol C: Obtain a list of all students in the residence halls and


randomly select 10 students from the list.

Protocol D: Obtain a list of all students who have registered for at least
one class at Purdue and randomly select 10 students from the list.

Protocol E: Obtain a list of all students who have registered for at least
one class at Purdue and randomly select 100 students from the list.

Protocol F: Obtain a list of all students who have registered for at least
one class at Purdue and group the students on the list by whether they
live on or off campus. Then randomly select 50 students who live on
campus and 50 students who live off campus to survey.

Additional Problems:

15
16

Você também pode gostar