Você está na página 1de 55

F77SA1 Introduction to Statistical Science A

Lecture notes
Jennie Hansen
George Streftaris

Contents
Introduction

iii

1 Collecting Data
1.1 Sampling . . . . . . . . . . .
1.2 Experimentation . . . . . .
1.3 Measurement . . . . . . . .
1.4 Looking at data intelligently

1
1
23
40
45

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

ii

CONTENTS

Introduction
The use of statistical arguments and the conclusions of statistical analyses are
used to guide and inform many decisions in society. Some examples include:
Global warming: The scientific claims both for and against global
warming are often based on statistical analyses - so how can we judge
the validity of the underlying statistical analyses on both sides of the
argument?
Clinical research: NICE (National Institute for Health and Clinical
Excellence) recommends whether new drugs should be made available
on the NHS. Their recommendations are based on statistical analysis
of the effectiveness of the drug. How are the clinical experiments designed and the data collected in order prove/disprove clinical claims of
effectiveness?
Psychological research: You may have heard people say that women
are better at multi-tasking than men, whereas men are better at map
reading than women. What is the basis (if any!) of such claims? How
did researchers design an experiment to investigate these differences between men and women and how do they know that the differences that
were observed between men and women were statistically significant
and could be attributed to a difference in gender?
Opinion polls (e.g. market research, television ratings, etc ): Many
corporate decisions are based on survey results. It is important to understand what the percentages in such surveys mean, how they were
obtained, and how accurate the information is - otherwise costly mistakes can be made.
Life insurance and pensions: Life insurance premiums and pension calculations are based on estimates of life expectancy. These estimates
iii

iv

INTRODUCTION
are based on statistical analyses of mortality data. Life insurance companies and pension providers would find it difficult to meet their obligations if the underlying statistical calculations are incorrect!

In all of the above examples, the aim of the underlying statistical analysis is
to provide insight by means of numbers. This process usually involves three
stages:
1. Collecting data
2. Organising data
3. Drawing conclusions from the data (inference)
In this module, we look at the statistical principles and techniques used at
each of these three stages in an analysis. At the end of this module students
should be able to
understand the logic of statistical reasoning
understand how statisticians come to their conclusions
be able to evaluate the use of statistical methods in a variety of applications (e.g. science, finance, the media, etc)
Developing these skills is an important step in the development of the critical
thinking skills which are required in every aspect of professional life.

Chapter 1
Collecting Data
1.1

Sampling

Samuel Johnson is reported to have said, You dont have to eat the whole
ox to know that the meat is tough. This is one way of describing the idea
behind sampling. Sampling is a way to gain information about the whole by
examining only a part of the whole. Sampling is widely used by industry,
social scientists, political parties, the media, etc.
Example 1.1.1 When a newspaper reports that 34% of the Scottish electorate support independence for Scotland, someone, somewhere, has to have
asked some voters (but clearly not every voter) their opinion. The reported
percentage is based on the sample of voters that were questioned. This is an
example of sampling in order to obtain information about the whole.
Whenever statisticians discuss sampling they use certain precise terms to
describe the procedure of sampling:
Population - this is the entire group of objects about which information is
sought.
Unit - any individual member of the population.
Sample- a subset of the population which is used to gain information about
the population.
Sampling frame - this is the list or group of units from which the sample
is chosen.
Variable - a characteristic of a unit which is to be measured for those units
in the sample.
1

CHAPTER 1. COLLECTING DATA

Discussion
In the above example, the population is the entire Scottish electorate. The
sampling frame is not usually the same as the population. For example, if
the sample was chosen by selecting a subset from the electoral roll, then
the sampling frame would be the electoral roll - however, the electoral roll
does not necessarily contain the name of every eligible Scottish voter. In the
example the variable to be measured is opinion on Scottish independence,
e.g. support/dont support independence.
Example 1.1.2 Acceptance sampling is used by purchasers to check the
quality of the items in a large shipment. In practice, manufacturers often
incorporate acceptance-sampling procedures in their contracts with suppliers.
The purchaser and supplier agree that a sample of parts from a large shipment
will be selected and inspected and based on the number of parts in the sample
which meet the purchasers specifications, the shipment is either accepted of
rejected.
Discussion
In the above example, the population and the sampling frame are the same,
i.e. the shipment of items, and the variable measured is whether or not a
part is defective.
Sampling design
We have seen that we start with a question about a population and then
take a sample from the population in order to answer our question about the
population. This approach will give us a meaningful answer provided we can
be confident that the sample is (roughly) representative of the population.
Problem: How should we go about selecting a (representative) sample from
the population?
One possibility is to select a sample consisting of the entire population - this
is what happens when the government conducts a census. In this case we
would obtain the exact answer to our question. It might seem that this
would be the ideal approach, but it is usually problematic to sample the
entire population. Some problems with this approach include:
expensive, time-consuming,
problems with acceptance sampling if units are destroyed as part of the
sampling procedure (e.g. testing the tolerance of fuses in a sample).

1.1. SAMPLING

Alternatively, we could select a smaller sample.


Smaller samples - a potential pitfall - Selecting a (relatively) small
subset from the population can lead to errors due to the fact that our method
of selecting the sample may have a tendency to select an unrepresentative
sample from the population.
Example 1.1.3 Convenience sampling
Selection of whichever units of the population are easily accessible is called
convenience sampling. For example, companies sometimes seek information
from consumers by hiring interviewers to stop and interview shoppers on
Princes Street in Edinburgh. But a sample of shoppers on Princes Street may
not be representative of the entire population of consumers. For example, car
owners may prefer shopping at out-of-town shopping centres, so any sample
of shoppers on Princes Street would under-represent these consumers and
over-represent others such as elderly consumers or inner city residents.
The example above is one illustration of bias in sampling:
A sampling method is biased if the results from the samples are consistently
and repeatedly different, in the same way, from the truth about the population
(i.e.there is a tendency to always over-estimate or always underestimate).
Example 1.1.4 Voluntary response sampling
A voluntary response sample chooses itself by, for example, responding to
questions asked by post or during radio or television broadcasts. People who
have strong opinions about an issue are more likely to take the trouble to
respond and their opinions may not represent the population as a whole.
Simple random sampling
We saw in the discussion above that sometimes we can detect when a sampling method is biased. In this section we look at how to develop a method
of sampling which is unbiased. In order to develop an unbiased method of
choosing a sample, we should think a bit about why some of the sampling
methods described in the examples in the last section were biased.
Observation: In the two examples above, the problem with the sampling
method was that certain subsets from the sampling frame were more likely
to be selected than other subsets. So a first step in developing an unbiased
sampling method is to select a sample of n units from the sampling frame in

CHAPTER 1. COLLECTING DATA

such a way that no subset of n units is more likely to be selected then any
other.
A simple random sample (SRS) of size n is a sample chosen in such a
way that every collection of n units from the sampling frame has the same
chance of being selected.
Remark
By taking a simple random sample, we can avoid some of the problems
associated with convenience sampling because no part of the population is
likely to be over (or under) represented in a simple random sample.
Question: How can we select a simple random sample?
One method is to use physical mixing. Physical mixing is the method that
is used to select the lottery numbers each week. The lottery draw works as
follows:
Start with 49 identical balls and label the balls with the numbers 1 to
49. Then thoroughly mix the balls and select a ball at random from the
49 balls (i.e. choose it mechanically or blindly from the 49 balls).
Key point: at this first stage, every ball has the same chance to be
selected!
After selecting the first ball, the remaining balls are thoroughly mixed
and a second ball is selected at random from the 48 balls.
Note: if the mixing at the second stage is thorough, then any of the
remaining 48 balls has the same chance of being selected. So, after two
stages in the draw, we have a simple random sample of size 2.
This procedure is repeated 4 more times to obtain a total of 6 balls and
because at each stage any of the remaining balls is equally likely to be
selected, the resulting sample of 6 balls is a simple random sample of
size 6 from the original 49 balls, and the numbers on the selected balls
correspond to a simple random sample from the numbers 1,2,..., 49.
Remarks
Selecting a random sample using physical mixing is not as easy as it might
appear. One key feature of this approach is that at each stage we must be
able to thoroughly mix the set of objects from which we are selecting a random
object. In the case of the lottery, much effort goes into verifying that the
machine that mixes the balls does, in fact, thoroughly mix the balls before

1.1. SAMPLING

each draw and that there is no bias in the way that the machine selects
each ball (just imagine the uproar if the mixing stage of the procedure was
found to be biased in some way!)
In other situations it can be more difficult to guarantee that the objects
have been thoroughly mixed. For example, when playing card games, someone usually shuffles the deck before dealing the cards to the players. The
purpose of shuffling the deck is to physically mix the deck in such a way that
each player has the same chance of getting any particular card in their hand.
However, does shuffling a deck really thoroughly mix the deck?! Also, how
many times should a fresh deck be shuffled in order to ensure that it has
been thoroughly mixed? Casinos are interested in the answers to these questions and have even commissioned statisticians to work out the answers! It
turns out that a fresh deck should be shuffled 7 times in order to thoroughly
mix the deck (see http://en.wikipedia.org/wiki/Persi_Diaconis and
http://homepage.mac.com/bridgeguys/SGlossary/ShuffleofCards.html).
Another problem with using physically mixing to select a simple random
sample is that the sample frame may be too large for this method to be
practical. For example, suppose the Admissions Office at Heriot-Watt would
like to select a sample of 50 first-year students to interview regarding their
reasons for choosing to go to university. It wouldnt be practical to get several
hundred identical balls, put student names on the balls, mix the balls, and
select a sample of 50 balls in the same way that the lottery balls are drawn!
Even if we tried to do this (and could find a container big enough to hold all
the balls), there would still be the problem of making sure that the balls were
thoroughly mixed before each draw. It was exactly these sorts of difficulties
that were encountered when the US army conducted the first lottery in 1970
to determine who would be drafted into the army. The order in which men
between 19 and 25 were to be drafted was determined by drawing capsules
which contained birthdays out of a box - men with the first birthday to
be drawn would be drafted first, then the men with the next birthday, etc.
However, because of a flawed mixing process, in the 1970 lottery it turned
out that capsules containing birthdays later in the year were more likely to
be drawn than capsules containing birthdays in January!
Question: So, if it is difficult and sometimes not practical to use physical
mixing to select a simple random sample, what can we do instead?
Answer: One way to select a simple random sample (even from a very large
sampling frame) is to use a table of random digits.

CHAPTER 1. COLLECTING DATA

A table of random digits is a list of the 10 digits, 0,1,2,3,4,5,6,7,8, and 9


having the properties:
1. The digit in any position in the list has the same chance of being any
one of the digits 0,1,2,3,4,5,6,7,8, or 9.
2. The digits in different positions are independent of each other in the
sense that if I know which digit appears in one position in the table,
then it is still the case that the digit in any other position in the table
has the same chance of being any one of 0,1,2,3,4,5,6,7,8,or 9.
A great deal of mathematical ingenuity and computing expertise has been
devoted to generating tables of random digits (they are also important in
applications of cryptology, computer security, etc). However, for this course
we do not need to consider how such lists are compiled. Instead, it is enough
to know how to use such a table. To use a table we, need to know the
following consequences of properties 1 and 2 above:
Any pair of digits in the table has the same chance of being any of the
100 possible pairs: 00, 01, 02, 03,...., 98, 99.
Also, if I know the digits in two adjacent positions in the table, then
the digits in any other two adjacent positions in the table have the
same chance of being any one of the 100 possible pairs provided neither
position in the pair overlaps with the first pair..
Any triple of digits in the table has the same chance of being any of
the 1000 possible triples: 000, 001, 002, 003,...., 998, 999.
Also, if I know the digits in three adjacent positions in the table, then
the digits in any other three adjacent positions in the table have the
same chance of being any one of the 1000 possible triples provided none
of positions in the triple overlaps with the first triple.
The same principles hold for groups of four or more digits from the
table.
To illustrate how to use a table of random digits, we will do some examples
in class.
Getting information about the population from the sample
So far we have looked at how to pick the sample (i.e. subset) from a population in such a way that we havent shown any favouritism. However,

1.1. SAMPLING

once we have selected a simple random sample from the population, there
are some questions that we need to resolve:
How can we justify using information from the sample to tell us something about the population - especially if the sample is only a small
subset of the population?
How does information from the sample tell us something about the
population?
To begin thinking about these questions we need to introduce some more
terminology.
A parameter is a numerical characteristic of the population. It is a
fixed number, but we (usually) do not know its value.
A statistic is a numerical characteristic of the sample. The value of
the statistic is known once we have taken a sample, but its value
changes from sample to sample.
Typically, when we want to know something about a population, our question
about the population can be expressed in terms of an unknown parameter.
After taking a sample from the population, we compute the value of an
associated statistic and use the value of this statistic to estimate the value
of the unknown population parameter.
Example 1.1.5 In the 1980s Newsday (an American weekly news magazine) surveyed a random sample of 1373 parents. The magazine wanted to
determine the proportion, p, of parents in the American population who, if
given the choice, would have children again. In the sample, 1249 parents said
that they would, if given the choice, have children again. So, the proportion
0.91. Note: The fraction p which was comin sample was p = 1249
1373
puted from the sample is a statistic and we use it to estimate the population
parameter p.
Now suppose Newsday selected another random sample of size 1373. We
would not expect the number in the second sample who would have children
again to be exactly the same as the number in the first sample, but we
might still expect the proportion in the sample to be close to 0.91, the
proportion in the first sample. In fact, we intuitively expect that if we were

CHAPTER 1. COLLECTING DATA

to repeatedly select random samples of size 1343 from the population of


parents, the corresponding sample proportions would be clustered together.
The way to visualize this clustering is to make a bar graph which reflects
the pattern of the values of the sample proportion p when we repeat sampling
several times. To illustrate this, we will think about a simple example (i.e.
we will do a statistical thought experiment).
Note! Our observations from this simple example will also apply to more
complicated practical situations in statistics.
Example 1.1.6 Suppose that I have a (large) box which contains 5000
beads, all of which are identical except for their colour. Of the 5000 beads,
1000 beads are black and 4000 beads are white and suppose that you cant
see the beads in the box. You would like to determine the proportion, p, of
black beads in the box. To do this, you follow the following procedure:
You select a simple random sample of size 25 from the balls in the box.
You count the number of black beads in your sample and compute the
proportion, p, of black beads in your sample. This proportion is your
estimate of the proportion of black beads in the box.
Discussion
1. Although this example is somewhat artificial, it gives us a simple model
for many situations (including the Example 1.1.5 above) - e.g. whenever
we wish to determine what proportion of a population would answer
Yes to a question, we can model the population as beads where black
Yes and white No.
2. The proportion, p, of black beads in the sample may not be equal to
p = 0.2, the proportion of black beads in the population, but we would
expect p to be close to 0.2.
3. If we return the beads in the sample to the box, thoroughly mix the
beads in the box, and select another simple random sample, the proportion of black beads in the second sample may not be the same as
the proportion of black beads in the first sample (but we would still
expect it to be close to the population proportion p = 0.2).

1.1. SAMPLING

4. The proportion of black beads in the entire population is the population


parameter and the proportion of black beads in the sample is a sample
statistic .
Now to see how the values of the sample proportions cluster together, we
will repeat this sampling procedure several times and look at the pattern of
the corresponding sample proportions. This pattern is called the sampling
distribution of the sample proportion p.
It would be very tedious to actually repeat this experiment several times
(not to mention the fact that I would need to start with box containing 5000
beads!). Fortunately, we can use a computer to do this experiment instead.
The results of 200 repetitions of sampling 25 beads from 5000 in the are
summarised in the table and bar graph below.
0
1
0.00 0.04
2
5

2
0.08
17

3
4
5
0.12 0.16 0.20
25
35
38

6
0.24
31

7
8
0.28 0.32
23
12

9
0.36
8

10
0.40
2

11
0.44
2

10

20

30

No. black beads


Proportion
Frequency

0.0

0.2

0.4

0.6

0.8

1.0

Proportion

Sampling distribution for sample proportions: sample size 25 (200 repetitions)

10

CHAPTER 1. COLLECTING DATA

Discussion
1. The bar graph above (which describes the sampling distribution) shows
that the values of the sample proportions obtained from the 200 repetitions are (more or less) symmetrically clustered around the population
parameter p = 0.2. This symmetry arises because taking a simple random sample is an unbiased sampling procedure. In this example, we
know the value of the population parameter p, but even if p were unknown, it would still be true that the sample proportions from repeated
sampling would be clustered around the population parameter p.
2. On the other hand, if our sampling procedure is biased, the values of the
sampling proportions will tend to be clustered on one side or the other
of the population parameter p and bulge in the sampling distribution
will be on one side or the other of the population parameter p. This is
because bias in the sampling method means that the sampling statistic
tends to either overestimate or underestimate the population parameter
when we repeatedly sample from the population.
3. The spread of the values of the sample proportions gives an indication
of the precision of the sampling method. We will always see some
spread in the values taken by a sample statistic when we repeatedly
sample because there is sampling variability. Since we cannot eliminate
sampling variability, our goal must be to try to reduce the spread in
the sampling distribution of our statistic (i.e. to increase the precision
of the sample statistic).
Question: How can we improve the precision of a statistic obtained from a
simple random sample?
In this (and many other situations) the precision of a statistic which is based
on a simple random sample can be increased by increasing the size of the
simple random sample. To illustrate this, I used a computer to perform 200
repetitions of sampling 100 beads from 5000 (sample size =100) and 200 repetitions of sampling 250 beads from 5000 (sample size =250). The sampling
distributions for the corresponding sample proportions are represented below
by bar graphs.

11

10

20

30

40

1.1. SAMPLING

0.0

0.2

0.4

0.6

0.8

1.0

Proportion

10

20

30

40

Sampling distribution for sample proportions: sample size 100 (200 repetitions)

0.0

0.2

0.4

0.6

0.8

1.0

Proportion

Sampling distribution for sample proportions: sample size 250 (200 repetitions)

12

CHAPTER 1. COLLECTING DATA

Discussion
Lets look a bit more closely at the bar graphs above which correspond to
the sampling distribution of the sample proportion p when the sample size
equals 25, 100, and 250 respectively. We can see that as the sample size
increases, the values of p (obtained from repeated samples) become more
tightly clustered around the population parameter p = 0.2:
When the sample size is 25, then the values of p range between 0.0 and
0.44. However, not many of the observed values were as far away from
the true proportion p = 0.2 as either the value 0.0 or 0.44. In fact,
194 of the values of p (97% of the 200 values) lie somewhere between
0.04 and 0.36. This indicates that if we were to take another simple
random sample of size 25 from the 5000 balls then there is a good
chance that the difference between the sample proportion p computed
from our sample and the true proportion p = 0.2 is no more than 0.16.
When the sample size is 100, then the values of p range between 0.08
and 0.28. Again, not many of the observed values were as far away
from the true proportion p = 0.2 as either the value 0.08 or 0.28. In
fact, 197 of the values of p (98.5% of the 200 values) lie somewhere
between 0.12 and 0.26. In this case, if we were to take another simple
random sample of size 100 from the 5000 balls then there is a good
chance that the difference between the sample proportion p computed
from our sample and the true proportion p = 0.2 is no more than 0.08.
When the sample size is 250, then the values of p range between 0.14
and 0.26. However, not many of the observed values were as far away
from the true proportion p = 0.2 as either the value 0.14 or 0.26. In
fact, 194 of the values of p (97% of the 200 values) lie somewhere
between 0.15 and 0.25. In this case, if we were to take another simple
random sample of size 250 from the 5000 balls then there is a good
chance that the difference between the sample proportion p computed
from our sample and the true proportion p = 0.2 is no more than 0.05.
So, by looking at the sampling distribution, we can work out the likely
range of values for p, and we can see that this range of values is smaller the
larger then sample size. So, precision increases as the sample size increases.
In the example above we knew the value of the population proportion
p = 0.2, but our observation that the precision of the sample proportion p

1.1. SAMPLING

13

increases as the sample size increases holds no matter what the population
proportion p equals. In fact, statisticians have studied examples like our
model and have worked out the following rule of thumb for determining the
precision of p in terms of the sample size:
Rule of thumb: Suppose that you select a simple random sample of size
n from from a (much larger) population, then there is a good chance that
the magnitude of the difference between
a sample proportion, p, and the
population parameter p is less than 1/ n.
One consequence of this rule of thumb:
The precision of a sample statistic depends on the size of the sample
but not on the size of the population provided the population is much
larger than the sample.
For example, (provided the population is very large) the sample proportion
p computed from a simple random sample of size 1000 from the population
is
likely to differ from the (unknown!) population parameter by at most
1/ 1000 0.03.
Another consequence of the rule of thumb is that it allows us to determine how big the sample must be in order to achieve a prescribed level of
precision. For example, suppose a national radio station wishes to determine
the proportion p of the population that listen to their station at least once
during a typical week.
Question: Given that the station is able to select a simple random sample
from the population of radio listeners, how big a sample should they select
in order to have a good chance that the sample proportion p differs from
the population proportion by at most 0.02 (i.e there is at most a 2% error)?
Answer: The statisticians Rule of Thumb says that we should choose the
sample size n so that
1
= 0.02.
n
We can re-arrange this equation to get:

n = 50

n = 2500.

Note: Now that you have a rule of thumb for determining the precision of
a sample proportion you can look out for mistaken criticisms of statistics in
the media. For example, a journalist criticised the Newsday results about

14

CHAPTER 1. COLLECTING DATA

parenting (see Example 1.1.5 above) by saying that a random sample of size
1373 was too small to give any meaningful information about a population
of several million. But our rule of thumb says that the sample proportion p
calculated from a simple random sample of size 1373is likely to differ from
the true proportion in the population by at most 1/ 1373 0.027!
Summary:
Despite the sampling variability of a statistic computed from a simple
random sample (i.e. the value of the statistic varies from sample to
sample), the values of the statistic have a sampling distribution which
can be observed by looking at a frequency bar graph for the values of
the statistic which are obtained by repeated sampling.
When the sampling frame consists of the entire population (as it did
in our example of sampling balls), then the values of the statistic computed from repeated simple random samples from the entire population
neither consistently overestimate nor consistently underestimate the
value of the population parameter that we wish to estimate. In other
words, simple random sampling produces unbiased estimates and the
sampling distribution of the statistic bulges around the value of the
population parameter.
The sampling distribution associated with a statistic computed from a
sample gives an indication of the precision of the statistic (i.e. we can
get a rough idea from the sampling distribution of the magnitude of
the typical difference between the value of the statistic computed from
the sample and the value of the population parameter). The precision
of a statistic computed from a simple random sample depends on the
size of the sample and can be made as high as desired by taking a large
enough sample.
Errors in sampling
In the discussion above we saw that we can always expect that there will be
a difference between the value of a sample statistic and the (unknown) population parameter that we wish to determine. In Example 1.1.6 above, the
discrepancy between the sample proportion and the population proportion
p = 0.2 was caused by chance in selecting the random sample (i.e. due to

1.1. SAMPLING

15

chance, we cant guarantee that the proportion of black beads in the sample
will be exactly the same as the proportion of black beads in the population). We also saw that for this very simple example, we could reduce the
discrepancy between the value of the sample proportion p and the population proportion p = 0.2 by increasing the sample size. Unfortunately, not all
discrepancies between sample statistics and population parameters are the
result of chance errors (and which can be reduced by increasing sample size).
In particular, whenever we are sampling from a human population there are
other sources of error that we need to watch out for. Some of typical examples
of these are described below.
Sampling errors
Sampling errors are errors that arise from the act of taking a sample
and cause the sample statistic to differ from the population statistic. Sampling errors arise because the sample is a subset of the entire population.
There are two types of sampling error:
Random sampling errors are the deviations between the sample statistic and the population parameter which are caused by chance when we
select a random sample. The deviations between the sample proportions and the population proportion observed in the example above are
random sampling errors.
Nonrandom sampling errors arise from improper sampling methods.
These errors can arise because the sampling method is inherently biased
(e.g. convenience sampling). Nonrandom sampling errors can also arise
when the sampling frame (the list from which the sample is drawn)
differs systematically from the population.
Example 1.1.7 Suppose that a polling organisation has been commissioned
to determine the proportion of Edinburghs population who favour the introduction of a congestion charge. The polling organisation decides to use the
Edinburgh telephone directory as a sampling frame (i.e. list) from which to
select a random sample to survey.
Question: Will a sample chosen in this way be representative of the population (i.e. adults who live in Edinburgh)?
Answer: The problem in this example is that using the telephone directory
means people without landline phones (e.g. students and others who rely

16

CHAPTER 1. COLLECTING DATA

primarily on mobile phones and people who cant afford a telephone landline)
and those who are ex-directory will not be part of any sample chosen using
this method. The random sample selected by the polling organisation will be
representative of the population of landline phone users but wont necessarily
be representive the population under investigation (i.e. adults who live in
Edinburgh). If the views of the excluded members of the population differ
significantly from those who are listed in the telephone directory, then the
sample statistic will be a biased estimate of the population parameter.
Note: If the sampling frame differs systematically from the population, sample statistics will be biased no matter how the sample is selected from the
sampling frame. In other words, simple random sampling cannot give us
unbiased statistics if the sampling frame differs systematically from the population.
Nonsampling errors
Nonsampling errors are errors that are not related to the act of selecting
a sample from the population. These errors can occur even if we used
the entire population as our sample. Here are some typical sources of
nonsampling errors:
Missing data: Sometimes information from a sample is incomplete
because it was not possible to contact some members of the sample or
because some members of the sample refuse to respond. Even if the
entire population was used as the sample, missing data could cause the
results of a survey to be biased if the people who cant be contacted or
who refuse to respond differ in some specific way from the population
as a whole.
Response errors: Some members of a sample may give wrong answers when surveyed. For example, subjects may lie about their age,
weight, income, use of alchohol, cigarettes and drugs, etc. Even subjects that are trying to answer a question truthfully may answer incorrectly because they cant estimate very accurately exactly how many
times they go up and down stairs in a day or how much tea they drink
in a day, etc. Other subjects may exaggerate their answers.
Processing errors: These errors usually occur at some stage in the
process of entering raw data into a computer. Sometimes big errors can
occur simply because a zero has been added or deleted as a number is

1.1. SAMPLING

17

recorded. These errors are often spotted by asking whether the results
make numerical sense.
Errors due to the method used to collect data once the sample
has been selected: Once the sample has been selected, the data has
to be collected. In the case of surveys (e.g. market research or opinion
polls) a decision must be made whether to contact subjects in the
sample by post, telephone, online, or by personal interview. Each of
these methods can lead to bias in the results.
Postal surveys are relatively inexpensive but response rates can
be low and, depending on the nature of the survey, there can also
be voluntary response bias.
Telephone surveys use computers to randomly dial numbers (so
even unlisted numbers can be reached). They are also relatively
inexpensive. However, there are still households (mostly poorer
ones) that do not have a telephone, so this leads to some nonrandom sampling error. It is also important in telephone sampling to
try the same number several times and at different times of the
day - otherwise the sample will only contain those who are usually
at home at a certain time of the day.
Some organisations, such as YouGov, now carry out surveys online. Again, not everyone is online so there is potential for some
nonrandom sampling error since those who cannot be reached by
an online survey may be different in some important way from
those that can be contacted online.
Personal (e.g. face-to-face) interviews can result in a higher response rate but can be expensive to conduct. Also, in some cases
face-to-face interviews can lead to response errors. For example,
a face-to-face interview about ones health or lifestyle might involve some embarrassing questions which some subjects would be
reluctant to answer.
Errors due to the wording of the questions in a survey: The
problem is that the wording of a question can be slanted to favour one
response over another. One way to slant a question is to pose it in
terms of a desirable outcome. For example, consider the following two
questions:

18

CHAPTER 1. COLLECTING DATA


Do you favour a 9pm curfew for children under 14 years of age?
Do you favour a 9pm curfew for children under 14 years of age in order
to reduce anti-social behaviour on the streets after dark?
The second question is one example of how to slant a question in order
to try to influence the answer. In this case, the question is phrased to
try to encourage people to say Yes.
Other sampling designs

In practice, it is often not practical to take a simple random sample from the
population of interest. Some of the practical problems include:
The population may be so large that it is very difficult or too timeconsuming to construct a (complete) sampling frame (e.g it would be
quite time-consuming to construct a complete sampling frame for the
population of all Scottish high school students.)
The sampling frame may be so large (e.g. the electoral roll for the UK)
that it is technically difficult to select a simple random sample from it.
A simple random sample taken from a very large population (e.g. a
simple random sample from all UK adults) is likely to be geographically dispersed. If the sampling data is to be collected by interview,
then tracking down all the members of the simple random sample for
interview is both time-consuming and expensive.
To deal with these and other problems, statisticians and opinion pollsters
have developed more sophisticated methods for selecting representative
samples. Some examples of these more elaborate methods are described
below.
Multistage sampling
Lets consider the problem of interviewing a sample of size 500 from the
population of Scottish high school students. A simple random sample of size
500 from this population (supposing that we can obtain such a sample) is
very likely to be dispersed throughout Scotland and would be expensive to
interview. Instead, we could use the following approach to select a sample:

1.1. SAMPLING

19

Select a random sample of size 20 from a list of all Scottish high schools.
Get the school roll for each high school in the sample of 20 schools and
select a random sample of size 25 from each school roll. This gives us
(in total) a sample of size 500.
Discussion
1. The key feature of this multistage sampling example is that at each
stage we make selections at random.
2. This procedure does not select a simple random sample since there are
some subsets of 500 students that are impossible to select by using the
procedure described above. For example, this procedure will never select a subset of students who attend 500 different schools. Nevertheless,
since at each stage we select schools and students at random, we avoid
some of the problems with bias which arise when we dont make sample
selections at random.
3. The other advantage of this multistage sampling design is that the interviewers only have to visit 20 schools rather than traveling to (possibly)
hundreds of different schools across Scotland.
Stratified sampling
To construct a stratified sample, we divide the population into distinct groups
which are called strata. Next, we decide how many units from each strata
will be included in the sample (the number selected from each strata will
often depend on what we want to know about the population). Finally, we
select a (simple) random sample of the designated size from each strata and
combine these samples to form the stratified sample. To illustrate stratified
sampling, we will consider two examples.
Example 1.1.8 Suppose that the University library wants to conduct a survey of Heriot-Watt students studying on campus in order to determine student views on the service provided by the Riccarton library. The population
for this survey is the 6191 students studying on the Riccarton campus, of
which 4699 (75.9%) are undergraduates and 1492 (24.1%) are postgraduates.
A stratified sample of size 200 is selected by selecting a simple random sample of size 152 from the undergraduates and a simple random sample of size
48 from the postgraduate students.

20

CHAPTER 1. COLLECTING DATA

Discussion
1. By selecting a stratified sample, the library can guarantee that 76% of
the students in the sample are undergraduates and 24% of the sample are postgraduates and this matches the percentages in the whole
population of students.
2. By selecting a simple random sample from each group, we can avoid
sampling bias and we can use data from each group to obtain unbiased
estimates for each group separately and for the whole population. For
example, suppose that 87 of the 152 undergraduates surveyed and 34 of
the 48 postgraduates surveyed said that they Strongly favour longer
library opening hours. From these data we can estimate pU G = 0.572,
the proportion of undergraduates who strongly favour longer opening
hours, and pP G = 0.708, the proportion of postgraduates that strongly
favour longer opening hours. To estimate proportion of all students who
strongly favour longer opening hours, we work backwards and estimate
how many students in each group strongly favour longer opening hours
as follows:
0.572 4699 = 2688 undergraduates
0.708 1492 = 1056 postgraduates
So, we estimate that, in total, 2688+1056= 3744 students out of the
6191 students , strongly favour longer opening hours. This gives us an
estimated proportion of
pT otal = 0.605.
Now lets look at another example.
Example 1.1.9 Suppose that the Admissions Office at Heriot-Watt wants
to conduct a survey of undergraduate entrants to find out what the students
thought of the service provided by the Admissions Office. The Office plans
to select a sample of 160 entrants to survey. Based on the data which is
summarised in the table displayed below, the Admissions Office identifies 3
distinct groups of entrants: Home/EU students on the Edinburgh campus,
Overseas students on the Edinburgh campus, and students on the Borders
Campus.

1.1. SAMPLING

21

Home/EU Overseas
Edinburgh Campus
1270
148
1418
Borders Campus
169
1
170
1439
149
1588
Heriot-Watt Undergraduate Entrants
In addition to finding out about general customer satisfaction, the Admissions Office would also like to investigate any differences between these
groups with respect to their satisfaction with the service provided. Now if
they select a simple random sample of size 160 from the 1588 entrants there
will be only a few Overseas students and Borders students in the sample because there are (relatively) few such students in the population of entrants.
To get better (i.e. more precise) information about these two groups it is
necessary to have more of these students in the sample. To accomplish this
the Admissions Office decides to select a stratified sample, and selects simple random samples of of size 80, 40, and 40 from the Home/EU students
on the Edinburgh campus, the Overseas students on the Edinburgh campus,
and the students on the Borders Campus, respectively, in order to obtain a
sample of size 160 from the new entrants.

Discussion
1. In this example, the numbers selected from each group do not correspond to the relative sizes of these three groups in the population of
Heriot-Watt undergraduate entrants. This is because the Admissions
Office wants to get more precise information about Overseas students
on the Edinburgh campus and students on the Borders Campus, so it
selects relatively more students from these two groups. Nevertheless,
since we use simple random sampling to select the samples from each
stratum, we can use the data to obtain unbiased estimates for each
stratum. Here is the data:
68 out of 80 Home/EU students (Edinburgh campus),
27 of the 40 Overseas students (Edinburgh campus),
23 out of the 40 Borders campus students
reported that they were Very satisfied with the service provided by
the Admissions Office.

22

CHAPTER 1. COLLECTING DATA


The sample proportions for Home/EU, Overseas, and Borders students
are
68
27
23
pH =
= 0.85, pO =
= 0.675 pB =
= 0.575
80
40
40
We can use these proportions to estimate the numbers of Home/EU
students (Edinburgh campus) , Overseas students (Edinburgh campus),
and Borders campus students that were Very satisfied. These are,
respectively:
pH 1270 = 1079.5,

pO 148 = 99.9,

pB 170 = 97.75.

So, we estimate the overall proportion of students who were Very satisfied to be
1079.5 + 99.9 + 97.75
= 0.804.
pT =
1588
2. Taken as a whole, a stratified sample constructed as described above
would over-represent the opinions of Overseas students on the Edinburgh campus and students on the Borders Campus and would underrepresent the opinions of Home/EU students on the Edinburgh campus.
So, for example, we would need to be careful about using the sample to
estimate the proportion of all entrants who are Very satisfied with the
service provided by the Admissions Office. In fact, if we just computed
a simple proportion of those in the entire sample who were Very satisfied we would certainly obtain a biased estimate! Fortunately, much
as we did in Example 1.18, provided that we know the size of each stratum and the size of the sample from each stratum, we can use the data
from this stratified sample to obtain unbiased estimates of population
parameters.
Systematic random sampling
As a final example in the section, lets see how to construct a systematic
random sample of size 25 from the students enrolled in this module:
As with selecting a simple random sample, we start with an (ordered) class
list and we assign to each student on the list one (or more) 3-digit number(s).
We then use a table of random digits to select the first person in the sample.
Once the first person is selected, we then select every fifth person (say) on
the list, starting with the randomly selected first person, until we have a
sample of size 25.

1.2. EXPERIMENTATION

23

Discussion
1. We can select a systematic random sample much more quickly than a
simple random sample because we only need to select the first person
at random. The rest of the sample is obtained by a systematic selection
from the list, starting from a random person on the list.
2. A systematic random sample is not a simple random sample since not
all subsets of size 25 have the same chance of being selected (for example, systematic random sampling will never select the first 25 people
on the class list) Nevertheless, since we select the starting point for
the sample at random, every person on the list has the same chance of
being selected by a systematic random sample. This means that there
is no favouritism in the selection mechanism (i.e. we do not have sampling bias provided there is no underlying bias in the way the names
appear on the class list).
Note: The key features of the sampling designs described above is that each
is
based upon a well-defiined procedure for selecting the sample, and
uses chance to select units from the population.
We also note that it is possible to combine these methods to construct even
more elaborate random sampling designs. All the sampling methods described above are examples of probability sampling:
A probability sample is a sample chosen in such a way that we know
what samples are possible (not all need be possible) and we know
the chance each possible sample has to be chosen.
Key point: Provided we are working with a probability sample, we can still
use the data obtained from the sample to obtain (unbiased) estimates of the
population parameters we are interested in and we can work out the sampling
distribution for our estimates. By looking at the sampling distribution we
can also work out the likely magnitude of our sampling error.

1.2

Experimentation

Almost everyone has performed some sort of experiment at some time in


order to answer a question of the form:

24

CHAPTER 1. COLLECTING DATA


What happens if we ...... (fill in the blank!)?

We perform experiments because we would like to investigate cause and effect.


Establishing a cause and effect relationship is accomplished by deliberately
imposing a treatment on the objects in the experiment. In contrast, we use
sampling to get a representative profile of the population of interest. It is
very important to remember that sampling and other types of observational
studies are not good ways to establish cause and effect.
Example 1.2.1 Suppose that I select a random sample of adult men in the
50-60 year old age group and conduct a detailed health survey of the members
of the sample. Upon analysing the results of this survey I discover that in
the sample those men who are heavy smokers also tend to have high blood
pressure.
Question: Can I conclude from the pattern observed in the data that smoking causes high blood pressure in men of age 50-60?
No! The data show that there is an association between smoking and high
blood pressure, but it doesnt prove that smoking causes high blood pressure.
The problem is that individuals make their own choices about whether to
smoke or not. It may be that men who choose to smoke differ in some
other way from non-smokers and that this other difference between the two
groups may be the cause of the high blood pressure in the men who smoke.
Nevertheless, data from observational studies can prompt us to ask questions
regarding cause and effect and can prompt further investigation into possible
cause and effect relationships.
In order to explore some of the statistical issues associated with investigating
cause and effect we need to introduce some vocabulary:
Units - These are the basic objects on which the experiment is performed.
When the units are people, we call them subjects (or participants).
Variable - This is a characteristic of a unit which is measured in the experiment.
Response variable - This is a variable whose value we wish to study.
Explanatory variable - This is a variable that explains or causes a change
in the response variable. Explanatory variables are sometimes called factors.
Treatment - A treatment is any specific experimental condition that is
applied to the units in an experiment. A treatment is often a combination
of specific values (called levels) of each explanatory variable.

1.2. EXPERIMENTATION

25

Discussion
1. In the above example the response variable is blood pressure and the explanatory variable is whether or not the participant in the study smoked
or not. This study is not an experiment because the researcher did
not impose a treatment (i.e. smoking or not smoking) on the participants in the study. So, even though we can identify explanatory and
response variables, this does not mean that the study is an experiment.
2. We justify the use of experimentation to establish causation as follows:
Suppose that we change the level of one or more explanatory variables (and all other experimental conditions remain the same), then
any resulting change in the response variable must be the caused by
the changes in the levels of the explanatory variables. For example,
we could investigate the effect of water temperature on colour fastness
of a dye by washing dyed fabric at different temperatures. Provided
we could keep all the conditions in the experiment the same (except
for water temperature), any changes in the colour of the fabric after
washing could be attributed to the effect of the water temperature on
the dyed fabric. Unfortunately, in many situations it can be difficult to
guarantee that nothing affects the response variable except the changes
in the explanatory variables!
The need for an experimental design
In our discussion of sampling we looked at how to sample in order to obtain
a representative sample of the population and to minimise the errors in our
results. Likewise, in experimentation we have to be concerned with how an
experiment is designed.
The most basic type of experiment follows one of these patterns:
Treatment

Observation

(1)

(2)

or
Observation

Treatment

Observation

In experiment (1), a treatment is applied and its effect is observed. In experiment (2), before-and-after measurements are taken. Now under ideal
conditions (e.g. an experiment in a carefully set up laboratory), experiments

26

CHAPTER 1. COLLECTING DATA

following one of the designs above can give us good results. Unfortunately, it
is not always possible to design an ideal experiment and just as we need to
sample with care, we also need to do experiments carefully in order to draw
conclusions from them.
With sampling we needed to look out for sampling procedures which could
lead to sampling bias. With experimentation (and observational studies)
the problem is usually invalid data from which we are unable to draw any
conclusions, i.e. we cannot determine if the treatment had an effect on the
response variable. Here are some typical situations which result in invalid
data:
Confounding of variables
Sometimes we cannot determine the effect of an explanatory variable on the
response variable because the response variable may be influenced by other
variables which are not part of the treatment used in the experiment. Any
variable which is not an explanatory variable in the experiment but which
may influence the response variable is called an extraneous variable.
Example 1.2.2 An educational researcher who has developed a new method
to teach reading to primary school children decides to test its effectiveness
by asking several head teachers in Edinburgh to introduce the scheme in
their schools. At the end of the academic year the children who have been
taught using the new scheme are tested and the results are compared with the
reading test results from schools that did not participate in the scheme. The
results for the pupils that were taught under the new method were higher,
on average, than those of the children from non-participating schools.
Question: Do the results from the above experiment show that the new
scheme improves reading attainment?
No! The problem is that there may be other factors which have also influenced the performance of the children in the participating schools. For example, the researcher may have favoured contacting head teachers of schools in
more prosperous areas of Edinburgh where average performance on reading
tests is already higher than the city average before the experiment. Also,
head teachers did not have to participate in the experiment. So perhaps
the ones that chose to let their schools participate in the new scheme were
already more motivated to improve reading attainment in their schools than
the ones who didnt participate. The motivation and enthusiasm of the participating head teachers may have helped to improve the reading attainment

1.2. EXPERIMENTATION

27

of the pupils at least as much as the new scheme! So we cannot draw any
conclusions from this experiment because factors other than the new reading
scheme may have influenced the results of the experiment. This is an example
of confounding:
The effects of two or more variables on a response variable are said
to be confounded if these effects cannot be distinguished from one
another.
In the example above, the motivation of the participating teachers (an extraneous variable) and the method of teaching (the explanatory variable) are
confounded.
Data from observational studies
As already mentioned above, it is usually difficult to determine cause and
effect based on data from observational studies. In particular, we often have
problems with confounding of variables in observational studies. Heres another example of an (comparative) observational study:
Example 1.2.3 A large study used health service records to investigate the
effectiveness of two ways to treat prostate cancer. One treatment was traditional surgery and the other was based on chemotherapy. In each case, the
patients consultant determined which treatment would be given. The study
found that the patients who received chemotherapy were less likely to survive
for more than 5 years.
Discussion
In this example the response variable is post-treatment survival and the explanatory variable is the type of cancer treatment (i.e. surgery or chemotherapy). This study is not an experiment because the researcher did not impose
the treatment on the patients (each patients consultant determined the treatment).
Question: Do the results from the above comparative study show that
chemotherapy is less effective as a treatment for prostrate cancer?
No! There are other variables that may be confounding the explanatory variable (which is cancer treatment). In particular, the choice of treatment for
each patient was determined by the patients doctor (not by the researcher
who was doing the study), and the doctor is likely to consider a variety of
factors when deciding which treatment is appropriate. For example, some
patients may have been in such poor health or have other complicating health

28

CHAPTER 1. COLLECTING DATA

problems that surgery would have been too dangerous for these patients. In
these cases, the doctor would be more likely to recommend chemotherapy
instead of surgery. If unwell patients tend to be recommended more often
than healthier patients for chemotherapy, this could also explain why the patients who received chemotherapy tended to have a lower survival rate. So,
in this example, the patients general health before treatment (an extraneous
variable) and the cancer treatment received (the explanatory variable) are
confounded.
Placebo effect
The response by patients to any treatment in which they have confidence is
called the placebo effect. There are many surprising examples of the power
of the placebo effect in the medical literature. Here are a few:
Many studies have shown that placebos relieve pain in 30-40% of patients, even those recovering from major surgery.
One study found that when a group of balding men was given a placebo,
42% of the men either experienced increased hair growth or did not
continue to lose hair.
In an experiment to investigate the effectiveness of vitamin C in preventing colds it was found that those who thought that they were being
given vitamin C (but in fact received a placebo) had fewer colds than
those who thought that they were being given a placebo (even though
they were, in fact, receiving vitamin C)!
Remark: Because of the placebo effect, clinical trials of drugs and other
medical treatments have to be carefully designed. For example, suppose that
I wish to determine whether a certain medication was effective in reducing
blood pressure. A naive design for an experiment to investigate this question
might be to measure the blood pressure of 40 patients, give each patient
the medication, and then measure their blood pressure after a week on the
medication. The problem with this approach is that any improvement in
a patients blood pressure might be due to the fact that they expect the
treatment to work (i.e. might be due to the placebo effect) rather than due
to the action of the medication. In other words the placebo effect and the
effect of the medication are confounded.
Experimental design

1.2. EXPERIMENTATION

29

Weve seen above, that data from experiments and observational studies can
be invalid due to the confounding of variables. In order to avoid generating
invalid data, we need to develop statistically sound methods for conducting
experiments. In other words, we need a good experimental design (i.e. a good
plan for the experiment). The key idea behind most good experimental designs is comparative experimentation. The basic features of comparative
experimentation are:
1. Start with two equivalent groups.
2. Give the treatment to one of the groups (this group is called the experimental group). The other group (which is called the control group) is
treated in exactly the same way except that this group does not receive
the treatment.
Key point: Extraneous variables influence both groups, whereas the treatment only affects one group.
Warning: Although this experimental design addresses the problem of confounding variables, there is still some room for improving this design! In
particular, comparison will eliminate the problem of confounding only if we
have equivalent groups of subjects. The weakness in the design described
above is that it relies on dividing the units into two equivalent groups. But
how can we be sure that the groups are equivalent ? How can we make
sure that one of the groups isnt different from the other in some way that
leads to bias in the experimental results? In particular, how can we avoid
some hidden bias arising due to the way that units or subjects are assigned
to groups?
Just as in sampling, we were able to eliminate (some) sources of bias by selecting a random sample, in experimentation we can use random allocation
to improve our experimental design and reduce any bias in the results.
Implementation of random allocation
The implementation of random allocation of units to experimental groups is
similar to the method of selecting a simple random sample.
Example 1.2.4 Suppose that I want to test a new organic fertiliser on
tomato plants. I start with 40 plants and assign a number to each plant.
Then using a table of random digits, I select 20 of the 40 plants to be fertilised. The other 20 plants receive no fertiliser, but in every other way are

30

CHAPTER 1. COLLECTING DATA

treated exactly the same as the 20 plants in the treatment group. At the end
of the growing season I record the yield of each plant. This is an example of
a randomised comparative experiment because the units are randomly
assigned to groups.
Discussion: Because we have allocated the tomato plants randomly to the
two groups, there has been no favouritism in the allocation of the plants i.e. the composition of each group is roughly the same with respect to the
other extraneous variables such as the health and vigour of the plants. In
other words, neither group is more likely to consist of the healthiest plants
(or the weaker plants). This random allocation of plants to groups averages
out the effect of extraneous variables and ensures that the groups are roughly
equivalent.
The importance of randomised comparative experiments stems from the fact
that we can use the following argument to establish cause-and-effect based
on the results of a randomised comparative experiment:
Randomization produces groups of subjects that should be similar in
all respects before we apply the treatments
Comparative design ensures that extraneous variables other than the
experimental treatments operate equally on all groups.
Therefore, (significant) differences in the response variable between
groups must be due to the differences (and the effects) of the treatments.
Further discussion: The more subjects used in a randomised comparative
experiment, the more likely it will be that the treatment groups in the experiment will be roughly equivalent. For example, in the tomato experiment
described above, there is still a chance that when I randomly allocate plants
to groups I will (by chance) allocate many more healthy plants to the group
that gets the fertiliser than to the other group (this would be unlucky but
still possible). To reduce the chance that, in spite of random allocation, I end
up with unbalanced groups, I should make the treatment groups as large as
possible, since this would reduce the chance that one group has many more
healthy plants than the other.

1.2. EXPERIMENTATION

31

Summary - Principles of experimental design


1. Control the effects of extraneous variables on the response variable by
comparing two or more treatments.
2. Randomly allocate subjects to treatments.
3. Use enough subjects in each group in order to reduce the chance
variation in the results.
Important: In medical experiments it is also necessary to make sure that
the control group gets a placebo!
Example 1.2.5 Suppose that I want to investigate whether Echinacea (a
plant extract) boosts immunity against colds. I start as in the fertiliser
experiment, and randomly allocate subjects to groups. Now suppose that
one group takes Echinacea every day over the winter months and the other
group takes nothing, and suppose that I interview the participants on a
regular basis to find out whether they have contracted colds over the winter.
Discussion: The problem with this experimental design is that I cannot be
sure at the end of the experiment whether any difference between the incidence of colds is due to the effect of the Echinacea or due to the placebo effect.
The problem is that one group knows that it is taking medication to prevent
colds whereas the other group knows that it is not receiving treatment. So
a difference between the groups may be due to the placebo effect, and in any
case the explanatory variable (Echinacea treatment) and the placebo effect
are confounded.
Question: How can we improve the design of this experiment to eliminate
the confounding of the treatment and the placebo effect?
Answer: The solution is to conduct a randomised, double-blind experiment with a placebo control. In a double-blind experiment neither
the subjects nor the people who work with them know which treatment each
subject is receiving. So, to improve the Echinacea experiment, in addition
to randomising the allocation of subjects to groups, one group should receive the Echinacea and the other group should receive a placebo, and the
participants shouldnt know whether they are receiving the Echinacea or the
placebo. In addition, anyone else involved in the experiment (e.g. anyone

32

CHAPTER 1. COLLECTING DATA

who is involved in recording data from the subjects) shouldnt know who is
receiving the Echinacea and who is getting the placebo. Once the experiment
has ended (e.g. all the data has been collected), the blinds can be removed
so that a statistician can analyse the results.
Note: It is generally accepted that whenever possible, it is desirable that
medical experiments with human subjects are randomized, double-blind with
a placebo control.
Interpretation of results
How can we know that the differences observed between the treatment group
and the control group are significant - i.e. the differences are due to something other than just chance?
Example 1.2.3 again Lets consider the tomato fertiliser experiment again.
Suppose at the end of the growing season I harvest the tomatoes from each
plant and weigh them. I then use my data to compute the average yield for
each group and I obtain:
Average yield for fertilised plants: 3.95 kgs
Average yield for unfertilised plants: 3.58 kgs
Question: Since the average yield for the fertilised plants is greater than
the average yield for the unfertilised plants, do these results prove that the
organic fertiliser increases yield?
Discussion: We need to be careful about coming to hasty conclusions! The
problem is that even if both groups received no fertiliser there would still
be a difference between the yield of the first group of plants and the second
group of plants. This is because there will always be some chance differences
between the plants in the groups and this will result in chance differences
between their yields. In order to decide whether these results prove that
the fertiliser increases yield, a statistician must first work out how much
chance variation in the yields we would expect to see between the groups
if neither group is fertilised. Next, the statistician looks at the observed
difference between the unfertilised and fertilised groups. Now if this observed
difference is much greater than the difference that we would expect to see
between two untreated groups, we can conclude that the difference in the
yields is unlikely to be due to just chance variation and we conclude that
the best interpretation of the data is that the fertiliser increases yield. The

1.2. EXPERIMENTATION

33

statistician would report that the difference between the yields is statistically
significant.
An observed effect or difference of a size that would rarely occur by
chance is called a statistically significant effect or difference.
In practice, whether an observed effect or difference is statistically significant
will depend on both the magnitude of the observed effect or difference and
on the number of subjects in the experiment. You will learn more about
exactly how statisticians determine whether an observed effect or difference
is statistically significant in subsequent statistics modules. For now the main
point is that if you read that a result of an investigation is statistically
significant, you can conclude that the investigators found good statistical
evidence to support the claim that differences in the levels of the response
variable(s) are due to differences in the treatments imposed.
Difficulties and issues in experimentation
In the section on sampling we saw that even when we use an unbiased sampling method, there can still be problems with sampling that cannot be
avoided by using a good sampling method (e.g. non-sampling errors such
as nonresponse errors, leading questions, processing errors, etc.). Likewise,
randomised comparative experiments go a long way towards avoiding the
problem of invalid data, but we still need to be on the lookout for difficulties
in experimentation and in the interpretation of experimental results. Here
are (just a few) problems and issues that we need to watch out for:
Applicability of the results (can the results be extended ?)
A common problem with the interpretation of experimental results is that
the applicability of the results can be over-stated. We always need to look
carefully at how an experiment was conducted in order to determine to what
population the conclusions apply. In many experiments the researcher has
to select subjects from an available pool of subjects which may not be representative of the population to which the researcher would like to apply the
results. In this case the results will probably be valid for the subject pool
but the researcher must justify why the results can be applied to the larger
population.
Example 1.2.6 Various well-designed clinical trials have shown that using
drugs to reduce blood cholesterol in middle-aged men with high cholesterol

34

CHAPTER 1. COLLECTING DATA

also decreases their risk of a heart attack. Can we conclude from this trial
that, in general, reducing blood cholesterol decreases the risk of heart disease?
Discussion: The problem with drawing general conclusions about blood
cholesterol and the risk of heart attack from the results of these experiments
is that there may be important physiological differences between men and
women (or between men of different ages) which mean that blood cholesterol
level is not as important a risk factor for these other groups as it is for middleaged men with high cholesterol. Doctors, for example, need to be careful not
to assume that the results of clinical trials are applicable to types of patients
that were not part of the relevant trials.
Lack of realism in the experiment
Another (related) problem with the interpretation of experimental results is
that the experimental treatments (or some other feature of the experiment)
may be unrealistic.
Example 1.2.7 In order to determine whether a food additive is safe, it is
standard practice to test high doses of the additive on laboratory rats. The
additive is deemed unsafe if the experimental group develops significantly
more tumours than the control group.
Discussion: The decision to ban a food additive based on an animal experiment is an example of erring on the side on caution. It is important
to remember that such an experiment does not necessarily prove that the
additive is actually dangerous for human consumption. The problem is that
the experiment is not realistic: humans are not rats and typical doses of the
additive are usually much smaller than the doses given to the rats.
Psychologists and other social scientists often have to devise ingenious experiments to investigate psychological responses to various factors. The difficulty
with some of these experiments is that they are (necessarily) somewhat artificial - e.g. they are conducted in a laboratory, the subjects are aware that
they are participating in a psychological experiment, etc. As a result, one
needs to be careful about generalising the findings of such experiments to
real-world situations.
Dropouts, nonadherers, and refusals in experiments with human
subjects
Experiments with human subjects can be compromised by human behaviour!
When this happens, statisticians and researchers have to figure out how to

1.2. EXPERIMENTATION

35

make appropriate adjustments in order to try to reduce any bias that may
result from human behaviour. Some typical problems include:
Dropouts: Experiments that continue over a long period of time often
have subjects that dropout before the end of the experiment. It is very
important that researchers try to determine the reasons that participants drop out. In particular, the researchers should try to determine
whether the reason for dropping out is related to a feature of the experiment. For example, perhaps the subjects receiving one particular
treatment experienced unpleasant side effects and as a result decide to
stop participating. Clearly, their reason for dropping out is very relevant to the experiment and as a result the results of the experiment
may be biased because the dropouts did not complete the experiment.
Nonadherers: A subject who participates in an experiment but who
doesnt follow the experimental treatment is called a nonadherer. There
are many reasons why a subject may break the rules. For example,
an experiment might require participants to take a medication according to a very careful schedule over several weeks. The difficulty with
such an experiment is that people sometimes arent very good about
remembering to take medication. If subjects are not taking the medication according to the experimental guidelines it will be difficult to
determine what the true effect of the medication is!
Refusals: Human subjects have to agree to participate in experiments
and that means that individuals can refuse to participate! Now if there
is no particular reason or pattern to the refusals, then non-participation
of some of the selected subjects may not make any difference to the
validity of the experimental results. On the other hand, if those who
refuse to participate differ in some systematic way from those who
participate then bias can result.
More complicated experimental designs
There are a variety of ways that randomised comparative experiments can
be developed to make more complicated comparisons. We describe a some
common variants below.
Completely randomized design with multiple factors/levels

36

CHAPTER 1. COLLECTING DATA

Randomised comparative experiments can be used to investigate the combined effect of more than one variable on the response variable. Variables
can also be set at different levels in order to investigate the effect of the
level of a variable (e.g. dose of a certain drug) on the response variable. Here
is an example of an experiment with two explanatory variables that are set
at various levels:
Example 1.2.8 Clothing manufacturers usually recommend both the temperature (i.e. 30o , 40o , etc) and the cycle setting ( Cotton wash, Synthetic
wash, etc.) at which a garment should be washed. To determine the optimal temperature and cycle setting for a particular material, we can perform
a randomised comparative experiment with multiple factors and levels. In
this case the factors (i.e. explanatory variables) are temperature and cycle
setting and the levels are the various settings for temperature and cycle.
All possible combinations of temperature and cycle settings give us a total of
20 different treatments as shown in the diagram below (labelled by Roman
numerals):
30o
Cotton
I
Synthetic VI
Delicate
XI
Wool
XVI

40o
II
VII
XII
XVII

50o
60o 90o
III
IV
V
VIII
IX
X
XIII XIV XV
XVIII IX XX

To carry out the experiment, the researcher obtains 200 pieces of the same
fabric which have all been stained with the same substance, and randomly
allocates 10 pieces of fabric to each of the 20 treatments. At the end of the
experiment the washed pieces of fabric are examined to see how well they
have been cleaned, whether the dye in the fabric has run, etc.
Remark: This experiment allows the manufacturer to discover how the
interaction between between two explanatory variables (temperature and
cycle setting) effect the response variable(s).
Randomized Block Designs
Matching subjects in various ways can be used in conjunction with randomization to produce more precise results than would be obtained by a
simple randomised comparative experiment. This is a particularly useful in
the design of experiments where it is thought that extraneous variables (i.e.

1.2. EXPERIMENTATION

37

variables that are not part of the treatment) may have a big impact on the
response variable. In order to control the effects of these extraneous variables
in the experiment a block design can be used:
A block is a group of experimental units or subjects that are similar
with respect to some extraneous variables that are thought to affect
the response to the treatment in the experiment.
In a randomised block design experiment, the subjects are first grouped
into blocks and then, within the blocks, the subjects are randomly assigned
to treatments.
Note: In a randomised block design, the allocation of subjects to blocks is
not random! The subjects or units are grouped together according to some
characteristics that they have in common. After the subjects have been put
in blocks, the subjects within a block are randomly allocated to treatments.
Heres a simple example of a randomised block design:
Example 1.2.9 A pharmaceutical company wishes to compare the effectiveness of a new drug for reducing levels of LDL cholesterol to the effectiveness
of two commonly used treatments for high levels of LDL cholesterol. It is
thought that the effectiveness of any drug for reducing LDL levels is affected
both by the gender of the patient and by the initial level of LDL in the bloodstream. A total of 600 men and 400 women have agreed to participate in
the clinical trial of these treatments. In order to control for these extraneous
variables, the men are divided into blocks of men with similar levels of LDL
cholesterol and the women are divided into blocks of women with similar
levels of cholesterol. Within each block the subjects are randomly allocated
to treatments. Also, because this is an experiment with human subjects, the
experiment is double-blind - i.e. the subjects do not know which treatment
they are receiving and the staff running the experiment do not know which
treatment a patient has received.
Discussion: Blocking in the experiment above allows a researcher to get a
clearer picture of the differences between the treatments. This is because the
blocks have been chosen to equalise important (and unavoidable) sources
of variation between the subjects. Less important sources of variation are
then averaged out by randomly allocating treatments within the blocks. In
addition, by grouping similar subjects together before randomly allocating
treatments, the researcher can also separately investigate the responses of

38

CHAPTER 1. COLLECTING DATA

subjects in each block to the different treatments.


Question: What would have happened if the researcher had not chosen to
use a block design for this experiment?
Answer: If the researcher used a simpler design, e.g. a double-blind experiment where the subjects are randomly assigned to treatment groups with
no regard to gender or intial levels of LDL, the data from the experiment
could still be used to investigate differences between the responses to the
treatments but it would be harder to make precise statements about the
magnitude of these differences because each treatment group contains a mixture of (disimilar) subjects.
A matched pairs design is a special case of a block design. In a matched
pair design only two treatments are compared. The blocks consist of pairs
of subjects that are matched as closely as possible and the treatments are
randomly allocated. Sometimes a pair can consist of a single subject who
gets both treatments. In such experiments, the order that the treatments
are given is randomised (since the order of the treatments may influence the
subjects response). Here is an example of such an experiment.
Example 1.2.10 A large food manufacturer has developed a new recipe for
one of the pasta sauces that it manufacturers. In order to test whether this
new product will be more popular with consumers than the old product, the
company asks 50 of its employees to participate in a matched pairs experiment. In the experiment the employee sits in a cubicle which has a small
sliding door that connects to the companys product development kitchen.
The employee is given a small dish of pasta with sauce to taste and a questionnaire to fill out. After the employee has tasted the first dish and completed
the form, the dish and form are removed and the employee is given a second
dish of pasta to taste and a second questionnaire to fill out. At the end
of the tasting session, the employee is asked which sauce he prefered. The
order in which the employees are given the two different sauces is random
(i.e. essentially by flipping a coin).
Discussion: In the experiment described above each employee is a (perfectly!) matched pair since all extraneous variables (such as personal food
preferences, etc) are the same for each treatment. The order in which the
subjects taste the food is randomised to average out any effect that order
might have on the response variables.

1.2. EXPERIMENTATION

39

Some comments on observational studies


We have seen in this section that properly designed randomised comparative
experiments give us a powerful tool for trying to answer questions about
cause and effect. Unfortunately, there are situations where is it either not
possible (or ethical!) to set up a randomised comparative experiment to
investigate questions of cause and effect.
Example 1.2.11 It is not possible to design a randomised comparative experiment to establish that smoking increases the risk of heart attack. The
problem is that if we suspect that smoking is harmful to human health, it
would be unethical to investigate the effects of smoking by randomly dividing
a group of healthy subjects into two treatment groups and then require one
group to take up smoking in order to discover if the smokers are more likely
to have heart attack than non-smokers!
Example 1.2.12 Much research is done to investigate various differences
between men and women. In such studies, gender is the explanatory variable.
In a randomized comparative experiment we would like to randomly allocate
participants to treatments (i.e. levels of the the explanatory variable) but we
cannot randomly assign people to be either male or female! So, in this case,
we cannot design a truly randomized comparative experiment to investigate
differences between men and women.
Discussion: In the examples above (and in many other situations where it
is not practical to perform a randomised comparative experiment) we have to
rely on data from observational studies and this makes establishing causation
much more difficult. Nevertheless, we can use some of the principles of good
experimental design to improve an observational study. In particular, we can
use comparison and matching. For example, to study the whether smoking
increases the risk of heart attack we could start with a randomly selected
group of smokers. We would then select from a large group of non-smokers
individuals who match the individuals in the smoking group with respect
to any extraneous variable that we think might also be a risk factor for heart
attacks (e.g. age, gender, weight, blood cholesterol level, etc). Of course,
there may by other extraneous variables that affect the risk of a heart attack
of which we are unaware and for which we have not matched the groups!
In addition, a statistician may also try to make statistical adjustments for
confounding variables (such as weight, age, etc) in order to make a fairer

40

CHAPTER 1. COLLECTING DATA

comparison between smokers and non-smokers. Of course, statisticians need


to be careful about making such adjustments as this can introduce other
problems of bias.
Moral: We need to be careful whenever we encounter the results of an observational study. The best observational studies will be based on comparison
of matched groups and may have statistical adjustment for confounding variables. Nevertheless, adjustments can sometimes lead to bias and matching
wont be able to control for unknown confounding variables. So, we need to
be wary of any claims that an observational study has proved a cause-andeffect relationship!

1.3

Measurement

In both experimentation and sampling we are interested in studying some


property of the units in the study - e.g. political opinions, physical stamina,
reading ability, intelligence, blood pressure, etc. In order to make concrete
statements about the property, we (usually) attempt to find a way to measure the property numerically:
In statistical science, to measure a property means to assign numbers
to units as a way to represent the property.
Note: Deciding how to measure a property is an important part of any
statistical study. In other words, after we have decided how to sample (i.e.
how to select the sample and how to contact the sample) or how to conduct
the experiment (i.e. the experimental design, etc), the next problem is to
decide how to measure the property of interest.
To take a measurement we must have:
An object to be measured.
A (well-defined) property of the object to be measured.
A measuring instrument that actually assigns a number to represent
the property.
Example 1.3.1 Suppose that a researcher wants to investigate whether a
certain treatment for asthma is effective. The property to be measured in
the experiment is lung function before and after the treatment. To measure

1.3. MEASUREMENT

41

this property, the participant exhales into a peak flow meter and the level of
peak flow is recorded.
Note: Once the researcher has decided how to measure lung function, the
variable is defined in terms of the method of measurement. In this case, variable is the peak flow because that is what the researcher actually measures.
Now deciding how to measure a property is easiest when everyone clearly
understands the property that we propose to measure (e.g. height, weight,
etc.) The problems arise when the definition of the property to be measured
is imprecise or disputed.
Example 1.3.2 Suppose that a psychologist wants to measure intelligence.
In this case, there is an immediate problem because human intelligence is
a complex property and there is no universally accepted definition of it.
Without a clear understanding of intelligence, it is difficult for researchers to
agree how to measure it. For example, there is much debate about whether
the standard IQ test is an appropriate measure of a property that is as
complex as intelligence!
Heres another example that illustrates some of the issues that arise when we
try to measure properties in complicated situations.
Example 1.3.3 Suppose we wish to measure an individuals employment
status. Before we can measure this property, we need to define what we
mean when we say that someone is employed or unemployed or economically
inactive.
Note: Different organisations may have different ideas about what it means
to say that someone is employed! In the UK, the Office of National Statistics
has adopted the following definitions:
1. A person (aged 16 or over) is employed if in the previous week they
did at least one hour of paid work, or are temporarily away from a job
(e.g. on holiday), or are on a government training scheme, or have done
unpaid work for a family business.
2. A person (aged 16 or over) is unemployed if in the previous week they
did not have a job but they were available to start work within the next
two weeks and had either been looking for a job during the last four
weeks or are waiting to start a job that they have already secured.

42

CHAPTER 1. COLLECTING DATA


3. A person (aged 16 or over) is economically inactive if they are neither
employed or unemployed (e.g. someone who is looking after the family
and/or is retired).

Once we have precise definitions, we can measure employment status by


assigning each individual a number 1,2 or 3, according to which of the
above categories the individual belongs to. So the variable in this study is
the category to which each individual is assigned.
Note: In the example above the measurement does not quantify employment status, rather it categorises it.
Discussion: Precise definitions are important if we want to interpret statements such as: The unemployment rate in the period from May to July 2008
was 5.5%. This number is computed from a probability sample of households
across the UK and corresponds to the ratio
number unemployed in the sample
.
total number of unemployed and employed in the sample
We note that someone who does not have a job and has stopped looking for
work because they are discouraged is not classified as unemployed (they are
classified as economically inactive)! If we changed the definition of unemployed so that it included such people we would see an increase in the unemployment rate. Not surprisingly, most governments would object to such
a change in the definition! So measuring employment status (though not as
difficult as measuring intelligence) can still be controversial because there can
be political disagreements over the best way to define the categories. Similar
issues arise when we want to measure properties like inflation, poverty levels,
crime levels, etc.
The above examples illustrate some of the problems inherent in trying to
describe numerically a complex property or situation. Since there may not
be an ideal way to measure a given property, we need to develop criteria to
help us decide if one method of measurement (though not perfect) is better
than another method.
Validity
One important question to ask about a variable is whether it is a valid
measure of the property:

1.3. MEASUREMENT

43

A variable is a valid measure of a property if it is relevant and appropriate as a representation of that property
Discussion: The method of measuring employment status described above
is an example of a valid (though not perfect) measure of the property. It is
both relevant and appropriate.
In contrast, heres a (silly) example of an invalid measurement:
Example 1.3.4 Suppose that I want to measure intelligence. To do this, I
decide to measure an individuals height in centimetres. Clearly, it is invalid
to measure intelligence by measuring someones height because height is neither relevant nor appropriate as numerical representation of someones
intelligence!
Here is a more subtle example of invalid measurement:
Example 1.3.5 Suppose that a small business wishes to measure the level
of employee job satisfaction. It surveys its employees and discovers that 65
of its employees who are under 50 are Satisfied or Very satisfied with their
job, whereas only 43 of its employees who are 50 or older are Satisfied or
Very satisfied with their job. Can we conclude from these numbers that the
younger employees are more satisfied with their jobs than the older employees?
No! The company employs 56 people who are 50 or older and 175 people
who are under 50. So the satisfaction rate for younger employees is
65
= 0.371, or 37.1%,
175
whereas the satisfaction rate for older employees is
43
= 0.768, or 78.8%.
56
This satisfaction rate is a more valid measure of the level of job satisfaction
than a simple count of the numbers.
Note: The above example illustrates that often the rate (i.e. fraction,
proportion, or percent) at which something occurs is a more valid measure
than a simple count of occurrences.

44

CHAPTER 1. COLLECTING DATA

Lastly, lets think again about the problem of how to measure intelligence.
Because there continue to be debates about how to define intelligence, there
continue to be debates about how to measure it - e.g. is the score of the
standard IQ test a valid measure of intelligence?
One way to resolve debates over whether a variable is a valid measure is to
claim (instead) that the variable has predictive validity:
A measurement of a property has predictive validity if it can be used
to predict success on tasks that are related to the property to be
measured.
Discussion: So, for example, instead of arguing about whether an IQ score
is a valid measurement of intelligence, we could claim (instead) that an IQ
score is valid for predicting success on (for example) school assessments.
Key Point: We can use data to investigate whether a variable has predictive
validity - e.g. by looking at an individuals IQ score and their results on
school assessments we can investigate the claim that IQ scores are valid
for predicting success on assessments. In contrast, data consisting of only IQ
scores cant really help us decide whether an IQ score in a valid measurement
of intelligence!
Accuracy in measurement
Once we have decided how to measure a property (i.e.we have decided what
variable to measure ), we need to consider the process of taking the measurements. Ideally, we want our measurement process to be unbiased and
reliable.
A measurement process is unbiased if it does not systematically overstate or understate the true value of the variable measured.
A measurement process has random error if repeated measurements on
the same individual give different results. If the random error is
small, we say that the process is reliable - i.e repeated measurements
on the same individuals give the same (or approximately the same)
results.
Example 1.3.6 Lets look again at the process of measuring peak flow (as
a representation of lung function). The measurement process is reliable if,
when I take repeated measurements on the same person, I get (more or less)
the same reading on the meter. On the other hand, if the meter is faulty (e.g.

1.4. LOOKING AT DATA INTELLIGENTLY

45

perhaps it gets stuck), then the measurement process will be biased because
the meter will always tend to record a peak flow value that is smaller than
the true peak flow.
If our measurement process appears to be reliable even though it is subject
to random errors, we can improve our measurement by averaging several
measurements taken on the same unit to obtain a more reliable (less variable)
measurement.
Example 1.3.7 Every week I monitor my rabbits health by weighing him
using my kitchen scales. I take three measurements because the measurements tend to vary by about 25g (plus or minus). I average these measurements to obtain a more reliable measurement of my rabbits weight.
Note: By averaging measurements we can improve the reliability of our
measurement process, but this does not reduce bias. Bias depends on how
good the measurement instrument is! In the example above, this means that
we need to know whether I have a good set of scales - e.g. do they tend to
give (more or less) the right answer? Or, do they tend to give readings that
are either too large or too small?
Further Discussion: In many situations (as in the examples above), we
can check whether a measurement process is reliable by taking repeated measurements on the same units. However, sometimes researchers have to use
more complicated ways of checking reliability. For example, it is difficult to
check the reliability of psychometric tests because, if the same person takes
a particular type of test over and over again, they will learn how to take
the test and their scores will increase with repeated attempts.

1.4

Looking at data intelligently

Whenever we look at data or summaries of data, there are a few questions


to keep in mind!
Where did this data come from and can it be verified?
It is important to know the source of the data in order to decide whether to
trust it. We can also check the numbers and the statistical methodology if
we know the source! Here are some typical categories of data:

46

CHAPTER 1. COLLECTING DATA


Anecdotal data: Anecdotal data is usually based on a few cases or
examples of some phenomenon. For example, a few years ago there
were a number of stories in the press about the (supposed) increase in
the incidence of certain childhood cancers amongst families living close
to electricity substations. The problem with this sort of data is that it
has not been collected in a statistically sound way, so any conclusions
from such data are dubious (at best). Subsequently, more scientifically
rigorous investigations have shown that the difference between the incidence rate near substations and the rate amongst children who do not
live close to substations is not statistically significant (i.e. the observed
difference can be explained by chance variation).
Moral: Anecdotal data can motivate further research, but on its own
it is unreliable.
Expert opinion: Sometimes data is presented by experts (e.g. government economists, consultants, scientific advisers to environmental
groups, etc.) to make a case for some action or to prove a point. When
weighing up data presented by an expert, we must consider whether we
trust the expert (i.e what are the experts credentials and how much
experience do they have). It is also worth considering: Where did the
numbers come from and how carefully were they analysed? Is the expert subject to various political pressures? Does the experts judgment
more or less agree with the judgment of others in the field, and if not,
what explains the discrepancy?
Warning: Experts are often tempted to extrapolate from data to predict future trends and this can be very risky!
Sampling or experimental data: We should be wary about accepting conclusions from survey or experimental data if we do not have
information about how the survey or experiment was conducted. If in
doubt, we should be able to obtain information about the survey methods or the experimental design. Be wary of published results which
cannot be verified!

Does the data make sense?


You should always look closely at numbers and summary statistics and ask
whether the numbers make sense! There are lots of different ways that
numbers can be silly. Here are just a few examples:

1.4. LOOKING AT DATA INTELLIGENTLY

47

Meaningless comparisons (or numbers that are clearly guesses):


Example A cigarette manufacturer claims that its brand contains 13%
less nicotine. But what does that statement mean? 13% less than
what? This is a meaningless comparison.
Example A newspaper reports that three million people suffered from
the winter vomiting bug last year. This number is a guess based on
2000 cases that were confirmed in the lab. The calculation was based
on a guess that for every confirmed case there are roughly 1500 cases
in the community. The figure of three million (roughly 5% of the UK
population) is eye-catching but not substantiated.
Internal inconsistency: Do the numbers fit together in a way that
makes sense?
Example Suppose that 20 students take an exam which is marked out
of 100. Does the statement : 23% of the students scored 70 or above on
the exam , make sense? No! Any percentage of 20 has to be a multiple
of 5, so the numbers stated above dont fit together in a consistent
way.
Implausible numbers: It is also a good idea to ask whether the
numbers are plausible:
Does the order of magnitude seem correct?
Are the numbers in line with a quick back of the envelope calculation?
Implausible numbers often arise as the result of typos (e.g. accidentally adding or dropping a 0 can make a big difference!)
Numbers which are too precise or regular: Numbers which are
too precise or regular are sometimes an indication that the data are
phony. A little checking is always a good idea.
Arithmetic mistakes: Watch out for bad arithmetic, especially calculations involving rates or percentages.
Example: I own shares in VBB (Very Big Bank) worth 100. Yesterday their value fell by 20%, but today their value increased by 20%. I

48

CHAPTER 1. COLLECTING DATA


am relieved because my shares are still worth 100!
No! 20% of 100 is 20, so yesterday the value of my shares fell to
100 20 = 80.
Now, 20% of 80 is 16, so today the value of my shares is
80 + 16 = 96 6= 100!
Another warning: An increase of 100% means that the quantity
in question has doubled, whereas a decrease of 100% means that the
quantity is now 0. A quantity cannot decrease by more than 100%!
Example: The Accommodation Office reports that 13% of first year
female students live at home and 15% of first year male students live
at home. So, overall, 28% of first year students live at home.
No! Percentages which correspond to distinct groups cannot be added!
To compute the correct answer, we need to know either the number of
female and the number of male entrants or the percentage of entrants
who are female and who are male, respectively:
Given that there are 635 female entrants and 953 male entrants, the
proportion of entrants that live at home equals
(0.13)(635) + (0.15)(953)
= 0.142, or 14.2%.
1588
Alternatively: 60% of the entrants are men and 40% of the entrants
are women, so the percentage of entrants that live at home equals
(0.13) 40% + (0.15) 60% = 14.2%.

Is the information complete?


Always think about whether the information is complete and relevant. A
common way that data can be used to mislead is to omit information that
would allow us to correctly interpret the given information. Here are a
couple examples:
Example A Maltesers advert announced:
At less than 11 calories each, youll need new ways to be naughty: Maltesers
- the lighter way to enjoy chocolate.

1.4. LOOKING AT DATA INTELLIGENTLY

49

The problem with this information is that it gives the impression that
Maltesers are a low calorie sweet because each one only has 11 calories.
However, under EU regulations a low calorie food is one that has fewer
than 40 calories per 100 g. Maltesers contain 505 calories per 100 g!
Note: The information given was correct but the advert didnt tell you the
full story!
Example In order to provide consumers with information about energy efficiency, many products are now rated using a letter scale with A corresponding to greatest energy efficiency and G corresponding to the lowest energy
efficiency.
In a brochure produced by a well-known double-glazing company, the company points out that its standard windows are all B-rated. The brochure
also points out that the minimum standard of energy efficiency required for all
new windows installed in Scotland is a D-rating . This information creates
the impression that the companys windows are much more energy efficient
than its competitors! What is missing is information about the energy ratings
of windows supplied by other companies - i.e. it is not really relevant what
the minimum standard is if most companies also supply B-rated windows.
(A little checking on the Web shows that many of this companies competitors
also supply B-rated windows as standard!)
Again, the information provided is correct but we need more information
in order to properly understand how special or otherwise the companys
windows are.

Você também pode gostar