Você está na página 1de 16

Chapter 1

Challenge: page 31

Section B: Proportional reasoning


974
0.2119
1.
(a) (i)
(ii)
4597

2399
0.5219
4597

96
0.3967
242

96
0.0437
2198

(b)

The corrected versions of the false statements are:


d.

945
0.2056
4597

In tables designed to convey information quickly and easily, include row or column
2.

averages if appropriate.
l.

If a plot of numeric data has a long right tail we say the data are positively skewed.

p.

An outlier should be removed from the data set only if it is found to be a mistake.

r.

For highly skewed data, the sample median is a more sensible measure of the centre than
the sample mean

u.

About 75% of the observations have a value less than the upper quartile.

(a)
Under 40

Alternative: About 25% of the observations have a value greater than the upper quartile.
v.

(c)

(iii)

bb. It is possible to explore more than two variables at a time, by using techniques such as
colour gradients and subsetting.

Total

Mild cases

16

20

36

Serious cases

15

35

50

Total

31

55

86

16
0.1860
86

(b)

(i)

(c)

15
0.3
50

Patterns we see in the data may not be facts.

40 or over

(ii)
(d)

20 35 15
0.8140
86
20
0.3636
55

(iii)

35
0.4070
86

3. (a)

Sample Exam / Cecil Test Questions: Pages 32 to 39


1.

(c)

2.

9.

(a)

3.

(a)

4.

(e)

5.

(c)

6.

(e)

7.

(a)

8.

(a)

(b)

10. (d)

11. (c)

12. (d)

13. (e)

14. (c)

15. (c)

16. (c)

17. (a)

18. (b)

19. (b)

20. (d)

21. (a)

22. (c)

23. (c)

24. (a)

25. (a)

26. (e)

Tutorial: Pages 40 to 43

(a)

Scatter plot

(b)

Side-by-side plot on the same scale. (Dot plot, box plot, histogram.)

(c)

Bar graphs of the three countries, showing proportions for each level of Inversions.

(d)
2.

Total

300

500

Not in default

2185

77% of 9500 = 7315

9500

Total

2385

7615

10000

Dont Drink Daily

Total

23.85%

(c)

200
0.0839
2385

(d)

300
0.0394
7615

(e)

200
300 0.0839

2.129
2385 7615 0.0394

4.
Drink Daily

Side-by-side plot on the same scale. (Dot plot, box plot, histogram.)

Answers will be released on Canvas after the tutorial.

(2)

Lecture and Tutorial Answers: Chapter 1

Low-risk

(b)

Section A: Exploring data


1.

High-risk
40% of 500 = 200

In default

Male drinker

19% of 5100 = 969

4131

5100

Female Drinker

10% of 4900 = 490

4410

4900

Total

1459

8541

10000

490
0.3358
1459

Page 1

Chapter 2

Tutorial: Page 9

Challenge: page 7

1.

(i)

The corrected versions of the false statements are:


d.

Study 2: We are comparing males and females.

Media reports which describe two variables as being linked or associated cannot be

Study 3: We are comparing the old style and new style of television commercials.

interpreted as a change in one variable will cause a change in the other variable.
h.

(ii)

If one of the treatment levels in an experiment involving people is a placebo, then blinding
should be used.

2. (e)

Study 1: To make the comparison, we are measuring the survival of the cats.
Study 2: To make the comparison, we are measuring the clothing expenditure for the
next 3 months.

Sample Exam / Cecil Test Questions: Page 8


1. (a)

Study 1: We are comparing cats that fell from 1 or 2 storeys, cats that fell from 3 5
storeys and cats that fell from 6 or more storeys.

Study 3: To make the comparison, we are measuring the recall scores for the
commercials.

3. (b)

(iii) Study 1: An observational study. There is no allocation (by the researcher) of subjects
(cats) to the number of storeys of the fall. Results are simply observed for cases that
happen.
Study 2: An observational study. There is no allocation (by the researcher) of subjects
(students) to the groups (male or female).
Study 3: An experiment. The researcher allocates which commercial is to be watched
by each subject (shopper).
(iv) It is not possible to do an experiment for study 1 due to ethical and moral
considerations. To do an experiment a sample of cats would have to be allocated a
height and then thrown out of a window at that height.
It is not possible to do an experiment for study 2 as the researcher cannot allocate a
gender to a student.
2.

(2)

False.
The corrected version of the false statement is:
Random allocation of treatments to subjects does not guarantee comparability of
treatment groups, even when we only have small numbers of subjects.

Lecture and Tutorial Answers: Chapter 2

Page 2

Chapter 3

3.

(a)

(i)

Blinding was used in this study the technician doing the cleaning and assessing
the results was not aware of which version of oven cleaner was used. (The ovens
were also unaware of which cleaner was used.)

(ii)

The control group is the current version of the oven cleaner.

Challenge: page 13
The corrected version of the false statement is:
i.

When the tail proportion is large then the observed difference is not unusual when chance

(iii) There was no blocking in this study. (No other factors apart from version of oven

is acting alone therefore chance COULD have been acting alone. All we can say is that we

cleaner were taken into account.)

have no evidence against chance was acting alone but this cannot be interpreted as
chance was acting alone.

(b)

The results of this study can be used to establish that any difference in mean
effectiveness score between the two versions of oven cleaner was caused by the
differences in the oven cleaners as there was random allocation of ovens to the two
treatment groups (current and new versions of oven cleaner).

(c)

Our tail probability of 0.17 means the observed difference is not unusual when chance
is acting alone, therefore chance COULD be acting alone.

(d)

We have no evidence that the new version of the oven cleaner gives higher
effectiveness scores, on average, than the current version.

(e)

We cannot conclude that the new version of the oven cleaner has the same
effectiveness scores, on average, as the current version - chance COULD be acting
alone OR something else as well as chance COULD also be acting (we dont have
enough information to determine which one of these two possibilities applies).

(a)

Blinding could be used in this study, if the male patients did not know the type of
surgery they had.

(b)

The Re-randomised data plot shows one of the 1000 re-randomisations under
chance alone. In this re-randomisation the diiference between the re-randomised
20
22
proportions, under chance alone, is

0.008 .
228 231

(c)

Our tail proportion of 0.069 means that, when chance is acting alone, a difference
between the group proportions of 0.045 or more is highly unlikely. In the actual study,
an observed difference between the two proportions of 0.045 or more would have been
highly unlikely if chance had been acting alone, therefore we are pretty sure that
chance was not acting alone in the actual study.

(d)

We can conclude that the type of surgery had an impact on whether the male patient
had a major event (death, heart attack or stroke) within 2 years of their surgery. We
are pretty sure that chance was not acting alone in the study and the study is a
well-designed experiment in which the male patients were randomly allocated to one
of the two types of surgery so a causal claim may be made.

Sample Exam / Cecil Test Questions Pages 13 and 14


1. (c)

2. (b)

3. (e)

4. (c)

Tutorial: Pages 15 to 17
1.

In a randomisation test we are assessing the plausibility of the explanation that an observed
difference between two groups is solely due to chance, i.e., is due to chance acting alone
We can say that the chance alone explanation is implausible if our observed difference is
unusual when chance is acting alone.
In a randomisation test, we randomly re-assign each unit to a group and, with only chance
acting, we calculate the difference between the two groups.
We repeat this re-randomisation a large number of times, and plot all of the differences to
get the re-randomisation distribution. We can then see whether our observed difference
is unlikely under chance alone.

(a)

The tail proportion is the proportion of times we get a difference at least as big as our
observed difference when chance is acting alone.

(b)

(c)

4.

When the tail proportion in the re-randomisation distribution is less than 5% then:

the observed difference would be unlikely when chance is acting alone,


therefore its a fairly safe bet chance isnt acting alone.

we have evidence against chance is acting alone

we have evidence that chance is not acting alone

When the tail proportion in the re-randomisation distribution is bigger than 5%


then:

the observed difference is not unusual when chance is acting alone, therefore
chance COULD be acting alone

we have NO evidence against chance is acting alone

chance COULD be acting alone OR something else as well as chance COULD


also be acting (we dont have enough information to determine which one of these
two possibilities applies).

Lecture and Tutorial Answers: Chapter 3

Page 3

Chapter 4
Challenge: Page 7
The corrected versions of the false statements are:
d. Sampling errors are smaller in larger samples than in smaller samples.
j. This is a common misconception. The size of the sampling error is not dependent on the
size of the population.
n. Using random sampling in statistical surveys does not guarantee that each sample(s) will be
representative but it ensures that in the long run over repeated samples of data, the
samples will, on average, be representative.

Sample Exam / Cecil Test Questions: Page 8


1. (b)

2. (a)

3. (c)

4. (e)

Tutorial: Page 9

1.

(4)

Selection bias and self-selection bias.

2.

(3)

The Hobbit, Wild Swans, The Power of One, April Fools Day.

3.

(3)

False.
The corrected version of the false statement is:
It is unlikely that sophisticated sampling projections can correct the results if the
population you are sampling from is different to the one of interest.

4.

(5)

Lecture and Tutorial Answers: Chapter 4

Page 4

Chapter 5

6.

(a)

The parameter we are interested in is the proportion of New Zealand adults who, in
2015, had trust and confidence in Members of Parliament.

(b)

0.25

(c)

The blue vertical lines in the re-sample plot represent 1000 re-sample proportions

Challenge: Page 13
The corrected version of the false statements is:
d

A bootstrap re-sample is obtained by random sampling, with replacement, from the

taken with replacement from the original sample. They show the extent of the variation

population.

in the proportions in these 1000 re-samples.


(d)

Sample Exam / Cecil Test Questions: Pages 13 to 15


1. (b)

2. (d)

3. (d)

4. (e)

7.

(a)

5. (c)
(b)

Yes. 10 percentage points is inside the bootstrap confidence interval.

(c)

It is a fairly safe bet that the proportion of New Zealand adults who, in 2015, had trust
and confidence in Members of Parliament is somewhere between 2 and 12 percentage
points higher than the corresponding proportion in 2013.

(d)

We dont know but it is a fairly safe bet that it is in this interval.

Tutorial: Pages 16 to 18
1.

A parameter is a numerical characteristic of a population or distribution.


An estimate is a known quantity calculated from the (sample) data to estimate an unknown
parameter.

It is a fairly safe bet that the proportion of New Zealand adults who, in 2015, had trust
and confidence in Members of Parliament is somewhere between 0.21 and 0.28.
The lower and upper limits of the bootstrap confidence interval were obtained by not
including the bottom and top 2.5% re-sample proportions.

The process of using sample data to try and make useful statements about an unknown
parameter is called sample-to-population inference.
2.

We form intervals of believable values rather than just stating individual estimates to give
an indication of the level of uncertainty in the estimate.

3.

(a)

The parameter we are interested in is Female , the mean number of text messages sent
per day by females for the population.

4.

5.

(b)

We do not know the value of the parameter.

(c)

Our estimate of the parameter is the sample mean: xFemale = 20.92.

(d)

The re-samples are randomly selected from the original sample with replacement using
the same sample size as the original sample.

(e)

It is a fairly safe bet that the mean number of text messages sent per day by females
is somewhere between 14.3 and 28.4.

(a)

Each point in the bootstrap distribution represents the difference between the female
and male means when the original sample has been re-sampled with replacement.

(b)

It is a fairly safe bet that the mean number of text messages sent per day by females
is somewhere between 12.7 lower and 14.5 higher than the mean number of text
messages sent per day by males.

(c)

As 0 is in the bootstrap confidence interval, it is believable that, on average, males and


females send the same number of text messages per day.

Selection bias the sample is only from one company and is also only for the number of
calls made on Tuesdays.

Lecture and Tutorial Answers: Chapter 5

Page 5

Chapter 6

4.

Challenge: Page 20
The corrected versions of the false statements are:
q

(a)

(3)

(b)

(5)

5. (a)

(i)

Parameter

(ii)

x , the mean exam mark for the sample of thirty STATS 108
Estimate
students = 31.97 marks.

There is no way of knowing whether a confidence interval actually contains the true

unknown value of the parameter. We simply take comfort in the fact that the method
works (i.e., produces confidence intervals that do contain the true value) most of the
time. For example, approximately 95% of 95% confidence intervals contain the true
value of the parameter.
Increasing the level of confidence increases the value of the t-multiplier and hence
increases the width of the confidence interval but has no effect on the value of the
estimate.

(iii) Formula = estimate t se(estimate) gives x t se( x )


(iv) se( x ) = 1.7251
(v)

2. (c)

3. (d)

4. (b)

5. (a)

x t se( x ) = 31.97 2.045 1.7251 = 31.97 3.5278 = (28.44, 35.50)

(vii) There are many ways of interpreting a confidence interval. Two different ways
follow.

6. (a)

(b)
Tutorial: Pages 22 to 25
Section A: Confidence intervals for a single mean or proportion
1.

2.

3.

t-multiplier = 2.045

(vi) 95% confidence interval for :

Sample Exam / Cecil Test Questions: Page 21


1. (b)

= , the population mean exam mark for the STATS 108 exam.

(1)

With 95% confidence, we estimate that the population mean exam mark is
somewhere between 28.44 and 35.50 marks.

(2)

With 95% confidence, we estimate that the population mean exam mark is
31.97 with a margin of error of 3.53.

We dont know. The population mean mark is not known so we dont know whether
this particular 95% confidence interval contains the population mean. However, in the
long run, the population mean will be contained in 95% of the 95% confidence intervals
calculated from such samples.

A measure of the amount a sample estimate varies from sample to sample is called the
standard error of the estimate.

Section B: Confidence interval for a difference in means or proportions

It roughly measures the average distance between an estimate and the population
parameter over all possible samples of a given size that can be taken from the population.

1.

(4)

2.

(a)

Situation (b): Single sample, several response categories

(b)

Situation (a): Two independent samples

(c)

Situation (c): Single sample, two or more Yes/No items

(d)

Situation (a): Two independent samples

(a)
(b)

Situation (c): Single sample, two or more Yes/No items .


(i) Parameter = pW pG, the true difference in the proportion of white Spanish

If we were to take a huge number of samples of 40 days and calculate their sample means
then we estimate that the average distance these sample means would be from the true
population mean would be roughly 2.12.
The formula for calculating a confidence interval is:
estimate t-multiplier standard error(estimate)

3.

For a specified level of confidence and number of degrees of freedom, a t-multiplier is the
number of standard errors between the estimate and each confidence limit.
For a given level of confidence, the t-multiplier decreases as the degrees of freedom
increase.
The t-multiplier multiplied by the standard error of the estimate is called the margin of error
and is half the width of a confidence interval.

Lecture and Tutorial Answers: Chapter 6

(ii)

prisoners who were infected with TB and the proportion of Gypsy Spanish
prisoners who were infected with TB.
Estimate pW pG , the difference in the proportion of the sample of white

Spanish prisoners who were infected with TB and the proportion of the sample of
Gypsy Spanish prisoners who were infected with TB.
496 74

0.5598 0.4868 0.0730


886 152
(iii) Formula = estimate t se(estimate) gives ( pW pG ) t se( pW pG )

Page 6

(iv) se( pW pG )
Sampling situation (a)
se( pW pG ) 0.0438
(v)

For a 95% confidence interval with df = use t=1.96

(vi) 95% c.i. is: ( pW pG ) t se( pW pG )


= 0.0730 1.96 0.043837
= 0.0730 0.08592
= (0.0129, 0.1589)
(vii) With 95% confidence, we estimate the proportion of white prisoners who were
infected with TB to be somewhere between 1.3 percentage points lower than and
16 percentage points higher than the proportion of Gypsy prisoners who were
infected with TB.
5.

(3)

Lecture and Tutorial Answers: Chapter 6

Page 7

Chapter 7

3.

Challenge: Page 18
The corrected versions of the false statements are
h.

In a t-test, the sidedness of the alternative hypothesis (one-sided or two-sided) is


determined by what we expect to be true if the null hypothesis is not true, i.e., if parameter

l.
q.
x.

= hypothesised value is not true, do we expect parameter > hypothesised value (1-sided),
or parameter < hypothesised value (1-sided), or dont we know which direction, in which
case we use parameter hypothesised value
(2-sided). We must not use the data to decide which relation (>, < or ) to use in the
alternative hypothesis.
A large P-value means that we have nothing against the null hypothesis so it could be true
. . . which doesnt mean that it is true! (See Question (p).)
Same as l above. Nonsignificant results (large P-values) do not mean that the null
hypothesis is true, they mean it could be true.
See Example 1, page 12, (birth month effect on height example). It is possible to have
established the existence of an effect (small P-value, statistical significance) but for that
effect to be so small as to be of no practical importance/significance. (See Question (w).)

P-value

Evidence against H0

> 0.10
0.10

none
weak

0.05
0.01

some
strong

0.001

very strong

4.

P-value < 5%

5.

Nothing.

6.

A confidence interval.

7.

A one-tailed test is used when the investigators have good grounds, before the study began,
for believing the departure from the null hypothesis goes in one particular direction.
Otherwise, or if in doubt, a two-tailed test is used. Good grounds mean that there is prior
information or there is a theory to tell the investigators which way the study is likely to go.

8.

The t-test statistic measures the number of standard errors the estimate is away from the
hypothesised value.
The more standard errors the estimate is away from the hypothesised value, the larger the
magnitude of the t-test statistic.
The larger the magnitude of the t-test statistic, the smaller the resulting P-value and hence
the stronger the evidence against the null hypothesis.

Sample Exam / Cecil Test Questions: Page 19


1. (b)

2. (c)

3. (e)

4. (b)

5 (d)

The smaller the magnitude of the t-test statistic, the larger the resulting P-value and hence
the weaker the evidence against the null hypothesis.

6 (e)
9.

Tutorial: Pages 20 to 22

When we deal with studies in which the data have been produced by:

random assignment of units to treatment groups (an experiment) we can make


experiment-to-causation inferences.

random sampling of units from a population or populations we can make sample-topopulation inferences.

Section A: Quiz
1.

We test the null hypothesis and determine how much evidence we have against it.
The null hypothesis usually takes a sceptical point of view: the researchers hunch is
nonsense, there is nothing new or interesting happening, there is no effect.
In most situations the researcher hopes to disprove or reject H0.
The alternative hypothesis corresponds to the research hypothesis. It usually takes the
form that something is happening, there is a difference or an effect, there is a relationship.
In most situations the researcher hopes to give support to H1 by showing that H0 is not
believable.

2.

To measure the strength of evidence against the null hypothesis, we calculate a P-value.
The P-value is the conditional probability of observing a test statistic at least as extreme as
that observed, given that the null hypothesis is true.
We can estimate P-values either by a theory-based approach (e.g. t-tests) or a simulationbased approach (e.g. randomisation tests).

Lecture and Tutorial Answers: Chapter 7

Page 8

Section B: Doing Tests by Hand


1.
(a) Parameter = pW pG, the true difference in the proportion of white Spanish prisoners

Section C: Interpreting Output and Interpretation Issues


1.
(a) Parameter = 1 2, the difference between the mean daily revenue for laundry 1 and

who were infected with TB and the proportion of Gypsy Spanish prisoners who were
(b)

infected with TB.


H0: pW pG = 0

(c)

H1: pW pG 0 (2-sided hypothesis)

(d)

Estimate pW pG , the difference in the proportion of the sample of white Spanish

(e)

(c)

H1: 1 - 2 0

(d)

t0

(e)

(ii)

(f)

Use estimate t se(estimate)


estimate = 0.0730, se(estimate) = 0.043837, t = 1.96
95% confidence interval is: 0.0730 1.96 0.0438 = (0.0128, 0.1588)

We have weak evidence that the proportion of white Spanish prisoners who were
infected with TB is different to the proportion of Gypsy Spanish prisoners. We estimate
that the proportion of White prisoners who had TB is somewhere between 1
percentage point lower than and 16 percentage points higher than the proportion of
Gypsy prisoners who had TB.

Lecture and Tutorial Answers: Chapter 7

With 95% confidence, we estimate that the mean daily revenue of the first laundry
is somewhere between $1.96 less than and $69.41 more than the mean daily
revenue of the second laundry.
If the true difference in population means is somewhere in this interval, then it
could be as small as $1.96 (which is not of practical importance) or as big as
$69.41 (which is of practical importance).

The observed difference, 0.073, is not statistically significant at the 5% level (even
though we have weak evidence against H0, it is not strong enough for the test to be
statistically significant at the 5% level).

(i)

We have some evidence:


- against H0 in favour of H1.

The observed difference, $33.70, is not a statistically significant result at the 5%


level (even though we have evidence against H0, it is not strong enough for the
test to be statistically significant at the 5% level).

We have weak evidence:


- against H0 in favour of H1.

With 95% confidence, we estimate that the proportion of White prisoners who had TB
is somewhere between 1.3 percentage points lower than and 16 percentage points
higher than the proportion of Gypsy prisoners who had TB.

33.724 0
1.933 .
17.449

- that the mean daily revenue for laundry 1 is not the same as that for laundry 2.

0.0730 0
1.6667
0.0438

(h)

(2-sided hypothesis)

- that a laundry effect exists for daily revenue.

- that the proportion of white Spanish prisoners who were infected with TB is not the
same as the proportion of Gypsy Spanish prisoners who were infected with TB.

(g)

(i)

Sampling situation (a)

(iii) t0
(f)

that for laundry 2.


H0: 1 - 2 = 0

The estimated difference is 1.933 standard errors away from the hypothesised
difference.

prisoners who were infected with TB and the proportion of the sample of Gypsy
Spanish prisoners who were infected with TB.
496 74

0.5598 0.4868 0.0730


886 152
estimate hypothesised value
(i)
t0
std error
(ii)

(b)

We have some evidence that there is a difference between the mean daily revenues
for the two laundries, with the mean daily income for laundry 1 being higher. We do
not have sufficient information to be able to determine whether the true difference in
mean daily incomes is big enough to be of any practical importance.
Even though it is plausible that the difference between the two laundries is so small as
to be of no practical importance, it is also plausible that the mean daily income of
laundry 1 is sufficiently greater than that of laundry 2 as to be of practical importance.
Therefore we should recommend laundry 1.

2.

(i)

P-value < 0.05

(ii)

P-value > 0.05

Page 9

Chapter 8

2.

Exercises for discussion


Page 4
1.

similar as possible with respect to all other variables, such as parents education level. It
allows us to classify all other explanations for the observed difference, apart from

What might be an explanation (other than the breastfeeding) for the significant difference
in the mean GCI scores between the breastfed and the non-breastfed infants?

breastfeeding, as chance explanations. If we find a significant difference between the two

This observational study allows us to claim that breastfeeding (the factor of interest) and
GCI scores (the response) are related:

Breastfeeding

The study would need to be a randomised experiment. We would randomly determine


which mothers would breastfeed and which mothers would not. The random assignment of
mothers to breastfeed and not to breastfeed is an attempt to have the two infant groups as

groups then we may conclude that the breastfeeding is the real cause of the difference.
Whether it would be possible or even ethical to randomly direct mothers as to how to feed
their infants is another issue altogether.

GCI score
3.

a.

The population to which the link between breastfeeding and GCI may apply should
not be systematically different from the sample of infants recruited in this study. We
would need to be able to reasonably assume that the 323 recruited infants were a
random sample from the described population. For example, the link between
breastfeeding and GCI may not hold true for a population which included infants of
other races or ethnicities.

b.

The infants in the study should be randomly selected from all New Zealand infants.
That is, the infants should be a random sample of all New Zealand infants.

Then, this relationship could be a causal relationship:

Breastfeeding

GCI score

Breastfeeding results in an increase (is the cause of the increase /explains the increase) in
the mean GCI score.
OR
There could be another variable (a lurking or confounding variable) which has an effect
on both breastfeeding and GCI score and, as such, is the real cause of the difference
between the mean GCI scores.
For example, maybe breastfeeding is affected by parents education level (better educated
parents tend to breastfeed their infants) and maybe there is a causal relationship between
parents education level and GCI score (parents higher education levels result in a higher
GCI score for their children at age 4 years).

Breastfeeding

GCI score

Parents ed.
level
Then an alternative explanation for the increase in the mean GCI scores would be the
higher level of education of the parents.
In reality in this case, we would not be able to identify the real cause of the higher mean
GCI score for the breastfed infants; it could have been the breastfeeding or it could have
been the education level of the parents or it could have even been a combination of both
breastfeeding and the education level of the parents or even some other unidentified
lurking variable.
In an observational study the real cause of a significant difference is able to be identified
very rarely. The real cause could be the factor of interest or it could be a lurking or
confounding variable.
Lecture and Tutorial Answers: Chapter 8

Page 10

Challenge:

Sample Exam / Cecil Test Questions

Part I, page 8

Part I page 9

The corrected version of the false statements is:

1. (c)

g.

2. (d)

3. (b)

4. (e)

5 (b)

4. (e)

5. (a)

6. (a)

7. (b)

4. (d)

5 (d)

6. (d)

7. (e)

A two sample t-test can still work quite well (especially for large samples) even if there are
clear indications in the data that the Normality assumption is not true.

Part II, page 17

Part II, page 18 and 19


1. (e)

2. (a)

3. (b)

Corrected versions of the false statements are:


b.

With paired data, each observation in one group is paired with an observation in the other
group and hence the two groups of data are NOT independent.

e.

With paired data, we analyse the differences. The paired data t-test is mechanically
equivalent to a one sample t-test on the differences. The necessary conditions for
conducting the test are checked by plotting the differences NOT the 2 groups of data.

f.

Same reason as e. Plot the differences NOT the 2 groups of data.

r.

The one sample t-test can still work quite well (especially for large samples) even if there
are clear indications in the data that the Normality assumption is not true.

u.

A large P-value provides no evidence against the hypothesised value but it does not mean
that the hypothesised value is true.

Part III, pages 31 to 33


1. (d)

2. (b)

3. (e)

8. (c)

9. (e)

10. (b)

Part III, page 30


Corrected versions of the false statements are:
b.

The null hypothesis in an F-test for one-way analysis of variance is that all of the underlying
means are equal.

d.

The alternative hypothesis in an F-test for one-way analysis of variance is that some of the
underlying means are different.

f.

The alternative hypothesis in an F-test for one-way analysis of variance is that at least two
of the underlying means are different.

j.

If the P-value for an F-test for one-way analysis of variance is large then the null hypothesis
is believable.

m.

If the P-value for an F-test for one-way analysis of variance is very small this suggests that
there are differences between at least two of the underlying means, but we would need to
look at pairwise confidence intervals to estimate the size of any differences.

p.

The assumption that the underlying distributions are Normally distributed is not critical for
the F-test for one-way analysis of variance. The F-test for one-way analysis of variance is
reasonably robust to departures from the Normality assumption.

r.

One of the assumptions for the F-test for one-way analysis of variance is that the underlying
population standard deviations are all equal.

Lecture and Tutorial Answers: Chapter 8

Page 11

Short Response Questions

Tutorial: Pages 35 to 39

Page 34

Section A: Two Independent Samples or Paired Comparisons

1.

1.

(a)

2.

(a)

It is a method which uses a comparison of a measure of spread between group means


with a measure of overall spread within groups to determine whether the data provide
evidence that the underlying group means are not all the same.

2.

When we are testing for equality of the underlying means of more than two groups.

3.

H0: The underlying means are all identical.

Paired data.

(b)

Two independent samples.

(c)

Paired data.

A t-test on the differences is more appropriate. A pair of observations is made on the same
subject so this is paired comparison data.

(b)

Since we have paired data we look at the dot plot of the differences. The dot plot shows
differences centred below 0 (current purchases higher than previous purchases) and slight

H1: Differences exist between some of the underlying means.

negative skewness.

H0 : Diff 0 vs H1 : Diff 0

4.

Large values of f0 provide evidence against H0.

5.

A large P-value tells us that we have no evidence against the underlying means all being
equal, i.e., it is believable that the underlying means are equal.
A small P-value tells us that we have evidence against the underlying means all being equal,
i.e., we have evidence that differences exist between some (possibly all) of the underlying
means.

P-value = 0.033

A small P-value tells us nothing about which means differ from one another, and it also tells
us nothing about the size of any differences.

that viewers spend, on average, between $3.10 and $62.50 more when they have access

6.
7.
8.

9.

(c)

We have some evidence against there being no difference between the mean amounts of
current and previous spending. It appears that, on average, access to the cable network
is associated with an increase in spending by viewers. With 95% confidence, we estimate
to the cable network.

From the Tukey confidence intervals for differences between pairs of underlying means.
The observations within each sample are independent. (Critical)
The samples are independent. (Critical)
The underlying distributions are Normally distributed. (The test is reasonably robust against
departures from this assumption, especially when the sample sizes are similar and the total
sample size is large)
The standard deviations of the underlying distributions are equal. (The test is reasonably
robust against departures from this assumption, but the confidence intervals are not.)
The multiple comparisons problem

10. The F-test is reasonably robust against departures from this assumption so we can rely on
the P-value.
The confidence intervals are not robust against departures from this assumption so we
cannot rely on them.

(d)

The dot plot shows slight skewness, but the t-test is robust to such departures from
Normality. The results of the t-test should be valid in this situation.

Section B: More Than Two Independent Samples


1.

(a)

H0: 1 = 2 = 3 = 4 (The underlying/population means are all equal.)


H1: At least one of the underlying/ population means is different from the other three.

(b)

Assumption: The samples are random.


Check:

Ensure observations within the sample are independent read the story.

Assumption: The samples are independent of each other.


Check:

Ensure independence in the design of the experiment or study read the


story.

Assumption: The underlying distribution of each group is Normally distributed.


Check:

By plotting the data. The choice of plot will depend on the sample sizes.

Assumption: The population standard deviations of each group are equal.


Check:

By plotting the data and/or looking at the sample standard deviations (We
require that the ratio of the largest sample standard deviation to the smallest
sample standard deviation is less than 2.).

Lecture and Tutorial Answers: Chapter 8

Page 12

(c)

sB2 measures the variability between the sample means.

(d)

ANOVA table:

sW2 measures the variability within the samples (that is, the internal variability within the

DF

samples themselves).
(d)

f0

sB2
sW2

sB2 : smaller / same / larger

(e)

f0 : smaller / larger

sW2 : smaller / same / larger

P-value : smaller / larger

sB2 : smaller / same / larger

sW2 : smaller / same / larger

sW2 : smaller / same / larger

f0 : smaller / larger

f0 : smaller / larger

P-value : smaller / larger

sB2 : smaller / same / larger

f0 : smaller / larger

more / less evidence against H0

(a)

(b)

Error

72

24868

345

Total

74

28198

(i)

Drug and Neither.

(ii)

Drug and Neither. Placebo and Neither.

(i)

Yes. The Neither level of treatment. We have strong evidence that the mean for
the neither group is higher than the mean for the drug group and we have some
to weak evidence that the mean for the neither group is higher than the mean for
the placebo group.

(ii)

No. We have no evidence of a difference between the underlying means of the


Drug and Placebo groups.

(h)

Section C: Identifying Appropriate Type of Analysis


Scenario 1
(i)

Exam numeric, Attend categorical.

There are three independent random samples.

(ii)

Side-by-side dot plot or box plot on the same scale.

Though there are signs of moderate positive skewness in all three of the groups, with
the equal sample sizes and the moderate size of the three groups this should not cause
any concern with the validity of the F-test.

(iii) D: Two-sample t-test on a difference between two means

23.14
1.520 ).
15.22
H0: The underlying mean number of minutes to fall asleep are the same, i.e.,
Drug Neither Placebo , where Drug is the mean time for patients to fall asleep if

all 75 patients had been given the new drug, and similarly for Neither and Placebo .
H1: At least one of the three underlying mean number of minutes to fall asleep is
different from the other two.

P
0.011

(g)

P-value : smaller / larger

The times for the Neither group are centred higher and more spread out than those
for the Drug and Placebo groups. There does not appear to be a great difference
between the centre and spread of times for the Drug and Placebo groups. There are
signs of moderate positive skewness in all three of the samples.

F
4.83

With 95% confidence we estimate that the underlying mean time for people taking the
placebo to fall asleep is somewhere between 8.9 minutes shorter and 16.2 minutes
longer than the underlying mean time for people taking the drug to fall asleep.

sW2 : smaller / same / larger

The assumption of equality of the standard deviations is reasonable as the ratio of the
largest sample standard deviation to the smallest standard deviation is less than 2

(c)

1665

(f)

more / less evidence against H0


2.

3330

The P-value of 0.011 means that we have strong evidence against the null hypothesis.
We have strong evidence that at least one of the groups has a different underlying
mean number of minutes for people to fall asleep.

P-value : smaller / larger

more / less evidence against H0

MS

(e)

more / less evidence against H0


sB2 : smaller / same / larger

SS

Treatment

Scenario 2
(i)

Pass categorical.

(ii)

One-way table of counts (frequency table) or bar graph comparing the counts or
proportion who pass and fail.

(iii) B: One-sample t-test on a proportion


Scenario 3
(i)

Assign numeric, Test numeric.

(ii)

Dot plot, box plot or histogram of the differences between Test and Assign (or vice
versa).

(iii) C: One-sample t-test on a mean of differences / Paired-data t-test


Scenario 4
(i)

Exam numeric, Degree categorical.

(ii)

Side-by-side dot plot or box plot on the same scale.

(iii) F: F-test for one-way analysis of variance


Lecture and Tutorial Answers: Chapter 9

Page 13

Chapter 9

Tutorial: Page 19 and 20

Challenge: Page 15

1.

(a)

One sample, cross-classified by two factors.

The corrected version of the false statement is

(b)

Yes.

f.

If, for several cells in a table of counts, there are relatively large differences between the

(c)

Yes.

observed counts and the expected counts under the null hypothesis, then the P-value for a

(d)

All three hypotheses could be tested.

Chi-square test will be small.

(e)

Several of the expected cell counts are very low, so there may be problems with the
assumptions for the Chi-square test.

(a)

Independence is satisfied as subjects were randomly to groups.

(b)

No. The distribution of courses will reflect the chosen sample sizes, not the true
distribution.

(c)

Yes.

(d)

Hypothesis (ii) and (iii) could be tested.

(e)

Hypothesis (i) and (iii) could be tested.

(a)

Yes. We could consider the samples of people under 30, people in the 30-49 age group
and the people in the 50 and over age group as three independent sub-samples and
carry out a Chi-square test that the distribution of primary news source is the same for
each age group.

(b)

Degrees of freedom = (3 1)(3 1) = 2 2 = 4

(c)

Expected count for the (Under 30, Radio) cell =

(d)

Cell contribution =

Sample Exam / Cecil Test Questions: Pages 16 to 18


1. (e)

2. (b)

3. (d)

4. (d)

5. (e)

6. (d)

7. (b)

2.

8. (e)

3.

(e)

225 250
= 56.25
1000

(100 51.625)2
= 45.330
51.625

The P-value = 0.000 to 3 decimal places.


We have very strong evidence to suggest that there is a relationship between a
persons age and their primary news source.

(f)

Lecture and Tutorial Answers: Chapter 9

The results are valid because all of the expected counts are greater than 5.

Page 14

Chapter 10

Short Response Questions

Challenge

Part I, page 21

Part I, page 19

No. Causation can only be assigned when the data come from a well designed and well
executed experiment.

The corrected versions of the false statements are:


e.

If we want to use the y-values to make predictions about x-values then the regression line
would be different and its equation would have different values.

l.

Residuals and prediction errors are the different names for the same thing the
difference between the observed and predicted values.

r.

The Y-variable is called the response, outcome or dependent variable and the
X-variable is called the explanatory, predictor or independent variable.

y.

The sample correlation coefficient, r, and the slope of the least squares line are measuring
different things and so will not usually be the same.

z.

The sign (+ or -) of the correlation coefficient and the sign of the slope of the least squares
regression line are both indicating the direction of the association and hence will be the
same.

Part II, page 25


1.

A linear trend and constant scatter about that trend.

2.

There is a linear relationship between x and the mean value of Y at X = x.


The random errors are Normally distributed with mean zero and all have the same standard
deviation regardless of the value of x.
The random errors are independent.

3.

H0: 1 = 0

4.

A confidence interval for the mean estimates the mean value of Y at a given value of x.
A prediction interval estimates the value of Y at a given value of x.

Part II, page 22

5.

Corrected versions of the false statements are:


k.

For an observation ( xi , y i ) , the residual is calculated by ui y i y i , where y i 0 1x i .

l.

When testing for no linear relationship between X and Y, we test


H0: 1 = 0.

s.

For a given value of x, the 95% prediction interval and the corresponding 95% confidence
interval for the mean have the same centres.

The two sources of uncertainty are:


1.

uncertainty about the true values of 0 and 1, and

2.

uncertainty due to random scatter about the true line.

A confidence interval for the mean only allows for uncertainty about the true values of 0
and 1.

Sample Exam / Cecil Test Questions


Part I, pages 20 and 21
1. (b)

2. (b)

3. (b)

4. (a)

5. (d)

6. (c)

7. (b)

5. (d)

6. (e)

7. (c)

8. (d)

9. (c)

10. (b)

Part II, pages 23 and 24


1. (c)

2. (b)

3. (e)

4. (d)

Lecture and Tutorial Answers: Chapter 10

Page 15

Tutorial: Page 26 and 27


1.

(a)
(b)

y 11.238 1.309 x
For each 3 year increase in smoking, we expect lung capacity to increase by 1.309 x
3 = 3.927.

(c)
(d)

Predicted lung capacity = 11.238 + 1.309 x 30 = 50.5


Predicted lung capacity = 11.238 + 1.309 x 25 = 44.0
Residual = Observed value predicted value = 55 44.0 = 11

(e)

Years smoking is used to predict lung capacity.


Years smoking is a numeric variable and Lung capacity is continuous and random.

(f)

There is a possible linear trend but the observations (28, 30) and (33, 35) are possible
outliers which cause concern with the appropriateness of the model.
H0: 1 = 0
H1: 1 0
P-value = 0.0086
There is strong evidence that an increase in years of smoking is associated with an
increase in lung capacity.
With 95% confidence, we estimate that for every additional year of smoking an
emphysema patients lung capacity increases by between 0.44 and 2.18 units.

(g)

With 95% confidence, we estimate that the mean lung capacity for people like those in
the study that spent 30 years smoking will be somewhere between 42.16 and 58.86.
With 95% confidence, we predict that the lung capacity for a person like those in the
study that spent 30 years smoking will be somewhere between 23.33 and 77.70.

2.

(5)

3.

(2)

4.

(3)

5.

(2)

Lecture and Tutorial Answers: Chapter 10

Page 16

Você também pode gostar