Você está na página 1de 9

1

Macquarie City Campus


STAT170 Introductory Statistics
Semester 3, 2011
Assignment 3 SOLUTIONS

Due: Week 12 (in your tutorial class)

This assignment is worth 5% of your final assessment of the unit.

Instructions for submission:
1. You can either word-process this assignment, or write neatly by hand.
2. The assignment may be done individually or by a group of TWO students.
3. Each student should attempt ALL questions in the entire assignment independently
in the first instance. This should be done during the week after the assignment has
been distributed.
4. When all students in the group have attempted all questions, the group should meet
to discuss their solutions. The groups should then write up a final version of their
solution.
5. Only ONE assignment should be submitted per group. Each student in the group
will receive the same mark allocated for that assignment, provided each student
contributed equally.

Note: As each part of this assignment covers different materials from the unit, it is
important that each student attempts all questions. The purpose of group work is to
give students an opportunity to work together as a team by discussing their solutions
with fellow students. Under NO circumstances should one student in the group
attempt one question and another student attempt another. You are reminded that
mistakes are also shared among all students in your group.

Declaration:
All students signing below certify that they have contributed equally to the attached
work and take responsibility for the answers to ALL questions.
We carried out this assignment without significant assistance from anyone else
outside our group apart from general discussion.

Student ID Surname Given name(s) Signature
1.
2.

In the case where one members contribution was significantly less than the other
members contributions, this should be drawn to the attention of the lecturer.

2
Introduction
A study of students who were enrolled in a second year bioinformatics unit was
carried out to investigate students performance in their assignments as well as in the
final exam.

A tutorial class of 26 students was used for the study. During the semester students
were asked to complete two assignments. Both assignments were marked out of a
total of 30. Marks in the final exam were also recorded for those students. Data from
this study include:

Sex: 1 =male, 2 = female
Ass1: First assignment mark (out of 30)
Ass2: Second assignment mark (out of 30)
Exam: Final exam mark (out of 100)
Attendance: 1 = attended at least 50%, 2 = attended less than 50% of classes

The data file is marks.xls,

3
Question 1
Research Question: Is there a change in students performance (i.e. change in marks) in
Assignment 1 (Ass1) and in Assignment 2 (Ass2) ?

Perform an appropriate hypothesis test to answer the above research question.
Remember to justify any assumptions.

Note: 1. You have to do some work on the Excel file, even though you will do your
hypothesis testing by hand manually.
2. Then find from the data file the required statistics in order to perform your
calculations. But do your hypothesis testing by hand manually; do NOT use EcStats
hypothesis testing output.

This is a paired t-test, and we need to form a new variable (column) in
Excel:
diff = Ass2 Ass1 (or Ass1 Ass2)

The numerical summary for diff (from EcStat) is:
Numerical Summary: diff
Variable Size Mean StDev
diff 26 1.7500 2.9572

H: H
0
:
d
= 0
A: The histogram of diff suggests that the difference could come from a
normal population. (Alternatively, n=26 25, and by CLT,
d
y is
approximately normally distributed.)
T:
01727 . 3
26 / 9572 . 2
0 75 . 1
0
=

=
n s
y
t
d
d


df = n-1 = 25
P: From t-table with df=25, 0.005 < p-val < 0.01. Hence reject Ho.
C: Evidence shows that students had higher marks in Assignment 2 than
in Assignment 1 on average.
95% C.I. for
d
=
26
9572 . 2
060 . 2 75 . 1
1
=

n
s
t y
d
n d

= (0.555, 2.945)
We are 95% confident that average difference in marks between
Assignment 2 and Assignment 1 lies between 0.555 and 2.945.
(Check that the CI above excludes the null value 0.)


4
Question 2
Research Question: Is there a difference in the marks obtained in Assignment 2 (Ass
2) between students who attended at least 50% of classes and those who did not?

Perform, by hand, an appropriate hypothesis test to address the above research question. Use
the following information to help you. Do NOT use EcStat to do the hypothesis test.




Attendance Size Mean SE StDev
Attend >50% 15 19.467 1.088 4.764
Attend <50% 11 15.500 1.271 3.294




H: H
0
:
1
=
2

A:
The 2 histograms indicate that the 2 samples could come from 2
normal populations.
The two sample standard deviations are close, and so are the 2
corresponds IQRs (boxes). Thus it is reasonable to assume the 2
population standard deviations are equal, i.e.
1
=
2
.
T:
10 14
294 . 3 10 764 . 4 14
2
) 1 ( ) (
2 2
2 1
2
2 2
2
1 1
+
+
=
+
+
=
n n
s n s n
s
p
= 4.2143

11
1
15
1
2143 . 4
5 . 15 467 . 19
1 1
2 1
2 1
+

=
+

=
n n
s
y y
t
p
= 2.3713
df = 15+11-2=24
P: From t-table with df=24, 0.02 < p-val < 0.05 Hence reject Ho.
C: The average Assignment 2 marks is higher for those students who had
50% or more attendance than those who had less than 50% attendance.
95% CI for
1
-
2
=
2 1
24 2 1 2 1
1 1
) (
n n
s t y y SE t y y
p
+ =


Attend
<50%
Attend
>50%
5 10 15 20 25 30 Ass2
Attendance
5

11
1
15
1
2143 . 4 064 . 2 ) 5 . 15 467 . 19 ( + =

= (0.514, 7.420)
We are 95% confident that the average Assignment 2 marks is 0.514 up
to 7.420 higher for those who had 50% or more attendance than those
who had less than 50% attendance.
(Check that the CI excludes the null value 0.)


6
Question 3
Research Question: Is the mark obtained by a student in Assignment 1 (Ass1) a
useful predictor for his or her mark in the final exam (Exam)?





(a) Perform, by hand, an appropriate hypothesis test to address the above research question.
Use the above information to help you. Do NOT use EcStat to do the hypothesis test.


H: Ho: = 0
A: From the scatter plot, the relation looks linear. The residuals seem to
have normal distribution and constant spread.
T:
106 . 7
461 . 0
276 . 3
) (
= = =
b SE
b
t

df = 26-2 = 24
P: From t-able, using df=24, p-val< 0.0005. Hence reject Ho.
C: There is a positive linear relation between Exam and Assignment 1
marks.
For extra 1 mark increase in Assignment 1, there corresponds an increase
of 3.276 marks in Exam.
95% CI for =
b
SE t b
24
= 3.276 2.064*0.461 = (2.325, 4.228)
We are 95% confident that the true increase in the population lies
between 2.325 and 4.228.


(b) Write down the value of the goodness-of-fit statistic. Interpret the meaning of this
value.

r
2
= 0.677
67.7% of the variation in Exam marks can be explained (accounted for) by
the variation in Assignment 1 marks.

(c) Calculate the value of the correlation coefficient. Interpret the meaning of this
value.

r =0.667 = +0.8167
There is a strong positive linear relationship between Exam marks and
Assignment 1 marks.
30
40
50
60
70
80
90
100
8 13 18 23 28
Ass1
Exam
df: 24
coeff SE t p-value
13.6238 7.573 1.7990 0.085 -2.006 29.254
3.2760 0.461
r-sq: 0.677 Resid SS: 1602.188 s: 8.171
outcome:
predictor
constant
Ass1
Exam
95% C.I.
7
Question 4
Research Question: Which of the following 4 variables, Ass1, Ass2, Gender and
Attendance, are significant in affecting Exam?

(a) Use EcStat to perform analysis on each of the independent variables with Exam.
Paste the outputs in the spaces below. Do NOT write anything here.













30
40
50
60
70
80
90
100
8 13 18 23 28 Ass1
Exam df: 24
coeff SE t p-value
13.6238 7.573 1.7990 0.085 -2.006 29.254
3.2760 0.461 7.0989 0.000 2.324 4.229
r-sq: 0.677 Resid SS: 1602.188 s: 8.171
Fitted line: Exam = 13.6238 + 3.276 Ass1
outcome:
predictor
constant
Ass1
Exam
95% C.I.
30
40
50
60
70
80
90
100
5 10 15 20 25 30 Ass2
Exam
df: 24
coeff SE t p-value
17.9088 5.402 3.3149 0.003 6.759 29.059
2.7129 0.294 9.2138 0.000 2.105 3.321
r-sq: 0.780 Resid SS: 1094.590 s: 6.753
Fitted line: Exam = 17.9088 + 2.7129 Ass2
outcome:
predictor
constant
Ass2
Exam
95% C.I.
female
male
30 50 70 90 Exam
Gender
Gender Size Mean SE StDev
male 12 63.290 4.075 14.662
female 14 68.633 3.773 13.636
Resid SS: 4781.95 r-sq:
factor df t p-val s diff
Gender 24 0.962 0.3456 14.116 5.343
Exam
Two-sample t-test:
Attend
<50%
Attend
>50%
30 50 70 90 Exam
Attendance
Attendance Size Mean SE StDev
Attend >50% 15 67.606 3.687 14.830
Attend <50% 11 64.204 4.305 13.469
Resid SS: 4892.98 r-sq: 0.01
factor df t p-val s diff CI/2
Attendance 24 0.600 0.5540 14.278 3.402 11.698
Exam
Two-sample t-test:
8
(b) Using your EcStat outputs in (a), write a brief statistical report to address the
research question. Your report MUST contain the four sections: Introduction,
Methods, Results and Conclusion. Some marks will be allocated to the organization
of your report. You are advised to word-process the report on A4 paper and limit the
length to at most 2 pages.

Hints:
1. Although not compulsory, it is advisable to summarize the results into an
appropriate table.
2. To cull the bad variables and to select the good ones, we suggest that you
follow these steps:
Step 1: Look for any case where assumptions of the relevant tests are violated, and
then disqualify those variables.
Step 2: To select the relevant independent variables affecting Exam, discard those
having p-values > 0.05.


INTRODUCTION
Researchers are interested to determine which of the 4 independent
variables Ass1, Ass2, Gender and Attendance, are significant in
affecting the dependent variable Exam?

METHODS
The sample consisted of 26 students, assumed randomly selected from all
students enrolled in a second year bioinformatics unit. The target
population is obviously all students enrolled in the bioinformatics unit. In
the 4 independent variables, Ass1, Ass2 are numerical, while Gender and
Attendance are categorical (and binary). The first two require
regressions, while the latter two demand 2-sample t-tests.

RESULTS
We shall look at the two methods separately.
A. Regression
For the 2 regressions involving Ass1 and Ass2 with Exam, the 2 scatter
plots show that the 3 conditions for regression, namely linearity, constant
spread for residuals, and normality of residuals are satisfied. The
results are summarized in the table below.

Independent
variable
Assumptions
satisfied?
p-val Significant
predictor?
(Reject Ho: =0?)
r
2
Result
Ass1 Yes 0.000 Yes 0.677 sig
predictor
Ass2 Yes 0.000 Yes 0.780 sig
9
predictor

Both variables Ass1 and Ass2 have p-values > 5%, both are significant
predictors for Exam.

(Note: r
2
is actually NOT required here since r
2
is only used to select the
best predictor. But the research question does not ask for the BEST
predictor.)

A. 2-sample t-test
The results are summarized in the table below.

Independent
variable
Assumptions
satisfied?
p-val Significant
variable?
(Reject Ho?)
Result
Gender Equal spread - Yes
Normality - ?
0.3456 No (p-val>5%) -----
Attendance Equal spread Yes
Normality - ?
0.5540 No (p-val>5%) -----

In each of the 2-sample t-tests, the equal spread assumption seems to be
satisfied, according to the box plots and the corresponding sample
standard deviations. For normality condition, it is not directly verifiable
as the sample sizes are small and no histograms or stem-and-leaf plots
are available unless we draw them ourselves. However, the p-values are
larger than 5% for both cases. Hence both variables Gender and
Attendance are discarded, and whether the normality condition is met or
not thus becomes irrelevant.

CONCLUSION
Of the 4 given independent variables Assignment 1, Assignment 2,
Gender and Attendance, only Assignment 1, Assignment 2 are
significant in affecting the dependent variable Exam.

Você também pode gostar