Você está na página 1de 4

TAKE HOME EXAM UKP

Question 1

All of the following question use the UKP1.sav data file.

a. Using a bar chart, examine the number of students in each section of the class along with whether or not
students attended the review session. Does there appear to be a relation between these variables?
Using the line graph, examine the relationship between attending the review session and section on the
final exam score. What does this relationship look like?

b. Create a boxplot of quiz 1 score. What does this tell you about the distribution of the quiz scores? Create a
boxplot quiz 2 scores. How does the distribution of this quiz differ from the distribution of quiz 1? Which
case number is the outlier?

c. Based on the examination of histrogram, does it appear that students previous GPAs are normally
distributed? So, what kind statistical test should be? Show the result.

Question 2

By using UKP1.sav data file, circle the two mean values that are being compared, circle the appropriate significant
value (be sure to consider equal or unequal variance).

a. Compare men with women (gender) for quiz1, quiz2, quiz3, quiz4, quiz5, final, total. Is there any
significant differences?

b. Determine whether the following pairings produce significant differences: quiz1 with quiz2, quiz1 with
quiz3, quiz1 with quiz4, quiz1 with quiz5

c. Compare the GPA variable (gpa) with the mean GPA of the university of 2.89. What you gets?

Question 3

An investigation was carried out in order to seek an answer to the question as to whether any significant differences
exist in the characteristics of advertisements among different magazines or groups of magazines. Thirty magazines
were ranked by educational level of their readers. Three magazines were randomly selected from, respectively, the
first ten, second ten, and third ten magazines. Six advertisements were randomly selected from each of the nine
selected magazines. The magazines were grouped as follows:

Group 1 highest educational level: 1. Scientific American, 2. Fortune, 3. The New Yorker.
Group 2 medium educational level: 4. Sports Illustrated, 5. Newsweek, 6. People.
Group 3 lowest educational level: 7. National Enquirer, 8. Grit, 9 True Confessions.

For each advertisement, the number of words and sentences in the advertisement copy was counted. The data file
UKP2.sav contains 54 cases with the following variables:

1. Words = number of words in advertisement copy.


2. Sentences = number of sentences in advertisement copy.
3. Magazine = magazine (1 through 9 as above).
4. Group = educational level (as above).

Analyze the data thoroughly as follows:

1. Start with a comparison of the number of words in advertisements in the three groups and formulate a null
hypothesis and a corresponding alternative hypothesis as a basis for the statistical analysis.

2. Draw a simple error bar plot. What do you see?


3. Investigate whether any of the assumptions behind the ANOVA are violated.

4. Test your hypothesis on the 5% level of significance. If the result of the omnibus test is significant, follow up
with a suitable post hoc analysis to determine which groups actually differ with respect to the number of
words counted.

5. Let us, for a while, disregard the actual design of the study and treat the data as independent samples from
9 selected magazines. Investigate if there are any significant differences between these selected
magazines, following the same procedure (steps 1 through 4) as above.

a. The null hypothesis in this case is that there is no difference between the selected magazines (with respect
to average number of words in the advertisements).

b. There are two magazines which turn out to be significant in the Kolmogorov-Smirnov test, namely magazine
1 (p = 0.044) and magazine 6 (p = 0.030), so the p-values from the ANOVA may not be reliable. It should be
noted, that we now have quite small data sets, with only six observations for each magazine, and this makes
the statistical inference more uncertain.

c. Levenes test of homogeneity of variances gives p = 0.031, and the null hypothesis of equal variances for
the magazines should be rejected (which means that at least one magazine has a different variance).

d. From the ANOVA table we see that the Between Groups variation (i.e. the variation between magazines) is
statistically significant (p = 0.001) but we should keep in mind that not all data follow a normal distribution,
and not all groups have equal variances. Following the advice in the course book (Field & Hole, 2003), the
Games-Howell procedure can be used for a post hoc test. This test shows that magazines 3, 7, and 8 have
significantly lower means than magazine 1, while no other differences are statistically significant.

6. There is one more dependent variable in the data set, namely the number of sentences in each
advertisement. Follow the same procedure as above and compare the number of sentences in
advertisements in the three groups.

The hypotheses are formulated along the same pattern as in the previous case. In the Kolmogorov-Smirnov
test the null hypothesis of normally distributed data is rejected for group 2 (p = 0.012) but retained for the
other two groups (p = 0.127 for group 1 and p =0.094 for group 3). As a consequence, the results of the
ANOVA may not be fully reliable. Levenes test does not indicate a significant in homogeneity of variance
across the groups (p= 0.122). Looking at the ANOVA table (keeping in mind that the data from group 2 have
been shown to deviate from the normal distribution) we find that the overall model is not significant (p =
0.885), i.e. there is no significant difference between the groups with respect to number of sentences in
advertisements.

7. In an investigation like this, it is important to carefully consider which characteristics to use or look for.
Looking at the number of sentences and words separately may tell us something, but maybe we can get an
event better understanding if we look at them taken together. One simple way of doing this is to plot the two
variables in a scatter plot with different symbols or colours for the different groups. Try this what do you
see? Another way to do it is to look at the average number of words per sentence, which we would expect to
give us a better picture of the complexity of the language used in the advertisements.
Compute a new variable representing the number of words per sentence for each case (use Compute
Variable from the Transform pull-down menu) and investigate if there are any (statistically) significant
differences between groups.
a. An error bar graph shows that the average number of words per sentence is slightly higher for group 1
than for the other two groups, but so is the within-group variation (as indicated by the error bars), so its
hard to reach any conclusion just by looking at the graph. Furthermore, the differences in within-group
variance may be an issue when performing an analysis of variance later on.

b. testing for normality, the Kolmogorov-Smirnov test does not show any significant results (p 0.200 for
all groups), so with respect to the normality assumption, we can proceed to the ANOVA.

c. Our suspicion concerning equal variances, awakened when looking at the error bar chart, is confirmed
by Levenes test which leads us to reject the null hypothesis of homogeneity of variance (p = 0.002).
Thus, we should not blindly trust the p-value from the ANOVA.

d. As we, at the moment, do not have any other methods than the ANOVA to rely on, we take a look at the
analysis of variance table and see that p = 0.115, which means that the overall model is not significant.
We conclude that there is no significant difference in average number of words per sentence in
advertisements from the different groups of magazines.

Question 4

The data are stored in the file UKP3.sav.


1. Start with a visual inspection of the data draw a scatter plot with the variable Preparation on the x-axis and
Mark on the y-axis. Does the graph indicate any association between the two variables?

2. To obtain a quantitative measure of the degree of association between the two variables, select Correlate
Bivariate from the Analyze pull-down menu. In the dialogue box, move the two variables Preparation and
Mark into the Variables frame. Make sure that Pearson is selected in the Correlation Coefficients frame
and that Two-tailed is selected in the Test of Significance frame. Click OK. Take a look at the output table:
What is the correlation between the two variables? Is it significant?

3. To perform a linear regression analysis, select Regression Linear from the Analyze pull down menu. In
the Linear Regression dialogue box select Mark as Dependent and Preparation as Independent. To get
some graphs for the model validation, click the Plots button and select *SDRESID as Y and *ZPRED as X.
Select Histogram and Normal probability plot in the Standardized Residual Plots frame. Click Continue
and the OK in the Linear Regression box to perform the analysis. Take a look at the output:

a. Compare R in the Model Summary table with the correlation coefficient that you obtained above.

b. From the results presented in the ANOVA table, what can you say about the overall linear regression
model is it significant?
c. What is the intercept and slope of the estimated regression line, and are they both significantly different
from zero? Also compare the sign of the estimated slope with the sign of the correlation coefficient
should they have equal or opposite signs?

d. The ANOVA and the t-tests used in the analysis assume that the errors are normally distributed and
have constant variance. Are there any signs in the plots of the residuals that these assumptions are not
met?

4. To insert a regression line in the scatter plot of the data, right-click on the graph and choose Edit Content
In Separate Window from the pop-up menu. In the Chart Editor select Fit Line at Total from the Elements
pull-down menu. Make sure that Linear is selected in the Fit Method box in the Fit Line tab. Then click
Close to proceed. The least squares linear regression line should now be visible in the graph. Close the
Chart Editor.
5. Perform the corresponding analysis with the variable Pub (which represents hours spent at the pub instead
of preparing for the exam) as independent variable and Mark as dependent variable.

6. Finally, do the same with the variable Commuting (which represents hours spent commuting to and from the
university) as independent variable and Mark as dependent variable.

Você também pode gostar