Escolar Documentos
Profissional Documentos
Cultura Documentos
MI451, MI4M1a, GV4M1a, PS448a, PS4M1a, PS4M3, PS4M4, PS4M5a, SO4M1a, SO4M3, MI411, GV4M3, MC4M1, MC4M2a, MC4M4a Quantitative Analysis I: Description and Inference
Suitable for all candidates
Instructions to candidates Time allowed: 2 hours. The examination is in two parts. You should answer all questions in both parts. Part A is worth 50 marks, and Part B is worth 30 marks. Thus the total possible number of marks over the two parts of the examination is 80. The number of marks each question is worth is stated in parentheses after the question. In Part A, you are asked to answer a series of specific questions, either by circling the letter of the correct answer (in the case of a multiple choice question) or by writing in the blank box or line provided for your answer. There should be enough space for your answer. However, if you need extra space, continue your answer in a blank LSE answer book, clearly indicating which question you are answering. Make sure that any extra answer books are securely tied to this question paper with the string which can be provided. Answer the questions in Part B by making reference to the SPSS printouts attached. Each question should be answered with reference to a single printout, which is indicated in the question. Each question in Part B is worth 10 marks. This is an open book examination. You are free to use any written material you find useful, including your own notes and annotations. In addition, two standard statistical tables are included at the end of this examination paper. You are also allowed to use a calculator (as prescribed by examination regulations).
LSE 2008/MI451
Page 1 of 22
Candidate No. _______________ PART A (50 marks) Question 1 Citizens of Northern Ireland were asked in a survey whether they thought the establishment of a Truth and Reconciliation Commission (TRC) would be a good idea. 1(a) Respondents, when asked about their religion, were classified as Catholic, Protestant, or No Religion. The measurement level of this variable is best described as: [1 mark] Skewed Interval Ordinal Nominal
A B C D
1(b)
A B C D
Respondents were asked whether they thought it was important that Northern Ireland should have a TRC. Their responses were coded as: 1 = very important, 2 = quite important, or 3 = not important. The measurement level of this variable is best described as: [1 mark] Ascending Ordinal Descending Nominal
1(c)
A B C D
Look at the bar chart below on the importance of establishing a TRC. Which of the following statements is the best description of the pattern found in it? [1 mark] The data have a clear negative skew. More Catholic than Protestant respondents view the establishment of a TRC as very important. Protestant respondents are evenly divided on the subject. Most Protestants view the establishment of a TRC as not important.
Religions 70 60 Catholic Protestant
50 40 30 20 10 0
Very Important
Quite important
Not important
LSE 2008/MI451
Page 2 of 22
Question 2 A sample of students were asked how many hours they had spent preparing for a particular examination. Below is a stem and leaf plot of their responses. 0 1 2 3 4 5 6 7 8 9 1122246 000002235588 01235566 00244 234 05 2 0 2 1
In questions 2(a) 2(e), write down the answer on the line provided. 2(a) How many students reported that they had spent more than 30 hours preparing for the examination? [1 mark] ______________ What is the range of the variable (hours spent preparing for the examination) in the sample? [1 mark] ______________ What is the mode of the variable? [1 mark] ______________ 2(d) What is the sample size? [1 mark] ______________ 2(e) What is the median of the variable? [2 marks] ______________ 2(f) A B C D E The distribution of the variable appears to be skewed... (Circle the letter of the best answer.) [1 mark] negatively insufficiently positively robustly significantly
2(b)
2(c)
LSE 2008/MI451
Page 3 of 22
Question 3 In an article entitled Does Cyber-Campaigning Win Votes?, Gibson and McAllister (2006) used data from the 2004 Australian Election Survey to examine whether or not maintaining a personal campaigning website affected a candidates vote in the election. They had information on the following variables for each candidate: VOTE% The percentage of the vote which the candidate received in the 2004 Australian House of Representatives federal election. This is the response variable. A dummy variable capturing the candidates use of campaign websites during the election, coded as follows: 1 = Candidate has a personal campaign website 0 = Candidate does not have a personal campaign website. A dummy variable, coded as follows: 1 = Candidate is a member of the Australian Labor party 0 = Candidate is a member of the Liberal-National coalition. Legislative experience of the candidate. This is a dummy variable, coded as follows: 1 = Candidate has previously been elected at state or federal level 0 = Candidate has no elected experience. Length of candidates party membership, in years. Support received from the candidates party, in terms of leaflets, funds, and visits by the party leader. This is measured on a scale from 0 to 7, where 0 indicates no support and 7 indicates lots of support. For present purposes, this variable is treated as an interval level variable. The number of party workers working on the candidates campaign. Answer to the question: How long before the election did you begin to organise your campaign?. Responses were recorded in numbers of months.
WEBSITE
LABOR
LEG.EXP
MEMB.EXP SUPPORT
WORKERS PREPARATION
The results of a linear regression model, regressing VOTE% on the other variables, are presented in the table on the next page. Answer Questions 3(a) 3(e) by referring to the table.
LSE 2008/MI451
Page 4 of 22
(Constant) WEBSITE LABOR LEG.EXP MEMB.EXP SUPPORT WORKERS PREPARATION Observations R2 Dependent variable: VOTE%
Unstandardized coefficients (b) 29.46 2.31 -9.97 5.69 0.27 0.46 0.23 0.54 373 0.87
3(a)
Write down the fitted regression equation for these variables. [4 marks]
3(b)
Interpret the coefficient of PREPARATION. Is it statistically significant, at the 5% level of significance? [3 marks]
LSE 2008/MI451
Page 5 of 22
3(c)
3(d)
Interpret the coefficient for WEBSITE. Is it statistically significant, at the 5% level of significance? [3 marks]
3(e)
LSE 2008/MI451
Page 6 of 22
Question 4 In a public opinion survey of the British public, respondents were asked whether they would be prepared to give part of their income to help fund initiatives for reducing pollution. Their answers were recorded in a variable called ENVIRONMENT, with possible responses Yes or No. You are interested in how responses to ENVIRONMENT might be related to peoples exposure to relevant issues in the media. You cross-tabulate ENVIRONMENT with a binary variable called MEDIA, which describes how often respondents read about environmental issues in the news. The possible responses are Less than once a week or At least once a week. The contingency table is presented below.
ENVIRONMENT * MEDIA Crosstabulation MEDIA Less than At least once once a week a week 239 220 60.7% 43.0% 155 292 39.3% 57.0% 394 512 100.0% 100.0%
ENVIRONMENT
No Yes
Total
4(a)
Which of the sentences below best describes what the table shows? Circle the letter of the best answer. [1 mark] 57 per cent of those who would contribute financially to pollution reduction initiatives read about environmental stories in the media at least once a week. 43 per cent of those who read about environmental stories in the media at least once a week would contribute financially to pollution reduction initiatives. The higher the level of media exposure to environmental issues, the greater the likelihood that a respondent is willing to contribute financially to pollution reduction initiatives. The lower the level of media exposure to environmental issues, the greater the likelihood that a respondent is willing to contribute financially to pollution reduction initiatives.
4(b)
A B C D
You run a chi-squared test on the contingency table. What is the null hypothesis for the test? Circle the letter of the best answer below. [1 mark] In the population, there is no association between MEDIA and ENVIRONMENT. In the sample, there is no association between MEDIA and ENVIRONMENT. In the population, there is an association between MEDIA and ENVIRONMENT. In the sample, there is an association between MEDIA and ENVIRONMENT.
LSE 2008/MI451
Page 7 of 22
4(c)
The results of the chi-squared test are presented below. How would you interpret them? Circle the letter of the best answer below. [1 mark]
Chi-Square Tests Value 27.880b 27.176 28.047 27.849 906 df 1 1 1 1 Asymp. Sig. (2-sided) .000 .000 .000 .000
Pearson Chi-Square Continuity Correctiona Likelihood Ratio Linear-by-Linear Association N of Valid Cases
a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 194.39.
I reject the null hypothesis at the 1% level of significance; I infer that in the population there is an association between MEDIA and ENVIRONMENT. I fail to reject the null hypothesis at the 1% level of significance; I infer that in the population there is an association between MEDIA and ENVIRONMENT. I reject the null hypothesis at the 1% level of significance; I infer that in the sample there is no association between MEDIA and ENVIRONMENT. I fail to reject the null hypothesis at the 1% level of significance; I infer that in the sample there is an association between MEDIA and ENVIRONMENT.
[Please turn the page for the next part of the question]
LSE 2008/MI451
Page 8 of 22
4(d)
A colleague suggests that you need to control for peoples level of education when describing the relationship between ENVIRONMENT and MEDIA. You measure educational level with the variable EDUCATION, which has three levels: Compulsory education or less; Basic or intermediate qualifications; and Higher education. You produce a three-way contingency table, and a chi-square test of independence for each of the partial tables. What does the output, shown below, tell you about the association between ENVIRONMENT and MEDIA, when we control for EDUCATION? [6 marks]
ENVIRONMENT * MEDIA * EDUCATION Crosstabulation MEDIA Less than At least once once a week a week 123 91 66.8% 52.3% 61 83 33.2% 47.7% 184 174 100.0% 100.0% 93 79 55.0% 39.3% 76 122 45.0% 60.7% 169 201 100.0% 100.0% 7 37 33.3% 36.6% 14 64 66.7% 63.4% 21 101 100.0% 100.0%
ENVIRONMENT
No Yes
Total Basic or intermediate qualifications ENVIRONMENT No Yes Total Higher education ENVIRONMENT No Yes Total
Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA
Total 214 59.8% 144 40.2% 358 100.0% 172 46.5% 198 53.5% 370 100.0% 44 36.1% 78 63.9% 122 100.0%
Chi-Square Tests EDUCATION Compulsory or less Value 7.873b 7.280 7.898 7.851 358 9.127c 8.506 9.155 9.102 370 .082d .001 .083 .081 122 df 1 1 1 1 Asymp. Sig. (2-sided) .005 .007 .005 .005
Higher education
Pearson Chi-Square Continuity Correction Likelihood Ratio Linear-by-Linear Association N of Valid Cases Pearson Chi-Square Continuity Correction Likelihood Ratio Linear-by-Linear Association N of Valid Cases Pearson Chi-Square Continuity Correction Likelihood Ratio Linear-by-Linear Association N of Valid Cases
1 1 1 1
1 1 1 1
a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 69.99. c. 0 cells (.0%) have expected count less than 5. The minimum expected count is 78.56. d. 0 cells (.0%) have expected count less than 5. The minimum expected count is 7.57.
LSE 2008/MI451
Page 9 of 22
LSE 2008/MI451
Page 10 of 22
Question 5 In another British survey, respondents were asked how much money they had donated in the last year to charities devoted to the arts. Your colleague is interested to know whether women tend to donate larger sums to arts charities than men do, or vice versa. From the survey, she collects the data given below: Gender Mean amount donated to arts charities, in British pounds sterling () 62.50 56.75 Standard deviation Sample size
Male Female
5.81 5.95
102 96
5(a)
In the sample data, do women donate larger sums to arts charities than men do, on average? Or do men tend to donate more than women? By how much? [2 marks]
5(b)
Your colleague wants to know if she can claim that there is a statistically significant difference between the mean amount donated by men and women. However, she does not know how to carry out a t-test. Calculate the t-statistic for the difference in mean amounts donated by men and women. Show your working. Technical tip: in your calculation, assume that the population standard deviations for men and women are equal. [5 marks]
5(c)
Referring to your result from Question 5(b), is the difference between male and female mean donations statistically significant, at the 5% level of significance? Justify your answer. [3 marks]
LSE 2008/MI451
Page 12 of 22
Question 6 You are interested in the attention young people give to reading different types of blogs on the internet. You run a survey of 900 young people, and ask them which type of blog they read most often. The frequency table below shows the information you collect.
6(a)
What percentage of people in the sample most often read political blogs? [2 marks]
6(b)
Calculate a 95% confidence interval around the proportion of young people who most often read political blogs. State the upper and lower limits of the interval. [4 marks]
LSE 2008/MI451
Page 13 of 22
You are investigating internet use in Britain. You draw on data from the 2005 Oxford Internet Survey, and focus on the following variables: VICTIM The number of times (over the past year) that the respondent has been a victim of crime or has been harassed/abused via the internet. The length of time, in months, that the respondent has been using the internet. A measure of the respondents degree of confidence in using the internet, on a scale of 1 to 5, where 1 = high levels of confidence, and 5 = low levels of confidence. For present purposes, this variable is treated as an interval level variable. The respondents gender, coded as follows: 0 = Male 1 = Female. The respondents age, in years.
INTERNETUSE CONFIDENCE
GENDER
AGE
Questions 7, 8 and 9 continue on the following pages, and are based on this data set.
LSE 2008/MI451
Page 14 of 22
Question 7 Output A
Group Statistics GENDER Male Female N 583 745 Mean 10.3298 6.0763 Std. Deviation 14.23866 6.06708 Std. Error Mean 1.46861 .53008
INTERNETUSE
Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means Mean Difference 4.25345 4.25345 Std. Error Difference 1.39176 1.56134
Sig. .000
t 3.056 2.724
df 223 117.385
7(a)
What is the mean difference in INTERNETUSE, comparing males and females? Which group (males or females) has been using the internet for longest on average? [3 marks]
7(b)
What would the mean difference in INTERNETUSE for males and females be under the assumption of the null hypothesis of the t-test? [3 marks]
LSE 2008/MI451
Page 15 of 22
7(c)
Is the association between INTERNETUSE and GENDER statistically significant? State the relevant obtained test statistic and degrees of freedom for the test, and its p-value (use the statistic calculated under the assumption of equal variances assumed). Indicate whether you would reject or fail to reject the null hypothesis. [4 marks]
LSE 2008/MI451
Page 16 of 22
Question 8 Output B
Model Summary Adjusted R Square .202 Std. Error of the Estimate .78942112
Model 1
R .450 a
R Square .203
Model 1
df 1 1206 1207
F 306.866
Sig. .000 a
Model 1
(Constant) INTERNETUSE
t 16.763 14.352
8(a)
In the figure below, draw the fitted regression line (it does not have to be exact, but it does have to give a reasonable indication of the slope and intercept). [4 marks]
6 5 4 VICTIM 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 INTERNETUSE
LSE 2008/MI451
Page 17 of 22
8(b)
What would you predict the value of VICTIM to be when INTERNETUSE equals 12 months? And what would you predict the value of VICTIM to be when INTERNETUSE equals 24 months? [6 marks]
LSE 2008/MI451
Page 18 of 22
Question 9 Output C
Model Summary Adjusted R Square .239 Std. Error of the Estimate .76835515
Model 1
R .493 a
R Square .243
Model 1
df 5 1191 1196
F 76.312
Sig. .000 a
Coefficientsa Unstandardized Coefficients Model 1 (Constant) INTERNETUSE GENDER AGE CONFIDENCE B .152 .250 -.282 -.002 .292 Std. Error .276 .150 .092 .003 .056 t .549 1.729 -3.065 -.504 5.213 Sig. .583 .084 .002 .614 .000
LSE 2008/MI451
Page 19 of 22
9(a)
Has the fit of the regression model presented in Output C increased, compared to the fit of the regression model presented in Output B? Justify your answer. [3 marks]
9(b)
Is the partial effect of CONFIDENCE statistically significant, at the 5% level of significance? Justify your answer. [3 marks]
9(c)
Comparing Output B to Output C, describe the effect of INTERNETUSE on VICTIM now that you are controlling for the effect of GENDER, AGE and CONFIDENCE. [4 marks]
LSE 2008/MI451
Page 20 of 22
LSE 2008/MI451
Page 21 of 22
LSE 2008/MI451
Page 22 of 22