Você está na página 1de 22

Summer 2008 examination Candidate No. __________________ Desk No.

__________________ Date of the examination __________________

MI451, MI4M1a, GV4M1a, PS448a, PS4M1a, PS4M3, PS4M4, PS4M5a, SO4M1a, SO4M3, MI411, GV4M3, MC4M1, MC4M2a, MC4M4a Quantitative Analysis I: Description and Inference
Suitable for all candidates
Instructions to candidates Time allowed: 2 hours. The examination is in two parts. You should answer all questions in both parts. Part A is worth 50 marks, and Part B is worth 30 marks. Thus the total possible number of marks over the two parts of the examination is 80. The number of marks each question is worth is stated in parentheses after the question. In Part A, you are asked to answer a series of specific questions, either by circling the letter of the correct answer (in the case of a multiple choice question) or by writing in the blank box or line provided for your answer. There should be enough space for your answer. However, if you need extra space, continue your answer in a blank LSE answer book, clearly indicating which question you are answering. Make sure that any extra answer books are securely tied to this question paper with the string which can be provided. Answer the questions in Part B by making reference to the SPSS printouts attached. Each question should be answered with reference to a single printout, which is indicated in the question. Each question in Part B is worth 10 marks. This is an open book examination. You are free to use any written material you find useful, including your own notes and annotations. In addition, two standard statistical tables are included at the end of this examination paper. You are also allowed to use a calculator (as prescribed by examination regulations).
LSE 2008/MI451

Page 1 of 22

Candidate No. _______________ PART A (50 marks) Question 1 Citizens of Northern Ireland were asked in a survey whether they thought the establishment of a Truth and Reconciliation Commission (TRC) would be a good idea. 1(a) Respondents, when asked about their religion, were classified as Catholic, Protestant, or No Religion. The measurement level of this variable is best described as: [1 mark] Skewed Interval Ordinal Nominal

A B C D

1(b)

A B C D

Respondents were asked whether they thought it was important that Northern Ireland should have a TRC. Their responses were coded as: 1 = very important, 2 = quite important, or 3 = not important. The measurement level of this variable is best described as: [1 mark] Ascending Ordinal Descending Nominal

1(c)

A B C D

Look at the bar chart below on the importance of establishing a TRC. Which of the following statements is the best description of the pattern found in it? [1 mark] The data have a clear negative skew. More Catholic than Protestant respondents view the establishment of a TRC as very important. Protestant respondents are evenly divided on the subject. Most Protestants view the establishment of a TRC as not important.
Religions 70 60 Catholic Protestant

Per cent respondents

50 40 30 20 10 0

Very Important

Quite important

Not important

Importance of establishing a TRC

LSE 2008/MI451

Page 2 of 22

Question 2 A sample of students were asked how many hours they had spent preparing for a particular examination. Below is a stem and leaf plot of their responses. 0 1 2 3 4 5 6 7 8 9 1122246 000002235588 01235566 00244 234 05 2 0 2 1

First digit Second digit (Tens of hours) (Single hours)

In questions 2(a) 2(e), write down the answer on the line provided. 2(a) How many students reported that they had spent more than 30 hours preparing for the examination? [1 mark] ______________ What is the range of the variable (hours spent preparing for the examination) in the sample? [1 mark] ______________ What is the mode of the variable? [1 mark] ______________ 2(d) What is the sample size? [1 mark] ______________ 2(e) What is the median of the variable? [2 marks] ______________ 2(f) A B C D E The distribution of the variable appears to be skewed... (Circle the letter of the best answer.) [1 mark] negatively insufficiently positively robustly significantly

2(b)

2(c)

LSE 2008/MI451

Page 3 of 22

Question 3 In an article entitled Does Cyber-Campaigning Win Votes?, Gibson and McAllister (2006) used data from the 2004 Australian Election Survey to examine whether or not maintaining a personal campaigning website affected a candidates vote in the election. They had information on the following variables for each candidate: VOTE% The percentage of the vote which the candidate received in the 2004 Australian House of Representatives federal election. This is the response variable. A dummy variable capturing the candidates use of campaign websites during the election, coded as follows: 1 = Candidate has a personal campaign website 0 = Candidate does not have a personal campaign website. A dummy variable, coded as follows: 1 = Candidate is a member of the Australian Labor party 0 = Candidate is a member of the Liberal-National coalition. Legislative experience of the candidate. This is a dummy variable, coded as follows: 1 = Candidate has previously been elected at state or federal level 0 = Candidate has no elected experience. Length of candidates party membership, in years. Support received from the candidates party, in terms of leaflets, funds, and visits by the party leader. This is measured on a scale from 0 to 7, where 0 indicates no support and 7 indicates lots of support. For present purposes, this variable is treated as an interval level variable. The number of party workers working on the candidates campaign. Answer to the question: How long before the election did you begin to organise your campaign?. Responses were recorded in numbers of months.

WEBSITE

LABOR

LEG.EXP

MEMB.EXP SUPPORT

WORKERS PREPARATION

The results of a linear regression model, regressing VOTE% on the other variables, are presented in the table on the next page. Answer Questions 3(a) 3(e) by referring to the table.

LSE 2008/MI451

Page 4 of 22

(Constant) WEBSITE LABOR LEG.EXP MEMB.EXP SUPPORT WORKERS PREPARATION Observations R2 Dependent variable: VOTE%

Unstandardized coefficients (b) 29.46 2.31 -9.97 5.69 0.27 0.46 0.23 0.54 373 0.87

p-values 0.001 0.036 0.003 0.017 0.009 0.196 0.005 0.019

3(a)

Write down the fitted regression equation for these variables. [4 marks]

3(b)

Interpret the coefficient of PREPARATION. Is it statistically significant, at the 5% level of significance? [3 marks]

LSE 2008/MI451

Page 5 of 22

3(c)

Interpret the coefficient for SUPPORT. significance?

Is it statistically significant, at the 5% level of [3 marks]

3(d)

Interpret the coefficient for WEBSITE. Is it statistically significant, at the 5% level of significance? [3 marks]

3(e)

Interpret the value of the R2 statistic. [2 marks]

LSE 2008/MI451

Page 6 of 22

Question 4 In a public opinion survey of the British public, respondents were asked whether they would be prepared to give part of their income to help fund initiatives for reducing pollution. Their answers were recorded in a variable called ENVIRONMENT, with possible responses Yes or No. You are interested in how responses to ENVIRONMENT might be related to peoples exposure to relevant issues in the media. You cross-tabulate ENVIRONMENT with a binary variable called MEDIA, which describes how often respondents read about environmental issues in the news. The possible responses are Less than once a week or At least once a week. The contingency table is presented below.
ENVIRONMENT * MEDIA Crosstabulation MEDIA Less than At least once once a week a week 239 220 60.7% 43.0% 155 292 39.3% 57.0% 394 512 100.0% 100.0%

ENVIRONMENT

No Yes

Total

Count % within MEDIA Count % within MEDIA Count % within MEDIA

Total 459 50.7% 447 49.3% 906 100.0%

4(a)

Which of the sentences below best describes what the table shows? Circle the letter of the best answer. [1 mark] 57 per cent of those who would contribute financially to pollution reduction initiatives read about environmental stories in the media at least once a week. 43 per cent of those who read about environmental stories in the media at least once a week would contribute financially to pollution reduction initiatives. The higher the level of media exposure to environmental issues, the greater the likelihood that a respondent is willing to contribute financially to pollution reduction initiatives. The lower the level of media exposure to environmental issues, the greater the likelihood that a respondent is willing to contribute financially to pollution reduction initiatives.

4(b)

A B C D

You run a chi-squared test on the contingency table. What is the null hypothesis for the test? Circle the letter of the best answer below. [1 mark] In the population, there is no association between MEDIA and ENVIRONMENT. In the sample, there is no association between MEDIA and ENVIRONMENT. In the population, there is an association between MEDIA and ENVIRONMENT. In the sample, there is an association between MEDIA and ENVIRONMENT.

LSE 2008/MI451

Page 7 of 22

4(c)

The results of the chi-squared test are presented below. How would you interpret them? Circle the letter of the best answer below. [1 mark]
Chi-Square Tests Value 27.880b 27.176 28.047 27.849 906 df 1 1 1 1 Asymp. Sig. (2-sided) .000 .000 .000 .000

Pearson Chi-Square Continuity Correctiona Likelihood Ratio Linear-by-Linear Association N of Valid Cases

a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 194.39.

I reject the null hypothesis at the 1% level of significance; I infer that in the population there is an association between MEDIA and ENVIRONMENT. I fail to reject the null hypothesis at the 1% level of significance; I infer that in the population there is an association between MEDIA and ENVIRONMENT. I reject the null hypothesis at the 1% level of significance; I infer that in the sample there is no association between MEDIA and ENVIRONMENT. I fail to reject the null hypothesis at the 1% level of significance; I infer that in the sample there is an association between MEDIA and ENVIRONMENT.

[Please turn the page for the next part of the question]

LSE 2008/MI451

Page 8 of 22

4(d)

A colleague suggests that you need to control for peoples level of education when describing the relationship between ENVIRONMENT and MEDIA. You measure educational level with the variable EDUCATION, which has three levels: Compulsory education or less; Basic or intermediate qualifications; and Higher education. You produce a three-way contingency table, and a chi-square test of independence for each of the partial tables. What does the output, shown below, tell you about the association between ENVIRONMENT and MEDIA, when we control for EDUCATION? [6 marks]
ENVIRONMENT * MEDIA * EDUCATION Crosstabulation MEDIA Less than At least once once a week a week 123 91 66.8% 52.3% 61 83 33.2% 47.7% 184 174 100.0% 100.0% 93 79 55.0% 39.3% 76 122 45.0% 60.7% 169 201 100.0% 100.0% 7 37 33.3% 36.6% 14 64 66.7% 63.4% 21 101 100.0% 100.0%

EDUCATION Compulsory or less

ENVIRONMENT

No Yes

Total Basic or intermediate qualifications ENVIRONMENT No Yes Total Higher education ENVIRONMENT No Yes Total

Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA Count % within MEDIA

Total 214 59.8% 144 40.2% 358 100.0% 172 46.5% 198 53.5% 370 100.0% 44 36.1% 78 63.9% 122 100.0%

Chi-Square Tests EDUCATION Compulsory or less Value 7.873b 7.280 7.898 7.851 358 9.127c 8.506 9.155 9.102 370 .082d .001 .083 .081 122 df 1 1 1 1 Asymp. Sig. (2-sided) .005 .007 .005 .005

Basic or intermediate qualifications

Higher education

Pearson Chi-Square Continuity Correction Likelihood Ratio Linear-by-Linear Association N of Valid Cases Pearson Chi-Square Continuity Correction Likelihood Ratio Linear-by-Linear Association N of Valid Cases Pearson Chi-Square Continuity Correction Likelihood Ratio Linear-by-Linear Association N of Valid Cases

1 1 1 1

.003 .004 .002 .003

1 1 1 1

.774 .971 .773 .775

a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 69.99. c. 0 cells (.0%) have expected count less than 5. The minimum expected count is 78.56. d. 0 cells (.0%) have expected count less than 5. The minimum expected count is 7.57.

LSE 2008/MI451

Page 9 of 22

[Answer space for Question 4(d)]

LSE 2008/MI451

Page 10 of 22

Question 5 In another British survey, respondents were asked how much money they had donated in the last year to charities devoted to the arts. Your colleague is interested to know whether women tend to donate larger sums to arts charities than men do, or vice versa. From the survey, she collects the data given below: Gender Mean amount donated to arts charities, in British pounds sterling () 62.50 56.75 Standard deviation Sample size

Male Female

5.81 5.95

102 96

5(a)

In the sample data, do women donate larger sums to arts charities than men do, on average? Or do men tend to donate more than women? By how much? [2 marks]

5(b)

Your colleague wants to know if she can claim that there is a statistically significant difference between the mean amount donated by men and women. However, she does not know how to carry out a t-test. Calculate the t-statistic for the difference in mean amounts donated by men and women. Show your working. Technical tip: in your calculation, assume that the population standard deviations for men and women are equal. [5 marks]

[Answer space continues on next page]


LSE 2008/MI451 Page 11 of 22

[Answer space for Question 5(b) continued]

5(c)

Referring to your result from Question 5(b), is the difference between male and female mean donations statistically significant, at the 5% level of significance? Justify your answer. [3 marks]

LSE 2008/MI451

Page 12 of 22

Question 6 You are interested in the attention young people give to reading different types of blogs on the internet. You run a survey of 900 young people, and ask them which type of blog they read most often. The frequency table below shows the information you collect.

Type of blog read most often Political Music Gossip Sport

Number of readers 207 54 360 279

6(a)

What percentage of people in the sample most often read political blogs? [2 marks]

6(b)

Calculate a 95% confidence interval around the proportion of young people who most often read political blogs. State the upper and lower limits of the interval. [4 marks]

LSE 2008/MI451

Page 13 of 22

PART B (30 marks)

You are investigating internet use in Britain. You draw on data from the 2005 Oxford Internet Survey, and focus on the following variables: VICTIM The number of times (over the past year) that the respondent has been a victim of crime or has been harassed/abused via the internet. The length of time, in months, that the respondent has been using the internet. A measure of the respondents degree of confidence in using the internet, on a scale of 1 to 5, where 1 = high levels of confidence, and 5 = low levels of confidence. For present purposes, this variable is treated as an interval level variable. The respondents gender, coded as follows: 0 = Male 1 = Female. The respondents age, in years.

INTERNETUSE CONFIDENCE

GENDER

AGE

Questions 7, 8 and 9 continue on the following pages, and are based on this data set.

LSE 2008/MI451

Page 14 of 22

Question 7 Output A
Group Statistics GENDER Male Female N 583 745 Mean 10.3298 6.0763 Std. Deviation 14.23866 6.06708 Std. Error Mean 1.46861 .53008

INTERNETUSE

Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means Mean Difference 4.25345 4.25345 Std. Error Difference 1.39176 1.56134

F INTERNETUSE Equal variances assumed Equal variances not assumed 30.166

Sig. .000

t 3.056 2.724

df 223 117.385

Sig. (2-tailed) .003 .007

7(a)

What is the mean difference in INTERNETUSE, comparing males and females? Which group (males or females) has been using the internet for longest on average? [3 marks]

7(b)

What would the mean difference in INTERNETUSE for males and females be under the assumption of the null hypothesis of the t-test? [3 marks]

LSE 2008/MI451

Page 15 of 22

7(c)

Is the association between INTERNETUSE and GENDER statistically significant? State the relevant obtained test statistic and degrees of freedom for the test, and its p-value (use the statistic calculated under the assumption of equal variances assumed). Indicate whether you would reject or fail to reject the null hypothesis. [4 marks]

[Please turn the page for the Question 8]

LSE 2008/MI451

Page 16 of 22

Question 8 Output B
Model Summary Adjusted R Square .202 Std. Error of the Estimate .78942112

Model 1

R .450 a

R Square .203

a. Predictors: (Constant), INTERNETUSE

ANOVAb Sum of Squares 191.234 751.562 942.796

Model 1

df 1 1206 1207

Regression Residual Total

Mean Square 191.234 .623

F 306.866

Sig. .000 a

a. Predictors: (Constant), INTERNETUSE b. Dependent Variable: VICTIM


Coefficientsa Unstandardized Coefficients B Std. Error .933 .056 .573 .040

Model 1

(Constant) INTERNETUSE

t 16.763 14.352

Sig. .000 .000

a. Dependent Variable: VICTIM

8(a)

In the figure below, draw the fitted regression line (it does not have to be exact, but it does have to give a reasonable indication of the slope and intercept). [4 marks]

6 5 4 VICTIM 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 INTERNETUSE

LSE 2008/MI451

Page 17 of 22

8(b)

What would you predict the value of VICTIM to be when INTERNETUSE equals 12 months? And what would you predict the value of VICTIM to be when INTERNETUSE equals 24 months? [6 marks]

[Please turn the page for the Question 9]

LSE 2008/MI451

Page 18 of 22

Question 9 Output C
Model Summary Adjusted R Square .239 Std. Error of the Estimate .76835515

Model 1

R .493 a

R Square .243

a. Predictors: (Constant), INTERNETUSE, GENDER, AGE, CONFIDENCE

ANOVAb Sum of Squares 225.262 703.130 928.392

Model 1

df 5 1191 1196

Regression Residual Total

Mean Square 45.052 .590

F 76.312

Sig. .000 a

a. Predictors: (Constant), INTERNETUSE, GENDER, AGE, CONFIDENCE b. Dependent Variable: VICTIM

Coefficientsa Unstandardized Coefficients Model 1 (Constant) INTERNETUSE GENDER AGE CONFIDENCE B .152 .250 -.282 -.002 .292 Std. Error .276 .150 .092 .003 .056 t .549 1.729 -3.065 -.504 5.213 Sig. .583 .084 .002 .614 .000

a. Dependent Variable: VICTIM

LSE 2008/MI451

Page 19 of 22

9(a)

Has the fit of the regression model presented in Output C increased, compared to the fit of the regression model presented in Output B? Justify your answer. [3 marks]

9(b)

Is the partial effect of CONFIDENCE statistically significant, at the 5% level of significance? Justify your answer. [3 marks]

9(c)

Comparing Output B to Output C, describe the effect of INTERNETUSE on VICTIM now that you are controlling for the effect of GENDER, AGE and CONFIDENCE. [4 marks]

LSE 2008/MI451

Page 20 of 22

LSE 2008/MI451

Page 21 of 22

LSE 2008/MI451

Page 22 of 22

Você também pode gostar