Sps Workbook

GEOG11092 APPROACHES TO GEOGRAPHICAL RESEARCH
QUANTITATIVE ANALYSIS
SPSS Workbook and Assessment Questions ANALYSIS OF UNEMPLOYMENT WITHIN GREATER MANCHESTER CONURBATION
-1-
AIMS
This practical session has three broad aims: a) To provide an understanding of basic descriptive and inferential statistics, and a capability to apply them appropriately to geographical questions; b) To develop computing skills in using the statistical package SPSS, version 14.0; c) To explore the variation in unemployment in a conurbation using the 1991 Census.
INSTRUCTIONS
The Block contains information you require to understand and use descriptive and basic inferential statistics, and to use the computing package SPSS. This workbook gives you the opportunity to put some of the statistical theory introduced in the lectures into practice using SPSS. The questions given at the back of this booklet form the assessment for this block and must be submitted on Blackboard by Friday 1st April at 3pm. The block is structured so that you can use the statistical theory and computing instructions as a reference guide throughout your studies. The intention is that you work through the booklet independently or in small groups. The booklet guides you through how to use SPSS to perform some of the statistical analyses you have been introduced to in the lectures. At key points in the booklet you are referred to the questions at the back of the booklet (p. 17-19). You should answer the questions as you go along as some of the questions refer to specific SPSS tasks. The questions on p. 17-19 are the same as those that appear on Blackboard. These form the assessment for this block. You may work at your own speed, but must complete all the sections and questions before you submit your answers on Blackboard. If you have any questions or are having any difficulties with the SPSS software you should use the Blackboard frequently asked questions discussion board. Support is also available on Wednesdays 12-1pm and Thursdays 12-2pm but please email Jennifer.obrien@manchester.ac.uk in advance to say you will be attending this session. Dont forget you can also ask your PASS leaders for guidance. The data are in an Excel spreadsheet, Quant.xls, which is available from Blackboard. Before you start the analysis you must save the file being used into your own working directory (p drive or removable pen drive) so that you can make changes to it. You can find some more information about the data used in this practical session in Appendix A on p. 15 & 16.
The overall aim is to investigate variations in unemployment within a conurbation. Data are analysed from the 1991 Census of Population at the ward scale for Greater Manchester. Three main research questions are examined: a) How does the unemployment rate vary spatially across the conurbation? b) How does the relationship between unemployment rates of men and women vary across the conurbation? c) To what extent is the female unemployment rate associated with the percentage of households headed by female lone parents?
THE RESEARCH QUESTIONS
-2-
WEEK 1: MEASUREMENT SCALES; MEASURES OF CENTRAL TENDENCY AND DISPERSION; GRAPHIC ANALYSIS ********************************************************************************* Turn to page 17 and answer Q1 & Q2 ********************************************************************************* Part 1: Exploring the data
1.1: The data for this exercise are available in an Excel file Quantitative.xls. This is located on blackboard and on the shared drive. Save the Excel file Quantitative.xls to your working directory (p drive or removable pen drive). 1.2: Open a new SPSS spreadsheet by going to Start > All Programs > Programs Core > Statistics > SPSS 14.0.2 > SPSS 14.0.2. A window pops up asking what you would like to do:
Accept the default option Open an existing data source and click OK. In the Files of type dropdown box select Excel (*.xls) and then navigate to your working folder and select Quantitative.xls. Click Open. Check the box to read the variable names and click OK.
-3-
The SPSS worksheet is now populated with the Census data. At the top of each column is the variable name. There are 9 variables in this dataset and they are described in the table below. If you scroll down the spreadsheet you will see that there are 214 entries or observations.
Column A B C D E F G H I Variable lad ward unemrate munemr funemr blackp pakistp indianp flonepp Description of Variable Local Authority District Ward Total unemployment rate Male unemployment Female unemployment rate Percentage of Black people Percentage of Pakistani people Percentage of Indian people Percentage of households headed by female lone parents
1.3: Save the SPSS Workbook to your working directory by selecting File > Save As from the menu bar. Navigate to your working folder and call the file ApproachesToResearch.sav 1.4: You are currently in Data View which allows you to see a spreadsheet of the raw data. Many of the features of Data View are similar to the features that are found in other spreadsheet applications: Rows are cases. Each row represents a case or an observation. For example, each individual respondent to a questionnaire is a case. Columns are variables. Each column represents a variable or characteristic that is being measured. For example, each item on a questionnaire is a variable. Cells contain values. Each cell contains a single value of a variable for a case. The cell is where the case and the variable intersect. Cells contain only data values. Unlike spreadsheet programs (e.g Excel), cells in the Data Editor cannot contain formulas. The data file is rectangular. The dimensions of the data file are determined by the number of cases and variables. You can enter data in any cell. If you enter data in a cell outside the boundaries of the defined data file, the data rectangle is extended to include any rows and/or columns between that cell and the file boundaries. There are
-4-
no "empty" cells within the boundaries of the data file. For numeric variables, blank cells are converted to the system-missing value. For string variables, a blank is considered a valid value.
We will now switch to Variable View. To do this click on the Variable View tab at the bottom left of the spreadsheet. Variable View contains descriptions of the attributes of each variable in the data file. In Variable View: Rows are variables. Columns are variable attributes. You can add or delete variables and modify attributes of variables, including the following attributes: Variable name Data type Number of digits or characters Number of decimal places Descriptive variable and value labels User-defined missing values Column width Measurement level If you click in a cell under the Type column an ellipsis will appear next to the variable type . Click the ellipsis and a list of the variable types accepted by SPSS appears.
********************************************************************************* Turn to page 17 and answer Q3 & Q4 *********************************************************************************
-5-
1.5: Click Cancel and return to Data View. In Data View we can examine the data as we would in an Excel spreadsheet.
********************************************************************************* Turn to page 17 and answer Q5 & Q6 ********************************************************************************* Part 2: Descriptive Statistics
2.1: With the SPSS Data Editor in Data View select Analyze from the menu bar. Then choose Descriptive Statistics > Frequencies. The Frequencies dialogue box opens:
We will compare male and female unemployment rates. Double click munemr from the list of variables. It appears in the Variable(s) pane. Repeat for funemr. Uncheck the option to Display frequency tables. Click the Statistics button. A new dialogue box appears which displays a range of statistics which we can choose to calculate:
Check the boxes to calculate Mean, Median, Mode, Std. Deviation, Range. Click Continue. Click OK. A new window opens. This is the Output Viewer window, which displays the results (tables, statistics, and charts) produced from data, and opens automatically the first time a command that produces output is performed. The Output Viewer displays a table containing the statistics you requested for the variables you selected. It also tells you how many observations (cases), N were used to calculate the statistics.
-6-
2.2:
Right click on the table in the Output Viewer. Select the option to Copy. Open a new Word document and paste the output statistics into the document.
********************************************************************************* Turn to page 17 and answer Q7 & Q8 ********************************************************************************* Part 3: Graphic Analysis
3.1: Select Graphs from the menu bar. The list of graphing options is displayed. Choose Histogram. The histogram dialogue box opens:
3.2: Double click on unemrate to add it to the Variable pane. There is an option to display a normal curve if you wish. Click on the Titles button. This allows you to input a title, subtitle or footnote for your graph. In the first text bo type: Unemployment Rates in Greater Manchester Wards. Click Continue. Click OK. 3.3: The graph is displayed in the Output Viewer. If you right click on the chart you will see that you get the option to Copy the chart. This is a quick and simple way of copying and pasting SPSS output into a word document. You can also Export the chart as a picture (e.g. .jpg ot .tif) by right clicking and selecting export.
3.4: In the Export Output dialogue box select to export Charts Only from the dropdown menu. Browse to your working folder and give the file an appropriate name (e.g unemrate_histogram). Make sure the option is to export the selected chart only and the file type is JPEG and click OK. The
-7-
file is saved to your working folder and can be inserted into other document types (e.g. Microsoft Word, PowerPoint) as required. 3.5: Double click on the histogram in the Output Viewer. The Chart Editor opens:
The Chart Editor allows you to customise your chart and explore your data. If you click on any of the chart elements (e.g. the title, axis label, histogram bars) the element is highlighted with a blue line. If you then click the Properties icon: this enables you to edit the properties of the selected element. For example clicking on the histogram opens the property window below, which allows you to modify the colour, amongst other properties, of the chart.
-8-
Experiment with the Chart Editor to improve the presentation of your chart. For example, it would be useful to give the x-axis a more meaningful label. Resave your graph after you have finished making modifications.
Unemployment Rates in Greater Manchester Wards
30
Frequency
20
10
0 0 10 20 30 40
Mean =12.4926 Std. Dev. =6.86059 N =214
Unemployment Rate (%)
3.6: Return to the SPSS Data Editor window. We will now examine the relationship between the percentage of households headed by female lone parents and the total unemployment rate. Select Graphs > Scatter/Dot from the menu bar. Choose the default option to draw a Simple Scatter and click Define. Enter flonepp as the X Axis variable and unemrate as the Y Axis variable. You may wish to add a title as you did in step 3.2. Click OK. Edit the chart using the Chart Editor, if required and export the chart to your working folder.
********************************************************************************* Turn to page 18 and answer Q9 & Q10 ********************************************************************************* WEEK 2: TRANSFORMING DATA; HYPOTHESIS TESTING; INFERENTIAL STATISTICS ********************************************************************************* Turn to page 18 and answer Q11 & Q12 ********************************************************************************* Part 4: Transforming Data
4.1: Some of the variables in the dataset cannot be described using a normal distribution. It is necessary to transform these data so that they are normally distributed in order to apply parametric methods. In SPSS you cannot enter
-9-
formulae to calculate new values. Instead you choose from a list of functions and build a calculation expression to compute new data values based on numeric transformations of existing variables. The first step is to establish the distribution of the data using a histogram and descriptive statistics. The histogram for menumr is shown below (you can compute this yourself by referring to the instructions in Part 3:
40
30
Frequency
20
10
Mean =14.8101 Std. Dev. =8.04656 N =214

0 0.00 20.00 40.00
munemr
4.2: Examining the histogram and the descriptive statistics suggests that the variables are positively skewed. In this instance it is appropriate to use the Base 10 Logarithm to transform the data. From the menu bar select Transform > Compute The Compute Variable window appears:
At the top left corner is the Target Variable box, beneath which is a box with the list of variables. At the top right is the Numeric Expression box in which the form of the calculation will be displayed. On the right is a box called
- 10 -
Function Group containing the different function groups that can be used in SPSS. Click on All in the Function Group box. The bottom right box is populated with the available functions. 4.3: In the Target Variable box type: tmunemr. This is the name of the new variable we are creating (transformed male unemployment rate: tmunemr). In the Functions and Special Variables pane scroll down to Lg10 and double click. The LG10 function appears in the Numeric Expression pane. Now from the list of variables double click munemr. The Numeric Expression should now read: LG10(munemr).
If it doesnt say this you can type the expression yourself. Click OK. 4.4: Scroll along the Data Editor window and you will see that a new column has been created containing your new variable:
4.5: Repeat Steps 4.2-4.4 for female unemployment rates.
- 11 -
4.6: Using Part 3 to help you, produce a histogram for the transformed variables, tmunemr and tfunemr. Do you notice a difference in the distribution?
********************************************************************************* Turn to page 18 and answer Q13 & Q14 ********************************************************************************* Part 5: Correlation Analysis
5.1: We will conduct a correlation analysis in order to answer the research question: Is there a strong relationship (or association) between the spatial distributions of male unemployment rates and female unemployment rates in the wards of the conurbation? Select Analyze from the menu bar and select Correlate > Bivariate. The Bivariate Correlations window opens:
The default options are to calculate a Pearsons correlation coefficient with a two-tailed test of significance. This is what we require. Select the variables tmunemr and tfunemr, which you created in Part4, from the left hand pane and add them to the Variables pane. Click OK. The output correlation matrix appears in the SPSS Output Viewer. 5.2: The correlations, r, appear in the Pearson Correlation rows. Because the results are presented as a matrix you also get the correlation between a variable and itself (which will always be 1). The Sig. (2-tailed) row of the matrix tells us whether the correlation is significant. If the value is less than 0.05 (or 0.01 for a 99 % significance level) we can reject the null hypothesis. The N row tells us how many observations were used to calculate this statistic. In this case N = 214. The matrix is symmetrical along the diagonal so the correlation appears twice (x versus y and y versus x).
Correlations x 1 y
Pearson Correlation Sig. (2-tailed) N
Pearson Correlation Sig. (2-tailed) N
- 12 -
5.3: Copy and paste your correlation matrix from the Output Viewer into a word document and save the word document to your working folder.
********************************************************************************* Turn to page 19 and answer Q15 & Q16 ********************************************************************************* Part 6: Regression Analysis
6.1: Transform the flonepp variable using the same process you followed in Part 5. Call the new variable tflonepp. 6.2: We will explore the form of the relationship between female unemployment rate and the percentage of households headed by female lone parents. From the menu bar select Analyze > Regression > Linear The Linear Regression window appears:
6.3: We will use the transformed variables to compute the regression analysis as the method requires the data to be normally distributed. Input tfunemr as the dependent variable and tflonepp as the independent variable. For the Method select Stepwise from the dropdown menu. Click OK. 6.4: The results of the regression analysis appear in the Output Viewer. The results appear in a series of four tables. An example of the output is given below. The results of chief interest have been highlighted in bold in this example. The Model Summary gives the correlation coefficient (r value), which in this case is 0.708, and the coefficient of determination (r2), which in this case is 0.504. The ANOVA table gives you the computed F-statistic. This would be compared to the values in a look up table to test for significance but SPSS does all this for you automatically and the significance is given in the end column. In the example below the significance is less than 0.05 so the result is significant at the 0.05 level. The actual model parameters are given in the Coefficients table.
- 13 -
Variables Entered/Removed Model Variables Entered 1 VAR00003 a All requested variables entered. b Dependent Variable: VAR00002
Variables Removed .
Method Stepwise
Model Summary R Square Model R Adjusted R Square .708 .501 .438 a Predictors: (Constant), VAR00003 ANOVA Model 1
Std. Error of the Estimate 1.42203
Sum of Squares Regression 16.223 Residual 16.177 Total 32.4 a Predictors: (Constant), VAR00003 b Dependent Variable: VAR00001 Coefficients Model Unstandardized Coefficients Std. Error 1.422 .237
df 1 8 9
Mean Square 16.223 2.022
F 8.027
Sig. .022
B 1.579 1(Constant) .670 VAR00003 a Dependent Variable: VAR00002
Standardized Coefficients Beta .708
t 1.110 2.832
Sig. .299 .022
6.5: Recall that a simple linear regression equation is of the form: y = a + bx. We use the Coefficients table to retrieve the values for the regression equation. In this example the equation is: VAR00002 = 1.579 + (0.670 x VAR00003).
********************************************************************************* Turn to page 19 and answer Q17 Q20 *********************************************************************************
- 14 -
APPENDIX A
SOURCE OF DATA: THE CENSUS OF POPULATION The Census of Population is a major source of information about a country, and is used by organisations such as national government, local authorities, and health authorities in planning provision and allocating finance. The census is also used by businesses to locate and target particular market segments, and by academics for social and economic research. In the United Kingdom a census has been collected every ten years since 1801. The only year a census was not taken was 1941. The information about each person is collected at the household level. Completing a census return is compulsory. The information that is collected about the population of the United Kingdom has changed over the last 190 years. Today, the information collected about every individual includes their employment status, attainment of higher education qualifications, possession of a car and recent changes of address, as well as personal details such as name, date of birth, marital status and details about their accommodation such as the tenure of the property, number of rooms and whether or not it has central heating. The information collected varies over time; for example, in the 1991 Census every person in the household had to identify the ethnic group to which they belong, in contrast to previous censuses where the only indication of ethnicity was the country of birth of the head of the household. The census data can be accessed from the publications of the Office of Population, Census and Statistics (OPCS), and also from a computer program known as 'SASPAC'. We have downloaded the data you will use from 'SASPAC 91'. There are a number of problems in using census data. Some of these arise from the need to preserve the confidentiality of the individual and the household. To this end, census data are protected by law with what is called the 'hundred year rule'. This states that the original census returns collected by the enumerator are not available to the general public, or to researchers, until 100 years after that census was collected. For censuses after 1891 the data are only available for spatial units at the following scales: a) The Enumeration District level. This is the smallest areal unit for which data collected after 1891 is currently available. The actual number of households included varies with the size of the enumeration district, but each enumeration district contains data that has been collated from approximately 200 households (equivalent to approximately 500 people). Not all the information that is collected in the census is available at this level. This is because at this scale it may still be possible to identify a particular individual or household from certain characteristics collected about them in the census. As an additional means of preserving confidentiality, the data which is available at this level has been amended by randomly adding +1 or -1 to randomly selected cell values in published tables. This process is called 'Barnardisation'.
- 15 -
For example, at the enumeration district level we do not know whether the actual number of unemployed people is one greater, or one less, than the number of people unemployed reported in the census or whether the figure given is the actual number of unemployed people in the enumeration district. Most of the census tables of data are subject to Barnardisation. b) The census data have been aggregated at three further spatial scales: the ward level, the metropolitan district or shire county level, and the national level. You will be using data at the ward level. The ward, the areal unit that is used for local elections, is composed of a number of enumeration districts. It contains the data on approximately 4,000 households (equivalent to approximately 10,000 people). These tables are not Barnardised. There are also a number of problems associated with the data used in this Block. These problems arise from people not declaring information about themselves or the household or not giving honest replies to the questions asked on the census forms. For example, the figures for male and female unemployment rates may not represent exactly the proportion of men and women unemployed because of the census undercount and nondeclaration of casual employment. There was a significant undercount of the population in the 1991 Census. This resulted from people not filling in a census form, for example because they were homeless. Others did not declare themselves for political reasons such as the misconception that the census could be used to identify illegal immigrants, and those avoiding the 'community charge'. The census does ask for the numbers of hours worked, as well as whether people are employed. Of those people who completed the census form, some may not have declared casual employment where they are paid cash in hand, perhaps to avoid being taxed. More women than men are employed on a casual basis, and surveys have found that those who work from home on a casual basis are less likely to record themselves as employed. The unemployment rate derived from the census is the percentage of the economically active population who are unemployed. The total number of economically active is derived from those aged between 18 to 65 for males, and between 18 to 60 for females, who are employed, on a government training scheme, or unemployed. The data on unemployment can also be obtained from 'NOMIS', an on-line information service. These data have the advantage of being available for dates after the census. However, they are different from the census because they are based on those registered as unemployed (not stated as unemployed as in the census). They are available for the 1981 wards, but not for the 1991 ones as there have been significant boundary changes since 1981. They are therefore not as easily correlated with other variables from the 1991 Census. Data on ethnicity in this Block is shown as three groups: Black, Pakistani and Indian. Those people who identify themselves as Black are derived from three subsets: those saying they were 'Caribbean', 'African' or 'Other' Blacks. The Census also records White Bangladeshis, Chinese and other Asians, but these are not included here.
- 16 -
BLACKBOARD QUIZ QUESTIONS

Q1. For each of the following variables, indicate the correct scale of measurement (a) A three-fold classification of slopes into steep, moderate and gentle Nominal Ordinal Interval Ratio
(b) The average number of persons per household in each district of London Nominal Ordinal Interval (c) River discharge Nominal Interval (d) Ethnic origin Nominal Interval Ratio Ordinal Ratio Ordinal Ratio
Q2. Calculate the mean, median and mode of the following data set: 98, 25, 32, 64, 25, 32, 89, 16, 72, 25, 33 Mean (to 2 decimal places): _______________________ Median: _____________________ Mode: _______________________ Q3. List 3 variable types which can be stored by SPSS ___________________________________________________________ Q4. What type of data are stored in the variable pakistp? If the percentage of Pakistani people in a ward is 7.3076523869, how would this number be presented in the Data View?
______________________________________________
Q5. How many wards are there in Bury Local Authority District? ____________________________________________________________ Q6. Which ward in Salford has the highest Total Unemployment Rate?
- 17 -
_____________________________________________________________ Q7. What are the mean male and female unemployment rates for wards in Greater Manchester (to 2 decimal places)? _____________________________________________________________ Q8. Which variable is more dispersed male or female unemployment rate? Explain your answer. ________________________________________________________________________ ________________________________________________________________________ ________________________________________________________________________ Q9. Using the histogram you produced in Part 3 how would you describe the distribution of the total unemployment rate for wards in Greater Manchester? Normally distributed Negatively Skewed Positively Skewed Bi-modal
Q10. Using the scatterplot you produced in Part 3, how would you describe the relationship between the percentage of households headed by female lone parents and the total unemployment rate? Weak, negative association Weak, positive association Strong, negative association Strong, positive association
Q11. Indicate whether the following statements are true or false: (a) Rejection of the null hypothesis implies that the alternative hypothesis must be true.
________________________
(b) If the null hypothesis is true, a = 0.05 implies that there is a 5 % chance of wrongly rejecting it.
________________________
(c) Setting a = 0.05 means that if the null hypothesis is correct there is a 0.01 probability of rejecting it.
________________________
Q12. If the research question is: 'Is there a strong relationship (or association) between the spatial distributions of male unemployment rates and female unemployment rates in the wards of the conurbation?' (a) State the null hypothesis, H0:
- 18 -
_________________________________________________ _________________________________________________
(b) State the research hypothesis, H1:
_________________________________________________ _________________________________________________
Q13. Why is it necessary to transform the data for the variables munemr and funemr (Part 4)?
_________________________________________________ _________________________________________________
Q14. What is the new mean for the transformed male unemployment rate, tmunemr, which you created in Part 4 (to 2 decimal places)? _____________________________ Q15. What is the correlation coefficient, r, between the transformed male and female unemployment rates you calculated in Part 5?
________________________
Q16. Is the correlation coefficient between male and female unemployment rates significant at the 0.01 level? Explain your answer.
_________________________________________________
Q17. What is the regression equation for describing the relationship between female unemployment rates and the percentage of households headed by a lone female parent?
_________________________________________________
Q18. What percentage of the female unemployment rate is explained by the percentage of households headed by a female lone parent (Hint: Refer to your lecture notes on the Coefficient of Determination)?
________________________
Q19. Is the regression equation describing the relationship between female unemployment rate and the percentage of households headed by a female lone parent significant at the 0.01 level? Explain your answer.
_________________________________________________
Q20. With reference to the null and research hypotheses, what can we conclude about the relationship between female unemployment rate and the percentage of households headed by a female lone parent?
_________________________________________________ _________________________________________________
- 19 -

Sps Workbook

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Sps Workbook

Enviado por

Direitos autorais:

Formatos disponíveis

GEOG11092 APPROACHES TO GEOGRAPHICAL RESEARCH

THE RESEARCH QUESTIONS

* Turn to page 17 and answer Q3 & Q4 *

Mean =12.4926 Std. Dev. =6.86059 N =214

Unemployment Rate (%)

Mean =14.8101 Std. Dev. =8.04656 N =214

4.5: Repeat Steps 4.2-4.4 for female unemployment rates.

Pearson Correlation Sig. (2-tailed) N

Pearson Correlation Sig. (2-tailed) N

Std. Error of the Estimate 1.42203

Mean Square 16.223 2.022

B 1.579 1(Constant) .670 VAR00003 a Dependent Variable: VAR00002

Standardized Coefficients Beta .708

Sig. .299 .022

* Turn to page 19 and answer Q17 Q20 *

BLACKBOARD QUIZ QUESTIONS

Você também pode gostar