Você está na página 1de 21

XAVIER INSTITUTE OF MANAGEMENT, BHUBANESWAR

SOCIAL RESEARCH METHODS


A study on the relative importance of different factors while purchasing a land.
Submitted to Prof. Prahlad Mishra

Submitted by-

HYPOTHESIS Questionnaire
The questionnaire that we used for our survey is given below

Name: Gender: Occupation: Income (Annual): y y y y y Less than 75,000 75,000-1,50,000 1,50,000-3,00,000 3,00,000-5,00,000 above 5,00,000

Educational Qualification: y y y y y y Matriculate Intermediate Diploma holder Graduate Post Graduate Others

1. Do you own a land already?

Y/N

If yes then where and what is the year of purchase? ____________________ 2. What is the approximate size of the land that you are willing to buy? (in terms of square feet) ___________________________

3. What is the purpose behind your purchasing a land? a. Building a home b. Setting up some industry c. Investment purposes d. Real estate business

4. Rate the following financial items that you give most importance to (rate from 3-most important to 1- least important) 1. Initial purchase price 2. Resale price _______ _______

3. Availability of various Financing options _______

5. What is the kind of the location of the land you are looking at? a. Tier-1 city b.

6. Which is more important to you, mileage or pick up? (1 being highest for mileage and 5 being highest for pickup) 1. 2. 3. 4. 5.

7. Are you willing to pay extra for electric start? Y/N If Yes, then how much? ________

8. Rate according to importance on the scale of 1 to 5.( 5 being the most important.) Factor Brand name Colour Discounts/free goodies
3

Word of mouth/Peer pressure Top speed Ground clearance Tyre width Manoeuvrability Comfort ability Availability of service centre

9. What will be the major usage of your land? ______________

Assignment 1
The assignment was to conduct bivariate analysis of data. The data is taken from the book Mankiw. It is fifteen year data of Real GNP and Consumption expenditure. By performing bivariate analysis we found out that Expenditure on Imported Goods is cubically dependent on Personal Disposable Income. DATA Year 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 Real GNP 203.6 183.5 169.5 144.2 141.5 154.3 169.5 193.2 203.2 192.9 209.4 227.2 230.4 234.6 231.5 Consumption Expenditure 139.6 130.4 126.1 114.8 112.8 118.1 125.5 138.4 143.1 140.2 148.2 115.7 149.5 135.8 151.4

VARIABLE USED The Real GNP is the independent variable used for the analysis (shown on the X-axis). The Consumption Expenditure is the dependent variable (shown on the Y-axis). APRIORI REASONING The Real GNP and the Consumption Expenditure are directly related. This means that as the Real GNP increases, Consumption Expenditure also increases. NULL HYPOTHESIS: H0 = No relation between Consumer Expenditure and Real GNP. ALTERNATIVE HYPOTHESIS: H1 = they are statistically related.

Linear
Model Summary Adjusted R R .919 R Square .844 Square .832 Std. Error of the Estimate 5.563

The independent variable is X. ANOVA Sum of Squares Regression Residual Total 2181.278 402.371 2583.649 df 1 13 14 Mean Square 2181.278 30.952 F 70.474 Sig. .000

The independent variable is X.

Coefficients Standardized Unstandardized Coefficients B X (Constant) .395 59.269 Std. Error .047 9.171 Coefficients Beta .919 t 8.395 6.463 Sig. .000 .000

Linear Equation:

Yi^ = 59.269+ 0.395 Xi

When we did the regression analysis using linear model, we got an R Square value .844. With this model, it shows that the variability in the Consumption Expenditure (Y) is explained by the change in the Real GNP to the extent of 84.4%. This model gives us the autonomous or subsistence level of consumption. The slope coefficient is the marginal propensity to consume.

Logarithmic
Model Summary Adjusted R R .929 R Square .864 Square .853 Std. Error of the Estimate 5.205

The independent variable is X.

ANOVA Sum of Squares Regression Residual Total 2231.474 352.175 2583.649 df 1 13 14 Mean Square 2231.474 27.090 F 82.371 Sig. .000

The independent variable is X. Coefficients Standardized Unstandardized Coefficients B ln(X) (Constant) 74.190 -253.986 Std. Error 8.174 42.914 Coefficients Beta .929 t 9.076 -5.918 Sig. .000 .000

Logarithmic Equation: Log Yi^ = -253.986 + 74.190 Log Xi When we did the regression analysis using logarithmic model, we got an R Square value .864. With this model, it shows that the variability in Consumption Expenditure (Y) is explained by the change in the Real GNP to the extent of 86.4%.

Quadratic
Model Summary Adjusted R R .938 R Square .881 Square .861 Std. Error of the Estimate 5.069

The independent variable is X. ANOVA Sum of Squares Regression Residual Total 2275.274 308.376 2583.649 df 2 12 14 Mean Square 1137.637 25.698 F 44.270 Sig. .000

The independent variable is X. Coefficients Standardized Unstandardized Coefficients B X X ** 2 (Constant) 1.535 -.003 -45.827 Std. Error .598 .002 55.583 Coefficients Beta 3.572 -2.660 t 2.568 -1.913 -.824 Sig. .025 .080 .426

Quadratic Equation: Yi^ = -45.827 +1.535 Xi - .003 Xi^2 When we did the regression analysis using Semi-Log model, we got an R Square value .881. With this model, it shows that the variability in Consumption Expenditure (Y) is explained by the change in the Real GNP to the extent of 88.1%.

Cubic
Model Summary Adjusted R R .940 R Square .883 Square .864 Std. Error of the Estimate 5.014

The independent variable is X. ANOVA Sum of Squares Regression Residual Total 2281.943 301.706 2583.649 df 2 12 14 Mean Square 1140.972 25.142 F 45.381 Sig. .000

The independent variable is X.

10

Coefficients Standardized Unstandardized Coefficients B X X ** 3 (Constant) .995 -5.498E-6 -14.145 Std. Error .303 .000 37.609 Coefficients Beta 2.316 -1.411 t 3.284 -2.001 -.376 Sig. .007 .069 .713

Cubic Equation: Yi^ = -14.945 +.995 X - 5.498E-6 Xi^3 When we did the regression analysis using Semi-Log model, we got an R Square value .883. With this model, it shows that the variability in Consumption Expenditure (Y) is explained by the change in the Real GNP to the extent of 88.3%.

11

Assignment 2
In this assignment, we had a dependent variable (Production) and 4 independent variables (Area, Yield, %Coverage under irrigation and fertilizer). We were trying to study the relation of the dependent variable with the independent variables. Null hypothesis: There is no significant relationship between the four independent variables and the agricultural production. Iteration 1
Model Summary Std. Error of the Model 1 R 1.000
a

R Square 1.000

Adjusted R Square .999

Estimate .61255

a. Predictors: (Constant), FERTILIZER, AREA, YIELD, COVERAGE Coefficients


a

Standardized Unstandardized Coefficients Model 1 (Constant) AREA YIELD COVERAGE FERTILIZER a. Dependent Variable: PRODUCTION B -156.907 1.293 .128 -.303 .012 Std. Error 12.308 .069 .003 .230 .022 .183 1.114 -.055 .018 Coefficients Beta t -12.749 18.733 41.923 -1.318 .546 Sig. .000 .000 .000 .204 .592

12

Coefficient Correlationsa Model 1 Correlations FERTILIZER AREA YIELD COVERAGE Covariances FERTILIZER AREA YIELD COVERAGE a. Dependent Variable: PRODUCTION FERTILIZER 1.000 -.400 -.197 -.702 .001 .000 -1.349E-5 -.004 AREA -.400 1.000 -.519 .776 .000 .005 .000 .012 YIELD -.197 -.519 1.000 -.543 -1.349E-5 .000 9.362E-6 .000 COVERAGE -.702 .776 -.543 1.000 -.004 .012 .000 .053

Here we find that there is high correlation between area and coverage. Further coverage is less significant as compared to area. Therefore we remove coverage and conduct regression again. We have removed the variables manually using reasoning rather than backward, forward, etc methods.

Iteration 2
Model Summary Std. Error of the Model 1 R 1.000
a

R Square .999

Adjusted R Square .999

Estimate .62434

a. Predictors: (Constant), FERTILIZER, AREA, YIELD

Coefficients

13

Standardized Unstandardized Coefficients Model 1 (Constant) AREA YIELD FERTILIZER a. Dependent Variable: PRODUCTION B -171.313 1.363 .126 -.009 Std. Error 5.775 .044 .003 .016 .193 1.095 -.012 Coefficients Beta t -29.666 30.734 48.141 -.522 Sig. .000 .000 .000 .607

Coefficient Correlationsa Model 1 Correlations FERTILIZER AREA YIELD Covariances FERTILIZER AREA YIELD a. Dependent Variable: PRODUCTION FERTILIZER 1.000 .321 -.965 .000 .000 -4.118E-5 AREA .321 1.000 -.185 .000 .002 -2.152E-5 YIELD -.965 -.185 1.000 -4.118E-5 -2.152E-5 6.859E-6

By similar reasoning we now remove fertilizer and conduct regression.

14

Iteration 3
Model Summary Std. Error of the Model 1 R 1.000
a

R Square .999

Adjusted R Square .999

Estimate .61289

a. Predictors: (Constant), YIELD, AREA

Coefficientsa Standardized Unstandardized Coefficients Model 1 (Constant) AREA YIELD B -171.437 1.371 .125 Std. Error 5.664 .041 .001 .194 1.084 Coefficients Beta t -30.268 33.243 185.528 Sig. .000 .000 .000

a. Dependent Variable: PRODUCTION

Coefficient Correlations Model 1 Correlations YIELD AREA Covariances YIELD AREA

YIELD 1.000 .505 4.522E-7 1.399E-5

AREA .505 1.000 1.399E-5 .002

15

Coefficient Correlations Model 1 Correlations YIELD AREA Covariances YIELD AREA a. Dependent Variable: PRODUCTION

YIELD 1.000 .505 4.522E-7 1.399E-5

AREA .505 1.000 1.399E-5 .002

ANALYSIS:
In the initial, we have taken four independent variables as Area, coverage of irrigated land, fertiliser and yield and found that the correlation between these independent variables and the agricultural food grains follows high correlation between them. Our initial hypothesis was there was no correlation between these independent variables. Now when we have run the SPSS software, multi- regression, we found that the significance level of the coverage was low and also the correlation between coverage and area under irrigation was high. The value was 0.77 also; the significance level was 0.53, which shows that the F-value of the variance was low. Now we have eliminated this variable and again try to run the SPSS multi regression with the other three independent variables and found that the significance level of the other independent variables have improved. However, at the same time there was high correlation between the yield and the fertiliser used. And because the significance level of the fertiliser was low, so we have eliminated this variable and try to again run the SPSS software multi regression. Now after the third run we have found that the significance level or the variance which comes from the F-statistics was very high for both the independent variables i.e. yield and Area under cultivation. The sig. Value of 0.00 for both the variables testifies the high significant variances between the food grain agricultural production and the two independent variables. Also, correlation among the independent variables was significantly low, testifying the validity of the result. And proving that given the secondary sample of agricultural production of India is improving with the yield and area under cultivation. Also, because the B coefficient is positive, so the correlation between them is directly proportional.

16

Assignment - 3 Factor Analysis


Factor analysis is a statistical method used to describe variability among observed variables in terms of fewer unobserved variables called factors. The general purpose is to condense the

information contained in a number of original variables into a smaller set of new, composite dimensions or variants (factors) with a minimum loss of information.
In our analysis we had the following manifest variables:

Manifest Variables y Initial Price y Resale Price y Financing Options y Mileage vs. Pickup y Discounts y Top Speed y Ground Clearance y Tyre Width y Maneuverability y Service Centre

KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett's Test of Sphericity Approx. Chi-Square df Sig. .601 236.175 45 .000

17

Interpretation: The degree of common variance among the fourteen variables is "mediocre". If a factor analysis is conducted, the factors extracted will account for fare amount of variance but not a substantial amount. Analysis: Since we were doing an exploratory analysis we used Principal Component Methodology. Based on the analysis we reduced the number of variables from 10 to 4. Some of the variables were also found to be redundant which were then removed. Assumption: 1) The cut-off point taken is 0.5 2) Factor analysis done using 10 manifest variables instead of 10 for better explanatory reasons. Factors Identified after Factor Reduction Method: 1) Technical Specification (explains 33.462% of the total variance)  Ground clearance (0.849)  Tyre width (0.888)  Top Speed (0.810)  Mileage Pick up. (0.566)  Maneuverability (0.811) 2) Additional Incentives (explains 15.959% of the variance)  Discounts (0.775)  Service centre (0.734) 3) Initial Price (explains 14.239% of the variance) 4) Financing Options (explains 13.201% of the variance)

18

Total factors formed after Reduction = 4, which accounted for 76.86% of the total variance. Factor which was redundant as per analysis based on respondent data collected are: 1) Resale Price

Cluster Analysis
Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. The basis employed can be socio economic, demographic, psychological etc. Here the variables used for cluster analysis are: Occupation Annual Income Education Ownership of a Bike The SPSS package was used for the analysis. The agglomeration method based on Euclidean distance (within the group) was the methodology employed. The Euclidean Distance taken was 5 The following three clusters were identified: CLUSTER 1: Medium price owned bike holders- working professionals y y y y 25 Respondents Working Professionals Annual Income- 150K-500K Medium price owned bike holders

19

CLUSTER 2 High Price owned bike holders- professionals y y y y 24 Respondents Professionals Annual Income- above 500K High Price owned bike holders

CLUSTER 3 Non bike holders- Graduation students y y y 5 Respondents Graduation students Non bike holders

20

Dendogram using Average Linkage (Between Groups) Rescaled Distance Cluster Combine C A S E 0 5 10 15 20 25 Label Num +---------+---------+---------+---------+---------+ 60 62 6 41 46 47 19 39 51 4 24 37 21 23 14 17 11 13 9 10 26 2 3 49 30 43 5 58 34 61 33 45 15 16 38 54 52 57 7 12 50 8 25 27 32 35 1 18 56 44 48 28 55 29 31 22 20 40 42 36 53 59

21

Você também pode gostar