Você está na página 1de 6

Multiple Regression Analysis Case #28, Housing Prices II Keller Graduate School of Management GM533 Ryan D.

. Lee Executive Summary: In this report I will use a multiple regression analysis approach to predict the appropriate selling price of my home in Eastville, Oregon. This approach is a statistical analysis that will explain the correlation between several selling features (independent variables) with the selling price of a home (dependent variable). The value in this approach is that it provides a systematic approach that can be duplicated and used to help potential For Sale By Owner homeowners who are unsure how to price their home. Introduction: I am a homeowner in Eastville, Oregon. Like many homeowners these days I am looking for any way to not only save, but find additional income. I have been thinking about selling my home but do not want to pay the Realtor commissions, as my home has already lost some value with the declining economy. As a result I have decided to conduct a systematic approach to determine the value of my home using commonly sought after features in homes. This approach, has helped me determine the appropriate selling price for my home and can be duplicated and used by anyone else selling a home. As an entrepreneur I have decided to market my approach to other potential For Sale By Owner (FSBO) homeowners. The average commission paid to Realtors is between 5-6% of the selling price of your home, [ (Commissions, 2011) ]. While Realtors offer important services to those who are trying to complete a FSBO transaction it can be quite daunting without the right information. As I will explain in this report I have already done all of the ground work and thoroughly explain the process of a multiple regression analysis. The comparables that I used in my analysis are easily transferred to any market in the country. It is a turnkey approach to determine the appropriate value for your home allowing you to be a successful and more profitable home seller. The data used in this analysis came from a sample size (n) of 108 homes with varying features, all located in Eastville, Oregon. The features, or independent variables (X), that were compared in this analysis are listed below: * Square Feet SQ FT (X1), total square feet * Bedrooms BEDS (X2), number of bedrooms * Bathrooms BATHS (X3), number of bathrooms * Heating HEAT (X4), gas or electric, gas = 0, electric = 1 * Architectural Style STYLE (X5), tri-level = 0, two story = 1, ranch styled = 2 * Garage GARAGE (X6), number of cars that can fit in the garage * Age AGE (X7), age of the home in years * Fire FIRE (X8), no fireplace present = 0, at least one fireplace present = 1 * Basement BASEMENT (X9), no basement = 0, basement present =1 * School SCHOOL (X10), Eastville school district = 0, Apple Valley school district = 1

As previously mentioned to interpret the data and determine the relationship between the dependent variable (Y Price) and the independent variables (X selling features) I used a multiple regression analysis. Multiple Regression models employ more than one independent variable to more accurately describe, predict and control a dependent variable, * (Bruce L. Bowerman, 2010) +. Thus, the main objective of this report is to use a multiple regression model with several independent variables (X) to understand the relationships among the variables in order to better understand the dependent variable (Y) in order to predict the selling price of a house. Methods and Analysis: Since the main objective of this process is to determine the selling price of homes the dependent variable (Y) in this analysis is Price. We will begin by stating our Null Hypotheses (Ho) and Alternative Hypotheses (Ha) Ho: B1 = B2 = . . . = Bk = 0, meaning none of the independent variables are significantly related to Y (selling price). Ha: At least one of the Bis does not equal 0, meaning at least one of the independent variables (X) is significantly related to Y (selling price). Each independent variable (X) has its own unique regression coefficient, which carries the relationship between that particular independent variable (X) and the dependent variable (Y), holding all other variables constant. Multiple regression models are a process of elimination that employ a systematic procedure of comparing a dependent variable (Y) and a number of independent variables (X). With each output you check every regression coefficient to see if it is significant. If there are regression coefficients that arent significant, you re-run the regression excluding the insignificant independent variables (X) until all of the included independent variables (X) have significant regression coefficients. Another important aspect of the multiple regression model is the R output, which is the proportion of the total variation in the n observed values of the dependent variable that is explained by the overall regression model, [ (Bruce L. Bowerman, 2010) ]. The closer R is to 1 the better it explains the variation in the dependent variable Y by the variation in the independent variables (X). Using a statistics program called MegaStat we are able to run the multiple regression analysis and analyze the data. Refer to Appendix I for detailed MegaStat output on the first analysis with 10 independent variables (X). Using all 10 independent variables (X) we have an adjusted R value of .808 which means that 80.8% of the variation in the dependent variable Y (selling price) is explained by our independent variables X. If the p-value for testing H is less than: | | | | | | | | | 0.10, we have some evidence that H is false. | | 0.05, we have strong evidence that H is false. | | 0.01, we have very strong evidence that H is false. | 0.001 we have extremely strong evidence that H is false. | Levels of Significance used to interpret the weight of evidence against the Null Hypotheses (Ho) Next we look at the F-test and corresponding P-value. The F-test is used to test the significance of the relationship between the independent variables (X) and the dependent variable (Y). The higher the Fvalue is with a corresponding low P-value, which is below the main three levels of significance of .10, .05, and .01, will indicate how well the model does at explaining the variation between the variables In Appendix I the F-value is 45.91 and the corresponding P-value is 2.21E-32. Because the P-value is less than all the levels of significance we can conclude that there is extremely strong evidence that at least one of the 10 independent variables of X is significantly related to Y (sales price). Therefore, we can

reject the null hypotheses and accept the alternative hypotheses indicating that at least one of the independent variables (X) is significantly related to Y (selling price). From this we can attempt to improve our model by examining the P-value of each independent variable (X) eliminating any independent variables that do not help explain the variation between the independent variables (X) and the dependent variable (Y). All of the above described metrics are used to aid us in our decision of either accepting the null hypotheses (Ho) or rejecting it and accepting the alternative hypotheses (Ha). In our case the null hypotheses (Ho) states that there is no correlation between the selling price of a house and the 10 independent variables (X) that we are testing. Conversely, the alternative hypotheses (Ha) states that there is a correlated relationship between all or some of the independent variables (X) and the dependent variable (Y). The multiple regression model that relates Y to X1, X2,,Xk is represented as y = + 1X1 + 2x2 + kXk + . This equation is used to describe the relationship between the dependent variable (Y) and independent variables (X). Thus, we can use the equation to predict Y (selling price of a home), when all 10 independent variables (X) are present. We will use this equation in the second MegaStat analysis that we run. Now that we have examined the data above we will analyze the P-values for each independent variable eliminating the independent variables with high P-values to improve the efficiency of our multiple regression equation. Referring to the MegaStat output in Appendix I we see that independent variables X3 (Baths), X5 (Style), and X9 (Fire), all have P-values greater than the .10 level of significance and X10 (School) has a P-value higher than the .05 level of significance. With these four independent variables we will reject the null hypotheses (Ho) indicating that Bathrooms, Style, Fireplaces and School district have no effect on the selling price of a home. Thus, we will eliminate them from our equation and run another MegaStat analysis. Furthermore, the P-values for X1 (Square Feet), X6 (Garage), X7 (Basement), and X8 (Age) all are less than the .01 level of significance indicating that these independent variables (X) provide very strong evidence that Ho is false and that they contribute significantly to the selling price of homes. Independent variables X2 (Bedrooms), and X4 (Heat) are less than the .05 level of significance providing strong evidence that Ho is false. After examining all the independent variables (X) individually we are ready to run a new MegaStat analysis with only the independent variables (X) that contribute to the variation in the dependent variable (Y). We will run the same MegaStat analysis described above with only 6 independent variables (X), see Appendix II. The analysis is listed in summary form below: statistic | Value | Interpretation | Adjusted R | 0.797 | 79.7% of variation in Y can be explained by all or some of the independent variables (X) used in the equation | F-value | 70.82 | High F-value with corresponding low P-value indicates that the model is a good fit to predict the correlation between independent variable(s) and the dependent variable | corresponding P-value | 5.79E-34 | P value is less than three main levels of significance meaning we have very strong evidence that Ho is false | independent variable (X) with smallest P-value | Square Feet (X1) | Square feet is the independent variable (X) that contributes most to the variation in (Y). This indicates that homes with more square feet will sell for a higher price | The values in the summary above indicate that the multiple regression analysis with only 6 independent variables (X) is better at predicting the correlation between X and Y than the first model. Because the

independent variable (X1) square feet has the smallest P-value it indicates that it contributes most to the variance in (Y), the selling price of a home. To be able to see this visually we will use a scatter plot diagram. The value of the scatter plots is that you can see the relationship between the independent and dependent variables visually.

Using the equation y = + 1X1 + 2x2 + kXk + along with the 6 independent variables (X) we will predict the selling price of three homes from our sample (n) of 108 homes. We will use the smallest square footage home, median square footage home, and largest square footage home from our sample home values while keeping all other independent variables (X) constant. * X1 variables to be used: 816 (smallest sq ft), 1758 (median sq ft), 2809 (largest sq ft) * Constant independent variables: X2 = 3, X3 = 1, X4 = 2, X5 = 1, X6 = 11 Example 1: Smallest Square Footage Home Y = -12.5988 + .0383*X1 + 4.3573*X2 14.5371*X4 + 16.0610*X4 + 11.3576*X5 1.2168*X6 Y = -12.5988 + .0383(816) + 4.3573(3) 14.5371(1) + 16.0610(2) + 11.3576(1) 1.2168(11) Y = $74.05 Example 2: Median Square Footage Home Y = -12.5988 + .0383*X1 + 4.3573*X2 14.5371*X4 + 16.0610*X4 + 11.3576*X5 1.2168*X6 Y = -12.5988 + .0383(1758) + 4.3573(3) 14.5371(1) + 16.0610(2) + 11.3576(1) 1.2168(11) Y = $110.13 Example 1: Largest Square Footage Home Y = -12.5988 + .0383*X1 + 4.3573*X2 14.5371*X4 + 16.0610*X4 + 11.3576*X5 1.2168*X6 Y = -12.5988 + .0383(2809) + 4.3573(3) 14.5371(1) + 16.0610(2) + 11.3576(1) 1.2168(11) Y = $150.39 As we can see holding all other independent variables (X) constant we can change the value for square footage and determine the appropriate sales price. Conclusion: Using a multiple regression analysis we are able to determine the correlation between a dependent variable (Y) and several independent variables (X). By refining the model that we use we were able to predict the appropriate selling price of a home. From the multiple regression analysiss that we performed we can reject the null hypotheses (Ho) and accept the alternative hypotheses (Ha) indicating that at least one of the independent variables (X) contributed to the variation in the selling price of a home (Y). While the adjusted R values were fairly similar in both of our attempts the F-Test and the corresponding P-value in the second multiple regression analysis indicated that the second test was a better fit at predicting Y with the six values of X. Based on the P-values for the six independent variables (X) the equation that best predicts the variation in selling price of a house (Y) is: Y = -12.5988 + .0383*X1 + 4.3573*X2 14.5371*X4 + 16.0610*X4 + 11.3576*X5 1.2168*X6 By performing a similar procedure in any area of the United States a FSBO seller could more accurately predict the appropriate selling price for their home. The independent variables can be changed to better fit unique selling features in your area. Appendix I MegaStat Analysis output with 10 independent variables (X) Regression Analysis | | | | | R | 0.826 | | | | | | | | | | | | | |

| Adjusted R | 0.808 | n | 108 | | | |R | 0.909 | k | 10 | | | | Std. Error | 11.594 | Dep. Var. | PRICE (Y) | | | | | | | | | | ANOVA table | | | | | | | Source | SS | df | MS | F | p-value | | Regression | 61,703.8105 | 10 | 6,170.3811 | 45.91 | 2.21E-32 | | Residual | 13,037.8312 | 97 | 134.4106 | | | | Total | 74,741.6417 | 107 | | | | | | | | | | | | | | | | | | | Regression output | | | | | confidence interval | variables | coefficients | std. error | t (df=97) | p-value | 95% lower | 95% upper | Intercept | -15.2124 | 9.8179 | -1.549 | .1245 | -34.6982 | 4.2734 | SQ FT (X1) | 0.0376 | 0.0036 | 10.365 | 2.19E-17 | 0.0304 | 0.0448 | BEDS (X2) | 4.9237 | 1.9647 | 2.506 | .0139 | 1.0244 | 8.8231 | BATHS (X3) | -2.9115 | 3.0240 | -0.963 | .3380 | -8.9132 | 3.0902 | HEAT (X4) | -12.9097 | 6.1009 | -2.116 | .0369 | -25.0184 | -0.8010 | STYLE (X5) | 2.2877 | 1.6437 | 1.392 | .1672 | -0.9746 | 5.5501 | GARAGE (X6) | 15.7593 | 3.8246 | 4.121 | .0001 | 8.1686 | 23.3501 | BASEMENT (X7) | 9.0772 | 3.4454 | 2.635 | .0098 | 2.2390 | 15.9154 | AGE (X8) | -1.0342 | 0.2813 | -3.676 | .0004 | -1.5925 | -0.4758 | FIRE (X9) | 5.3054 | 3.9794 | 1.333 | .1856 | -2.5927 | 13.2035 | SCHOOL (X10) | 4.6217 | 2.5341 | 1.824 | .0713 | -0.4079 | 9.6513 | Appendix II MegaStat Analysis output with 6 independent variables (X) Regression Analysis | | | | | | | R | 0.808 | | | Adjusted R | 0.797 | n |R | 0.899 | k |6 | Std. Error | 11.922 | | | | | | | | | | | 108 | | | | Dep. Var. | | | | | | | | | | PRICE (Y) |

ANOVA table | | | | | | | Source | SS | df | MS | F | p-value | | Regression | 60,387.1261 | 6 | 10,064.5210 | 70.82 | 5.79E-34 | | Residual | 14,354.5156 | 101 | 142.1239 | | | | Total | 74,741.6417 | 107 | | | | | | | | | | | | | | | | | | | Regression output | | | | | confidence interval | variables | coefficients | std. error | t (df=101) | p-value | 95% lower | 95% upper | Intercept | -12.5988 | 9.4172 | -1.338 | .1839 | -31.2799 | 6.0823 | SQ FT (X1) | 0.0383 | 0.0032 | 11.976 | 4.25E-21 | 0.0319 | 0.0446 | BEDS (X2) | 4.3573 | 1.9124 | 2.278 | .0248 | 0.5636 | 8.1510 | HEAT (X3) | -14.5371 | 6.1467 | -2.365 | .0199 | -26.7305 | -2.3437 | GARAGE (X4) | 16.0610 | 3.9271 | 4.090 | .0001 | 8.2706 | 23.8513 | BASEMENT (X5) | 11.3576 | 3.2806 | 3.462 | .0008 | 4.8497 | 17.8655 | AGE (X6) | -1.2168 | 0.2810 | -4.331 | 3.51E-05 | -1.7742 |0.6595 | References: Bruce L. Bowerman, R. T. (2010). Essentials of Business Statistics 3rd Edition. New York: McGraw-Hill Irwin. Commissions, R. E. (2011). Real Estate Sales Commissions Rates - North American Real Estate. Retrieved April 14, 2011, from Accelerated Real Estate House Sales: http://www.oneifbyland.com/salescommission.htm

Você também pode gostar