Escolar Documentos
Profissional Documentos
Cultura Documentos
BY GROUP NO. 5
AKSHAY RAM (1111004) ARUN PRABU (1111010) BHARTI VISHAL (1111016) DHANASHREE VINAYAK SHIRODKAR (1111022) GHULE NILESH VISHNU (1111028) AMOL DEVNATH KUMBHARE (1111034) MUDAVATH SWETHA (1111040) SUPREET KUMAR(1111046) RAJA SIMON J (1111052) SAGAR BEHERA (1111058) SHREYA SETHI (1111065) SWATI MURARKA (1111071) AJUSAL SUGATHAN (1111077)
Table of Contents
S.No Particulars Pages 3-4 4 5-13 5-8 6 7 8 9-13 11-13 13 14 15
1. 2. 3.
4. 5.
Executive Summary
Reyem Affiar has recently found the below described condominium in Mid-Cambridge that he wants to purchase. Street Address Last Price Area & Area Code Bed Bath Rooms Interior Condo Tax RC : 236 Ellery Street : $169000 : M/9 :2 :1 :5 : 1040 : $175 : $1121 : 1(Restrictions on monthly rent that owner may charge)
Even though Affiar is monetarily capable of paying the asking price of $169000, generally negotiations from buyers agent keeps the selling price lower than the last asking price. Given the above information, based on the data that Reyem Affiar has on condominiums sold in Cambridge the past five years, we need to help Reyem Affiar to decide on a fair offer price.
Solution Approach
An estimate for selling price of the above condominium needs to be made. Hence selling price is clearly the dependent variable Y for the regression model. Clearly first date, close date and number of days between the two (Days) cannot be part of the independent variable set since we do not have these information for the 236 Ellery Steet Condominium yet (since the sale has not taken place yet). Further the condominium of interest lies in area M (9), hence one could possibly analyze only the data on the 111 condominiums from the same area and ignore the rest. On the other hand, if we can set up independent dummy variables for the area/area codes, these can be incorporated into our regression model and then we will have a bigger sample of 456 data-points to make a better and more accurate prediction for Affiar. This will be explained in detail in the model description. Stepwise regression in SPSS has been adopted for variable selection. This method, being a combination of forward selection
and backward elimination techniques for variable selection, avoids the errors in regression model that can be committed due to multi-collinearity.
condominium, hence we remove first price from our possible independent variable list. As stated before in section 1.1, we cannot have number of days between first and last date as an independent variable either since the sale of condominium has not happened and we dont have information on the first date the condominium was put on sale. Finally, we can intuitively see that there will be a positive correlation between interior space and number of rooms, bathrooms and bedrooms. Since interior space can be representative of all, to avoid the issue of multi-collinearity, interior space can very well act as a good proxy in our regression model for number of rooms, bathrooms and bedrooms. We will also show this through the output generated in the model description section. Further, one can also expect last price
and interior space to have positive coefficients while condominium taxes, property taxes and RC to have negative coefficients. Effect of the other dummy variables for area/area codes need to be explored by running the regression model.
= 156757.758 t[0.025,(456-10)](30268.701252 + 9.162 * 108)0.5 = 156757.758 1.9653 *(30268.701252 + 9.162 * 108)0.5 = 156757.758 84127.57 = {72630.188, 240885.328}
The standard error and MSE are taken from the regression output table (Appendix). Now, a 95% Confidence Interval for the Selling Price (conditional mean) of 236 Ellery Street Condominium would be given by:
= 156757.758 7902.471 = {148855.29, 164660.23} The standard error of mean predicted value is taken from the Residual Statistics table (Appendix). Exhibit 1: Regression Model Coefficients
Let us check if the models regression assumptions are satisfied through Residual Analysis: From the normality histogram for residuals shown in the figure below, it is clear that the normality assumption is satisfied since the residuals (standardized) seem to be normally distributed. The normal P-P graph also confirms the same. Lastly homoscedasticity can be seen from the residual scatter plot where the residuals are scattered around the mean 0 in a random fashion with no observable pattern or heteroscedasticity. Finally the independence assumption between the independent variables is
inherently taken care of in the step-wise regression technique which checks for multi-collinearity after each stage (as shown in Figure 1) with a Pin = 0.05 and Pout = 0.10. Hence the algorithm automatically kicks out of the model variables that are correlated to each other and keeps only the most significant independent variables inside the model. independent variable is shown in Appendix. The individual residual plots of residual error Vs each
Model 2:
In Model 1, we have clearly accounted for the areas/area codes of condominiums by starting with the 15 dummy variables for our step-wise regression analysis. One could very well argue that condominiums outside of Mid-Cambridge should not be considered for analysis. Hence step-wise regression was run with only the 111 data points from Mid-Cambridge condominiums. The step-wise regression was started with the input independent variables including Last Price, Bed, Bath, Rooms, Interior, Condo, Tax and RC. But Last Price and RC were the only independent variables that seem to have a significant impact on the Selling Price. The step-wise regression with a Pin = 0.05 and Pout = 0.10 was carried out, as we can see from Appendix, Last Price and RC were the only independent variables with a significant impact (based on step-wise partial F-test) on Selling Price. The model can be summarized as below:
For the Ellery Street condominium, we have: Selling Price = 0.96 * 169000 + 1935.903 * 1 2181.178 = $161,994.725
Similar to model 1, 95% prediction interval for the Selling price of 236 Ellery Street Condominium is given by :
= 161,994.725t[0.025,(111-3)](4422.9452 + 1.956 * 107)0.5 = 161,994.725 1.98217 *(4422.9452 + 1.956 * 107)0.5 = 161,994.725 12398.064 = {149596.661, 174392.7892} The standard error and MSE are taken from the regression output table (Appendix).
Now, a 95% Confidence Interval for the Selling Price (conditional mean) of 236 Ellery Street Condominium would be given by:
= 161,994.725t[0.025,(111-3)](698.994) = 161,994.725 1.98217 *(698.994) = 161,994.7251385.525 = {160609.2,163380.25} The standard error of mean predicted value is taken from the Residual Statistics table (Appendix).
As explained for model 1, there is more uncertainty about the predicted value than there is about the average value of Y given the values of Xi. Based on the confidence interval, the recommendation for Affiar would be to not bid more than the upper limit value of $163,380 since he can be confident to a level of 97.5% (100% 5%/2) that the final selling price (mean) of the condominium would be below this number. So $163,380 is the maximum that he should bid on the condominium. If he were to be more conservative in his bid, then he can go by the prediction interval. Since the upper limit of the prediction interval $174,393 is greater than the asking price of $169000, his bid should be $169,000 in this case. The maximum he can afford to bid for the house with a 95% confidence level would be $174,393.
Coefficients
Standardized Unstandardized Coefficients Model 1 (Constant) LastPrice 2 (Constant) LastPrice RC B -544.824 .958 -2181.178 .960 1935.903 Std. Error 1357.461 .008 1541.383 .008 909.479 .998 .017 .996 Coefficients Beta t -.401 123.128 -1.415 124.529 2.129 Sig. .689 .000 .160 .000 .036
Let us check if the models regression assumptions are satisfied through Residual Analysis: From the normality histogram for residuals shown in the figure below, it is clear that the normality assumption is satisfied since the residuals (standardized) seem to be normally distributed. The normal P-P graph also confirms the same. Lastly homoscedasticity can be seen from the residual scatter plot where the residuals are scattered around the mean 0 in a random fashion with no observable pattern or heteroscedasticity. Finally the independence assumption between the independent variables is
inherently taken care of in the step-wise regression technique which checks for multi-collinearity after each stage (as shown in Figure 1) with a Pin = 0.05 and Pout = 0.10. Hence the algorithm automatically kicks out of the model variables that are correlated to each other and keeps only the most significant independent variables inside the model. independent variable is shown in Appendix. The step-wise regression method adopted works the same way as it was explained for model-1. Here only 2 iterations were required to arrive at the final model as shown in Appendix. The individual residual plots of residual error Vs each
Other Models:
In addition to the above 2 best-fit models, a number of other regression models with different combinations of input independent variables were tried. For instance, areas based on location (with the help of the map provided) were grouped to form lesser number of dummy variables (e.g., grouping Agassiz, Harvard Square and Radcliffe). Multiple such combinations were formed to see how area can be best-fit into the model. Rooms was tried as proxy for interior (due to their high correlation as seen in Appendix). Best fit test for each model based on R2 values, significance of coefficients, residual plots was conducted and the best 2 models have been presented in the case solution. Also in each model, the given price for the Ellery street condominium has been assumed as the Last Price as stated before.
Recommend Mean Selling Price ($) Prediction Interval ($) Confidence Interval ($) ed bid price ($)
Model
156757.758
{72630.188,240885.328}
{148855.29,164660.23}
164,660
240,885
1
Model
161,994.725
{149596.661,174392.789}
{160609.2,163380.25}
163,380
174,393
Comparing the Adjusted R2 values of the two models, we see that Model 2 is able to explain 99.3% of variation in Sale price against Model 1s 88.6%. Hence one might be tempted to use Model 2. But on a closer look at the independent variables in model 2, Last Price and RC are the only independent variables used. In this case there is not a large difference between the recommended prices for Affiar using model 1 or model 2, but in reality buyer cant base his/her offer just by the sellers stated Last price. Obviously a number of other factors like interior space, tax, apartment maintenance fee, area, etc., need to be considered. From the given data, model 1 has made a comprehensive attempt to form the best possible regression fit by use of maximum data points. Hence the recommendation would be to go by model 1, but in this specific case of the Ellery Street house, since the variation for the predicted selling price from the two models is not much, it is left to Affiar to either make an initial offer of $164,660 or $163,380.
Appendix Model 1
Variables Entered/Removed
Model Variables Entered Last Price Variables Removed . Method Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-toremove >= .100). Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-toremove >= .100). Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-toremove >= .100). Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-toremove >= .100). Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-toremove >= .100). Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-toremove >= .100). Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-toremove >= .100). Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-toremove >= .100). Stepwise (Criteria: Probability-of-F-to-enter <= .050, Probability-of-F-toremove >= .100).
Tax
Interior
Condo
A12
A5
RC
A16
A2
Model Summary
Change Statistics Model R R Square Adjusted R Std. Error of Square the Estimate R Square Change 1 2 3 4 5 6 7 8 9 .872 .925
a b c
F Change 1437.412 300.700 33.258 31.979 20.174 14.420 9.167 6.037 5.018
df1 1 1 1 1 1 1 1 1 1
df2 454 453 452 451 450 449 448 447 446
Sig. F Change .000 .000 .000 .000 .000 .000 .003 .014 .026
d e f
g h i
.943
a. Predictors: (Constant), Last Price b. Predictors: (Constant), Last Price, Tax c. Predictors: (Constant), Last Price, Tax, Interior d. Predictors: (Constant), Last Price, Tax, Interior, Condo e. Predictors: (Constant), Last Price, Tax, Interior, Condo, A12 f. Predictors: (Constant), Last Price, Tax, Interior, Condo, A12, A5 g. Predictors: (Constant), Last Price, Tax, Interior, Condo, A12, A5, RC h. Predictors: (Constant), Last Price, Tax, Interior, Condo, A12, A5, RC, A16 i. Predictors: (Constant), Last Price, Tax, Interior, Condo, A12, A5, RC, A16, A2 j. Dependent Variable: Sale Price
ANOVA
Model Regression 1 Residual Total Regression 2 Residual Total Regression 3 Residual Total Regression 4 Residual Total Regression 5 Residual Total Regression 6 Residual Total Regression 7 Residual Total Regression 8 Residual Total Regression 9 Residual Total Sum of Squares 2.791E12 8.816E11 3.673E12 3.143E12 5.299E11 3.673E12 3.179E12 4.935E11 3.673E12 3.212E12 4.609E11 3.673E12 3.232E12 4.411E11 3.673E12 3.245E12 4.274E11 3.673E12 3.254E12 4.188E11 3.673E12 3.260E12 4.132E11 3.673E12 3.264E12 4.086E11 3.673E12 df 1 454 455 2 453 455 3 452 455 4 451 455 5 450 455 6 449 455 7 448 455 8 447 455 9 446 455 3.627E11 9.162E8 395.860 .000
i
F 1.437E3
Sig. .000
a
1.571E12 1.170E9
1.343E3
.000
1.060E12 1.092E9
970.531
.000
8.030E11 1.022E9
785.781
.000
6.463E11 9.802E8
659.386
.000
5.409E11 9.518E8
568.279
.000
4.649E11 9.348E8
497.265
.000
4.074E11 9.244E8
440.754
.000
a. Predictors: (Constant), LastPrice b. Predictors: (Constant), LastPrice, Tax c. Predictors: (Constant), LastPrice, Tax, Interior
d. Predictors: (Constant), LastPrice, Tax, Interior, Condo e. Predictors: (Constant), LastPrice, Tax, Interior, Condo, A12 f. Predictors: (Constant), LastPrice, Tax, Interior, Condo, A12, A5 g. Predictors: (Constant), LastPrice, Tax, Interior, Condo, A12, A5, RC h. Predictors: (Constant), LastPrice, Tax, Interior, Condo, A12, A5, RC, A16 i. Predictors: (Constant), LastPrice, Tax, Interior, Condo, A12, A5, RC, A16, A2 j. Dependent Variable: SalePrice
Coefficients
Unstandardized Coefficients Model B 1 Std. Error Beta 9.587 .872 37.913 7.102 .504 18.154 .481 17.341 .625 .466 16.921 .434 15.516 .127 5.767 -1.748 .424 15.336 .335 10.350 .167 .148 7.448 5.655 -1.381 .442 16.136 .321 10.104 .165 .143 7.506 5.554 Standardized Coefficients t Sig. Lower Bound Upper Bound Tolerance VIF 95% Confidence Interval for B Collinearity Statistics
.000 16804.307 29662.951 .000 .000 .371 42.158 .461 52.935 .414 .414 2.416 2.416
(Constant) 2954.638 4728.282 LastPrice Tax Interior .385 42.937 33.058 .023 2.767 5.732
.532 -6337.506 12246.782 .000 .000 .000 .341 37.499 21.793 .430 48.375 44.323 1089.051 .396 39.351 55.047 165.805 2883.192 .410 37.935 54.256 160.414 .356 .264 .552 .404 2.809 3.788 1.811 2.472 .363 .266 .552 .405 2.753 3.755 1.810 2.467 .391 .379 .614 2.555 2.636 1.628
(Constant) -8782.305 5022.983 LastPrice Tax Interior Condo .351 33.071 43.555 123.044 .023 3.195 5.848 21.759
.081 -18653.660 .000 .000 .000 .000 .306 26.792 32.062 80.284
(Constant) -6822.788 4938.803 LastPrice Tax Interior Condo .365 31.758 42.998 118.486 .023 3.143 5.729 21.334
.168 -16528.769 .000 .000 .000 .000 .321 25.581 31.740 76.559
A12
37401.549
8327.091
-.074
-4.492 -.819
.000 -53766.363 -21036.736 .413 -13704.814 .000 .000 .000 .000 .303 26.752 32.324 57.171 5641.006 .393 38.979 54.516 142.078
.974
1.026
(Constant) -4031.904 4921.946 LastPrice Tax Interior Condo A12 .348 32.865 43.420 99.625 35648.270 A5 .023 3.111 5.646 21.602 8218.611
.421 15.299 .332 10.564 .167 .120 -.071 .068 7.690 4.612 -4.338 3.797 -2.408 .414 15.101 .349 11.018 .170 .126 -.058 .084 .056 7.907 4.888 -3.434 4.512 3.028 -2.483 .407 14.883 .361 11.325 .170 .126 -.058 .081 .062 -.040 7.960 4.900 -3.461 4.413 3.325 -2.457
.000 -51799.991 -19496.550 .000 11743.104 36935.714 .016 -25993.587 -2632.776 .000 .000 .000 .000 .297 28.316 33.293 62.774 .386 40.610 55.317 147.207
(Constant)
A5 RC 8 (Constant)
29784.464 6601.659 10435.164 3446.591 14679.636 5912.150 .023 3.147 5.572 21.362 8391.027
.013 -26298.698 -3060.575 .000 .000 .000 .000 .292 29.457 33.404 62.691 .381 41.827 55.304 146.656 .336 .248 .550 .380 .905 .741 .729 .951 2.974 4.027 1.817 2.628 1.105 1.350 1.373 1.052
A5 RC A16
(Constant)
15967.736
-2.700 .403 14.763 .364 11.462 .173 .127 -.056 .084 .059 -.037 .036 8.097 4.942 -3.345 4.548 3.190 -2.271 2.240
.007 -27590.071 -4345.402 .000 .000 .000 .000 .289 29.783 34.052 63.311 .377 42.110 55.882 146.906 .335 .248 .549 .380 .902 .738 .726 .944 .967 2.988 4.035 1.821 2.629 1.108 1.354 1.378 1.059 1.034
A5 RC A16
A2
12290.704 5486.742
Residuals Statistics
Minimum Predicted Value Std. Predicted Value Standard Error of Predicted Value Adjusted Predicted Value Residual Std. Residual Stud. Residual Deleted Residual Stud. Deleted Residual Mahal. Distance Cook's Distance Centered Leverage Value a. Dependent Variable: SalePrice 2.1894E4 -1.761 1971.030 1.6813E4 -3.59573E5 -11.879 -20.352 -1.05539E6 -76.135 .932 .000 .002 Maximum 7.3736E5 6.686 2.458E4 1.1794E6 1.37644E5 4.547 4.861 Mean 1.7108E5 .000 4.021E3 1.7253E5 .00000 .000 -.017 Std. Deviation 84699.37571 1.000 1982.252 95574.81320 29967.84529 .990 1.268 55783.52632 3.664 16.348 3.753 .036 N 456 456 456 456 456 456 456 456 456 456 456 456
1.57295E5 -1.45182E3 4.990 298.983 80.153 .657 -.139 8.980 .179 .020
Intercept Rooms
Standard Error t Stat P-value 42.08789622 -1.82366 0.068861 8.999672999 26.21065 6.77E-93
Interior
Residuals
2000 1000 0 0
-1000
5 Rooms
10