Você está na página 1de 3

AJ Arena

HW 9
Due 11/07/2014
The file eBayLogistic.csv contains information on 1972 auctions transacted on eBay.com during
May-June 2004. The goal is to use these data to build a model that will classify competitive
auctions from noncompetitive ones. A competitive auction is defined as an auction with at least
one bid placed on the item auctioned. Details of predictors and response are as follows.
SellerRating
Duration
ClosePrice
OpenPrice
currencyUS
currencyGBP
competitive

A rating by eBay
Number of days the auction lasted
Price item sold (in USD)
Initial price set by the seller
Does the Auction use US currency?
currencyUS = 1 (yes), currencyUS = 0 (no)
Does the Auction use GBP currency?
currencyGBP = 1 (yes), currencyGBP = 0 (no)
Whether or not the auction is competitive? 1 = competitive (yes),
0 = noncompetitive (no)

The goal is to predict whether or not the auction will be competitive using the logistic regression.
(a) Use stepwise selection to find the model that optimizes AIC criterion. Report the results
from R. What predictors are eliminated from the model?
Stepwise Model Path
Analysis of Deviance Table
Initial Model:
competitive ~ sellerRating + Duration + ClosePrice + OpenPrice +
currencyUS + currencyGBP
Final Model:
competitive ~ sellerRating + ClosePrice + OpenPrice + currencyUS +
currencyGBP
Step Df
Deviance Resid. Df Resid. Dev
AIC
1
1965
2106.264 2120.264
2 - Duration 1 0.07262561
1966
2106.337 2118.337

Duration was eliminated


(b) Use subset selection to find the model that optimizes AIC criterion. Report the results
from R. What predictors are eliminated from the model?
AIC
BICq equivalent for q in (0.744755471705781, 0.977181236157727)
Best Model:
Estimate
Std. Error
z value
Pr(>|z|)
(Intercept) -0.7004119364 1.173723e-01 -5.967438 2.410073e-09
sellerRating -0.0000245737 1.076549e-05 -2.282636 2.245180e-02
ClosePrice
0.1092010701 8.432643e-03 12.949804 2.355559e-38
OpenPrice
-0.1317106637 9.484008e-03 -13.887659 7.525521e-44
currencyUS
0.5808800026 1.301518e-01
4.463097 8.078364e-06
currencyGBP
1.1163934380 2.160161e-01
5.168103 2.364818e-07

AJ Arena
> #show results for the top two models
> out$BestModels
sellerRating Duration ClosePrice OpenPrice currencyUS currencyGBP Criter
ion
1
TRUE
FALSE
TRUE
TRUE
TRUE
TRUE 2116.
337
2
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE 2118.
264

Duration were eliminated


Currency EUR is eliminated
(c) Write the estimated equation from part (b).
Y = -0.7004119364 0.0000245737* sellerRating+ 0.1092010701*ClosePrice
- 0.1317106637*OpenPrice+ 0.5808800026* currencyUS + 1.1163934380*currencyGBP
(d) From the model from part (b), predict P(Y = 1), probability that the auction is
competitive (yes), when for the new record that uses US currency, SellerRating = 1000,
OpenPrice = $1.5, ClosePrice = $2, and Duration = 5 days? What classes would you
classify this new record?
Using the model from part (b) I did not have a duration predictor.
P=

p= 0.4692215919460304315877683175823440909215072669673855
probability is about 46.922%
I would classify the new record as 0 not competitive
(e) From the model from part (b), interpret the meaning of the coefficients for closing price,
and currencyGBP.

If closing price increases 1 unit, then odds changes (e^(0.1092010701) -1)(100)% (holding all
other predictors constant.)
If currencyGBP increases 1 unit, then odds changes (e^(1.1163934380) -1)(100)% (holding all
other predictors constant.)

(f)

AJ Arena

(g) Use subset selection to find the model that yields the smallest 10-fold cross validation
error rate. Report the results from R. What predictors are eliminated from the model?
CV(K = 10, REP = 1)
BICq equivalent for q in (0, 0.0111867243652535)
Best Model:
Estimate Std. Error
z value
Pr(>|z|)
(Intercept) -0.3002928 0.064634959 -4.645982 3.384632e-06
ClosePrice
0.1037808 0.008146831 12.738796 3.599078e-37
OpenPrice
-0.1247768 0.009099840 -13.711976 8.608147e-43

sellerRating + Duration +currencyUS + currencyGBP were all eliminated

(h) Compare the models in part (a), (b) and (f) in terms of the predictors in the models.
Models (a) and (b) both only eliminated predictor Duration, while model (f) eliminated all
predictors but closeprice and openprice

Você também pode gostar