Você está na página 1de 15

Assignment 3: Multiple Regression

This data set consists of a sample of over eight hundred used cars in this country. The retail price of
these cars was calculated from the tables provided by the association of car manufacturer. You are
provided with a data set containing the following variables:

Price: suggested retail price of the used car in excellent condition. The condition of a car can
greatly affect price. All cars in this data set were less than one year old when priced and
considered to be in excellent condition.
Mileage: number of miles the car has been driven
Make: manufacturer of the car.
Model: specific models for each car manufacturer.
Trim (of car): specific type of car model such as SE Sedan 4D, Quad Coupe 2D
Type: body type such as sedan, coupe, etc.
Cylinder: number of cylinders in the engine
Liter: a more specific measure of engine size
Doors: number of doors
Cruise: indicator variable representing whether the car has cruise control (1 = cruise)
Sound: indicator variable representing whether the car has upgraded speakers (1 = upgraded)
Leather: indicator variable representing whether the car has leather seats (1 = leather)

Perform the following tasks on this data set:


1.

Use simple linear regression to explore the intuitive relationship between miles traveled
and retail price.
From the simple regression results, answer the following questions:
a. In general, what happens to price when there is one more mile on the car?
b. Does mileage help you predict price? What does the p-value tell you?
c. Does mileage help you predict price? What does the R-Sq value tell you?
Answers

Variables Entered/Removeda

Model

Variables

Variables

Entered

Removed

Mileageb

Method
. Enter

a. Dependent Variable: Price


b. All requested variables entered.
Coefficientsa
Standardized
Unstandardized Coefficients
Model
1

B
(Constant)

Std. Error

24764.559

904.363

-.173

.042

Mileage

Coefficients
Beta

-.143

Sig.

27.383

.000

-4.093

.000

a. Dependent Variable: Price

a. The price will be reduced by 1.73 cents with each added mile on the car.

Coefficientsa
Standardized
Unstandardized Coefficients
Model
1

B
(Constant)

Std. Error

24764.559

904.363

-.173

.042

Mileage

Coefficients
Beta

-.143

Sig.

27.383

.000

-4.093

.000

a. Dependent Variable: Price

b. Yes.p-value explains that the relationship between mileage and price is negatively
significant corelated.
It is significant but in negative direction.
Model Summary

Model

R Square

.143a

Adjusted R

Std. Error of the

Square

Estimate

.020

.019

9789.288

a. Predictors: (Constant), Mileage

c. Yes. R-Sq(R2) is the correlation coefficient squared(.1432 = .020) referred to as the


coefficent of determination. This values indicates the percentage of total variation of
Y ( Price) explained by the regression model consisting of miles.
Only 2% can be influenced by mileage and the rest (98%) by other factors.

2.

Taking price as the dependent variable, perform stepwise multiple regression on this
data set.
What is your final model? How many variable/variables was/were dropped from the
model. Explain why?
Variables Entered/Removeda

Model

Variables

Variables

Entered

Removed

Method
Stepwise
(Criteria:
Probability-of-F-

Cylinder

. to-enter <= .050,


Probability-of-Fto-remove >= .
100).

Stepwise
(Criteria:
Probability-of-FCruise

. to-enter <= .050,


Probability-of-Fto-remove >= .
100).

Stepwise
(Criteria:
Probability-of-FLeather

. to-enter <= .050,


Probability-of-Fto-remove >= .
100).

Stepwise
(Criteria:
Probability-of-FMileage

. to-enter <= .050,


Probability-of-Fto-remove >= .
100).

Stepwise
(Criteria:
Probability-of-FDoors

. to-enter <= .050,


Probability-of-Fto-remove >= .
100).

Stepwise
(Criteria:
Probability-of-FSound

. to-enter <= .050,


Probability-of-Fto-remove >= .
100).

a. Dependent Variable: Price

Coefficientsa
Standardized
Unstandardized Coefficients
Model
1

B
(Constant)

-17.057

Std. Error
1126.944

Coefficients
Beta

Sig.
-.015

.988

Cylinder
2

4054.203

206.852

19.600

.000

-1046.431

1082.655

-.967

.334

Cylinder

3392.587

211.273

.476

16.058

.000

Cruise

6000.366

678.841

.262

8.839

.000

-2978.398

1129.554

-2.637

.009

Cylinder

3276.233

209.189

.460

15.662

.000

Cruise

6362.343

671.901

.278

9.469

.000

Leather

3139.484

608.259

.142

5.161

.000

412.562

1296.815

.318

.750

Cylinder

3232.656

206.188

.454

15.678

.000

Cruise

6492.035

662.181

.284

9.804

.000

Leather

3161.569

599.032

.143

5.278

.000

Mileage

-.165

.032

-.137

-5.087

.000

(Constant)

5530.335

1709.446

3.235

.001

Cylinder

3257.643

203.798

.457

15.985

.000

Cruise

6319.636

655.373

.276

9.643

.000

Leather

2978.887

593.246

.135

5.021

.000

Mileage

-.167

.032

-.139

-5.214

.000

Doors

-1402.112

310.015

-.121

-4.523

.000

(Constant)

7323.164

1770.837

4.135

.000

Cylinder

3200.125

202.983

.449

15.765

.000

Cruise

6205.511

651.463

.271

9.525

.000

Leather

3327.143

597.114

.151

5.572

.000

Mileage

-.171

.032

-.141

-5.352

.000

Doors

-1463.399

308.274

-.126

-4.747

.000

Sound

-2024.401

570.718

-.096

-3.547

.000

(Constant)

(Constant)

(Constant)

.569

a. Dependent Variable: Price

Model Summaryg

Model
1
2

R Square

Adjusted R

Std. Error of the

Square

Estimate

.324

.323

8133.162

.384

.382

7768.193

.569
.620

Durbin-Watson

.635c

.404

.402

7646.769

.423

.420

7530.569

.437

.433

7440.529

.446

.442

7387.114

.650

.661

.668

.304

a. Predictors: (Constant), Cylinder


b. Predictors: (Constant), Cylinder, Cruise
c. Predictors: (Constant), Cylinder, Cruise, Leather
d. Predictors: (Constant), Cylinder, Cruise, Leather, Mileage
e. Predictors: (Constant), Cylinder, Cruise, Leather, Mileage, Doors
f. Predictors: (Constant), Cylinder, Cruise, Leather, Mileage, Doors, Sound
g. Dependent Variable: Price

In the Model Summary, we can see that litre is deleted.

Excluded Variablesa
Collinearity

Model
1

Beta In

.000

-.154

.999

.158

1.563

.118

.055

.082

-.140b

-4.890

.000

-.170

1.000

Cruise

.262

8.839

.000

.298

.874

Sound

-.074b

-2.543

.011

-.090

.992

3.981

.000

.139

.994

-.136

-4.966

.000

-.173

.998

.037c

.383

.702

.014

.081

-4.655

.000

-.162

.997

-2.094

.037

-.074

.988

Leather

.142

5.161

.000

.180

.983

Mileage

-.137d

-5.087

.000

-.177

.998

.037

.970

.001

.080

-4.377

.000

-.153

.993

-3.048

.002

-.107

.960

.017e

.180

.858

.006

.080

-4.523

.000

-.158

.992

-3.243

.001

-.114

.959

Liter

-.108

-1.108

.268

-.039

.074

Sound

-.096f

-3.547

.000

-.125

.956

-.908

.364

-.032

.074

Mileage
Liter
Doors
Sound

Liter
Doors
Sound
Liter
Doors
Sound
5

Tolerance

-4.401

Leather

Correlation

Doors

Sig.

Statistics

-.126b

Mileage
Liter

Partial

Liter

.115

-.128
-.058

.004
-.119

-.084

-.121
-.088

-.088

a. Dependent Variable: Price


b. Predictors in the Model: (Constant), Cylinder

c. Predictors in the Model: (Constant), Cylinder, Cruise


d. Predictors in the Model: (Constant), Cylinder, Cruise, Leather
e. Predictors in the Model: (Constant), Cylinder, Cruise, Leather, Mileage
f. Predictors in the Model: (Constant), Cylinder, Cruise, Leather, Mileage, Doors
g. Predictors in the Model: (Constant), Cylinder, Cruise, Leather, Mileage, Doors, Sound

Litre is excluded where the P value is high. For each model ( 1 6 ) the p values are more
than 0.05 significant.
Only one variable is dropped. Because the P value of Litre is more than the significance
value; p < 0.05.

3.

Transform price to log price and take this new variable as your dependent variable.
Perform multiple regression by including variables in (2) as independent variables.
Discuss the results.
Variables Entered/Removeda

Model
1

Variables

Variables

Entered

Removed

Method

Leather,
Mileage, Doors,

. Enter

Cylinder, Sound,
Cruiseb
a. Dependent Variable: LgPrice
b. All requested variables entered.

Model Summaryb
Change Statistics

Std. Error
Mode
l
1

R
.695

Adjusted R

of the

R Square

Square

Square

Estimate

Change

Change

.484

.480

.12847

.484 124.410

df1

df2
6

797

Sig. F

Durbin-

Change

Watson

.000

a. Predictors: (Constant), Leather, Mileage, Doors, Cylinder, Sound, Cruise


b. Dependent Variable: LgPrice

After running the Log Price as the dependent variable, we can see that Litre is excluded
but cylinder is included.

.376

Coefficientsa
95.0%
Unstandardized

Standardized

Confidence

Coefficients

Coefficients

Interval for B

Std.
Model

1 (Constant)
Mileage

Correlations

Beta

.031

Sig. Bound Bound order Partial Part Tolerance

129.744 .000

3.935

.000

-.148

-5.786 .000

.000

.000 -.148

.057

.004

.440

16.018 .000

.050

.063

-.016

.005

-.077

-3.007 .003

-.027

.139

.011

.338

12.298 .000

.117

-.038

.010

-.099

-3.816 .000

-.057

.053

.010

.132

5.078 .000

.032

-.201

-.14
7

6
Cylinder
Doors

Cruise
Sound

Leather

.583

-.006 -.092
.162

.494

-.018 -.139
.073

.130

.493 .408
-.106

-.07
7

.399 .313
-.134

-.09
7

.177 .129

a. Dependent Variable: LgPrice

Excluded Variablesa
Collinearity

Model
1

Beta In

Correlation

Tolerance

-4.876

.000

-.170

1.000

Cylinder

2.170

.030

.076

.082

-.045b

-1.578

.115

-.056

.994

Cruise

.316

10.999

.000

.362

.857

Sound

-.101b

-3.561

.000

-.125

.996

2.777

.006

.098

.992

Mileage

-.147

-5.646

.000

-.196

.998

Cylinder

.243c

2.636

.009

.093

.082

-.039

-1.480

.139

-.052

.993

-.080

-3.017

.003

-.106

.990

Leather

.113

4.271

.000

.149

.980

Cylinder

.223d

2.463

.014

.087

.082

-.042

-1.610

.108

-.057

.993

-.084

-3.220

.001

-.113

.990

Leather

.114

4.393

.000

.154

.980

Cylinder

.236e

2.632

.009

.093

.082

Doors
Sound

Doors
Sound

Statistics

-.137b

Leather

Sig.

Partial

Mileage

Doors

.215

.079

VIF

4.056

3.206E-

Statistics

Lower Upper Zero-

Error

3.996

Collinearity

.997

1.003

.857

1.167

.989

1.011

.859

1.165

.956

1.046

.952

1.050

-.036e

-1.375

.169

-.049

.990

-4.060

.000

-.142

.963

.204f

2.284

.023

.081

.081

Doors

-.042

-1.642

.101

-.058

.986

Doors

-2.346

.019

-.083

.916

Doors
Sound
5

Cylinder

-.106

-.062

a. Dependent Variable: TrPrice


b. Predictors in the Model: (Constant), Liter
c. Predictors in the Model: (Constant), Liter, Cruise
d. Predictors in the Model: (Constant), Liter, Cruise, Mileage
e. Predictors in the Model: (Constant), Liter, Cruise, Mileage, Leather
f. Predictors in the Model: (Constant), Liter, Cruise, Mileage, Leather, Sound
g. Predictors in the Model: (Constant), Liter, Cruise, Mileage, Leather, Sound, Cylinder

Doors is excluded. Where the P values are more than 0.05 for each model.
Only one variable is dropped. Because the P value of Doors is more than the significance
value; p < 0.05.

4.

Since Type (Sedan, Hatchback, Convertible or Coupe) and Make (A,B,C,D,E or F) are
also criterias considered by many car buyers, perform another regression by considering
these two variables. Discuss the results.

A Dummy variable or Indicator Variable is an artificial variable created to represent an


attribute with two or more distinct categories/levels. Regression analysis treats all
independent (X) variables in the analysis as numerical. Numerical variables are interval or
ratio scale variables whose values are directly comparable. For multiple regression analysis,
all but one of the dummy variables is entered as independent variables for each of the original
categorical variables. With dummy variables, the regression coefficients indicate the
difference in the dependent variable between the category specified by the dummy variable
and the category omitted from the analysis.

After Type and Make are changed into dummy variables. The data is analysed.

Descriptive Statistics
Mean

Std. Deviation

LgPrice

4.2904

.17811

804

Mileage

19831.93

8196.320

804

Cylinder

5.27

1.388

804

Doors

3.53

.850

804

Cruise

.75

.432

804

Sound

.68

.467

804

Leather

.72

.447

804

.10

.300

804

.10

.300

804

.40

.490

804

.14

.349

804

.07

.263

804

Sedan

.42

.494

804

Convertible

.06

.242

804

Hatchback

.05

.218

804

Coupe

.17

.379

804

Variables Entered/Removeda

Model
1

Variables

Variables

Entered

Removed

Method

Coupe, Mileage,
Cruise, Leather,
Convertible,
Sound,
Hatchback, E, A,
B, D, C,
Cylinder, Sedanb

a. Dependent Variable: LgPrice


b. Tolerance = .000 limit reached.

. Enter

Model Summaryb
Change Statistics

Std. Error
Mod
el

R
a

.960

Adjusted R

of the

R Square

Square

Square

Estimate

Change

Change

.922

.921

.05008

df1

.922 669.112

df2

14

Sig. F

Durbin-

Change

Watson

789

.000

.274

a. Predictors: (Constant), Coupe, Mileage, Cruise, Leather, Convertible, Sound, Hatchback, E, A, B, D, C, Cylinder,
Sedan
b. Dependent Variable: LgPrice

Coefficientsa
95.0%
Unstandardized Standardized
Coefficients

Confidence

Coefficients

Interval for B

Std.
Model

Error

1 (Constant)

3.903

.013

Cylinder

.072

.002

Cruise

.010

Sound

Collinearity
Correlations

Statistics

Lower Upper ZeroBeta

Sig. Bound Bound order Partial Part Tolerance

VIF

306.064 .000

3.877

3.928

.560

36.209 .000

.068

.076

.583

.790 .359

.412

2.429

.005

.024

1.963 .050

.000

.020

.494

.070 .019

.663

1.507

.002

.004

.004

.425 .671

-.006

.010 -.139

.015 .004

.884

1.131

Leather

.017

.004

.042

3.927 .000

.008

.025

.130

.138 .039

.845

1.183

.067

.009

.113

7.316 .000

.049

.085

.044

.252 .073

.414

2.416

.213

.010

.359

21.339 .000

.194

.233

.580

.605 .212

.348

2.870

-.007

.006

-.019

-1.167 .243

-.018

.377

2.654

.310

.008

.608

38.839 .000

.294

.008

.009

.012

.896 .371

-.010

-.032

.006

-.089

-5.275 .000

-.044

-.020

.029

.113

.009

.154

13.072 .000

.096

.130

.440

-.083

.010

-.102

-8.737 .000

-.102

-.065 -.263

.002

.006

.005

.364 .716

-.010

.014 -.178

Sedan

Convertible
Hatchback

Coupe

a. Dependent Variable: LgPrice

.005 -.467
.326

-.042

-.01
2

.402

.810 .385

.402

2.486

.026 -.237

.032 .009

.561

1.781

.349

2.862

.712

1.404

.727

1.375

.613

1.630

-.185

-.05
2

.422 .130
-.297

-.08
7

.013 .004

Coefficient Correlationsa
Cou
Model

pe

1 Correla Coupe

1.00

tions

0
Mileag
e
Cruise

Leathe
r
Conve
rtible
Sound

.021
-.07

ack
E

Cylind
er
Sedan

Covari
ances

Coupe

Hatch

age

back

se

her

-.07

-.07

1.00

-.01

-.02

.021

-.01 1.00

-.07

-.02

0
.102

rtible

.102
1.00
0

.235 .007 .028 .030


-.06
6

Hatchb

Mile Crui Leat Conve Sou

.023 .005

.337 .038 .106

-.14
2
-.01
5

-.30

-.04

-.21

-.06

-.13

-.20

-.01

-.04

-.14

-.04

-.32

-.10

.009
-.30

-.03

.128 .017

.073 .079

.022

6E-5

-.08
7

-.33

-.06

.513 .051 .041


3.53 2.76

.095

-.02
0
-

.235

nd
-.06
6

.337

Cylin Seda
E

-.30

-.21

-.20

-.04

-.06

-.01

-.04

-.03

-.13

-.04

-.32

-.14

-.10

-.08

-.06

.007 .023

.038

.028 .005

.106 .073

.030

1.000

-.033

-.14
2
-.03
3
1.00
0

-.015 .079 .095

.167

1.000

-.204 .115

-.145

-.211

-.01
4

.374
1.210

-.06
2
-

6E- 2.18 1.81

E-5 1.57

11 4E-6 4E-6

6E-6

.017

.051

-.33
4

-.01
4

.044 .065

-.09
8

-.14

-.25

-.25

-.05

-.33

1.00
0

1.00
0

.586 .361 .541


1.00
0

.257 .498
1.00
0

-.020

.374

.076 -.062

.161

.398

-.20
1
-.43
5

-.622

-.608

.504 .425 -.204


1.00
0

-.20

-.43

-.42

-.62

-.60

-.20

-.42

- 4.44

.161 .098

.041

.431 .341 .387 .534 .098 -.424

-.337 .534 .541 .498 .504

1.906

.513

-.07

-.250 .431

.398

.022

.128

-.24

-.056 .387 .361 .257

-.070 .076

-.38

-.384 .065

-.30

-.26

-.259 .341 .586

-.09

.009

der

-.21

-.261 .044

-.241

-.20

.042 .115

.167 .042

.425 .025

.025 -.426
1.00
0
.253

- 1.51

.253
1.00
0
1.84

E-5 1.63 1.17 1.23 6E-7 1.07 1E-6 4E-5


2E-5 5E-5 3E-5

6E-5

Mileag
e

Cruise

2.76 4.69
6E-

7E-

11

14

2.18
4E-6

Leathe
r

1.81
4E-6

Conve
rtible

Sound

Hatchb
ack

1.21
0E-5

1.17
5E-5

1.23
3E-5

1.97 2.19 1.84

1.122

9E- 8E-6 7E-5

E-6

11
1.22
3E11

7.81
9E11

1.23 1.12

7.507

3E-6 2E-6

E-5

2.45

1.168

8E-6

E-6

1.10
5E-7

5.07
8E-6

6.33
2E-7

9.10 3.28 3.05


0E- 2E-6 2E-6
11
1.18
3E10
3.75
6E11

4E-

6.11
9E-6

3.74
1E-6

6E-5

7E11

1.586
E-5

E-7

1.16
8E-6

1.379
E-5

4E-

7E-

11

10

11

11

11

1.236
E-5

1.675 5.01

2.188

E-5 8E-7

E-5

E-5

1.29 3.60

2.659

2E-5 3E-6

E-5
-

1.78
1E-6

2.09
7E-6

2.464
E-5

4.278
E-6

9E-6 8E-6 2E-5

2E-6 1E-6

6.38
4E-7

7.23

6.71

6E-

5E-

12

11

3.33
0E-6

5.01
8E-7
-

9E-6

6.29 3.60 2.21 5.75

5.16

6E-6 3E-6 0E-6 9E-7 4E-7


-

6E-5 5E-5 3E-5 9E-5 6E-5 7E-6


-

1.23

1.58 1.67 2.26 2.65 1.22 1.19

E-6 9E-6

6.11 2.36 1.29

3.05 3.74

0E-5

9E-6

6E-

1.599 4.14

4.14

3E-

1.62

8E-6 6E-6

0E-6

8E-6

E-5

2.263

1.07 5.00 4E-7 2.21

6.332

E-5 9E-6

2.36 6.29

2.45

9.075

- 6.38

1.379 1.59

0E-

11

11
-

11

E-11

E-6 2E-6

4.44 8.12

11

7E-

E-6 5E-7

6E-7

11

E-11

5E- 5E-5 8E-6

11

2E-5

9E-

7.819 9.10 1.18 3.75 8.12 5.00

5.078 3.28

6E-6

1.63

5E-

1.223

1.233 1.10

7E-

1.34 1.97

2.00

1.34 2.52 2.19

1.57

6E-5

- 2.00

1.90

1.78 2.09
1E-6 7E-6
-

2.31
3E-6
-

1.23 2.18 2.46 4.27 1.88


6E-5 8E-5 4E-5 8E-6 6E-5

6.06
5E-7

3.03

7E-5 4E-5 3E-5 9E-5 5E-5 0E-6

4E-5 3E-5 9E-5 1E-5 7E-5

3.05 5.36 9.99 2.05 2.92


3E-5 9E-5 1E-5 4E-5 7E-5

2E-5
1.50
7E-6
2.29

9E-6 2E-5

8.04 3.54 3.05 2.76 2.81 1.75

3.54 8.41 5.36 2.64 2.91

1.96

2.30
1E-5

3.65

3.45

3E-6 2E-5

8.63

3.67

2E-6 4E-5

2.76 2.64 2.05 6.37 2.36 6.73


9E-5 1E-5 4E-5 4E-5 3E-5 3E-6

9.85
8E-6

- 2.81 2.91 2.92 2.36 3.45 2.93

1.226 2.31

1.886 5E-5 7E-5 7E-5 3E-5 6E-5 7E-7

1.51

E-5 3E-6

E-5

4E-5

Cylind
er

Sedan

1.51
1E-6

1.84
4E-5

7.23

6E- 3.33 5.75

1.197

12 0E-6 9E-7

E-6

6.71
5E11

1.23
9E-6

5.16
4E-7

1.962
E-5

6.06

3.039 1.75

5E-7

E-6 0E-6

1.50
7E-6

2.292
E-5

3.65 8.63
3E-6 2E-6
-

6.73 2.93 3.94

3.04

3E-6 7E-7 1E-6 2E-6


-

2.30 3.45 3.67 9.85 1.51


1E-5 2E-5 4E-5 8E-6 4E-5

3.04

3.65

2E-6 9E-5

a. Dependent Variable: LgPrice

After including type and make (dummy variables), other variables are excluded from the
model as their partial correlation was significant. This suggest that if we maintain them in the
model, it will not have significant influence on the ability of the model to predict retail price
of the car.
The prediction model contained 11 variables in total and 9 dummies. All 11 predictors were
gathered in 11 steps with 5 variables removed. The model was statistically significant and
counted for 97.3% of the variance of the retail price. Only litre and mileage have the highest
influence on retail price of the car.

GRADUATE SCHOOL OF BUSINESS (UKM GSB)

ZCZA6043
MULTIVARIATE ANALYSIS
ASSIGNMENT 3
MULTIPLE REGRESSION

PREPARED FOR :
PROF. MADYA DR. RASIDAH MOHAMAD SAID

PREPARED BY:
AL AZMI BIN ABDUL RAHMAN
(ZP 02311)

Você também pode gostar