Você está na página 1de 5

Executive Report

Jona Sassenhagen

Summary
For the available sample, cars with manual transmission had somewhat better MPG (by about 3 gallons/mile),
and we are somewhat confident that this finding is generalizable.

Caveat
This estimate includes correction for a number of confounds, such as the observation that heavier and faster
cars have worse MPG, but automatic cars tend to be heavier and slightly slower. However, a number of other
possible confounds, especially interactions, could not be adequately controlled for due to the small sample
size. Also, while estimating a beneficial effect of manual transmission on MPG was statistically significant,
great uncertainty remains about the absolute extent of the improvement; based on the available data, we can
not exclude that the true improvement is trivially small. We also cannot exclude interaction effects of the
kind that possible benefits of automatic transmissions might be different, or even reversed, for different car
types, e.g. heavier cars.

Analysis steps
Preliminary analysis
Initially, a simple Welchs two-sample t-test for the difference in means was conducted, comparing cars with
manual and automatic transmissions.
This test preliminarily indicated that comparing cars MPG by transmission mode, automatic transmission
cars are more fuel efficient (t(18)=-3.8, p=0.001).

Exploration of possible confounds


However, plotting all pairwise correlations (see appendix) indicates that many variables are correlated,
including very strong correlations between MPG and weight, horsepower, displacement, rear axle ratio, vs,
and cylinders, and correlations between weight, gears and rear axle ratio and transmission type.

Multiple Regression
We thus employ multiple regression to disentangle the various cofounds. However, the dataset is comparatively
small compared to the number of predictors, especially considering the strong correlations between various
characteristics of the cars. Clearly, data is insufficient to investigate the full range of interaction effects even
up to the second order, as the number of terms including interactions is much larger than the number of
observations.
We choose to employ all predictors as linear (scalar) even though some (e.g. the number of cylinders) may
inherently better correspond to categorical or otherwise linear effects. This is done for reasons of parsimony,
but we feel confident in this procedure because visual inspection of the dependence of MPG on these variables
(not shown for reasons of brevity) indicates no strong divergence from linear dependence.

Because the range of parameters that could influence fuel efficiency is too large to be fully estimated, we
conducted model selection in the form of stepwise regression on the simple, no-interaction linear model.
Backward stepwise regression (which consists of iteratively removing the worst predictor from the model,
until the model fit worsens significantly comparing the model with to that without the predictor) indicates
that at the very least, weight, speed and transmission mode jointly predict MPG to a significant degree.
While controlling for the effects of weight and speed, cars with manual transmission had better MPG, and
this effect was statistically significant. However, a wide range of possible values are compatible with the data;
the 95% confidence interval indicates values as small as 0.05 miles/gallon cannot be excluded with confidence.
We show the full result of the best model resulting from stepwise regression. The relationship between weight,
transmission mode and speed are also visualised in the appendix.
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)
Residuals:
Min
1Q Median
-3.4811 -1.5555 -0.7257

3Q
1.4110

Max
4.6610

Coefficients:
Estimate Std. Error t value
(Intercept)
9.6178
6.9596
1.382
wt
-3.9165
0.7112 -5.507
qsec
1.2259
0.2887
4.247
ammanual
2.9358
1.4109
2.081
--Signif. codes: 0 '***' 0.001 '**' 0.01

Pr(>|t|)
0.177915
6.95e-06 ***
0.000216 ***
0.046716 *
'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom


Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11

However, additionally, because this analysis stage indicated significant influences of weight and speed on
MPG, we also consider a possible interaction between transmission mode, weight and speed (we do not
attempt to investigate the full range of interactions between all factors due to insufficient data). For this, we
investigate a model including all main effects as well as all interactions with transmission mode (not shown
for reasons of brevity).
Stepwise regression on the full range of interactions between terms indicates that the interaction between
weight and transmission is indeed statistically significant (95% confidence interval: -6.57, -1.71). This means
that the true effect of transmission mode depends on the weight of the car insofar that for heavier cars, the
benefit of manual transmission was reduced. However, as noted, this observation results from a reductionist
process of model simplification so that richer data sets might uncover a different latent structure.

Conclusion
Results indicate fuel efficiency is improved for cars with manual transmission. However, a complex interaction
structure (such as different effects for cars of different weight classes) as well as trivially small improvements
cannot be excluded given the limited number of observations at hand.

Appendix
Dependence of MPG on speed, weight and transmission mode
35

wt
30

2
3
4

25

mpg

5
qsec
22.5

20
20.0
17.5
15
15.0

10
automatic

manual

transmission mode

Crosscorrelation matrix plot showing strong collinearities


1

carb
0.8

0.43 wt
0.6

0.75 0.66 hp
0.53 0.78 0.83 cyl

0.4

0.39 0.89 0.79 0.9 disp

0.2

0.660.170.710.590.43qsec

0.570.550.720.810.71 0.74 vs

0.2

0.550.870.780.850.85 0.42 0.66 mpg

0.4

0.090.710.45 0.7 0.71 0.09 0.44 0.68 drat

0.6

0.06 0.690.240.520.590.23 0.17 0.6 0.71 am


0.8

0.27 0.580.130.490.560.21 0.21 0.48 0.7 0.79 gear


1

Model 1 residual density plot showing roughly normally distributed residuals

0.10
0.00

0.05

Density

0.15

density.default(x = resid(m2))

N = 32 Bandwidth = 0.9962
Model 2 residual density plot showing roughly normally distributed residuals

0.10
0.05
0.00

Density

0.15

0.20

density.default(x = resid(m4))

N = 32 Bandwidth = 0.8479
5

Você também pode gostar