Multiple Linear Regression

CARS HH SIZE CARS
1 1 1 CARS regressed on HHSIZE

3.5
2 2 2
No. of cars
2 3 2 3
2 4 2 f(x) = 0.4x + 0.8
2.5 R² = 0.8
3 5 3
2
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members
CARS regressed on HHSIZE

3.5
No. of cars
2.5 f(x) = 0.9962543387 l n(x) + 1.0460881159

R² = 0.8017047516
2
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

3.5
No. of cars
3
f(x) = 0.9767186839 exp( 0.2197224577 x )
2.5 R² = 0.7683868434
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

3.5
No. of cars
2.5 f(x) = 1.0893387646 x^0.574455186

3.5
No. of cars
3
2.5 f(x) = 1.0893387646 x^0.574455186

R² = 0.8484911839
2
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5

3.5
No. of cars
3
f(x) = - 8.90158753083262E-17x^2 + 0.4x + 0.8
2.5 R² = 0.8
1.5
0.5
0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
4.5 5 5.5
Logarithmic: y = c + b*ln(x) + u
This is a linear-log relationship.
4.5 5 5.5
Exponential: y = c*exp(bx)*u
This is a log-linear relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)
4.5 5 5.5
Power: y = a*(x^b)*u
This is a log-log relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)
4.5 5 5.5
Polynomial: y = a + b*x + c*x^2 + d*x^4 + .... + u

This is for example a quadratic relationship if the poynomial is of order 2.
4.5 5 5.5
CARS HH SIZE CORRELATION COEFFICIENT
1 1
2 2 The correlation coefficient between two series, say x and y, equals
2 3
2 4 Covariance(x,y) / [Sqrt(Variance(x)) * Sqrt(Variance(y))]
3 5
CALCULATION USING THE DATA ANALYSIS ADD-IN where
CARS HH SIZE Covariance(x,y) is the sample covariance between x and y: (1/(n-1)) ×
CARS 1 Variance(x) is the sample variance of x: (1/(n-1)) × Σ i (xi - xbar)2
HH SIZE 0.894427 1 Variance(y) is the sample variance of y: (1/(n-1)) × Σ i (yi - ybar)2
The correlation coefficient is 0.894427.

This can be extended to several series.
For example if there are data in columns A, B, C, D and E then the array chosen is A1:E6 and produces a 5 x 5 table of
correlations.
CALCULATION USING THE CORREL FUNCTION
0.894427 On the Formula Tab select the Function Library group and More Functions and Statistical
Select Correlation and fill out the dialog box as below
0.894427 Alternatively directly type = CORREL(A1:A6,B1:B6) which yields 0.894427.

Note that Excel dropped the first row (or labesl).
0.894427 = CORREL(A2:A6,B2:B6) yields the same result
COVARIANCE
This is obtained in a similar way to correlation.
We can use Data Analysis Add-in and Covariance
CARS HH SIZE
CARS 0.4
HH SIZE 0.8 2
0.8
0.8
ies, say x and y, equals
tween x and y: (1/(n-1)) × Σ i (xi - xbar)(yi - ybar)

n-1)) × Σ i (xi - xbar)2
n-1)) × Σ i (yi - ybar)2
es a 5 x 5 table of
CARS HH SIZE TWO-VARIABLE LINEAR REGRESSION
1 1 The population regression model is: y = β1 + β2 x + u
2 2 We wish to estimate the regression line: y = b1 + b2 x
2 3
2 4
3 5
We obtain
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.894427
R Square 0.8
Adjusted R 0.733333
Standard E 0.365148
Observatio 5
ANOVA
df SS MS F Significance F
Regression 1 1.6 1.6 12 0.040519
Residual 3 0.4 0.133333
Total 4 2
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 0.8 0.382971 2.088932 0.127907 -0.418784 2.018784 -0.418784 2.018784
HH SIZE 0.4 0.11547 3.464102 0.040519 0.032523 0.767477 0.032523 0.767477
INTERPRETING THE REGRESSION SUMMARY OUTPUT
The key output is given in the Coefficients column in the last set of output:
b1 = 0.8 (the Intercept coefficient)

b2 = 0.4 (the Coefficient of HH SIZE : the slope coefficient)
Thus the fitted line is: y = 0.8 + 0.4 x

or CARS = 0.8 + 0.4 HHSIZE
The regression statistics outyput gives measures of how well the model fits the data. In particular
R2 = 0.8 which measures the fit of the model

This means that 80% of the variation of yi around ybar is explained by the regressor xi
Standard error = 0.365 which measures the standard deviation of yi around its fitted value.
The remaining output (ANOVA table and t Stat, p-value, .... ) is used for statistical inference.
CARS HH SIZE Statistical Inference for Two-variable Regression
1 1
2 2
2 3
2 4
3 5
REGRESSION USING THE DATA ANALYSIS ADD-IN

SUMMARY OUTPUT
Multiple R 0.894427
R Square 0.8
Adjusted R 0.733333
Standard E 0.365148
Observatio 5
ANOVA
Regression 1 1.6 1.6 12 0.040519
Residual 3 0.4 0.133333
Total 4 2
Coefficients
Upper 95.0%
Intercept 0.8 0.382971 2.088932 0.127907 -0.418784 2.018784 -0.418784 2.018784
HH SIZE 0.4 0.11547 3.464102 0.040519 0.032523 0.767477 0.032523 0.767477
The regression output has three components:
Regression statistics table

ANOVA table
Regression coefficients table.
INTERPRET REGRESSION STATISTICS TABLE
Regression Statistics Explanation

Multiple R 0.894427 R = square root of R2
R Square 0.8 R2 = coefficient of determination
Adjusted R 0.733333 Adjusted R2 used if more than one x variable
Standard E 0.365148 This is the sample estimate of the standard deviation of the error u
Observatio 5 Number of observations used in the regression (n)
The standard error here refers to the estimated standard deviation of the error term u.
It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).
SSE = Residual (or error) sum of squares 0.365148
INTERPRET ANOVA TABLE

ANOVA
Regression 1 1.6 1.6 12 0.040519
Residual 3 0.4 0.133333
Total 4 2
The ANOVA (analysis of variance) table splits the sum of squares into its components.
Total sums of squares

= Residual (or error) sum of squares + Regression (or explained) sum of squares.
Thus Σ i (yi - ybar)2 = Σ i (yi - yhati)2 + Σ i (yhati - ybar)2
where yhati is the value of yi predicted from the regression line

and ybar is the sample mean of y.
For example:
R2 = 1 - Residual SS / Total SS (general formula for R2)
= 1 - 0.4/2.0 (from data in the ANOVA table)
0.8 (which equals R2 given in the regression Statistics table).
The remainder of the ANOVA table is described in more detail in Excel: Multiple Regression.
INTERPRET REGRESSION COEFFICIENTS TABLE
Coefficients
Upper 95.0%
Intercept 0.8 0.382971 2.088932 0.127907 -0.418784 2.018784 -0.418784 2.018784
HH SIZE 0.4 0.11547 3.464102 0.040519 0.032523 0.767477 0.032523 0.767477
CONFIDENCE INTERVALS FOR SLOPE COEFFICIENT
TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE")

TEST HYPOTHESIS OF SLOPE COEFFICIENT EQUAL TO VALUE OTHER THAN ZERO
-5.196155 = (0.4 - 1.0) / 0.11547
-5.236973 = (0.4 - 1.0) / 0.11457
FITTED VALUES AND RESIDUALS FROM REGRESSION LINE
RESIDUAL OUTPUT
Observation
Predicted CARS
Residuals
1 1.2 -0.2 -0.2
2 1.6 0.4 0.4
3 2 0 0
4 2.4 -0.4 -0.4
5 2.8 0.2 0.2
RESIDUAL OUTPUT PROBABILITY OUTPUT

HH
Residuals
0.6
0.4
0.2
0
HH
Residuals
0.6
Observation
Predicted CARS
Residuals
Standard Residuals Percentile CARS
0.4
1 1.2 -0.2 -0.632456 10 1
2 1.6 0.4 1.264911 30 2 0.2
3 2 0 0 50 2 0
4 2.4 -0.4 -1.264911 70 2 -0.2 0.5 1 1.5
5 2.8 0.2 0.632456 90 3 -0.4

-0.6
Norm
3.5
CARS
3
2.5
2
1.5
1
0.5
0
0 10 20
HH SIZE Residual Plot HH SIZE Line Fit Plot
Residuals
0.6
CARS
3.5
0.4 3
2.5
0.2 2
0 1.5 CARS
HH SIZE Residual Plot HH SIZE Line Fit Plot
Residuals
0.6
CARS
3.5
0.4 3
2.5
0.2 2
0 1.5 CARS
1 Predicted CARS
-0.2 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
0.5
-0.4 0
HH SIZE 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
-0.6 HH SIZE
Normal Probability Plot

3.5
CARS
3
2.5
2
1.5
1
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Sample Percentile
lot
CARS
lot
CARS
Predicted CARS
5.5
CARS HH SIZE
1 1
2 2
2 3
2 4
3 5
REGRESSION USING EXCEL FUNCTIONS INTERCEPT, SLOPE, RSQ, STEYX and FORECAST
The population regression model is: y = β1 + β2 x + u
We wish to estimate the regression line: y = b1 + b2 x
The individual functions INTERCEPT, SLOPE, RSQ, STEYX and FORECAST can be used to get key results for
two-variable regression
0.8 INTERCEPT(A1:A6,B1:B6) yields the OLS intercept estimate of 0.8

0.4 SLOPE(A1:A6,B1:B6) yields the OLS slope estimate of 0.4
0.8 RSQ(A1:A6,B1:B6) yields the R-squared of 0.8
0.365148 STEYX(A1:A6,B1:B6) yields the standard error of the regression of 0.36515 0.8
3.2 FORECAST(6,A1:A6,B1:B6) yields the OLS forecast value of Yhat=3.2 for X=6 (forecast 3.2 cars for
household of size 6).
First in cell D2 enter the function LINEST(A2:A6,B2:B6,1,1).

CARS HH SIZE
1 1 0.4
2 2
2 3
2 4
3 5
Then Highlight the desired array D2:E6

Hit the F2 key (Then edit appears at the bottom left of the dpreadsheet).
CARS HH SIZE
1 1 0.4
2 2
2 3
2 4
3 5
Finally Hit CTRL-SHIFT-ENTER.
This yields
CARS HH SIZE
1 1 0.4 0.8
2 2 0.11547 0.382971
2 3 0.8 0.365148
2 4 12 3
3 5 1.6 0.4
where the results in A2:E6 represent
Slope coef Intercept coef

St.error of slope St.error of intercept
R-squared St.error of regression
F-test overall Degrees of freedom (n-k)
Regression SS Residual SS
In particular, the fitted regression is
CARS = 0.8 + 0.4 HH SIZE with R2 = 0.8

The estimated coefficients have standard errors of, respectively, 0.11547 (slope) and
0.382971 (intercept).
To get just the coefficients give the LINEST command with the last entry 0 rather than 1, ie.
LINEST(A2:A6,B2:B6,1,0),
and then highlight cells A8:B8, say, hit F2 key, and hit CTRL-SHIFT-ENTER.
CARS HH SIZE
1 1
2 2
2 3
2 4
3 5
0.4 0.8
PREDICTION USING EXCEL FUNCTION TREND
CARS HH SIZE New Values

1 1 6
2 2 7
2 3
2 4
3 5
CARS HH SIZE New Values

1 1 6
2 2 7
2 3
2 4
3 5
trend y = 0.8 + 0.4*x (model)
3.2 3.2
3.6 3.6
The LOGEST function is the same as the LINEST function, except that an exponential relationship is estim
LOGEST rather than a linear relationship.
CARS HH SIZE
1 1 1.245731 0.976719
2 2 0.069647 0.230995
2 3 0.768387 0.220245
2 4 9.952632 3
3 5 0.48278 0.145523
1.245731 0.976719
Exponential: y = c*exp(bx)*u
This is a log-linear relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)
3.5
of cars
3
nential relationship is estimated
3.5
No. of cars
3
f(x) = 0.9767186839 exp( 0.2197224577 x )
2.5 R² = 0.7683868434
1.5
0.5
+ b*ln(x) + ln(u) 0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
.5 5 5.5
CARS HH SIZE CUBED HH SIZE EXCEL 2007: Multiple Regression
1 1 1
2 2 8
2 3 27
2 4 64
3 5 125
SUMMARY OUTPUT
Multiple R 0.895828
R Square 0.802508
Adjusted R 0.605016
Standard Er 0.444401
Observation 5
ANOVA
Regression 2 1.605016 0.802508 4.063492 0.197492
Residual 2 0.394984 0.197492
Total 4 2
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept 0.896552 0.764398 1.172886 0.361624 -2.392388 4.185491 -2.392388 4.185491
HH SIZE 0.336468 0.422704 0.79599 0.509507 -1.482279 2.155216 -1.482279 2.155216
CUBED HH S 0.00209 0.013114 0.159364 0.888021 -0.054334 0.058514 -0.054334 0.058514
The regression output has three components:
Regression statistics table

ANOVA table
Regression coefficients table.
INTERPRET REGRESSION STATISTICS TABLE
This is the following output. Of greatest interest is R Square.
Regression Statistics Explanation

Multiple R 0.895828 R = square root of R2
R Square 0.802508 R2 = coefficient of determination
Adjusted R 0.605016 Adjusted R2 used if more than one x variable
Standard Er 0.444401 This is the sample estimate of the standard deviation of the error u
Observation 5 Number of observations used in the regression (n)
The above gives the overall goodness-of-fit measures:
R2 = 0.8025
Correlation between y and y-hat is 0.8958 (when squared gives 0.8025).
Adjusted R2 = R2 - (1-R2 )*(k-1)/(n-k) = .8025 - .1975*2/2 = 0.6050.
The standard error here refers to the estimated standard deviation of the error term u.
It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).
SSE = Residual (or error) sum of squares 0.444401
INTERPRET ANOVA TABLE
For example:
R2 = 1 - Residual SS / Total SS (general formula for R2)
= 1 - 0.3950/2.0 (from data in the ANOVA table)
0.8025 (which equals R2 given in the regression Statistics table).
The column labeled F gives the overall F-test of H0: β2 = 0 and β3 = 0 versus Ha: at least
one of β2 and β3 does not
Aside: Excel computes F this as:

F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1.6050/2] / [.39498/2] = 4.0635.
4.0635
The column labeled significance F has the associated P-value.

Since 0.1975 > 0.05, we do not reject H0 at signficance level 0.05.
Note: Significance F in general = FDIST(F, k-1, n-k) where k is the number of regressors
including hte intercept. where k equals = 3
Here FDIST(4.0635,2,2) = 0.1975. 0.1975
INTERPRET REGRESSION COEFFICIENTS TABLE
t Stat =(coeff./std error)

1.1729
0.7960
0.1594
A simple summary of the above output is that the fitted line is
y = 0.8966 + 0.3365*x + 0.0021*z
CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS
95% confidence interval for slope coefficient β2 is from Excel output (-1.4823, 2.1552).
Excel computes this as
b2 ± t_.025(3) × se(b2)
= 0.33647 ± TINV(0.05, 2) × 0.42270 TINV(0.05,2)
= 0.33647 ± 4.303 × 0.42270 4.303
= 0.33647 ± 1.8189 1.8187
= (-1.4823, 2.1552). -1.4823 2.1552
TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE")
There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2).

For example, for HH SIZE p = TDIST(0.796,2,2) = 0.5095. 0.5095
TEST HYPOTHESIS ON A REGRESSION PARAMETER
Then
t = (b2 - H0 value of β2) / (standard error of b2 )
= (0.33647 - 1) / 0.4227
-1.569733
Using the p-value approach

p-value = TDIST(1.569, 2, 2) = 0.257. [Here n=5 and k=3 so n-k=2]. 0.257049
Do not reject the null hypothesis at level .05 since the p-value is > 0.05.
Using the critical value approach

We computed t = -1.569
The critical value is t_.025(2) = TINV(0.05,2) = 4.303. [Here n=5 and k=3 so n-k=2]. 4.302653
So do not reject null hypothesis at level .05 since t = |-1.569| < 4.303.
OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS

We test H0: β2 = 0 and β3 = 0 versus Ha: at least one of β2 and β3 does not equal zero.
From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975.
Since the p-value is not less than 0.05 we do not reject the null hypothesis that the
regression parameters are zero at significance level 0.05.
Conclude that the parameters are jointly statistically insignificant at significance level 0.05.
Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors
including hte intercept. Here FDIST(4.0635,2,2) = 0.1975. 0.197492
PREDICTED VALUE OF Y GIVEN REGRESSORS
Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.
yhat = b1 + b2 x2 + b3 x3 = 0.88966 + 0.3365×4 + 0.0021×64 = 2.37006 2.376176
EXCEL LIMITATIONS
Excel restricts the number of regressors (only up to 16 regressors ??).
Excel requires that all the regressor variables be in adjoining columns.

You may need to move columns to ensure this.
e.g. If the regressors are in columns B and D you need to copy at least one of columns
B and D so that they are adjacent to each other.
Excel standard errors and t-statistics and p-values are based on the assumption that the
error is independent with constant variance (homoskedastic).
Excel does not provide alternaties, such asheteroskedastic-robust or autocorrelation-robust

standard errors and t-statistics and p-values.
More specialized software such as STATA, EVIEWS, SAS, LIMDEP, PC-TSP, ... is needed.

Multiple Linear Regression

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Multiple Linear Regression

Enviado por

Direitos autorais:

Formatos disponíveis

CARS HH SIZE CARS

1 1 1 CARS regressed on HHSIZE

CARS regressed on HHSIZE

2.5 f(x) = 0.9962543387 l n(x) + 1.0460881159

CARS regressed on HHSIZE

CARS regressed on HHSIZE

2.5 f(x) = 1.0893387646 x^0.574455186

2.5 f(x) = 1.0893387646 x^0.574455186

CARS regressed on HHSIZE

Polynomial: y = a + b*x + c*x^2 + d*x^4 + .... + u

The correlation coefficient is 0.894427.

CALCULATION USING THE CORREL FUNCTION

0.894427 Alternatively directly type = CORREL(A1:A6,B1:B6) which yields 0.894427.

0.894427 = CORREL(A2:A6,B2:B6) yields the same result

tween x and y: (1/(n-1)) × Σ i (xi - xbar)(yi - ybar)

INTERPRETING THE REGRESSION SUMMARY OUTPUT

b1 = 0.8 (the Intercept coefficient)

Thus the fitted line is: y = 0.8 + 0.4 x

R2 = 0.8 which measures the fit of the model

REGRESSION USING THE DATA ANALYSIS ADD-IN

The regression output has three components:

Regression statistics table

INTERPRET REGRESSION STATISTICS TABLE

Regression Statistics Explanation

INTERPRET ANOVA TABLE

Total sums of squares

Thus Σ i (yi - ybar)2 = Σ i (yi - yhati)2 + Σ i (yhati - ybar)2

where yhati is the value of yi predicted from the regression line

INTERPRET REGRESSION COEFFICIENTS TABLE

TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE")

-5.196155 = (0.4 - 1.0) / 0.11547

-5.236973 = (0.4 - 1.0) / 0.11457

FITTED VALUES AND RESIDUALS FROM REGRESSION LINE

RESIDUAL OUTPUT PROBABILITY OUTPUT

5 2.8 0.2 0.632456 90 3 -0.4

Normal Probability Plot

The population regression model is: y = β1 + β2 x + u

We wish to estimate the regression line: y = b1 + b2 x

0.8 INTERCEPT(A1:A6,B1:B6) yields the OLS intercept estimate of 0.8

First in cell D2 enter the function LINEST(A2:A6,B2:B6,1,1).

Then Highlight the desired array D2:E6

where the results in A2:E6 represent

Slope coef Intercept coef

In particular, the fitted regression is

CARS = 0.8 + 0.4 HH SIZE with R2 = 0.8

PREDICTION USING EXCEL FUNCTION TREND

CARS HH SIZE New Values

CARS HH SIZE New Values

The regression output has three components:

Regression statistics table

INTERPRET REGRESSION STATISTICS TABLE

This is the following output. Of greatest interest is R Square.

Regression Statistics Explanation

INTERPRET ANOVA TABLE

Aside: Excel computes F this as:

The column labeled significance F has the associated P-value.

Here FDIST(4.0635,2,2) = 0.1975. 0.1975

INTERPRET REGRESSION COEFFICIENTS TABLE

t Stat =(coeff./std error)

A simple summary of the above output is that the fitted line is

y = 0.8966 + 0.3365*x + 0.0021*z

CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS

Excel computes this as

There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2).

Polynomial: y = a + bx + cx^2 + d*x^4 + .... + u

y = 0.8966 + 0.3365x + 0.0021z