Você está na página 1de 41

CARS HH SIZE CARS

1 1 1 CARS regressed on HHSIZE


3.5
2 2 2

No. of cars
2 3 2 3
2 4 2 f(x) = 0.4x + 0.8
2.5 R² = 0.8
3 5 3
2

1.5

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members

CARS regressed on HHSIZE


3.5
No. of cars

2.5 f(x) = 0.9962543387 l n(x) + 1.0460881159


R² = 0.8017047516
2

1.5

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members

CARS regressed on HHSIZE


3.5
No. of cars

3
f(x) = 0.9767186839 exp( 0.2197224577 x )
2.5 R² = 0.7683868434

1.5

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members

CARS regressed on HHSIZE


3.5
No. of cars

2.5 f(x) = 1.0893387646 x^0.574455186


CARS regressed on HHSIZE
3.5

No. of cars
3

2.5 f(x) = 1.0893387646 x^0.574455186


R² = 0.8484911839
2

1.5

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members

CARS regressed on HHSIZE


3.5
No. of cars

3
f(x) = - 8.90158753083262E-17x^2 + 0.4x + 0.8
2.5 R² = 0.8

1.5

0.5

0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members
4.5 5 5.5

Logarithmic: y = c + b*ln(x) + u
This is a linear-log relationship.

4.5 5 5.5

Exponential: y = c*exp(bx)*u
This is a log-linear relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)

4.5 5 5.5

Power: y = a*(x^b)*u
This is a log-log relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)

4.5 5 5.5

Polynomial: y = a + b*x + c*x^2 + d*x^4 + .... + u


This is for example a quadratic relationship if the poynomial is of order 2.

4.5 5 5.5
CARS HH SIZE CORRELATION COEFFICIENT
1 1
2 2 The correlation coefficient between two series, say x and y, equals
2 3
2 4 Covariance(x,y) / [Sqrt(Variance(x)) * Sqrt(Variance(y))]
3 5
CALCULATION USING THE DATA ANALYSIS ADD-IN where
CARS HH SIZE Covariance(x,y) is the sample covariance between x and y: (1/(n-1)) ×
CARS 1 Variance(x) is the sample variance of x: (1/(n-1)) × Σ i (xi - xbar)2
HH SIZE 0.894427 1 Variance(y) is the sample variance of y: (1/(n-1)) × Σ i (yi - ybar)2

The correlation coefficient is 0.894427.


This can be extended to several series.
For example if there are data in columns A, B, C, D and E then the array chosen is A1:E6 and produces a 5 x 5 table of
correlations.

CALCULATION USING THE CORREL FUNCTION

0.894427 On the Formula Tab select the Function Library group and More Functions and Statistical
Select Correlation and fill out the dialog box as below

0.894427 Alternatively directly type = CORREL(A1:A6,B1:B6) which yields 0.894427.


Note that Excel dropped the first row (or labesl).

0.894427 = CORREL(A2:A6,B2:B6) yields the same result

COVARIANCE
This is obtained in a similar way to correlation.
We can use Data Analysis Add-in and Covariance

CARS HH SIZE
CARS 0.4
HH SIZE 0.8 2

0.8

0.8
ies, say x and y, equals

tween x and y: (1/(n-1)) × Σ i (xi - xbar)(yi - ybar)


n-1)) × Σ i (xi - xbar)2
n-1)) × Σ i (yi - ybar)2

es a 5 x 5 table of
CARS HH SIZE TWO-VARIABLE LINEAR REGRESSION
1 1 The population regression model is: y = β1 + β2 x + u
2 2 We wish to estimate the regression line: y = b1 + b2 x
2 3
2 4
3 5

We obtain
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.894427
R Square 0.8
Adjusted R 0.733333
Standard E 0.365148
Observatio 5

ANOVA
df SS MS F Significance F
Regression 1 1.6 1.6 12 0.040519
Residual 3 0.4 0.133333
Total 4 2

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 0.8 0.382971 2.088932 0.127907 -0.418784 2.018784 -0.418784 2.018784
HH SIZE 0.4 0.11547 3.464102 0.040519 0.032523 0.767477 0.032523 0.767477

INTERPRETING THE REGRESSION SUMMARY OUTPUT

The key output is given in the Coefficients column in the last set of output:

b1 = 0.8 (the Intercept coefficient)


b2 = 0.4 (the Coefficient of HH SIZE : the slope coefficient)

Thus the fitted line is: y = 0.8 + 0.4 x


or CARS = 0.8 + 0.4 HHSIZE

The regression statistics outyput gives measures of how well the model fits the data. In particular

R2 = 0.8 which measures the fit of the model


This means that 80% of the variation of yi around ybar is explained by the regressor xi

Standard error = 0.365 which measures the standard deviation of yi around its fitted value.

The remaining output (ANOVA table and t Stat, p-value, .... ) is used for statistical inference.
CARS HH SIZE Statistical Inference for Two-variable Regression
1 1
2 2
2 3
2 4
3 5

REGRESSION USING THE DATA ANALYSIS ADD-IN


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.894427
R Square 0.8
Adjusted R 0.733333
Standard E 0.365148
Observatio 5

ANOVA
df SS MS F Significance F
Regression 1 1.6 1.6 12 0.040519
Residual 3 0.4 0.133333
Total 4 2

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 0.8 0.382971 2.088932 0.127907 -0.418784 2.018784 -0.418784 2.018784
HH SIZE 0.4 0.11547 3.464102 0.040519 0.032523 0.767477 0.032523 0.767477

The regression output has three components:

Regression statistics table


ANOVA table
Regression coefficients table.

INTERPRET REGRESSION STATISTICS TABLE

Regression Statistics Explanation


Multiple R 0.894427 R = square root of R2
R Square 0.8 R2 = coefficient of determination
Adjusted R 0.733333 Adjusted R2 used if more than one x variable
Standard E 0.365148 This is the sample estimate of the standard deviation of the error u
Observatio 5 Number of observations used in the regression (n)

The standard error here refers to the estimated standard deviation of the error term u.
It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).
SSE = Residual (or error) sum of squares 0.365148

INTERPRET ANOVA TABLE


ANOVA
df SS MS F Significance F
Regression 1 1.6 1.6 12 0.040519
Residual 3 0.4 0.133333
Total 4 2

The ANOVA (analysis of variance) table splits the sum of squares into its components.

Total sums of squares


= Residual (or error) sum of squares + Regression (or explained) sum of squares.

Thus Σ i (yi - ybar)2 = Σ i (yi - yhati)2 + Σ i (yhati - ybar)2

where yhati is the value of yi predicted from the regression line


and ybar is the sample mean of y.

For example:
R2 = 1 - Residual SS / Total SS (general formula for R2)
= 1 - 0.4/2.0 (from data in the ANOVA table)
0.8 (which equals R2 given in the regression Statistics table).

The remainder of the ANOVA table is described in more detail in Excel: Multiple Regression.

INTERPRET REGRESSION COEFFICIENTS TABLE

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 0.8 0.382971 2.088932 0.127907 -0.418784 2.018784 -0.418784 2.018784
HH SIZE 0.4 0.11547 3.464102 0.040519 0.032523 0.767477 0.032523 0.767477
CONFIDENCE INTERVALS FOR SLOPE COEFFICIENT

TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE")


TEST HYPOTHESIS OF SLOPE COEFFICIENT EQUAL TO VALUE OTHER THAN ZERO

-5.196155 = (0.4 - 1.0) / 0.11547

-5.236973 = (0.4 - 1.0) / 0.11457

FITTED VALUES AND RESIDUALS FROM REGRESSION LINE

RESIDUAL OUTPUT

Observation
Predicted CARS
Residuals
1 1.2 -0.2 -0.2
2 1.6 0.4 0.4
3 2 0 0
4 2.4 -0.4 -0.4
5 2.8 0.2 0.2

RESIDUAL OUTPUT PROBABILITY OUTPUT


HH
Residuals

0.6
0.4
0.2
0
HH

Residuals
0.6
Observation
Predicted CARS
Residuals
Standard Residuals Percentile CARS
0.4
1 1.2 -0.2 -0.632456 10 1
2 1.6 0.4 1.264911 30 2 0.2
3 2 0 0 50 2 0
4 2.4 -0.4 -1.264911 70 2 -0.2 0.5 1 1.5

5 2.8 0.2 0.632456 90 3 -0.4


-0.6

Norm
3.5

CARS
3
2.5
2
1.5
1
0.5
0
0 10 20
HH SIZE Residual Plot HH SIZE Line Fit Plot
Residuals

0.6
CARS

3.5
0.4 3
2.5
0.2 2
0 1.5 CARS
HH SIZE Residual Plot HH SIZE Line Fit Plot
Residuals

0.6

CARS
3.5
0.4 3
2.5
0.2 2
0 1.5 CARS
1 Predicted CARS
-0.2 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
0.5
-0.4 0
HH SIZE 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
-0.6 HH SIZE

Normal Probability Plot


3.5
CARS

3
2.5
2
1.5
1
0.5
0
0 10 20 30 40 50 60 70 80 90 100
Sample Percentile
lot

CARS
lot

CARS
Predicted CARS

5.5
CARS HH SIZE
1 1
2 2
2 3
2 4
3 5

REGRESSION USING EXCEL FUNCTIONS INTERCEPT, SLOPE, RSQ, STEYX and FORECAST

The population regression model is: y = β1 + β2 x + u

We wish to estimate the regression line: y = b1 + b2 x

The individual functions INTERCEPT, SLOPE, RSQ, STEYX and FORECAST can be used to get key results for
two-variable regression

0.8 INTERCEPT(A1:A6,B1:B6) yields the OLS intercept estimate of 0.8


0.4 SLOPE(A1:A6,B1:B6) yields the OLS slope estimate of 0.4
0.8 RSQ(A1:A6,B1:B6) yields the R-squared of 0.8
0.365148 STEYX(A1:A6,B1:B6) yields the standard error of the regression of 0.36515 0.8
3.2 FORECAST(6,A1:A6,B1:B6) yields the OLS forecast value of Yhat=3.2 for X=6 (forecast 3.2 cars for
household of size 6).

First in cell D2 enter the function LINEST(A2:A6,B2:B6,1,1).


CARS HH SIZE
1 1 0.4
2 2
2 3
2 4
3 5

Then Highlight the desired array D2:E6


Hit the F2 key (Then edit appears at the bottom left of the dpreadsheet).
CARS HH SIZE
1 1 0.4
2 2
2 3
2 4
3 5
Finally Hit CTRL-SHIFT-ENTER.
This yields
CARS HH SIZE
1 1 0.4 0.8
2 2 0.11547 0.382971
2 3 0.8 0.365148
2 4 12 3
3 5 1.6 0.4

where the results in A2:E6 represent

Slope coef Intercept coef


St.error of slope St.error of intercept
R-squared St.error of regression
F-test overall Degrees of freedom (n-k)
Regression SS Residual SS

In particular, the fitted regression is

CARS = 0.8 + 0.4 HH SIZE with R2 = 0.8


The estimated coefficients have standard errors of, respectively, 0.11547 (slope) and
0.382971 (intercept).

To get just the coefficients give the LINEST command with the last entry 0 rather than 1, ie.
LINEST(A2:A6,B2:B6,1,0),
and then highlight cells A8:B8, say, hit F2 key, and hit CTRL-SHIFT-ENTER.

CARS HH SIZE
1 1
2 2
2 3
2 4
3 5

0.4 0.8

PREDICTION USING EXCEL FUNCTION TREND

CARS HH SIZE New Values


1 1 6
2 2 7
2 3
2 4
3 5

CARS HH SIZE New Values


1 1 6
2 2 7
2 3
2 4
3 5
trend y = 0.8 + 0.4*x (model)
3.2 3.2
3.6 3.6
The LOGEST function is the same as the LINEST function, except that an exponential relationship is estim
LOGEST rather than a linear relationship.

CARS HH SIZE
1 1 1.245731 0.976719
2 2 0.069647 0.230995
2 3 0.768387 0.220245
2 4 9.952632 3
3 5 0.48278 0.145523

1.245731 0.976719
Exponential: y = c*exp(bx)*u
This is a log-linear relationship since taking logs gives ln(y) = ln(a) + b*ln(x) + ln(u)
CARS regressed on HHSIZE
3.5
of cars

3
nential relationship is estimated
CARS regressed on HHSIZE
3.5

No. of cars
3
f(x) = 0.9767186839 exp( 0.2197224577 x )
2.5 R² = 0.7683868434

1.5

0.5

+ b*ln(x) + ln(u) 0
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5
No. of household members
.5 5 5.5
CARS HH SIZE CUBED HH SIZE EXCEL 2007: Multiple Regression
1 1 1
2 2 8
2 3 27
2 4 64
3 5 125

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.895828
R Square 0.802508
Adjusted R 0.605016
Standard Er 0.444401
Observation 5

ANOVA
df SS MS F Significance F
Regression 2 1.605016 0.802508 4.063492 0.197492
Residual 2 0.394984 0.197492
Total 4 2

Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept 0.896552 0.764398 1.172886 0.361624 -2.392388 4.185491 -2.392388 4.185491
HH SIZE 0.336468 0.422704 0.79599 0.509507 -1.482279 2.155216 -1.482279 2.155216
CUBED HH S 0.00209 0.013114 0.159364 0.888021 -0.054334 0.058514 -0.054334 0.058514

The regression output has three components:

Regression statistics table


ANOVA table
Regression coefficients table.

INTERPRET REGRESSION STATISTICS TABLE

This is the following output. Of greatest interest is R Square.

Regression Statistics Explanation


Multiple R 0.895828 R = square root of R2
R Square 0.802508 R2 = coefficient of determination
Adjusted R 0.605016 Adjusted R2 used if more than one x variable
Standard Er 0.444401 This is the sample estimate of the standard deviation of the error u
Observation 5 Number of observations used in the regression (n)
The above gives the overall goodness-of-fit measures:
R2 = 0.8025
Correlation between y and y-hat is 0.8958 (when squared gives 0.8025).
Adjusted R2 = R2 - (1-R2 )*(k-1)/(n-k) = .8025 - .1975*2/2 = 0.6050.

The standard error here refers to the estimated standard deviation of the error term u.
It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).
SSE = Residual (or error) sum of squares 0.444401

INTERPRET ANOVA TABLE

For example:
R2 = 1 - Residual SS / Total SS (general formula for R2)
= 1 - 0.3950/2.0 (from data in the ANOVA table)
0.8025 (which equals R2 given in the regression Statistics table).

The column labeled F gives the overall F-test of H0: β2 = 0 and β3 = 0 versus Ha: at least
one of β2 and β3 does not

Aside: Excel computes F this as:


F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1.6050/2] / [.39498/2] = 4.0635.
4.0635

The column labeled significance F has the associated P-value.


Since 0.1975 > 0.05, we do not reject H0 at signficance level 0.05.

Note: Significance F in general = FDIST(F, k-1, n-k) where k is the number of regressors
including hte intercept. where k equals = 3

Here FDIST(4.0635,2,2) = 0.1975. 0.1975

INTERPRET REGRESSION COEFFICIENTS TABLE

t Stat =(coeff./std error)


1.1729
0.7960
0.1594

A simple summary of the above output is that the fitted line is

y = 0.8966 + 0.3365*x + 0.0021*z

CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS

95% confidence interval for slope coefficient β2 is from Excel output (-1.4823, 2.1552).

Excel computes this as

b2 ± t_.025(3) × se(b2)
= 0.33647 ± TINV(0.05, 2) × 0.42270 TINV(0.05,2)
= 0.33647 ± 4.303 × 0.42270 4.303
= 0.33647 ± 1.8189 1.8187
= (-1.4823, 2.1552). -1.4823 2.1552
TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL SIGNIFICANCE")

There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2).


For example, for HH SIZE p = TDIST(0.796,2,2) = 0.5095. 0.5095

TEST HYPOTHESIS ON A REGRESSION PARAMETER

Then
t = (b2 - H0 value of β2) / (standard error of b2 )
= (0.33647 - 1) / 0.4227
-1.569733

Using the p-value approach


p-value = TDIST(1.569, 2, 2) = 0.257. [Here n=5 and k=3 so n-k=2]. 0.257049
Do not reject the null hypothesis at level .05 since the p-value is > 0.05.

Using the critical value approach


We computed t = -1.569
The critical value is t_.025(2) = TINV(0.05,2) = 4.303. [Here n=5 and k=3 so n-k=2]. 4.302653
So do not reject null hypothesis at level .05 since t = |-1.569| < 4.303.

OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS


We test H0: β2 = 0 and β3 = 0 versus Ha: at least one of β2 and β3 does not equal zero.

From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975.
Since the p-value is not less than 0.05 we do not reject the null hypothesis that the
regression parameters are zero at significance level 0.05.
Conclude that the parameters are jointly statistically insignificant at significance level 0.05.

Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of regressors
including hte intercept. Here FDIST(4.0635,2,2) = 0.1975. 0.197492

PREDICTED VALUE OF Y GIVEN REGRESSORS

Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.
yhat = b1 + b2 x2 + b3 x3 = 0.88966 + 0.3365×4 + 0.0021×64 = 2.37006 2.376176

EXCEL LIMITATIONS

Excel restricts the number of regressors (only up to 16 regressors ??).

Excel requires that all the regressor variables be in adjoining columns.


You may need to move columns to ensure this.
e.g. If the regressors are in columns B and D you need to copy at least one of columns
B and D so that they are adjacent to each other.

Excel standard errors and t-statistics and p-values are based on the assumption that the
error is independent with constant variance (homoskedastic).

Excel does not provide alternaties, such asheteroskedastic-robust or autocorrelation-robust


standard errors and t-statistics and p-values.
More specialized software such as STATA, EVIEWS, SAS, LIMDEP, PC-TSP, ... is needed.

Você também pode gostar