Escolar Documentos
Profissional Documentos
Cultura Documentos
http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...
Minitab.com (http://www.minitab.com)
259
61
()
()9 (http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-the-constanty-intercept)
Master
Statistics
Anytime,
Anywhere
Quality Trainer
teaches you how to
analyze your data
anytime you are
online.
In this post, Ill show you everything you need to know about the constant in linear
Take the Tour! (
regression analysis.
http://www.minitab.com
/products
I'll use fitted line plots to illustrate the concepts because it really brings the math to
/quality-trainer
life. However, a 2D fitted line plot can only display the results from simple
/?WT.ac=BlogQT)
regression, which has one predictor variable and the response. The concepts hold
true for multiple linear regression, but I cant graph the higher dimensions that are
required.
1 of 9
12/12/2014 11:48 AM
http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...
Ive often seen the constant described as the mean response value when all
predictor variables are set to zero. Mathematically, thats correct. However, a zero
setting for all predictors in a model is often an impossible/nonsensical
combination, as it is in the following example.
In my last post about the interpretation of regression p-values and coefficients
(http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpretregression-analysis-results-p-values-and-coefficients), I used a fitted line plot to
illustrate a weight-by-height regression analysis. Below, Ive changed the scale of
the y-axis on that fitted line plot, but the regression results are the same as before.
If you follow the blue fitted line down to where it intercepts the y-axis, it is a fairly
negative value. From the regression equation, we see that the intercept value is
-114.3. If height is zero, the regression equation predicts that weight is -114.3
kilograms!
Clearly this constant is meaningless and you shouldnt even try to give it meaning.
No human can have zero height or a negative weight!
Now imagine a multiple regression analysis with many predictors. It becomes even
more unlikely that ALL of the predictors can realistically be set to zero.
If all of the predictors cant be zero, it is impossible to interpret the value of the
constant. Don't even try!
12/12/2014 11:48 AM
http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...
Even if its possible for all of the predictor variables to equal zero, that data point
might be outside the range of the observed data.
You should never use a regression model to make a prediction for a point that is
outside the range of your data because the relationship between the variables
might change. The value of the constant is a prediction for the response value
when all predictors equal zero. If you didn't collect data in this all-zero range, you
can't trust the value of the constant.
The height-by-weight example illustrates this concept. These data are from middle
school girls and we cant estimate the relationship between the variables outside of
the observed weight and height range. However, we can get a sense that the
relationship changes by marking the average weight and height for a newborn
baby on the graph. Thats not quite zero height, but it's as close as we can get.
I drew the red circle near the origin to approximate the newborn's average height
and weight. You can clearly see that the relationship must change as you extend
the data range!
So the relationship we see for the observed data is locally linear, but it changes
beyond that. Thats why you shouldnt predict outside the range of your data...and
another reason why the regression constant can be meaningless.
3 of 9
12/12/2014 11:48 AM
http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...
collect data within that all-zero range, the constant might still be meaningless!
The constant term is in part estimated by the omission of predictors from a
regression analysis. In essence, it serves as a garbage bin for any bias that is not
accounted for by the terms in the model. You can picture this by imagining that the
regression line floats up and down (by adjusting the constant) to a point where the
mean of the residuals is zero, which is a key assumption for residual analysis
(http://blog.minitab.com/blog/adventures-in-statistics/why-you-need-to-checkyour-residual-plots-for-regression-analysis). This floating is not based on what
makes sense for the constant, but rather what works mathematically to produce
that zero mean.
The constant guarantees that the residuals dont have an overall positive or
negative bias, but also makes it harder to interpret the value of the constant
because it absorbs the bias.
Next, Ill overlay the line for this equation on the previous fitted line plot so we can
compare the model with and without the constant.
4 of 9
12/12/2014 11:48 AM
http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...
The blue line is the fitted line for the regression model with the constant while the
green line is for the model without the constant. Clearly, the green line just doesnt
fit. The slope is way off and the predicted values are biased. For the model without
the constant, the weight predictions tend to be too high for shorter subjects and
too low for taller subjects.
In closing, the regression constant is generally not worth interpreting. Despite this,
it is almost always a good idea to include the constant in your regression analysis.
In the end, the real value of a regression model is the ability to understand how the
response variable changes when you change the values of the predictor variables.
Don't worry too much about the constant!
If you're learning about regression, read my regression tutorial
(http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorialand-examples)!
5 of 9
12/12/2014 11:48 AM
http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...
/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodnessof-fit)
Comments
Name: Tim McDaniel Sunday, September 15, 2013
Very nice. It is amazing, and I think understandable, how desperately new-to-regression students want to
attach a substantively meaningful interpretation to the intercept term. I tell students that one could interpret
the intercept as a "correction factor" when using particular values of the x's to predict y.
6 of 9
12/12/2014 11:48 AM
http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...
5 Comments
Regression line drawn as Y=c+1075x, when x was 2, Y was 239, given that Y intercept was
11,. Calculate the residual.
Mod
John K.
Jim, can you elaborate on the purpose and meaning of assessing the significance of a
constant. The significance measure is included in regression results and occasionally is way
above .05 (in my example: .559). Thank you!
Mod
Hi John,
The strict technical meaning of the p-value for the constant is that it measures how
compatible your data are with the null hypothesis that the constant equals zero. If you
have a sufficiently low p-value for the constant, you can reject the null hypothesis and
conclude that the constant does not equal zero. In other words, the regression line
does not go through the origin.
Your higher p-value indicates that you cannot reject the null that the constant equals
zero. Your constant could be zero.
However because the value of the constant is generally meaningless determining
7 of 9
12/12/2014 11:48 AM