Você está na página 1de 5

POLYNOMIAL REGRESSION MODELS

Polynomial regression models are multiple linear regression models that have higher order terms
in them. They are useful when there is reason to believe the relationship between two variables is
curvilinear.
In a polynomial regression model of order p the relationship between the predictor(s) and
response variable is/are modeled as a pth order polynomial function.

Polynomial model in one variable:


The general form of a polynomial regression model in one variable X is given by

y  0  1 X  2 X 2 ...  k X k 

The predictors in the model are higher powers of predictor X.

This model is referred to as the k-th polynomial model.

When k=2, the polynomial model is referred to as quadratic model and the parameters 1 and 2
are linear and quadratic parameter effects respectively.

When k=3, the polynomial model is referred to as cubic model and the parameters 1 , 2 and
3are linear , quadratic and cubic parameter effects respectively.

Estimation of regression coefficients:


Since these models are multiple linear models, we use least square criterion to estimate the
regression coefficients. One major problem in this case is that the predictors are correlated hence
the matrix  X X  is singular and hence we cannot obtain its inverse  X X   1 . To deal we co
 
linearity we use centered values of X instead; ie X *  X  X ; subtract the value of X from the
mean value.
The model is therefore
2 k
yi  0 1  X i  X   2  X i  X  ...  k  X i  X   i ; i=1,2,..,n
     

 ˆ  1 ( x1  x ) ( x1  x ) 2  ( x1  x ) k 
 0  
 ˆ  1 ( x2  x ) ( x2  x ) 2  ( x2  x ) k 
 1  
And ˆ 
 ˆ 2   X X   1 X y where X 1 ( x3  x ) ( x3  x ) 2  ( x3  x ) k 
   
       
   
 ˆ k  1 ( xn  x ) ( xn  x ) 2 ( xn  x ) k 
Significance tests for determining best polynomial model fit:
The higher the powers of X, the more complex the model becomes, hence it is advisable to keep
the powers as low as possible.
To determine the best fit, we successively compare two models; the k-th order (reduced model)
and (k+1)-th order( full model) models. Thus we compare simple linear and quadratic, quadratic
and cubic, cubic and fourth order model etc.

The hypotheses being tested are


H0: the k-th polynomial model is better fit than (k+1)-th model
vs.
H1: the (k+1)-th polynomial model is better fit than k-th model

SSE R  SSE F
The test statistic is F  MSE F
~ F (1, n  k  2)

SSER is the error sum of squares for reduced model and SSEF is the error sum of squares for full
model.
The best fit is attained when the calculated value of F is less than F(1,n-k-2, ).
Alternatively we test the hypotheses
H 0 :  k 1 0 vs. H 1 :  k 1 0

If we fail to reject H0 then we conclude that the k-th polynomial is the better fit and hence the
best fit.

Example:
(i) Fit a linear, quadratic and cubic regression model to the data below:
y 9.42 4.88 3.36 3.28 1.67 7.35 6.3 4.67 9.33 5.04
x 11 9 5 5 4 10 3 10 11 8

(ii) Determine the model that best fits the data


Simple linear :
Parameter estimates
Variable estimate s.e(estimate)
intercept 1.1006
x 0.5828 0.2139

ANOVA TABLE:
Source of variation d.f Sum of squares Mean sum of squares F -value
Regression 1 28.6689 28.6689 7.4268
Error 8 30.882 3.8602

Quadratic model:
Parameter estimates:
Variable estimate s.e(estimate)
intercept 3.3649
(x  x) 0.7958 0.1623

(x  x) 2 0.2565 0.0818

ANOVA TABLE:
Source of variation d.f Sum of squares Mean sum of squares F -value
Regression 2 46.704 23.352 12.7245
Error 7 12.846 1.8352
Cubic model:
Parameter estimates:
Variable estimate s.e(estimate)
intercept 3.4066
(x  x) 0.9031 0.4204

(x  x) 2 0.2437 0.0991

(x  x)3 -0.0095 0.0339

ANOVA TABLE:
Source of variation d.f Sum of squares Mean sum of squares F -value
Regression 3 46.87 15.6233 7.3925
Error 6 12.68 2.1134

(ii) Comparing linear and quadratic


Using regression coefficient H0: 2=0 vs. H1:2≠0

T 0.2565 3.136
0.0818
t-critical value = t(7,0.025)= 2.365, hence the predictor is significant. The quadratic model is a
better fit.
Alternatively; comparing the two models: linear is reduced and quadratic is full model
(30.882  12.846)
F 9.8278
1.8352
F –critical value= F(1,7,0.05)=5.59 hence we reject H0 full model is a better fit.
Comparing quadratic and cubic
Using regression coefficient H0: 3=0 vs. H1:3≠0

T 0.0095 0.28
0.0339
t-critical value = t(6,0.025)= 2.447, hence the predictor is not significant. The quadratic model is
a better fit.
Alternatively; comparing the two models: quadratic is reduced and cubic is full model
(12.846  12.68)
F 0.079
2.1134
F –critical value= F(1,6,0.05)=5.99 hence we fail to reject H0 reduced model is a better fit.

Hence we conclude that the quadratic model is the one that best describes the relationship
between y and x.

Você também pode gostar