Você está na página 1de 15

`

STAT 786 (Regression Analysis) 09/27/2011 Assignment-2


Chapter 3 (Problem #3) (c) The plot of the residuals (ei) against the fitted values of the Grade Point Average data is given below in (figure 1).

Figure. 1

Residual vs. fitted plot of the Grade Point Average data

From the plot given above, we see that the residuals are not following a systematic pattern being positive or negative. It means that errors do have a constant variance. Also, a linear regression function fits pretty well.

(d) A normal probability plot of the residuals has been prepared, which is given below in figure 2. The SAS codes for the plot are given in appendix.

Figure 2 Normal probability plot of the residuals The coefficient of correlation between the ordered residuals and their expected values under normality is calculated in SAS. From SAS output, the coefficient of correlation is 0.97373. From Table B.6 (page 673 of the textbook), we see that for alpha=0.05, the critical value corresponding to n=100 is 0.987. We also see that the critical value increases with an increase in n. When n=120, the coefficient of correlation =0.97373 (from table 1 below) is less than the critical value. Hence, there is strong evidence that the error terms have substantial departure from normality.

Pearson Correlation Coefficients, N = 120 Prob > |r| under H0: Rho=0 resid expected resid 1.00000 0.97373 Residual <.0001 expected 0.97373 1.00000 <.0001

Table 1

(e) In SAS, Brown-Forsythe test is carried out in order to determine whether or not the error variance varies with the level of X. The data is divided into two parts, X<26 and X26 as suggested by our problem and the value of is chosen to be 0.01. The SAS codes are given in appendix.

Null Hypothesis (Ho):

Error variance is constant

Alternative Hypothesis (H1): Error variance is not constant Decision Rule: Reject Ho if |t*|t(1-/2,n-2)=t(0.995,118)=2.617 (Using table values of t)

Calculations: From excel work, we have

s^2=0.1741192 so, s=0.417276 n1=55, n2=65 so that n1+n2=120. d1bar=0.437961, d2bar=0.506515

Numerator = Denominator =

0.437961-0.506515=-0.068553 0.417276*0.033564

So, t* =Numerator/Denominator =-0.068553/0.0764476 =-0.8967


Thus, from above calculations,| t*|=0.8967
Conclusion: Since |t*|=0.8967 < 2.617, we fail to reject Ho. It means we conclude that the error variance is constant.

Yes, this conclusion strongly supports my preliminary findings in part (c). In either case, we have the same conclusion that the error variance is constant.

(f) Two different plots, one for residuals against X2 and the other for residuals against X3 are plotted using the SAS. The corresponding codes are given in appendix and the plots so obtained are given below in figure (3) and (4).

Figure 3 Plot of Residuals against X2

Figure 4 Plot of Residuals against X3

In figure (3), we see that residuals have a symmetric positive trend with X2. However, in figure (4), we see that there is no trend of the residuals with X3. From this, we can ascertain that our model can be improved by including the variable X2. On the other hand, since there is no trend of residuals with X3, we do not include X3 in the model. (g) We have,

X h ( new) Y( hnew) bo b1 3.4 2.11405 0.03883 33.1174 MSE 0.38828 Var( X ) (4.472) 2 19.99

( Xi X )
i 1

(n 1)Var( X ) 119 * 19.99 2379.85

X h ( new)

S X h ( new)

0.38828 1 (33.1174 24.75) 2 [1 ] 267.70 120 2379.85 (0.03883) 2 16.361

Now, the required confidenceinterval is, 33.1174 1.98*16.361 (0.7642,65 at 95% level of confidence .47
(h)The 95% Bonferroni joint confidence interval for 0 and 1 is obtained from the SAS output given below (Table 2).
Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| 97.5% Confidence Limits

Intercept 1 2.11405 0.32089 6.59<.0001 score 1 0.03883 0.01277 3.04 0.0029 Table 2
The 95% Bonferroni joint confidence interval for 0 The 95% Bonferroni joint confidence interval for 1:
:

1.38550 0.00983

2.84260 0.06783

(1.38550,2.84260) (0.00983,0.06783).

Problem #2 (a) A simple linear regression model is fitted to the given data set. The least square estimates b0 and b1(from Table 3 below ) are found to be -0.00190 and 0.01396 respectively. Hence the regression model is = -0.00190+0.01396 x.

Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -0.00190 0.00451 -0.42 0.6758 x 1 0.01396 0.00184 7.58<.0001 Table 3

To perform a residual analysis, a plot of the residuals against the fitted values (Figure 5) and a normal probability plot (Figure 6) are prepared and are given below.

Figure 5(Residuals against the fitted plot)

Figure 6(Normal probability plot)

From the above plot of residuals against the fitted values, we see thatAgain, from the normal probability plot, it is seen that Thus, from both the plots (Figure (5) and Figure (6)), we come to the same conclusion that..

(b)
The TRANSREG Procedure Box-Cox Transformation Information for y Lambda R-Square Log Like

-2.0 -0.8 -0.7 -0.6 -0.5+ -0.4 1.9 2.0

0.74 0.89 0.90 0.90 0.90 0.89 0.29 0.27

220.0697 267.1507 268.9500* 269.9332* 270.0027< 269.1157* 135.3258 127.6604

< - Best Lambda * - 95% Confidence Interval + - Convenient Lambda

Table 4 Here, a Box-Cox transformation is carried out to figure out convenient power transformation. The table given above (Table 4) suggests that the best as well as the convenient value of lambda is -0.5. It means that the suitable power transformation is given by

y*

1 y

(1)

(c)Using the Box-Cox transformation, an estimated linear regression function for the transformed data is obtained. The estimates of intercept and the slope parameter are 11.93274 and -1.93867 respectively (From table 5 below). Hence the estimated linear regression function for the transformed data is given by =11.93274-1.93867X (2)

Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t|

Intercept 1 11.93274 0.23040 51.79<.0001 x 1 -1.93867 0.09406 -20.61<.0001


Table 5

The estimated regression line and the transformed data are plotted which are given in the figure 7 given below.

Figure 7 Estimated regression line and the transformed data Once the data is transformed, the regression line appears to be a good fit to the transformed data.

(e) To perform residual analysis, residual plot and the normal probability plot are prepared which are given below in figures (9) and (10).

Figure 8 Plot of residuals against fitted values for transformed data

Figure 9 Normal probability plot of residuals for transformed data

From these plots, we conclude that ..

(f) Here, Our suitable power transformation is

y*
The fitted regression equation is y*=b0+b1 x From (3) and (4), we have,

1 y

(3) (4)

y*

1 y y

b0 b1 x 1 b0 b1 x

y y

1 (b0 b1 x) 2 1 (11.933 1.9838) 2

which is our estimated regression function.

Appendix A: Problem 3.3 data GPA; input GPA score; cards; 3.897 21 3.885 14 1.860 16 2.948 28 ; run; proc sort data = GPA out=GPA1; by score; run; SAS codes for (c), (d) and (g) proc reg; model GPA=score; plot r.*p.; plot r.*npp.; output out=regout r=resid p=pred; proc reg data=GPA alpha=0.025; model GPA=score/clb; run;

SAS codes for (e) data GPA2; set GPA1; if score < 26 then group = 1; else group = 2; run; proc reg data = GPA2 alpha=0.01; model GPA = score; output out=Res_data r=residuals ; run; proc print data = Res_data; run; proc anova data = Res_data; class group; model residuals = group; means group / hovtest = BF; run;

SAS codes for (f) data GPA11; input GPA score cards; 3.897 21 3.885 14 3.778 28 .. 1.860 16 2.948 28 ; run; proc print data= GPA11; run; proc reg data=GPA11; model GPA=score X2 X3; output out=Fit_Res_data p=predicted r=residuals; plot r.*X2; plot r.*X3; run; x2 122 132 119 111 110 x3; 99 71 95 65 85

Appendix B (SAS Codes for problem 2) data assig2; input x y; datalines; 0 0 . 4 4 4 ; run; proc run; 0.0088 0.0069 0.0475 0.1375 0.1041 print data= assig2;

proc reg data=assig2; model y=x; *plot GPA*test_score; output out= Fit_Res_data p=predicted r=residuals; plot r.*p.; plot r.*npp.; run; proc print data = Fit_Res_data; run;

************************; proc transreg data=assig2; *model boxcox(y/ lambda = -1 -0.9 -0.7 0.5 0.7 ; model boxcox(y/ lambda = -2 to 2 by 0.1 convenient) =identity(x); run; */ /*********C and D */; data two; input x y; yt=1/sqrt(y); datalines;

-0.5

-0.3 -0.1 0

0.1 0.3

0 0 0 0 . 4 4 4 ; run; proc run;

0.0088 0.0069 0.0084 0.0061 0.0475 0.1375 0.1041 print data=two;

proc reg data=two; model yt=x; plot yt*x; output out= Fit_Res_data p=predicted r=residuals; plot r.*p.; plot r.*npp.

Você também pode gostar