Você está na página 1de 11

# Just following Chapter 11

11.1 Show that in simple regression analysis, the hat value is Hint: for

## Attempt #1: Clean algebra

Cut n Paste Chapter 9 hw =simple regression:

## Two things to this calculation to eventually get to

a. Leave as above and pulling out the fraction (it s a constant) and rewriting it due to the following equivalence:

b. The subscripts i and j were confusing: Weisberg uses i and j to suggest we could calculate all hat values (including nondiagonals) but the simpler approach is to use i for the particular hat value we are calculating and j as the index that is summed over (for )



## Attempt #2: Messy algebra

Simplify according to (Weisberg, pg. 139) (=*n/n)

## (Weisberg, pg 9) Where SXX is the corrected sum of squares for the xi s:

Weisberg 113: the matrix H is n x n, and even for simple regression, the number of entries in H, namely n2, may be quite large, and it is rarely computed in full. However, using , a formula may be obtained for an individual hij. We find

 

Break it up:

Break it up:

Nuts and Bolts: The given equation is for an individual element of the hat matrix so it is pretty easy to refute / confirm whether it is true or not. Using some pretend data: #So, it s a simple regression: col1=c(1,1,1,1,1) col2=c(14,14,9,5,7) X=cbind(col1,col2) y=c(23,25,27,22,12) model=lm(y~col2) #So the hat matrix is: hat=X%*%(solve(t(X)%*%X))%*%t(X) #The hat values are the diagonals of that matrix: hatvalues(model) #The sum of the hat values is 2= k+1 sum(hatvalues(model)) #Finally, each hat value can be found as the sum of two fractions

#Fraction #1: 1/n 1/5 #Fraction #2 SSX=sum((col2-mean(col2))^2) (col2-mean(col2))^2 / SSX #So add the 2 fractions and get the hat values (diagonals of the hat matrix): (col2-mean(col2))^2 / SSX + .2 #Now let s try the one at a time formula for hat values: X xrow1=t(as.matrix(c(1,14))) xrow2=t(as.matrix(c(1,14))) xrow3=t(as.matrix(c(1,9))) xrow4=t(as.matrix(c(1,5))) xrow5=t(as.matrix(c(1,7))) t(X) XpXinv=solve(t(X)%*%X) xrow1%*%XpXinv%*%t(xrow1) #matches hat(1,1) xrow1%*%XpXinv%*%t(xrow4) #matches hat(1,4) Comment on 11.1: In lecture we are given a similar matrix that is more general, i.e., it is for more than just the dimensions corresponding to simple regression.

vs.

## is symmetric (H=H ) and idempotent (H2=H)

Symmetric (H=H ):

Idempotent (H2=H):

Other important aspects of the hat matrix (page 135, Myers, Raymond) 1. tr(H)=p The trace of the hat matrix equals the number of model parameters. This implies that:

a result which indicates that apart from , the prediction variance, summed over the locations of the data points, equals the number of model parameters. The implication of this result may or may not be apparent to the reader at this point. However, it would seem to lend some credibility to the choice of simple models in the exercise of model building 2. For a model containing a constant term each individual hat value, given by a particular row of the hat matrix equation, which implies that:

it suggests that the precision in a prediction at the location of a data point is no worse than the error variance in an observation, i.e., . In addition, the precision in prediction can be no better than the precision of the average response if all of the observations were taken at the same location. 11.3 Show that in a one way ANOVA with equal numbers of observations in several groups; all the hat values are equal to each other. By extension this result implies that the hat values in any balanced ANOVA are equal. Why? Let s look at the given information:

But the design matrix will vary depending on how we parameterize the model Attempt #1: dummy coding (3 groups with 3 observations per group)

 

## column1=c(1,1,1,1,1,1,1,1,1) column2=c(1,1,1,0,0,0,0,0,0) column3=c(0,0,0,1,1,1,0,0,0) X=cbind(column1,column2,column3) xtxinv=solve(t(X)%*%X)

hat=X%*%xtxinv%*%t(X)

The above matrix shows that the values for the hat matrix are indeed identical in each category. * Which makes sense 1. Hat = f(X s) 2. All X s (X=dummies) are identical in each group!

11.4 = Conductors Using Duncan s regression of occupational prestige on the educational and income levels of occupations, verify that the influence vector for the deletion of ministers on the regression coefficients di = b b(-1), can be written as:

Where xi is the ith row of the model matrix X (i.e., the row for conductors) written as a column. (A much more difficult problem is to show that this formula works in general; see, e.g., Belsey, et. Al (1980, pp 69-83) or Velleman and Welsch (1981)). #Obtain Full Model library(car)

## col1=as.matrix(Duncan\$education) col2=as.matrix(Duncan\$income) X=cbind(1,col1,col2) tXXinv=solve(t(X)%*%X) Y=as.matrix(Duncan\$prestige) tXY=t(X)%*%Y tXXinv%*%tXY betas=tXXinv%*%tXY

#now obtain estimates sans conductor outs= which(rownames(Duncan) %in% c("conductor")) model.2=lm(prestige~education+income,data=Duncan,subset=-outs)

Duncan[outs,]

## #Difference between betas with and without conductor (confirmation)

11.4 = Ministers Using Duncan s regression of occupational prestige on the educational and income levels of occupations, verify that the influence vector for the deletion of ministers on the regression coefficients di = b b(-1), can be written as:

Where xi is the ith row of the model matrix X (i.e., the row for ministers) written as a column. (A much more difficult problem is to show that this formula works in general; see, e.g., Belsey, et. Al (1980, pp 69-83) or Velleman and Welsch (1981)). #Obtain Full Model library(car) model=lm(prestige~education + income, data=Duncan) col1=as.matrix(Duncan\$education) col2=as.matrix(Duncan\$income) X=cbind(1,col1,col2) tXXinv=solve(t(X)%*%X) Y=as.matrix(Duncan\$prestige) tXY=t(X)%*%Y tXXinv%*%tXY betas=tXXinv%*%tXY #now obtain estimates sans minister col1=as.matrix(Duncan\$education[-6]) col2=as.matrix(Duncan\$income[-6]) Y=Duncan\$prestige[-6] Xatheist=cbind(1,col1,col2) atheistBetas=solve(t(Xatheist)%*%Xatheist)%*%t(Xatheist)%*%Y AtheistModel

#Get Minister separated out #head(Duncan) gives minister row #minister type=prof income=21 education=84 prestige=87 MinisterColumn=as.matrix(c(1,84,21)) #Minister residual value residuals(model)

## MinisterHat=.17305816 #difference between betas using Formula

tXXinv%*%MinisterColumn*(MinisterResidual/(1-MinisterHat))

## #Difference between betas with and without minister betas-atheistBetas

****Note there may have been a better/easier way to separate out minister outs= which(rownames(Duncan) %in% c("minister")) model.2=lm(prestige~education + income, data=Duncan, subset=-outs) #to do table see lec8 (Pg 459-460, Meyers) The result in this section serves as the basis for modern single data point diagnostics discussed in chapter 6 as well as the PRESS statistic presented in chapter 4. Essentially it offers an ease in computation of important regression statistics for the case in which the ith data point is removed or set aside. Here we give the fundamental result in a very general form. In appendix b, the result is used to explain the development of certain important diagnostic tools. Consider a square nonsingular matrix A, which is p x p, and a p-dimensional column vector z. In our application, A is the X X matrix. The vector z is the ith row of the X matrix. Thus (A-zz ) becomes the X X matrix with the ith data point not involved. The theorem states:

We can prove this result by merely demonstrating that multiplication of the right hand side by gives the identity matrix: