Você está na página 1de 352

MODULE - I

LECTURE - 1

SOME RESULTS ON LINEAR


ALGEBRA, MATRIX THEORY
AND DISTRIBUTIONS
Dr. Shalabh
D
Department
t
t off M
Mathematics
th
ti and
d Statistics
St ti ti
Indian Institute of Technology Kanpur

A-PDF Merger DEMO : Purchase from www.A-PDF.com to remove the watermark

Analysis
y
of Variance and
Design of Experiments
Experiments--I

2
We need some basic knowledge to understand the topics in analysis of variance.

Vectors
A vector Y is an ordered n-tuple of real numbers. A vector can be expressed as row vector or a column vector as

y1

y
Y = 2
#

yn

is a column vector of order n 1 .

and

Y ' = ( y1 , y2 ,..., yn ) is a row vector of order 1 n.


If all yi = 0 for all i = 1,2,,n then Y ' = (0, 0,..., 0) is called the null vector.
If

x1
y1
z1



x2
y2
z

X=
, Y=
, Z = 2
#
#
#



xn
yn
zn

then
x1 + y2
ky1

x2 + y2
k 2
ky

X +Y =
, kY =
#

xn + yn
kyn

3
X + (Y + Z ) = ( X + Y ) + Z
X '(Y + Z ) = X ' Y + X ' Z
k ( X ' Y ) = ( kX ) ' Y = X '( kY )
k ( X + Y ) = kX + kY
X ' Y = x1 y1 + x2 y2 + ... + xn yn
where k is a scalar.

Orthogonal
g
vectors
Two vectors X and Y are said to be orthogonal if X ' Y = Y ' X = 0.
The null vector is orthogonal to every vector X and is the only such vector.

Linear combination
if x1 , x2 ,..., xm are m vectors of same order and k1 , k2 ,..., km are scalars, Then
m

t = ki xi
i =1

is called the linear combination of x1 , x2 ,..., xm .

4
Linear independence

If X 1 , X 2 ,..., X m are m vectors then they are said to be linearly independent if there exist scalars k1 , k 2 ,..., k m
such that
m

k X
i =1

= 0 ki = 0 for all i = 1,2,,m.


m

If there exist k1 , k 2 ,..., k m with at least one k i to be nonzero, such that

kx
i =1

= 0 then x1 , x2 ,..., xm are said to

be linearly dependent
dependent.

Any set of vectors containing the null vector is linearly dependent.


Any set of non-null
non null pair
pair-wise
wise orthogonal vectors is linearly independent
independent.
If m > 1 vectors are linearly dependent, it is always possible to express at least one of them as a linear
combination of the others.

5
Linear function
Let K = (k1 , k 2 ,..., k m ) ' be a m 1 vector of scalars and X = ( x1 , x2 ,..., xm ) be a m 1 vector of variables, then

K 'Y =

i =1

k i y i is called a linear function or linear form. The vector K is called the coefficient vector.

For example,
example mean of x1 , x2 ,..., xm can be expressed as

x1

x
1 m
1
1
x = xi = (1,1,...,1) 2 = 1'm X
# m
m i =1
m

xm
'
where 1m is a m 1 vector of all elements unity.

Contrast
The linear function K ' X =

k x
i =1

i i

is called a contrast in x1 , x2 ,..., xm if

k
i =1

= 0.

For example, the linear functions

x1 x2 , 2 x1 3 x2 + x3 ,
are contrasts.

x
x1
x2 + 3
2
3
m

A linear function K ' X is a contrast if and only if it is orthogonal to a linear function


Contrasts x1 x2 , x1 x3 ,..., x1 x j are linearly independent for all j = 2, 3,, m.

xi or to the linear function x =


i =1
i=

Every contrast in x1 , x2 ,..., xn in can be written as a linear combination of (m - 1) contrasts x1 x2 , x1 x3 ,..., x1 xm .

1 m
xi .
m i =1

Matrix
A matrix is a rectangular array of real numbers. For example,

a11

a21
#

am1

a12 ... a1n

a22 ... a2 n
#
#

am 2 ... amn

is a matrix of order m n with m rows and n columns.


If m = n, then A is called a square matrix.
If aij = 0, i j , m = n, then A is a diagonal matrix and is denoted as A = diag ( a11 , a22 ,..., ann ).
If m = n, (square matrix) and aij = 0 for i > j , then A is called an upper triangular matrix. On the other hand if
m = n, and aij = 0 for i < j then A is called a lower triangular matrix.
If A is a m n matrix, then the matrix obtained by writing the rows of A and columns of A as columns of A and
rows of A respectively, is called the transpose of a matrix A and is denoted as A ' .
If A = A ' then A is a symmetric matrix.
matrix
If A = A ' then A is skew symmetric matrix.
A matrix whose all elements are equal to zero is called as null matrix.
An identity matrix is a square matrix of order p whose diagonal elements are unity (ones) and all the off diagonal
elements are zero. It is denotes as I p.

7
If A and B are matrices of order m n then

( A + B ) ' = A '+ B '.'


If A and B are the matrices of order m x n and n x p respectively and k is any scalar, then

( AB ) ' = B ' A '


( kA) B = A( kB ) = k ( AB ) = kAB.
If the orders of matrices A is m x n, B is n x p and C is n x p then

A( B + C ) = AB + AC.
If the orders of matrices A is m x n, B is n x p and C is p x q then

( AB )C = A( BC ).
If A is the matrix of order m x n then

I m A = AI n = A.

Trace of a matrix
The trace of n x n matrix A, denoted as tr(A) or trace(A) is defined to be the sum of all the diagonal elements of A,
i.e., tr ( A) =

a .
i =1

ii

If A is of order m n and B is of order n m , then

tr ( AB ) = tr ( BA).

If A is n x n matrix and P is any nonsingular n x n matrix then

tr ( A) = tr ( P 1 AP ).
)
If P is an orthogonal matrix than tr ( A) = tr ( P ' AP ).
If A and B are n x n matrices, a and b are scalars then

tr ( aA + bB ) = a tr ( A) + b tr ( B )

If A is a m x n matrix, then
n

tr ( A ' A) = tr ( AA ') = aij2


j =1 i =1

and

tr ( A ' A) = tr ( AA ') = 0 if and only if A = 0.


If A is n x n matrix then
.

tr ( A ') = trA

Rank
a o
of a matrix
at
The rank of a matrix A of m n is the number of linearly independent rows in A.
Let B be another matrix of order n q.

A square matrix of order m is called non-singular if it has a full rank.


rank ( AB) min ( rank ( A), rank ( B)).
rank ( A + B ) rank ( A) + rank ( B )).
Rank of A is equal to the maximum order of all nonsingular square sub-matrices of A.
rank ( AA ') = rank ( A ' A) = rank ( A) = rank ( A ').
A is of full row rank if rank(A) = m < n.
A is of full column rank if rank(A) = n < m.

10

Inverse of matrix
The inverse of a square matrix A of order m, is a square matrix of order m, denoted as

A1, such that A1 A = AA1 = I m .

The inverse of A exists if and only if A is non singular.


1 1
( A ) = A.

If A is non singular, then ( A ') 1 = ( A1 ) '.


If A and B are non-singular matrices of same order, then their product, if defined, is also nonsingular and

( AB) 1 = B 1 A1.

Idempotent matrix
A square matrix A is called idempotent if A2 = AA = A.
If A is an n n idempotent matrix with rank ( A) = r n. Then
the eigenvalues of A are 1 or 0.
trace ( A ) = rank ( A ) = r .
If A is of full rank n, then A = I n .
If A and B are idempotent and AB = BA, then AB is also idempotent.
If A is idempotent then (I A) is also idempotent and A(I - A) = (I - A) A = 0.

Analysis
y
of Variance and
Design of Experiments
Experiments--I
MODULE - I
LECTURE - 2

SOME RESULTS ON LINEAR


ALGEBRA, MATRIX THEORY
AND DISTRIBUTIONS
Dr. Shalabh
D
Department
t
t off M
Mathematics
th
ti and
d Statistics
St ti ti
Indian Institute of Technology Kanpur

Quadratic forms
If A is a given matrix of order m n and X and Y are two given vectors of order m1 and n 1 respectively, then
the quadratic form is given by
m

X ' AY =
i =1

a
j =1

ijj

xi y j

where aij ' s are the nonstochastic elements of A.


If A is square matrix of order m and X = Y , then

X ' AX = a11 x12 + ... + amm xm2 + (a12 + a21 ) x1 x2 + ... + (am1,1 m + am,m1 ) xm1 xm .
If A is symmetric also, then

X ' AX = a11 x12 + ... + amm xm2 + 2a12 x1 x2 + ... + 2am1,m xm1 xm
m

=
i =1

a x x
j =1

ij i

is called a quadratic form in m variables x1, x2, , xm or a quadratic form in X.


To every quadratic form corresponds a symmetric matrix and vice versa.
The matrix A is called the matrix of quadratic form.

The quadratic form X ' AX and the matrix A of the form is called
Positive definite if X ' AX > 0 for all x 0 .
Positive semi definite if X ' AX 0 for all x 0 .
Negative definite if X ' AX < 0 for all x 0 .
Negative semi definite if X ' AX 0 for all x 0 .

3
If A is positive semi definite matrix then aii 0 and if aii = 0 then aij = 0 for all j, and a ji = 0 for all j.
If P is any nonsingular matrix and A is any positive definite matrix (or positive semi-definite matrix) then P ' AP is
also a positive definite matrix (or positive semi-definite matrix).
A matrix A is positive definite if and only if there exists a non-singular matrix P such that A = P ' P.
A positive definite matrix is a nonsingular matrix.
If A is m n matrix and rank ( A ) = m < n then AA ' is positive definite and A ' A is positive semidefinite.
If A m n matrix and rank ( A) = k < m < n, then both A ' A and AA ' are positive semidefinite.
semidefinite

Simultaneous linear equations


The set of m linear equations in n unknowns x1 , x2 ,..., xn and scalars aij and bi , i = 1, 2,..., m, j = 1, 2,..., n of the form

a11 x1 + a12 x2 + ... + a1n xn = b1


a21 x1 + a22 x2 + ... + a2 n xn = b2
#
am1 x1 + am 2 x2 + ... + amn xn = bm
can be formulated as

AX = b
where A is a real matrix of known scalars of order m n called as coefficient matrix, X is real vector and b is n 1 real
vector of known scalars given by

a11

a
A = 21
#

am1

a12 ... a1n

a22 ... a2 n
, is an m n real matrix called as coefficient matrix,
# %#

am 2 ... amn

x1
b1


x2
b

X=
, is an n 1 vector of variables and b = 2 is an m 1 real vector.
#
#


xn
bm

5
If A is n n nonsingular matrix, then AX = b has a unique solution.
L t B = [A,
[A b] is
i an augmented
t d matrix.
t i A solution
l ti tto AX = b exist
i t if and
d only
l if rank(A)
k(A) = rank(B
k(B).
)
Let
If A is an m n matrix of rank m, then AX = b has a solution.
Linear homogeneous system AX = 0 has a solution other than X = 0 if and only if rank (A) < n.
If AX = b is consistent then AX = b has a unique solution if and only if rank (A) = n
If

aii is the ith diagonal element of an orthogonal matrix, then

Let the n n matrix be partitioned as A = [a1 , a2 ,..., an ] where

1 aii 1.

ai is an n1 vector of the elements of ith

column of A.

A necessary and sufficient condition that A is an orthogonal matrix is given by the following:
(i ) ai' ai = 1 for i = 1, 2,..., n
(ii ) ai' a j = 0 for i j = 1, 2,..., n.

Orthogonal matrix
A square matrix A is called an orthogonal matrix if A ' A = AA ' = I or equivalently if A1 = A '.
An orthogonal matrix is non
non-singular.
singular
If A is orthogonal, then AA ' is also orthogonal.
If A is an n n matrix and let P is an n n orthogonal matrix, then the determinants of A and P ' AP are the same.

Random vectors
Let Y1 , Y2 ,..., Yn be n random variables then Y = (Y1 , Y2 ,..., Yn ) ' is called a random vector.
The mean vector of Y is

E (Y ) = (( E (Y1 ), E (Y2 ),..., E (Yn )) '.

The covariance matrix or dispersion matrix of Y is


Cov(Y1 , Y2 ) ... Cov(Y1 , Yn )
Var (Y1 )

Cov(Y2 , Y1 ) Var (Y2 ) ... Cov(Y2 , Yn )


Var (Y ) =

#
#
%
#

Cov(Yn , Y1 ) Cov(Yn , Y2 ) ... Var (Yn )

which is a symmetric matrix.

If Y1 , Y2 ,..., Yn are pair-wise uncorrelated, then the covariance matrix is a diagonal matrix.
If Var (Yi ) = 2 for all i = 1, 2,, n then Var (Y ) = 2 I n .

Linear function of random variable


n

If Y1 , Y2 ,..., Yn are n random variables and k1 , k2 ,.., kn are scalars , then

kY
i =1

i i

is called a linear function of random

variables Y1 , Y2 ,..., Yn .

If Y = (Y1 , Y2 ,..., Yn ) ', K = (k1 , k2 ,..., kn ) ' then K ' Y = kiYi ,


i =1

the mean K ' Y is E ( K ' Y ) = K ' E (Y ) = ki E (Yi ) and


i =1

the variance of K ' Y is Var ( K ' Y ) = K 'Var (Y ) K .

M lti i t normall distribution


Multivariate
di t ib ti
A random vector Y = (Y1 , Y2 ,..., Yn ) ' has a multivariate normal distribution with mean vector = ( 1 , 2 ,..., n ) and dispersion
matrix if its probability density function is

f (Y | , ) =

1
(2 )

n /2

n /2

exp (Y ) ' 1 (Y )
2

assuming is a nonsingular matrix.

Chi-square distribution
If Y1 , Y2 ,,...,, Yk are identicallyy and independently
p
y distributed random variables following
g the normal distribution with
k

common mean 0 and common variance 1, then the distribution of

Y
i =1

is called the - distribution with k


2

degrees of freedom.

2
The probability density function of - distribution with k degrees of freedom is given as

k
1
1
x
2
f 2 ( x) =
x
exp ; 0 < x < .
k /2
(k / 2)2
2

If Y1 , Y2 ,..., Yk are independently distributed following the normal distribution with common means 0 and common
k

2
variance , then

Y
i =1

2
g
of freedom.
has - distribution with k degrees

If the random variables Y1 , Y2 ,..., Yk are normally distributed with non-null means 1 , 2 ,..., k but common variance
k

1 th
1,
then th
the di
distribution
t ib ti off

i =1

parameter =
i =1

2
h
has
non-central
t l - distribution
di t ib ti with
ith k degrees
d
off freedom
f d
and
d non-centrality
t lit

2
i

If Y1 , Y2 ,..., Yk are independently distributed following the normal distribution with means 1 , 2 ,..., k but common
variance then
2

Yi 2 has non-central 2 -distribution with k degrees of freedom and noncentrality parameter = 12


i =1

i =1

2
i

9
If U has a Chi-square distribution with k degrees of freedom then E (U ) = k and Var (U ) = 2k .

If U has a noncentral Chi-square distribution with k degrees of freedom and noncentrality parameter then

E (U ) = k + and Var (U ) = 2k + 4 .
If U1 , U 2 ,..., U k are independently distributed random variables with each U i having a noncentral Chi-square
distribution with

ni degrees of freedom and non centrality parameter i , i = 1, 2,..., k


k

Chi
Chi-square
di t ib ti with
distribution
ith

d
off freedom
f d
and
d noncentrality
t lit parameter
t
ni degrees
i =1

Let X = ( X 1 , X 2 ,..., X n ) ' has a multivariate distribution with mean vector


Then X ' AX is distributed as noncentral

then
k

.
i =1

i =1

has noncentral

and positive definite covariance matrix

with k degrees of freedom if and only if

is an idempotent matrix

of rank k.

Let X = ( X 1 , X 2 ,..., X n ) has a multivariate normal distribution with mean vector and positive definite covariance
matrix

Let the two quadratic forms-

X ' A1 X is
i di
distributed
t ib t d as
X ' A2 X is distributed as

2 with
ith n1

2 with n2

d
degrees off freedom
f d
and
d noncentrality
t lit parameter
t ' A1 and
d

degrees of freedom and noncentrality parameter ' A2 .

Then X ' A1 X and X ' A2 X are independently distributed if A1 A2 = 0.

10

t- distribution
If

X has a normal distribution with mean 0 and variance 1,


Y has a 2 distribution with n degrees of freedom, and
X and Y are independent random variables,
then the distribution of the statistic T =

X
is called the t-distribution
t distribution with n degrees of freedom
freedom.
Y /n

The probability density function of T is

n +1
n +1

t 2 2
2
fT (t ) =
;
1 +
n
n

n
2

- < t < .

If the mean of X is non zero then the distribution of


of freedom and noncentrality parameter .

X
Y /n

is called the noncentral t - distribution with n degrees

11

F- distribution

2
If X and Y are independent random variables with - distribution with m and n degrees of freedom respectively,

then the distribution of the statistic F =

X /m
is called the F-distribution with m and n degrees of freedom. The
Y /n

probability density function of F is

m + n m


2 n
fF ( f ) =
m n

2 2

m /2

m2

m
1 + n

m+n

; 0 < f < .

2
If X has a noncentral Chi-square distribution with m degrees of freedom and noncentrality parameter ; Y has a

distribution with n degrees of freedom, and X and Y are independent random variables, then the distribution of

F=

X /m
is the noncentral F distribution with m and n degrees of freedom and noncentrality parameter .
Y /n

Analysis
y
of Variance and
Design of Experiments
Experiments--I
MODULE - I
LECTURE - 3

SOME RESULTS ON LINEAR


ALGEBRA, MATRIX THEORY
AND DISTRIBUTIONS
Dr. Shalabh
D
Department
t
t off M
Mathematics
th
ti and
d Statistics
St ti ti
Indian Institute of Technology Kanpur

Linear model
Suppose there are n observations. In the linear model, we assume that these observations are the values taken by n
random variables Y1 , Y2 ,.., Yn satisfying the following conditions:

E (Yi ) is a linear combination of p unknown parameters 1 , 2 ,..., p

with

E (Yi ) = xi11 + xi 2 2 + ... + xip p , i = 1, 2,..., n

where xij ' s are known constants.


Y1 , Y2 ,..., Yn are uncorrelated and normality distributed with variance Var (Yi ) = 2 .
The linear model can be rewritten by introducing independent normal random variables following N (0,
(0 2 ) , as
.

Yi = xi11 + xi 2 2 + .... + xip p + i , i = 1, 2,..., n.

These equations can be written using the matrix notations as

Y = X +
where Y is a n x1 vector of observation, X is a n p matrix of n observations ( xij 's) on each of X 1 , X 2 ,..., X p variables,

is a p 1 vector of parameters and

is a n1 vector of random error components with ~ N (0, 2 I n ). Here Y

is called study or dependent variable, X 1 , X 2 ,..., X p are called explanatory or independent variables and

1 , 2 ,..., p are called as regression coefficients.

3
2
Alternatively since Y ~ N ( X , I ) so the linear model can also be expressed in the expectation form as a normal

random variable Y with

E (Y ) = X
Var (Y ) = 2 I .
2
Note that and are unknown but X is known.

Estimable function
' off the
A linear
li
parametric
t i ffunction
ti
th parameter
t iis said
id to
t be
b an estimable
ti bl parametric
t i ffunction
ti or estimable
ti bl if th
there
exists a linear function of random variables A 'Y of Y where Y = (Y1 , Y2 ,..., Yn ) ' such that

E (A ' Y ) = '
with A = (A1 , A 2 ,..., A n ) ' and = (1 , 2 ,..., n ) ' being the vectors of known scalars.

Best Linear Unbiased Estimates (BLUE)


The unbiased minimum variance linear estimate A 'Y of an estimable function ' is called the best linear unbiased
'
estimate of .

Suppose
S
d A '2Y are the
th BLUE off 1' and
ti l
A 1' Y and
d 2' respectively.
Then (a1A1 + a2 A 2 ) ' Y is the BLUE of ( a11 + a2 2 ) ' .
If ' is estimable, its best estimate is ' where is any solution of the equations X ' X = X ' Y .

Least squares estimation


The least squares estimate of in Y = X + is the value of which minimizes the error sum of squares ' .
Let

S = ' = (Y X ) '(Y X )
= Y 'Y 2 ' X 'Y + ' X ' X .

Minimizing S with respect to involves

S
=0

X ' X = X 'Y
which is termed as normal equation.

This normal equation has a unique solution given by

= ( X ' X ) 1 X ' Y
assuming rank ( X )

2S
= X 'X
= p. Note that
'

1
is a positive definite matrix. So = ( X ' X ) X ' Y is the value of

which minimizes ' and is termed as ordinary


y least squares
q
estimator of .
In this case, 1 , 2 ,..., p are estimable and consequently all the linear parametric function are estimable.
E ( ) = ( X ' X ) 1 X ' E (Y ) = ( X ' X ) 1 X ' X =
Var
V ( ) = ( X ' X ) 1 X 'Var
V (Y ) X ( X ' X ) 1 = 2 ( X ' X ) 1
If ' and ' are the estimates of ' and '
Var ( ' ) = 'Var ( ) = 2 [ '( X ' X ) 1 ]
Cov ( ' , ' ) = 2 [ '( X ' X ) 1 ].
Y X is called the residual vector and

E (Y X ) = 0.
0

respectively, then

Linear model with correlated observations


In the linear model

Y = X +
with E ( ) = 0, Var ( ) = and is normally distributed, we find

E (Y ) = X , Var (Y ) = .
Assuming

to be positive definite, we can write

= P'P
where P is a nonsingular matrix. Premultiplying Y = X + by P, we get

PY = PX + P
or

Y* = X * + *

where

Y * = PY , X * = PX and * = P .

Note that and 2 are unknown but X is known.

Distribution of

A 'Y

2
In the linear model Y = X + , ~ N (0, I ) consider a linear function A 'Y which is normally distributed with

E (A ' Y ) = A ' X ,
V (A ' Y ) = 2 (A ' A).
Var
)
Then

A 'Y
A' X
~ N
,1 .
A 'A
A 'A

(A ' y ) 2
Further,
has a noncentral Chi-square distribution with one degrees of freedom and noncentrality parameter
2A ' A

(A ' X ) 2
.
2A ' A

Degrees of freedom
A linear function A 'Y of the observations (A 0) is said to carry one degrees of freedom. A set of r linear functions L ' Y ,
where L is r x n matrix, is said to have M degrees of freedom if there exists M linearly independent functions in the set
and no more. Alternatively, the degrees of freedom carried by the set L ' Y equals rank (L). When the set L ' Y are
the estimates of ' , the degrees of freedom of the set L ' Y will also be called the degrees of freedom for the
estimates of ' .

Sum of squares
If A 'Y is a linear function of observations, then the projection of Y on A is the vector

Y 'A
.A . The square of this
A 'A

(A ' Y ) 2
projection is called the sum of squares (SS) due to A ' y is given by
. Since A 'Y has one degree of freedom,
A 'A
so the SS due A 'Y to has one degree of freedom.
The sum of squares and the degrees of freedom arising out of the mutually orthogonal sets of functions can be added
together to give the sum of squares and degrees of freedom for the set of all the function together and vice versa.

Let X = ( X 1 , X 2 ,..., X n ) has a multivariate normal distribution with mean vector

and positive definite covariance

matrix . Let the two quadratic forms

X ' A1 X is distributed as 2 with n1 degrees of freedom and noncentrality parameter ' A1 and

X ' A2 X
Then

is distributed as 2 with n2 degrees of freedom and noncentrality parameter ' A2 .

X ' A1 X and X ' A2 X are independently distributed if

A1A2 = 0.

Fisher-Cochran theorem
If X = ( X 1 , X 2 ,..., X n ) has multivariate normal distribution with mean vector
matrix

and positive definite covariance

and let X ' 1 X = Q1 + Q2 + ... + Qk

where
h
Qi = X ' Ai X

1 2,...,
2 k . Then
with
ith rank
k ( Ai ) = N i , i = 1,
Th Qi ' s are independently
i d
d tl distributed
di t ib t d noncentral
t l Chi-square
Chi

distribution with Ni degrees of freedom and noncentrality parameter ' Ai if and only if
k

' 1 = ' Ai .
i 1
i=

N
i =1

= N , in which case

10

Derivatives of quadratic and linear forms


X = ( x1 , x2 ,..., xn ) ' and f(X) be any function of n independent variables x1 , x2 ,..., xn ,

Let
then

f ( X )
x
1

f ( X )
f ( X )

= x2 .
X
#

f ( X )
x
n

If K = ( k1 , k 2 ,..., k n ) ' is a vector of constants, then

If A is an n n matrix, then

K ' X
= K.
X

X ' AX
= 2( A + A ') X .
X
X

Independence of linear and quadratic forms


Let Y be an n 1 vector having multivariate normal distribution N ( , I ) and B be an m n matrix.
matrix Then m 1
vector linear form BY is independent of the quadratic form Y ' AY if BA = 0 where A is a symmetric matrix of known
elements.
Let Y be an n 1 vector having multivariate normal distribution N ( , ) with rank () = n. If BA = 0, then the
quadratic form Y ' AY is independent of linear form BY where B is an

m n matrix.

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE II
LECTURE - 4

GENERAL LINEAR HYPOTHESIS


AND ANALYSIS OF VARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Regression model for the general linear hypothesis


Let Y1 , Y2 ,..., Yn be a sequence of n independent random variables associated with responses. Then we can write it as
p

E (Yi ) = j xij , i = 1, 2,..., n, j = 1, 2,..., p


j =1

Var (Yi ) = 2 .

This is the linear model in the expectation form where 1 , 2 ,..., p are the unknown parameters and x ij ' s are the known values
of independent covariates X 1 , X 2 ,..., X p .
p

Alternatively, the linear model can be expressed as

Yi = j xij + i , i = 1, 2,..., n; j = 1, 2,..., p


j =1

where i s are identically and independently distributed random error component with mean 0 and variance 2 , i.e.,
E ( i ) = 0,
0 Var ( i ) = 2 and Cov( i , j ) = 0(i j )).

In matrix notations, the linear model can be expressed as

Y = X +
where

Y = (Y1 , Y2 ,..., Yn ) ' is n1 vector of observations on response variable,

X 11 X 12 ... X 1 p

X 21 X 22 ... X 2 p

is n p matrix of n observations on p independent covariates X 1 , X 2 ,..., X p ,


the matrix X =
#
# %#

X X ... X
n
1
n
2
np

= ( 1 , 2 ,..., p ) is a p 1 vector of unknown regression parameters (or regression coefficients) 1 , 2 ,..., p


associated with X 1 , X 2 ,..., X p , respectively and

= (1 , 2 ,..., n ) is a n1 vector of random errors or disturbances.

We assume that E ( ) = 0, covariance matrix V ( ) = E ( ') = 2 I p , rank ( X ) = p.


In the context of analysis of variance and design of experiments,

the matrix X is termed as design matrix;

unknown 1 , 2 ,..., p are termed as effects;

the covariates X 1 , X 2 ,..., X p are counter variables or indicator variables where xij counts the number of times
the effect j occurs in the ith observation xi .

x ij mostly takes the values 1 or 0 but not always.

The value xij = 1 indicates the presence of effect j in xi and xij = 0 indicates the absence of effect j in Xi.

Note that in the linear regression model, the covariates are usually continuous variables.
When some of the covariates are counter variables and rest are continuous variables, then the model is called as
mixed model and is used in the analysis of covariance.

Relationship between the regression model and analysis of variance model


The same linear model is used in the linear regression analysis as well as in the analysis of variance. So it is important
to understand the role of linear model in the context of linear regression analysis and analysis of variance.
Consider the multiple linear model
Y = 0 + X 1 1 + X 2 2 + ... + X p p + .

In the case of analysis of variance model,


the one-way classification considers only one covariate,
two
t
way-classification
l
ifi ti model
d l considers
id
ttwo covariates,
i t
three-way classification model considers three covariates and so on.
If , and denote the effects associated with the covariates X, Z and W which are counter variables, then in

One-way model: Y = + X +
Two-way model: Y = + X + Z +
Three-way model : Y = + X + Z + W + and so on.

Consider an example of agricultural yield. The study variable denotes the yield which depends on various covariates
X 1 , X 2 ,..., X p . In case of regression analysis, the covariates X 1 , X 2 ,..., X p are the different variables like temperature,

quantity of fertilizer, amount of irrigation, etc.

Now consider the case of one way model and try to understand its interpretation in terms of multiple regression model.
The covariate X is now measured at different levels, e.g., if X is the quantity of fertilizer then suppose there are p
possible values, say 1 Kg., 2 Kg., ,..., p Kg. then X 1 , X 2 ,..., X p denotes these p values in the following way.
The linear model now can be expressed as Y = o + 1 X 1 + 2 X 2 + ... + p X p +

by defining

1 if effect of 1 Kg. fertilizer is present


X1 =
0 if effect of 1 Kg. fertilizer is absent
1 if effect of 2 Kg. fertilizer is present
X2 =
0 if effect of 2 Kg. fertilizer is absent
#
1 if effect of p Kg. fertilizer is present
Xp =
0 if effect of p Kg. fertilizer is absent.

If effect of 1 Kg. of fertilizer is present, then other effects will obviously be absent and the linear model is expressible as
Y = 0 + 1 ( X 1 = 1) + 2 ( X 2 = 0) + ... + p ( X p = 0) +
= 0 + 1 + .

If effect of 2 Kg. of fertilizer is present then

Y = 0 + 1 ( X 1 = 0) + 2 ( X 2 = 1) + ... + p ( X p = 0) +
= 0 + 2 + .

If effect of p Kg. of fertilizer is present then

Y = 0 + 1 ( X 1 = 0) + 2 ( X 2 = 0) + ... + p ( X p = 1) +
= 0 + p +
and so on.

If the experiment with 1 Kg. of fertilizer is repeated n1 number of times then n1 observation on response variables
p
as
are recorded which can be represented

Y11 = 0 + 1.1 + 2 .0 + ... + p .0 + 11


Y12 = 0 + 1.1 + 2 .0 + ... + p .0 + 12
#
Y1n1 = 0 + 1.1 + 2 .0 + ... + p .0 + 1n1.
If X2 = 1 is repeated n2 times, then on the same lines n2 number of times then n1 observation on response
variables are recorded which can be represented as
Y21 = 0 + 1.0 + 2 .1 + ... + p .0 + 21
Y22 = 0 + 1.0 + 2 .1 + ... + p .0 + 22
#
Y2 n2 = 0 + 1.0 + 2 .1 + ... + p .0 + 2 n2 .

7
The experiment is continued and if Xp = 1 is repeated np times, then on the same lines
Yp1 = 0 + 1.0 + 2 .0 + ... + p .1 + P1
Yp 2 = 0 + 1.0 + 2 .0 + ... + p .1 + P 2
#
Ypn p = 0 + 1.0 + 2 .0 + ... + p .1 + pn p .

All these n1 , n2 ,.., n p observations can be represented as


y
11 1
y12 1

#
y1n 1
1
y21 1
y 1
22
#
= #


y
2 n2 1
#
#


y p1 1

1
y
p2
#
#


y pn 1
p

1
1
#
1
0
0
#
0
#
0
0
#
0

0
0
#
0
1
1
#
1
#
0
0
#
0

or

Y = X + .

0" 0 0
0" 0 0
#% # #
0" 0 0
0" 0 0
0" 0 0
# %# #
0" 0 0
#% # #
0" 0 1
0" 0 1
#%# #
0" 0 1

11

12

1n

21
0

22
1 + #
#

2n2
p
#

p1

p2

pn
p

In the two way analysis of variance model, there are two covariates and the linear model is expressible as
Y = 0 + 1 X 1 + 2 X 2 + ... + p X p + 1 Z1 + 2 Z 2 + ... + q Z q +

where

X 1 , X 2 ,..., X p

denotes, e.g., the p levels of quantity of fertilizer, say 1 Kg., 2 Kg.,..., p Kg. and Z1 , Z 2 ,..., Z q

denotes, e.g., the q levels of level of irrigation, say 10 Cms., 20 Cms.,,10q Cms. etc. The levels X 1 , X 2 ,..., X p ,
Z1 , Z 2 ,..., Z q are defined as counter variable indicating the presence or absence of the effect as in the earlier

case. If the
th effect
ff t off X1 and
d Z1 are present,
t i.e.,
i
1K
Kg off fertilizer
f tili
and
d 10 Cms.
C
off irrigation
i i ti iis used
d th
then th
the
linear model is written as

Y = 0 + 1.1 + 2 .0 + ... + p .0 + 1.1 + 2 .0 + ... + p .0 +


= 0 + 1 + 1 + .
If X2 = 1 and Z2 = 1 is used, then the model is Y = 0 + 2 + 2 + .
The design matrix can be written accordingly as in the one way analysis of variance case.
In the three way analysis of variance model
Y = + 1 X 1 + ... + p X p + 1Z1 + ... + q Z q + 1W1 + ... + rWr + .

The regression
g
p
parameters ' s can be fixed or random.

If all ' s are unknown constants, they are called as parameters of the model and the model is called as a
fixed-effects model or model I. The objective in this case is to make inferences about the parameters and the error

variance 2 .

If for some j , xij = 1

for all i = 1,
case, j occurs with every
1 2,...,
2
n then j is termed as additive constant. In this case

observation and so it is also called as general mean effect.

If all ' s are observable random variables except the additive constant, then the linear model is termed as
random-effects model, model II or variance components model. The objective in this case is to make inferences
2
2
2
about the variances of ' s, i.e., 1 , 2 ,..., p and error variance 2 and/or certain functions of them.

If some parameters are fixed and some are random variables, then the model is called as mixed-effects model
or model III. In mixed effect model, at least one

is constant and at least one

is random variable. The

2
objective is to make inference about the fixed effect parameters, variance of random effects and error variance .

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE II
LECTURE - 5

GENERAL LINEAR HYPOTHESIS


AND ANALYSIS OF VARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Analysis of variance
Analysis of variance is a body of statistical methods of analyzing the measurements assumed to be structured as

yi = 1 xi1 + 2 xi 2 + ... + p xip + i , i = 1, 2,..., n


where

xij

are integers, generally 0 or 1 indicating usually the absence or presence of effects

j ; and i s are assumed to

be identically and independently distributed with mean 0 and variance 2 . It may be noted that the i s can be assumed
additionally to follow a normal distribution N (0, 2 ). It is needed for the maximum likelihood estimation of parameters from
the beginning of analysis but in the least squares estimation, it is needed only when conducting the tests of hypothesis and
the confidence interval estimation of parameters.
parameters The least squares method does not require any knowledge of distribution
like normal upto the stage of estimation of parameters.
We need some basic concepts to develop the tools.

Least squares estimate of


Let y1 , y2 ,..., yn be a sample of observations on Y1 , Y2 ,..., Yn . The least squares estimate of is the values of for which
the sum of squares due to errors, i.e.,
n

S 2 = i2 = ' = ( y X )( y X )
i =1

= yy 2 X ' y + X X

y = ( y1 , y2 ,..., yn ) . Differentiating
i minimum
is
i i
where
h
Diff
i i
S2 with
i h respect to and
d substituting
b i i iit to b
be zero, the
h normall
equations are obtained as

dS 2
= 2 X X 2 X y = 0
d
or X X = X y.

If X has full rank then ( X X ) has a unique inverse and the unique least squares estimate of is

= ( X X ) 1 X y
which is the best linear unbiased estimator of in the sense of having minimum variance in the class of linear and unbiased

)
estimator If rank of X is not full,
estimator.
full then generalized inverse is used for finding the inverse of ( X X ).
If L is a linear parametric function where L = (A 1 , A 2 ,..., A p ) is a non-null vector, then the least squares estimate of L
is L .

L admits
A question
ti arises
i
th t what
that
h t are the
th conditions
diti
under
d which
hi h a linear
li
parametric
t i function
f
ti
d it a unique
i
l
least
t
squares estimate in the general case.

The concept of estimable function is needed to find such conditions.

Estimable functions
A linear function of the parameters with known is said to be an estimable parametric function (or estimable) if there
exists a linear function L Y of Y such that

E ( LY ) = for all Rb .

Note that not all parametric functions are estimable.

Following results will be useful in understanding the further topics.

Theorem 1
A linear parametric function L admits a unique least squares estimate if and only if L is estimable.

Th
Theorem
2 (Gauss
(G
M k ff theorem)
Markoff
h
)
If the linear parametric function L is estimable then the linear estimator L where is a solution of X X = X Y
g minimum variance in the class of all linear and unbiased
is the best linear unbiased estimater of L in the sense of having
estimators of L .

Theorem 3
If the linear parametric function

1 = l1' , 2 = l2' ,..., k = lk'

are estimable, then any linear combination of

1 , 2 ,..., k

is also estimable.

Theorem 4
All linear parametric functions in are estimable if and only if X has full rank
rank.

If X is not of full rank, then some linear parametric functions do not admit the unbiased linear estimators and nothing can be
inferred about them. The linear parametric functions which are not estimable are said to be confounded. A possible solution
to this problem is to add linear restrictions on so as to reduce the linear model to a full rank.

Theorem 5
Let L1' and L'2 be two estimable parametric functions and let L1' and L'2 be their least squares estimators. Then

Var ( L1' ) = 2 L1' ( X X ) 1 L1


Cov ( L' , L' ) = 2 L' ( X X ) 1 L
1

assuming that X is a full rank matrix. If not, the generalized inverse of X X can be used in place of unique inverse.

Estimator of

q
estimation
2 based on least squares

Consider an estimator of 2 as
1
( y X )( y X )
2 =
n p

1
[ y X ( X X ) 1 X ' y ][ y X ( X X ) 1 X y ]
n p
1
y [ I X ( X X ) 1 X ][ I X ( X X ) 1 X ] y
=
n p
1
=
y [ I X ( X X ) 1 X ] y
n p
=

where the hat matrix [ I X ( X X ) X ] is an idempotent matrix with its trace as


1

tr [ I X ( X X ) 1 X '] = trI trX ( X X ) 1 X


= n tr
t ( X X ) 1 X X (using
i th
the result
lt tr
t ( AB ) = tr
t ( BA))
= n tr I p
= n p.
Note that, using E ( y Ay ) = ' A + tr ( A ), we have

E ( 2 ) =

2
n p

tr[ I X ( X X ) 1 X ]

= 2
and so

2 is an unbiased estimator of 2 .

Maximum likelihood estimation


The least square method does not uses any distribution of the random variables in the estimation of parameters. We need the
distributional assumption in case of

least squares only while constructing the tests for hypothesis and the confidence

intervals. For maximum likelihood estimation, we need the distributional assumption from the beginning.
p

Suppose y1 , y2 ,..., yn are independently and identically distributed following a normal distribution with mean E ( yi ) = j xij
and variance Var ( yi ) = 2 (i = 1, 2,, n). Then the likelihood function of

L( y | , 2 ) =

1
n
2

n
2 2

(2 ) ( )
where

exp 2 ( y X )( y X )
2

y = ( y1 , y2 ,..., .yn. ). Then


n
n
1
L = ln L ( y | , 2 ) = log 2 log 2
( y X )( y X ).
2
2
2 2

Differentiating the log likelihood with respect to and 2 , we have

L
= 0 X X  = X y,

1
L
2

=
0

=
( y X  )( y X  ).
)
2
n

y1 , y2 ,..., yn is

j =1

8
Assuming the full rank of X, the normal equations are solved and the maximum likelihood estimators are obtained as

 = ( X X ) 1 X y
1
n
1
= y I X ( X X ) 1 X y.
n

 2 = ( y X  )( y X  )

The second order differentiation conditions can be checked and they are satisfied for  and  2 to be the maximum
likelihood estimators.
Note that in the maximum likelihood estimator  is same as the least squares estimator and

 is an unbiased estimator of , i.e., E (  ) = like the least squares estimator but


n p 2
 2 is not an unbiased estimator of 2 , i.e., E ( 2 ) =
2 unlike the least squares estimator.
n
Now we use the following theorems for developing the test of hypothesis.

Theorem 6
Let

Y = (Y1 , Y2 ,..., Yn )

follow a multivariate normal distribution N ( , ) with mean vector and positive definite covariance

matrix . Then Y AY follows a noncentral chi


chi-square
square distribution with p degrees of freedom and noncentrality parameter

A , i.e., 2 ( p, A ) if and only if A is an idempotent matrix of rank p.

Theorem 7
Let Y = (Y1 , Y2 ,..., Yn ) follows a multivariate normal distribution N (, ) with mean vector and positive definite covariance
2
2
matrix . Let Y AY
follows ( p1 , A1 ) and Y A2Y follows ( p2 , A2 ). Then Y AY
and Y A2Y are independently distributed if
1
1

A1 A2 = 0.

Theorem 8
Let Y = (Y1 , Y2 ,..., Yn )

follow a multivariate normal distribution N ( , 2 I ),

then the maximum likelihood (or least squares)

estimator L of estimable linear parametric function is independently distributed of 2 ; L follow N L , L( X X ) 1 L


and

n 2

2
follows (n p) where rank(X) = p.
p

Proof: Consider = ( X X ) 1 X Y , then

E ( L ) = L( X X ) 1 X E (Y )
= L( X X ) 1 X X
= L
Var ( L ) = LVar ( ) L
= LE ( )( ) L
= 2 L( X X ) 1 L.
Since is a linear function of y and L is a linear function of , so L follows a normal distribution N L , 2 L( X X ) 1 L .
Let A = I X ( X X )1 X and B = L '( X X ) 1 X , then L = L ( X X ) 1 X Y = BY
and

n 2 = (Y X ) ' I X ( X X )1 X (Y X ) = Y ' AY .

2
2
So, using Theorem 6 with rank(A) = n p, n follows a (n p) . Also

BA = L( X X )1 X L( X X )1 X X ( X X )1 X
= 0.
0
So using Theorem 7, Y ' AY and Y ' BY are independently distributed.

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE II
LECTURE - 6

GENERAL LINEAR HYPOTHESIS


AND ANALYSIS OF VARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Tests of hypothesis in the linear regression model


First we discuss the development of the tests of hypothesis concerning the parameters of a linear regression model. These tests
of hypothesis will be used later in the development of tests based on the analysis of variance.

Analysis of Variance The technique in the analysis of variance involves the breaking down of total variation into orthogonal
components. Each orthogonal factor represents the variation due to a particular factor contributing in the total variation.
Model
p

Let Y1 , Y2 ,..., Yn be independently distributed following a normal distribution with mean E (Yi ) = j xij and variance 2. Denoting
j =1

Y = (Y1 , Y2 ,..., Yn ) a n1 column vector, such assumption can be expressed in the form of a linear regression model Y = X +
where X is a n p matrix, is a p 1 vector and is a n1 vector of disturbances with

E ( ) = 0
Cov ( ) = 2 I
and follows a normal distribution.
This implies that

E (Y ) = X
E (Y X )(Y X ) = 2 I .
Now we consider four different types of tests of hypothesis .
I the
In
th first
fi t two
t
cases, we develop
d
l the
th likelihood
lik lih d ratio
ti test
t t for
f the
th nullll hypothesis
h
th i related
l t d to
t the
th analysis
l i off variance.
i
N t that,
Note
th t
later we will derive the same test on the basis of least squares principle also. An important idea behind the development of this
test is to demonstrate that the test used in the analysis of variance can be derived using least squares principle as well as
likelihood ratio test.

Case 1: Test of H 0 : = 0
Consider the null hypothesis for testing H 0 : = 0 where = ( 1 , 2 ,..., p ), 0 = ( 10 , 20 ,..., p0 ) ' is specified and 2 is
unknown.

This null hypothesis is equivalent to


H 0 : 1 = 10 , 2 = 20 ,..., p = p0 .

Assume that all i ' s are estimable, i.e., rank(X) = p (full column rank). We now develop the likelihood ratio test.
The ( p + 1) 1 dimensional parametric space is a collection of points such that

= {( , 2 ); < i < , 2 > 0 i = 1, 2,... p} .


0
Under H 0 , all ' s are known and equal, say and the reduces to one dimensional space given by

= {( 0 , 2 ); 2 > 0} .
The likelihood function of y1 , y2 ,..., yn is
n

1 2
1

exp 2 ( y X )( y X ) .
L( y | , ) =
2
2
2

2
The likelihood function is maximum over when and are substituted with their maximum likelihood estimators, i.e.,

= ( X X )1 X y
1
n

2 = ( y X )( y X ).
Substituting and 2 in L( y | , 2 ) gives
n

1
1 2

Max L( y | , 2 ) =
exp 2 ( y X )( y X )

2
2

2
n
n
=
exp .
2 ( y X )( y X )
2

U d H 0 , the
Under
th maximum
i
lik lih d estimator
likelihood
ti t off 2 is
i

1
n

2 = ( y X 0 )( y X 0 ).
The maximum value of the likelihood function under H 0 is
n

1
1 2

exp 2 ( y X 0 )( y X 0 )
Max L( y | , ) =

2
2

2
n
n
=
exp .

0
0
2
2 ( y X )( y X )

5
The likelihood ratio test statistic is

Max L( y | , 2 )

Max L( y | , 2 )

( y X )( y X ) 2
=
0
0
( y X )( y X )
n

( y X )( y X )
=

'
( y X ) + ( X X 0 ) ( y X ) + ( X X 0 )

( 0 ) ' X X ( 0 )
= 1 +

( y X )( y X )

q
= 1 + 1
q2

n
2

n
2

where q2 = ( y X )( y X )
and

q1 = ( 0 ) X X ( 0 ).

The expression
p
of q1 and q2 can be further simplified
p
as follows:

Consider

q1 = ( 0 ) X X ( 0 )
= ( X X )1 X y 0 X X ( X X )1 X y 0
= ( X X )1 X ( y X 0 ) X X ( X X )1 X ( y X 0 )
= ( y X 0 ) X ( X X )1 X X ( X X )1 X ( y X 0 )
= ( y X 0 ) X ( X X )1 X ( y X 0 )
q2 = ( y X )( y X )
= y X ( X X )1 X y y X ( X X )1 X y
= y I X ( X X )1 X y
= [( y X 0 ) + X 0 ][ I X ( X X ) 1 X '][( y X 0 ) + X 0 ]
= ( y X 0 )[ I X ( X X )1 X ]( y X 0 ).

Other two terms become zero using [ I X ( X X )1 X ] X = 0.

7
In order to find out the decision rule for H 0 based on , first we need to find if is a monotonic increasing or decreasing
function of

q1
. So we proceed as follows:
q2

q
Let g = 1 , so that = 1 + 1
q2
q2

then

d
n
=
d
dg
2

n
2

= (1 + g )

n
2

1
n

(1 + g ) 2

+1

d
decreases.
dg

So as g increases,

Thus is a monotonic decreasing function of

q1
.
q2

The decision rule is to reject H 0 if 0 where 0 is a constant to be determined on the basis of size of the test. Let us
simplify this in our context.

0
or
or

or

q1
1 +
q2
1

(1 + g )

n
2

n
2

(1 + g ) 0 n

2
n

or

g 0 1

or

g C

where C is a constant to be determined by the size condition of the test.

So reject H 0

whenever

q1
C
q2

q1
can also be obtained by the least squares method as follows. The least squares methodology will
q2
also be discussed in further lectures.

Note that the statistic

q1 = ( 0 ) X X ( 0 )
q1
Min( y X )( y X )
=

Min( y X )( y X )

sum of

sum

sum

squares due

of squares

of

to deviation

due to H o

squares

from H o

OR

due to error

OR
sum of squares
due to

Total sum of
squares

9
Theorem 9
Let

Z =Y X0
Q1 = Z X ( X X )1 X ' Z
Q2 = Z [ I X ( X X )1 X ]Z .

Q1
Q2
2
2
Then Q1 and Q2 are independently distributed. Further, when H 0 is true , then 2 ~ ( p ) and 2 ~ (n p )
2
2

where 2 ( m ) denotes the 2 distribution with m degrees of freedom.


Proof: Under H 0 ,

E (Z ) = X 0 X 0 = 0
Var ( Z ) = Var (Y ) = 2 I .
Further Z is a linear function of Y and Y follows a normal distribution. So Z ~ N (0,
(0 2 I )
The matrices X ( X X )1 X and [ I X ( X X ) 1 X ] are idempotent matrices. So
tr [ X ( X X ) 1 X ] = tr[( X X ) 1 X X ] = tr ( I p ) = p
tr [ I X ( X X ) 1 X ] = tr I n tr[ X ( X X ) 1 X ] = n p.

So using theorem 6, we can write that under H 0 ,

Q1

~ 2 ( p ) and

Q2

~ 2 (n p )

1
where the degrees of freedom p and (n-p) are obtained by the trace of X ( X X )1 X and trace of I X ( X X ) X ,
respectively.

I X ( X X )1 X X ( X X )1 X = 0, so using theorem 7, the quadratic forms Q1 and Q2 are independent under H0 .


Hence the theorem is proved.
Since

10
Since Q1 and Q2 are independently distributed, so under H 0 ,
Q1 / p
Q2 /(n p ) follows a central F distribution, i.e.

n p Q1
F ( p, n p).

p Q2
Hence the constant C in the likelihood ratio test statistic is given by C = F1 ( p, n p)
where F1 (n1 , n2 ) denotes the upper 100 % points of F-distribution with n1 and n2 degrees of freedom.
The computations of this test of hypothesis can be represented in the form of an analysis of variance table.

ANOVA table for testing H 0 : = 0


Source of
variation
Due to

Degrees of
freedom
p

q1
p

(n p)

C = F1 ( p, n p)
q2
(n p)

q2
H0 : = 0

Total

Mean
squares

q1

Error

Sum of
squares

( y X 0 )( y X 0 )

F - value
n p q1

p q2

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE II
LECTURE - 7

GENERAL LINEAR HYPOTHESIS


AND ANALYSIS OF VARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

H 0 : k = k0 , k = 1,
1 2,..,
2 r < p when
C
Case
2 Test
2:
T t off a subset
b t off parameters
t
h
d
2 and

r +1 , r + 2 ,..., p are unknown


In case 1, the test of hypothesis was developed when all ss are considered in the sense that we test for each

i = i0 , i = 1, 2,..., p. Now consider another situation, in which the interest is to test only a subset of 1 , 2 ,..., p , i.e., not all but
only a few parameters. This type of test of hypothesis can be used e.g., in the following situation. Suppose five levels of
voltage are applied to check the rotations per minute (rpm) of a fan at 160 volts, 180 volts, 200 volts, 220 volts and 240 volts.
It can be realized in practice that when the voltage is low, then the rpm at 160, 180 and 200 volts can be observed easily. At
220 and 240 volts, the fan rotates at the full speed and there is not much difference in the rotations per minute at these
voltages. So the interest of the experimenter lies in testing the hypothesis related to only first three effects, viz., ,1 for 160
volts, 2 for 180 volts and 3 for 200 volts.
The null hypothesis in this case can be written as:
H 0 : 1 = 10 , 20 = 20 , 3 = 30

when 4 , 5 and 2 are unknown.


Note that under case 1, the null hypothesis will be
H 0 : 1 = 10 , 20 = 20 , 3 = 30 , 4 = 40 , 5 = 50 .

3
Let 1 , 2 ,..., p be the p parameters.
We can divide them into two parts such that out of 1 , 2 ,..., r , r +1 ,..., p , and we are interested in testing
the hypothesis of a subset of it.
0
Suppose, we want to test the null hypothesis H 0 : k = k , k = 1, 2,.., r < p when r +1 , r + 2 ,..., p and 2 are unknown.

The alternative hypothesis under consideration is H 1 : k k0 for at least one k = 1, 2,.., r < p.
In order to develop a test for such a hypothesis, the linear model

Y = X +
under the usual assumptions can be rewritten as follows:
(1)
Partition X = ( X 1 X 2 ), =
(2)

where (1) = ( 1 , 2 ,..., r ), (2) = ( r +1 , r + 2 ,..., p )


with the orders of matrices as X 1 : n r , X 2 : n ( p r ), (1) : r 1 and ( 2 ) : ( p r ) 1.
Th model
The
d l can be
b rewritten
itt as

Y = X +
(1)
= ( X1 X 2 )
+
(2)
= X 1 (1) + X 2 (2) + .

4
The null hypothesis of interest is now
2
0
(2) and
H 0 : (1) = (1)
= ( 10 , 20 ,..., r0 ) where
h
d are unknown.
k

The complete parametric space is

= {( , 2 ); < i < , 2 > 0, i = 1, 2,..., p}


and sample space under H 0 is
0
= {( (1)
, (2) , 2 ); < i < , 2 > 0, i = r + 1, r + 2,..., p}.

The likelihood function is


n
2

exp 2 ( y X )( y X ) .
L( y | , 2 ) =
2
2
2

Th maximum
The
i
value
l off lik
likelihood
lih d function
f
ti under
d is
i obtained
bt i d b
by substituting
b tit ti th
the maximum
i
lik
likelihood
lih d
estimates of and 2 , i.e.,

= ( X X )1 X y
2 =

1
( y X )( y X )
n

as
n

1 2
1

exp 2 ( y X )( y X )
Max L( y | , ) =

2
2

2
n
n
exp .
=

'

2
2 ( y X ) ( y X )

5
Now we find the maximum value of likelihood function under H 0 . The model under H 0 becomes
0
Y = X 1 (1)
+ X 2 2 + . The
Th lik
likelihood
lih d function
f
ti under
d H 0 is
i

1 2
1

0
0
exp 2 ( y X 1 (1)
L( y | , ) =
X 2 (2) )( y X 1 (1)
X 2 (2) )
2
2
2

1 2
1

exp 2 ( y * X 2 (2) )( y * X 2 (2) )


=
2
2
2

(0)
. Note that ( 2) and 2 are the unknown parameters. This likelihood function looks like as if it is
where y* = y X 1 (1)

written for y* ~ N ( X 2 (2) , 2 ).


This helps is writing the maximum likelihood estimators of ( 2) and 2 directly as

((2)) = ( X 2' X 2 ) 1 X 2' y *


1
n

2 = ( y * X 2 (2) )( y * X 2 (2) ).
Note that X 2' X 2 is a principal minor of X X . Since X X is a positive definite matrix, so X 2' X 2 is also positive
definite. Thus ( X 2' X 2 ) 1 exists and is unique.
Thus the maximum value of likelihood function under H 0 is obtained as
n

1 2
1

exp 2 ( y * X 2 (2) )( y * X 2 (2) )


Max L ( y* | , ) =
2

2
2

2
n
n
=
exp .
2 ( y * X ) '( y * X )
2
2 (2)
2 (2)

0
The likelihood ratio test statistic for H 0 : ((1)) = (1)
is
( )

max L( y | , 2 )

max L( y | , 2 )

( y X )( y X )
=

( y * X 2 (2) )( y * X 2 (2)

n
2

( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X ) + ( y X )( y X )
=

( y X )( y X )

-n
( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X ) 2
= 1 +

( y X )( y X )

-n
q1 2
= 1 +
q2

- n2

where q1 = ( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X ) and q2 = ( y X )( y X ).


Now we simplify
p y q1 and q2

Consider
( y * X 2 (2) )( y * X 2 (2) ) = ( y * X 2 ( X 2' X 2 ) 1 X 2' y*) ( y * X 2 ( X 2' X 2 ) 1 X 2' y*)
= y * ' I X 2 ( X 2' X 2 ) 1 X 2' y *
0
0
= ( y X 1 (1)
X 2 (2) ) + X 2 (2) I X 2 ( X 2 X 2 ) 1 X 2 ( y X 1 (1)
X 2 (2) ) + X 2 (2)
0
= ( y X 1 (01) X 2 (2) ) I X 2 ( X 2 X 2 ) 1 X 2 ( y X 1 (1)
X 2 (2) ).

7
The other terms becomes zero using the result X 2' I X 2 ( X 2' X 2 ) 1 X 2' = 0.
Consider
( y X )( y X ) = ( y X ( X ' X ) 1 X ' y )( y X ( X ' X ) 1 X ' y )

= y I X ( X X )1 X ' y

0
0
0
0
= ( y X 1 (1)
X 2 (2) ) + X 1 (1)
+ X 2 (2) ) I X ( X ' X ) 1 X ( y X 1 (1)
X 2 (2) ) + X 1 (1)
+ X 2 (2) )
0
0
= ( y X 1 (1)
X 2 (2) ) ' I X ( X X )1 X ( y X 1 (1)
X 2 (2) )
0
+ X 2 (2)
and other terms become zero using the result X ' I X ( X X )1 X = 0. Note that under H 0 , the term X 1 (1)
0
can be expressed as ( X 1 X 2 )( (1)
(2) ) ' . Thus

q1 = ( y * X 2 (2) )( y * X 2 (2) ) ( y X )( y X )
= y *' I X 2 ( X 2 X 2 ) 1 X 2 y * y ' I X ( X X ) 1 X y
0
0
0
= ( y X 1 (1)
X 2 (2) ) I X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) ) ( y X 1 (1)
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (10 ) X 2 (2) )
0
0
= ( y X 1 (1)
X 2 (2) ) X ( X X )1 X X 2 ( X 2' X 2 ) 1 X 2' ( y X 1 (1)
X 2 (2) )

q2 = ( y X )( y X )

= y [ I X ( X X ) X ] y

0
0
0
0
= ( y X 1 (1)
X 2 (2) ) + ( X 1 (1)
+ X 2 (2) ) [ I X ( X X ) X ] ( y X 1 (1)
X 2 (2) ) + ( X 1 (1)
+ X 2 (2) )
0
0
= ( y X 1 (1)
X 2 (2) ) ' I X ( X X ) 1 X ( y X 1 (1)
X 2 (2) ).
'

Other terms become zero. Note that in simplifying the terms q1 and q2, we tried to write them in the quadratic form with
0
same variable ( y X 1 (1) X 2 (2) ).

8
Using the same argument as in the case 1, we can say that since is a monotonic decreasing function of
lik lih d ratio
likelihood
ti ttestt rejects
j t H 0 whenever
h

q1
>C
q2
where C is a constant to be determined by the size of the test.
The likelihood ratio test statistic can also be obtained through least squares method as follows:
0
(q1 + q2 ) : Minimum value of ( y X )( y X ) when H 0 : (1) = (1)
holds true.

(q1 + q2 ) :

Sum of squares due to H 0

q2

: Sum of squares due to error.

q1

: Sum of squares due to the deviation from H 0 or sum of squares due to (1) adjusted for (2)

0
= 0 then
If (1)

q1 = ( y X 2 (2) ) '( y X 2 (2) ) ( y X ) '( y X )


'
= ( yy (2)
X 2' y) ( yy X y)

= X y

'
(2)
X 2' y.

Reduction

sum of squares

sum of squares

due to (2)

or
sum of squares
due to

ignoring (1)

q1
, so the
q2

Now we have the following theorem based on the Theorems 6 and 7.


Th
Theorem
10
Let

0
Z = Y X 1 (1)
X 2 (2)

Q1 = Z AZ
Q2 = Z BZ
A = X ( X X )1 X ' X 2 ( X 2' X 2 ) 1 X 2'

where

B = I X ( X X ) 1 X '.
Then

Q1

and

Q2

are independently distributed.


distributed Further

Q1

~ 2 ( q ) and

Q2

~ 2 ( n p ).
)

Thus under H 0 ,
Q1 / r
n p Q1 follow a F-distribution F ( r , n p ).
=
Q2 / ( n p )
r Q2

Hence the constant C in

is

C = F1 ( r, n p )
where F1 ( r, n p) denotes the upper 100 % points on F-distribution with r and (n - p) degrees of freedom.

10

The analysis of variance table for this null hypothesis is as follows:

ANOVA for testing H 0 : (1) = (1)0


Source of
variation
Due to (1)

Degrees of
freedom
r

(n p)

C = F1 ( p, n p )
q2
(n p )

q2
H0 : = 0

Total

n -(p q)

Mean
squares
q1
r

q1

Error

Sum of
squares

q1 + q2

F - value
n p q1

r q2

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE II
LECTURE - 8

GENERAL LINEAR HYPOTHESIS


AND ANALYSIS OF VARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Case 3: Test of H 0 : L =
Let us consider the test of hypothesis related to a linear parametric function. Assuming that the linear parameter function
L is estimable where L = (A 1 , A 2 ,..., A p ) is a p 1 vector of known constants and = ( 1 , 2 ,..., p ). The null hypothesis

of interest is

H 0 : L =
where is some specified constant.
Consider the set up of linear model Y = X + where Y = (Y1 , Y2 ,..., Yn ) follows N ( X , 2 I ). The maximum likelihood
estimators
ti t
off and
d 2 are

= ( X X ) 1 X y
and

1
n

respectively
2 = ( y X )( y X ),
) respectively.

The maximum likelihood estimate of estimable L is L with

E ( L ' ) = L
Cov( L ) = 2 L( X X )1 L
L ~ N L , 2 L( X X )1 L
and

n 2

~ 2 (n p)

assuming X to be the full column rank matrix. Further, L and

n 2

are also independently distributed.

Under H 0 : L = , the statistic

( n p )( L )

t=

n 2 L( X X ) 1 L

follows a t-distribution with (n p) degrees of freedom. So the test for H 0 : L = against H1 : L rejects H 0 whenever

t t

(n p )

where t1 (n1 ) denotes the upper 100 % points on t-distribution with n1 degrees of freedom.

Case 4: Test of H 0 : 1 = 1 , 2 = 2 ,..., k = k


Now we develop the test of hypothesis related to more than one linear parametric functions. Let the ith estimable linear
parametric function is i = L'i and there are k such functions with Li and both being p 1 vectors as in the Case 3.
Our interest is to test the hypothesis

H 0 : 1 = 1 , 2 = 2 ,..., k = k
where 1 , 2 ,..., k are the known constants.
Let = (1 , 2 ,..., k ) and = (1 , 2 ,.., k ).
Then H 0 is expressible as H 0 : = L =
where L is a k p matrix of constants associated with L1 , L2 ,..., Lk .
The maximum likelihood estimator of i is : i = L'i
Then = (1 , 2 ,..., k ) = L .
Also E () =
Cov () = 2V

where V = (( L'i ( X X ) 1 L j ))

where ( L'i ( X X ) 1 L j ) is the (i, j )th element of V. Thus

( )V 1 ( )

2
follows a 2 distribution with k degrees of freedom and

n 2

2
follows a distribution with (n - p) degrees of freedom where

1
2 = ( y X )( y X ) is the maximum likelihood estimator of 2 .
n

5
1

2
Further ( )V ( ) and n are also independently distributed.

Thus under H 0 : =

( )V 1 ( )

2
n

(n p )

or

n p ( )V 1 ( )
k
n 2

follows FF distribution with k and (n p) degrees of freedom.


freedom So the hypothesis H 0 : = is rejected against

H1 : At least one i for i = 1, 2,..., k whenever F F1 (k , n p) where F1 (k , n p) denotes the upper 100 % points
of F-distribution with k and (n p) degrees of freedom.

One-way classification with fixed effect linear models of full rank


The objective in the one way classification is to test the hypothesis about the equality of means on the basis of several
samples which have been drawn from univariate normal populations with different means but the same variances.
Let there be p univariate normal populations and samples of different sizes are drawn from each of the population. Let
2
yij ( j = 1, 2,..., ni ) be a random sample from the ith normal population with mean i and variance , i = 1, 2,..., p , i.e.,

Yij ~ N ( i , 2 ), j = 1, 2,..., ni ; i = 1, 2,..., p.

The random samples from different population are assumed to be independent of each other.
These observations follow the set up of linear model
Y = X +

where

Y = (Y11 , Y12 ,..., Y1n1 , Y21 ,..., Y2 n2 ,..., Y p1 , Y p 2 ,..., Y pn p ) '


y = ( y11 , y12 ,..., y1n1 , y21 ,..., y2 n2 ,..., y p1 , y p 2 ,..., y pn p ) '

= ( 1 , 2 ,..., p )
= (11 , 12 ,..., 1n , 21 ,..., 2 n ,..., p1 , p 2 ,..., pn ) '
1

7
1 0...0

# #%# n1 values
1 0 0

0 1...0

# #%# n values
2

X =
0 1...0

# # #

0 0...1

# #%# n p values

0 0...1

1 if i occurs in the j th observation x j

xij = or if effect i is present in x j

0 if effect i is absent in x j
p

n = ni .
i =1

So X is a matrix of order n p, is fixed and


'
first n1 rows of are 1 = (1, 0, 0,..., 0),
'
next n2 rows of are 2 = (0,1, 0,..., 0)
'
and similarly the last n p rows of are p = (0, 0,..., 0,1).

2
Obviously, rank ( X ) = p, E (Y ) = X and Cov(Y ) = I .

This completes the representation of a fixed effect linear model of full rank.

Th nullll h
The
hypothesis
th i off iinterest
t
t iis H 0 : 1 = 2 = ... = p = (say)
(
)
and H 1 : At least one i j (i j )
where and 2 are unknown.
W would
We
ld d
develop
l h
here the
h lik
likelihood
lih d ratio
i test. IIt may b
be noted
d that
h the
h same test can also
l b
be d
derived
i d through
h
h the
h lleast
squares method. This will be demonstrated in the next module. This way the readers will understand both the methods.
We already have developed the likelihood ratio for the hypothesis H 0 : 1 = 2 = ... = p in the case 1.
The
e whole
o e pa
parametric
a et c space is
s a ( p + 1)) d
dimensional
e s o a space .

= {( , 2 ) : < i < , 2 > 0, i = 1, 2,..., p} .


Note that there are ( p + 1) parameters 1 , 2 ,..., p and 2 .
Under H 0 , reduces to two dimensional space as

= {( , 2 ); < < , 2 > 0} .


The likelihood function under is
n

1
1 2
L( y | , 2 ) =
exp 2
2
2
2

ni

( y
i =1 j =1

n
1
L = ln L( y | , ) = ln (2 2 ) 2
2
2
2

1
L
= 0 i =
i
ni

ni

y
j =1

ij

= yio

L
1 p ni
2

=
0
( yij yio ) 2 .

n i =1 j =1

ij
p

i ) 2

ni

( y
i =1 j =1

ij

i )2

9
The dot sign

(o)

in yio indicates that the average has been taken over the second subscript j. The Hessian matrix of

second order partial derivation of ln L with respect to i and 2 is negative definite at = y io and 2 = 2 which
ensures that the likelihood function is maximized at these values.
Thus the maximum value of L( y | , 2 ) over is
n

1
1 2
Max L( y | , ) =
exp
2

2
2
2

( y
i =1 j =1

=
p ni

2
2 ( yij yio )
i =1 j =1

The likelihood function under

ij

i ) 2

n /2

n
expp .
2

is

1
1 2
exp
L( y | , ) =
2

2
2
2
2

ni

)2

ni

( y
i =1 j =1

ij

and

1
n
l L( y | , ) = ln(2
ln
l (2 2 ) 2
2
2
2

ni

( y
i =1 j =1

ij

)2 .

The normal equations and the least squares estimates are obtained as follows:

ln L( y | , 2 )
1 p ni
= 0 = yij = yoo

n i =1 j =1
ln L( y | , 2 )
1 p ni
2

=
0

=
( yij yoo ) 2 .

n i =1 j =1

10
The maximum value of the likelihood function over under H 0 is
n

1
1 2
Max L ( y | , 2 ) =
exp

2
2

2
2

ni

(y
i =1 j =1

=
ni
p

2
2 ( y ij y oo )
i =1 j =1

ij

) 2

n/2

n
exp .
2

The likelihood ratio test statistic is

Max L ( y | , 2 )

Max L ( y | , 2 )

p ni
2
( yij yio )
=1 =1

= i p jni

2
( yij yoo )
i =1 j =1

n/2

We have
p

ni

(y
i =1 j =1

ni

y oo ) = ( yij yio ) + ( yio y oo )

ij

i =1 j =1

ni

= ( yij yio ) 2 + ni ( yio y oo ) 2 .


i =1 j =1

i =1

11

Thus

ni

( y

n
2

ij

i =1 j =1

q1
= 1+
1+
q2

yi ) 2 + ni ( yio yoo ) 2
I =1

p ni

( yij yio ) 2

i =1 j =1

n
2

where
p

q1 = ni ( yio yoo ) 2
i =1

ni

q2 = ( yij yio ) 2 .
i =1 j =1

Note that if the least squares principal is used, then

q1

: sum of squares due to deviations from H 0 or the between population sum of squares,

q2

: sum of squares due to error or the within population sum of squares,

q1+q2 : sum of squares due to H 0 or the total sum of squares.


Using the Theorems 6 and 7
7, let
p

Q1 = ni (Yio Yoo ) 2
i =1
p

Q2 = Si2
i =1

where
ni

Si2 = (Yij Yio ) 2 ,


i =1

Yoo =

1 p ni
Yij ,
n i =1 j =1

Yio =

1
ni

ni

Y
j =1

ij

12
then under H 0
Q1

Q2

and

~ 2 ( p 1)
~ 2 (n p)

Q1 and Q2

are independently distributed.

Thus under H 0
Q1
2

p 1

~ F ( p 1, n p).
Q2
2

n p

The likelihood ratio test reject H 0 whenever

q1
>C
q2
where the constant C = F1 ( p 1, n p ).

13
The analysis of variance table for the one way classification in fixed effect model is

Source of
variation

Degrees of
freedom

Sum of
squares

Mean
squares

Between
populations

p -1

q1

q1
p 1

Within
populations

n-p

Total

n-1

C = F1 ( p, n p )

q2
H0 : = 0

q1 + q2

Note that
Q
E 2 =2
n p
p

Q
E 1 =2 +
p 1
1 p
=
i .
p i =1

(
i =1

p 1

q2
(n p)

F - value
n p q1

.
p 1 q2

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE II
LECTURE - 9

GENERAL LINEAR HYPOTHESIS


AND ANALYSIS OF VARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Case of rejection of H 0
If F > F1 ( p 1, n p), then H 0 : 1 = 2 = ... = p is rejected. This means that at least one i is different from other effects
which is responsible for the rejection. So objective is to investigate and find out such i and divide the population into
groups such that the means of populations within the group are same. This can be done by pairwise testing of s.
Test

H 0 : i = k (i k ) against H1 : i k .

This can be tested using following t-statistic


Yio Yko

t=

1 1
s2 +
ni nk

which follows the t distribution with (n p ) degrees of freedom under H 0 and s =


2

Thus the decision rule is to reject H 0 at level if the observed difference


( yioi yko
k ) >t

1 , n p
2

The quantity t

1 , n p
2

1 1
s2 +
ni nk

1 1
s 2 + is called the critical difference.
ni nk

q2
.
n p

Thus following steps are followed:


1. Compute all possible critical differences arising out of all possible pairs ( i , k ), i k = 1, 2,..., p.
2. Compare them with their observed differences.
3 Divide the p populations into different groups such that the populations in the same group have same means
3.
means.
The computation are simplified if ni = n for all i. In such a case , the common critical difference (CCD) is

CCD = t

1 , n p
2

2s 2
n

) i k are compared with CCD


and the observed difference ( yio y ko ),
CCD.

If

yio yko > CCD

then the corresponding effects/means y io and y ko are coming from populations with the different means.

Note: In general we say that if there are three effects 1 , 2 , 3 then


if H 01 : 1 = 2 ( denote as event A)

is accepted

and if H 02 : 2 = 3 ( denote as event B ) is accepted


then H 03 : 1 = 2 ( denote as event C ) will be accepted.
The question arises here that in what sense do we conclude such statement about the acceptance of H 03 .
The reason is as follows:
Since event A B C , so
P ( A B ) P (C )

In this sense if the probability of an event is higher than the intersection of the events, i.e., the probability that H 03 is
accepted is higher than the probability of acceptance of H 01 and H02 both, so we conclude, in general , that the
acceptance of H 01 and H02 imply the acceptance of H 03 .

Multiple
p comparison
p
tests
One interest in the analysis of variance is to decide whether population means are equal or not. If the hypothesis of
equal means is rejected then one would like to divide the populations into subgroups such that all populations with
same means come to the same subgroup. This can be achieved by the multiple comparison tests.

A multiple comparison test procedure conducts the test of hypothesis for all the pairs of effects and compare them at a
significance level , i.e., it works on per comparison basis.

This is based mainly on the t-statistic.


t statistic If we want to ensure that the significance level simultaneously for all group
comparison of interest, the approximate multiple test procedure is one that controls the error rate per experiment
basis.

There are various available multiple comparison tests. We will discuss some of them in the context of one way
classification. In two way or higher classification, they can be used on similar lines.

1. Studentized range test


It is assumed in the Studentized range test that the p samples, each of size n, have been drawn from p normal
populations. Let their sample means be y1o , y 2 o ,..., y po These means are ranked and arranged in an ascending order
as y1* , y2* ,..., y *p where y1* = Min yio and y *p = Max yio , i = 1, 2,..., p.
i

Find the range.


R = y *p y1* .

The Studentized range is defined as


q p, n p =

R n
s

where q , p , is the upper 100 % point of Studentized range when = n p. The tables for q
The testing procedure involves the comparison of q p , with q

, p ,

if q p , n p < q , p , n p then conclude that 1 = 2 = ... = p ,


if q p , n p > q , p , n p then all s in the group are not the same.

in the usual way as follows:

, p ,

are available.

2. Studentized - Newman - Keuls test


The Student-Newman-Keuls test is similar to Studentized range test in the sense that the range is compared with 100 %
points on critical Studentized range W p given by
.

W p = q , p ,

s2
.
n

The observed range R = y *p y1* is now compared with W p .


If R < W p then stop the process of comparison and conclude that 1 = 2 = ... = p .
if R > W p then
i.

*
*
*
divide the ranked means y1 , y2 ,..., y p into two subgroups containing

( y *p , y *p 1 ,..., y 2* ) and ( y *p 1 , y *p 2 ,..., y1* ) .

ii.

Compute the ranges R1 = y *p y2* and R2 = y *p 1 y1* . Then compare the ranges R1 and R2 with W p 1.

If either range R1 or R2 is smaller than W p 1, then means (or i s) in each of the groups are equal.
If R1 and / or R2 are greater then W p 1 , then the ( p 1) means (or i s) in the group concerned are divided
into two groups of (p 2) means each and compare the range of the two groups with W p 2 .
Continue with this procedure until a group of i means (or i s) is found whose range does not exceed Wi .
By this method, the difference between any two means under test is significant when the range of the observed means
of each and every subgroup containing the two means under test is significant according to the Studentized critical
range. This procedure can be easily understood by the following flow chart.

8
Arrange yio ' s in increasing order
y1* y2* ... y p*

Compute R = y *p y1*
Compare with W p = q , p ,

If R < W p
Stop and conclude
1 = 2 = ... = p

s2
n

If R > W p
continue

Divide ranked mean in 2 groups


( y ,.*p ,..., y2* ) and ( y *p 1 ,..., y1* )

Compute R1 = y *p y2*
R2 = y *p 1 y1*
Compare R1 and R2 with Wp 1

4 possibilities
ibiliti
off R1 and
d R 2 with
ith W p 1

R1 < Wp 1
R2 < W p 1
2 = 3 = ... = p
and
1 = 2 = ... = p 1
1 = 2 = ... = p

R1 < W p 1

R1 > W p 1

R2 > W p 1

R2 < W p 1

2 = 3 = ... = p

1 = 2 = ... = p 1

and at least one

and at least one

i j , i j = 1, 2,..., p 1

i j , i j = 2,3,..., p

which is 1

which is p

one subgroup is
( 2 , 3 ,..., p )

One subgroup has


( 1 , 2 ,..., p 1 )

and another group has only 1

and another has only p

R1 > W p 1
R2 > W p 1

Divide ranked means


in four groups
1. y *p ,..., y3*
2. y *p 1 ,..., y2*
3. y *p 1 ,..., y2* (same as in 2)
4. y *p 2 ,..., y1*

Compute
R3 = y *p y3*
R4 = y *p 1 y2*
R5 = y *p 2 y1*
and compare
with Wp 2

Continue till we get subgroups with same ' s

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE II
LECTURE - 10

GENERAL LINEAR HYPOTHESIS


AND ANALYSIS OF VARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

3. Duncans multiple comparison test


The test procedure in Duncans multiple comparison test is the same as in the Student-Newman-Keuls test except the
observed ranges are compared with Duncans 100 % critical range

D p = q p , p ,
*

s2
n

where p = 1 (1 ) p 1 , q* p , p, denotes the upper 100 % points of the Studentized range based on Duncans range.
Tables for Duncans range are available.
Duncan feels that this test is better than the Student-Newman-Keuls test for comparing the differences between any
two ranked means. Duncan regarded that the Student-Newman-Keuls method is too stringent in the sense that the
true differences between the means will tend to be missed too often. Duncan notes that in testing the equality of a
subset k , (2 k p) means through null hypothesis, we are in fact testing whether (p - 1) orthogonal contrasts between
the ' s differ from zero or not. If these contrasts were tested in separate independent experiments, each at level ,
th probability
the
b bilit off incorrectly
i
tl rejecting
j ti the
th nullll hypothesis
h
th i would
ld be
b 1 (1 ) pp1 . So
S Duncan
D
proposed
d to
t use
1 (1 ) p 1 in place of in the Student-Newman-Keuls test.
[Reference: Contributions to order statistics, Wiley 1962, Chapter 9 (Multiple decision and multiple comparisons, H.A.
David, pages 147-148)].
.

Case of unequal sample sizes


When sample means are not based on the same number of observations, the procedures based on Studentized range,
Student-Newman-Keuls test and Duncans test are not applicable. Kramer proposed that in Duncans method, if a set
of p means is to be tested for equality, then replace

q* * , p ,
p

s
1 1
1
*
+
by q * , p , s

p
2 nU nL
n

where nU and nL are the number of observations corresponding to the largest and smallest means in the data
data. This
procedure is only an approximate procedure but will tend to be conservative, since means based on a small number of
observations will tend to be overrepresented in the extreme groups of means.

Another option is to replace n by the harmonic mean of n1 , n 2 ,..., n p , i.e.,

p
1

i =1 ni
p

The Least Significant Difference (LSD)


In the usual testing of H 0 : i = k against H1 : i k , the t-statistic

t=

yio yko
Vm ( yio yko )
Var

is used which follows a t-distribution, say with degrees of freedom df. Thus H 0 is rejected whenever

t >t

df , 1

and it is concluded that 1 and 2 are significantly different. The inequality t > t

d f , 1

can be

equivalently written as

yio yko > t

df , 1

m ( y y ).
Var
V
io
ko )

If every pair of sample for which yio yko

exceeds t

df , 1

m(y y )
Var
io
ko

then this will indicate that the difference between i and k is significantly different . So according to this
this, the
quantity t

df , 1

m ( y y ) would be the least difference of y and y for which it will be declared that the
Var
ko
io
io
ko

difference between i and k is significant. Based on this idea, we use the pooled variance of the two sample
Var ( yio y ko ) as

LSD = t

s2 and the Least Significant Difference (LSD) is defined as

df , 1

1 1
s 2 + .
ni nk

If n1 = n2 = n, then

LSD = t

Now all

df , 1

2s 2
.
n

p ( p 1)
pairs of yio and yko , (i k = 1, 2,..., p) are compared with LSD. Use of LSD criterion may not lead to
2

good results if it is used for comparisons suggested by the data (largest/smallest sample mean) or if all pairwise
comparisons are done without correction of the test level. If LSD is used for all the pairwise comparisons then these
tests are not independent. Such correction for test levels was incorporated in Duncans test.

Tukeys Honestly Significant Difference (HSD)

In this procedure , the Studentized rank values q , n , are used in place of t-quantiles and the standard error of the
difference of pooled mean is used in place of standard error of mean in common critical difference for testing H 0 : i = k
against H 0 : i k and Tukeys Honestly Significant Difference is computed as

HSD = q

1 , p ,
2

MSerror
n

assuming all samples are of the same size n. All


then i and k are significantly different.

p ( p 1)
pairs
2

yio yko

are compared with HSD. If yio y ko > HSD

7
We notice that all the multiple comparison test procedure discussed up to now are based on the testing of hypothesis.
Th
There
i one-to-one
is
t
relationship
l ti
hi b
between
t
th
the ttesting
ti off h
hypothesis
th i and
d th
the confidence
fid
interval
i t
l estimation.
ti ti
So
S the
th
confidence interval can also be used for such comparisons. Since H 0 : i = k is same as H 0 : i k = 0, so first we
establish the relationship and then describe the Tukeys and Scheffes procedures for multiple comparison test which
are based on the confidence interval. We need the following concepts.

Contrast
p

A linear parametric function L = l ' = A i i


i =1

where = ( 1 , 2 ,..., P ) and A = (A 1 , A 2 ,..., A p ) are the p 1 vectors of


p

parameters and constants respectively is said to be a contrast when

= 0.

i=1

For example 1 2 = 0, 1 + 2 3 1 = 0, 1 + 2 2 3 3 = 0
etc are contrast whereas 1 + 2 = 0, 1 + 2 + 3 + 4 = 0, 1 2 2 3 3 = 0 etc.
etc.
etc are not contrasts.
contrasts

Orthogonal contrast
p

If

L1 = A ' = A i i and L2 = m ' = mi i are contrasts such that


i =1

i =1

A m = 0 or

A m
i =1

=0

then L1 and L2

are called orthogonal contrasts.


For example, L1 = 1 + 2 3 4 and L2 = 1 2 + 3 4 are contrasts. They are also the orthogonal contrasts.
p

The condition

A m
i =1

= 0 ensures that

L1 and L2 are independent in the sense that Cov ( L1 , L2 ) = 2 A i mi = 0.


i =1

Mutually orthogonal contrasts


If there
h
are more than
h two contrasts then
h they
h are said
id to be
b mutually
ll orthogonal,
h
l if they
h are pair-wise
i i orthogonal.
h
l
It may be noted that the number of mutually orthogonal contrasts is the number of degrees of freedom.

Coming back to the multiple comparison test, if the null hypothesis of equality of all effects is rejected then it is
reasonable to look for the contrasts which are responsible for the rejection. In terms of contrasts, it is desirable to have
a procedure
i.

that permits the selection of the contrasts after the data is available.

ii
ii.

with which a known level of significance is associated.


associated

Such procedures are Tukeys and Scheffes procedures. Before discussing these procedure, let us consider the
following example which illustrates the relationship between the testing of hypothesis and confidence intervals.
.

Example

Consider the test of hypothesis for

H 0 : i = j (i j = 1, 2,..., p)
or H 0 : i j = 0
or H 0 : contrast = 0
or H 0 : L = 0.

The test statistic for H 0 : i = j is

t=

( i j ) ( i j )
m(y y )
Var
io
ko

L L
m ( L )
Var

9
where denotes the maximum likelihood (or least squares) estimator of and t follows a t-distribution with df
degrees
g
of freedom. This statistic,, infact,, can be extended to anyy linear contrast,, sayy e.g.,
g,

L = 1 + 2 3 4 , L = 1 + 2 3 4 .
The decision rule is
reject H 0 : L = 0 against

H1 : L 0

if

m ( L ).
L > tdf V
Var
)

The 100(1 )% confidence interval of L is obtained as

L L
P tdf
tdf = 1
m ( L )

Var

or

m ( L ) L L + t Var
m ( L ) = 1
P L tdf Var
df

so that the 100(1 ) % confidence interval of L is

L t Var
m ( L ), L + t Var
m ( L )
df
df

and
m ( L ) L L + t Var
m ( L ).
L t df Var
df

10

If this interval includes L= 0 between lower and upper


pp confidence limits,, then H0 : L = 0 is accepted.
p
Our objective
j
is
to know if the confidence interval contains zero or not.
Suppose for some given data the confidence intervals for 1 2 and 1 3 are obtained as

3 1 2 2

and 2 1 3 4.

Thus we find that the interval of 1 2 includes zero which implies that H 0 : 1 2 = 0 is accepted. Thus 1 = 2 . On
the other hand interval of 1 3 does not include zero and so H 0 : 1 3 = 0 is not accepted. Thus 1 3 .
If the interval of 1 3 is 1 1 3 1 then H 0 : 1 = 3 is accepted. If both H 0 : 1 = 2 and H 0 : 1 = 3 are
accepted then we can conclude that 1 = 2 = 3 .

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE II
LECTURE - 11

GENERAL LINEAR HYPOTHESIS


AND ANALYSIS OF VARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Tukeys
Tukey
s procedure for multiple comparison (T
(T- method)
The T-method uses the distribution of studentized range statistic . (The S-method (discussed next) utilizes the Fdistribution). The T-method can be used to make the simultaneous confidence statements about contrasts ( i j )
among a set of parameters {1 , 2 ,..., p } and an estimate s2 of error variance if certain restrictions are satisfied.

These restrictions have to be viewed according to the given conditions. For example, one of the restrictions is that all
2
i ' s have equal
q
variances. In the setup
p of one way
y classification,, i has its mean Yi and its variance is
. This
ni
reduces to a simple condition that all ni ' s are same, i.e., ni = n for all i so that all the variances are same.
Another assumption is to assume that 1 , 2 ,..., p are statistically independent and the only contrasts considered are
p ( p 1)
i 'j , i j = 1,
1 22,..., p} .
th
the
diff
differences
{
2
We make the following assumptions:
i.

The 1 , 2 ,..., p are statistically independent

ii
ii.

i a known
k
constant.
t t
i ~ N ( i , a 2 2 ), i = 1, 2,..., p , a > 0 is

iii.

s2 is an independent estimate of 2 with degrees of freedom i.e.,

iv.

s2 is statistically independent of 1 , 2 ,..., p .

s2
~ 2 ( ). (Here = n - p)
2

and

Th statement
The
t t
t off T-method
T
th d iis as ffollows:
ll
Under the assumptions (i)-(iv), the probability is (1 ) that the values of contrasts
p

i =1

i =1

L = Ci i ( Ci = 0) simultaneously satisfy
1 p

1 p

L Ts Ci L L + Ts Ci
2 i =1

2 i =1

where L = Ci i , i is the maximum likelihood (or least squares) estimate of i , T = aq , p , , with q , p , being the
i =1

upper 100 % point off the distribution off Studentized


S
range.
Note that if L is a contrast like i j (i j ) then

1 p
Ci = 1 and the variance is 2 so that a = 1 and the interval

2 i =1

simplifies to
( i j ) Ts
T i j ( i j ) + T
Ts
where T = q , p , . Thus the maximum likelihood (or least squares) estimate L = i j of L = i j is said to be
significantly different from zero according to T-criterion if the interval ( i j Ts, i j + Ts ) does not cover i j = 0,
i e if
i.e.,

i j > Ts

or more general if

1 p
L > Ts Ci
2 i =1

The steps involved in the testing now involve following steps:

Compute L or i j .
Compute all possible pairwise differences.
Compare all the differences with

s 1 p
q , p , .
Ci .
n 2 i =1

1 p

If L or ( i j ) > Ts Ci
2 i =1

then i and j are significantly different where T =

q , p ,
n

Tables for T are available.


When sample sizes are not equal, then Tukey-Kramer Procedure suggests to compare L with

q , p , s

1 1 1 1 p
+ Ci
2 ni n j 2 i =1

or
T

1 1 1 1 p
+ Ci
2 ni n j 2 i =1

The Scheffes method (S-method) of multiple comparison


S-method generally gives shorter confidence intervals then T-method. It can be used in a number of situations where
T-method is not applicable, e.g., when the sample sizes are not equal.
A set L of estimable functions { } is called a p-dimensional space of estimable functions if there exists p linearly
p

independent estimable functions ( 1 , 2 ,..., p ) such that every in L is of the form = Ci yi where C1 , C2 ,..., C p
i =1

are known constants. In other words, L is the set of all linear combinations of 1 , 2 ,..., p .

Under the assumption that the parametric space is Y ~ N ( X , 2 I ) with rank ( X ) = p , = ( 1 ,..., p ), X
is n p matrix, consider a p-dimensional space L of estimable functions generated by a set of p linearly independent
estimable functions { 1 , 2 ,..., p }.
n

For any L, Let = Ci yi be its least squares (or maximum likelihood) estimator,
i =1

Var ( ) = 2 Ci2
i =1

= (say)
2

and

= s
2

where

s2

C
i =1

2
i

is the mean square due to error with (n p) degrees of freedom.

The statement of S-method is as follows:


Under the parametric space , the probability is (1 ) that simultaneously for all L,

S + S
where the constant S =

pF1 ( p , n p ).

Method
For a given space L of estimable functions and confidence coefficient (1 ) , the least square (or maximum
likelihood) estimate of

L will be said to be significantly different from zero according to S-criterion if the

confidence interval

( S + S )
does not cover = 0, i.e., if > S .

The S-method is less sensitive to the violation of assumptions of normality and homogeneity of variances.

Comparison
p
of Tukeys
y and Scheffes methods
1. Tukeys method can be used only with equal sample size for all factor level but S-method is applicable whether the
sample sizes are equal or not.
2. Although, Tukeys method is applicable for any general contrast, the procedure is more powerful when comparing
simple pairwise differences and not making more complex comparisons.
3. It only pairwise comparisons are of interest, and all factor levels have equal sample sizes, Tukeys method gives
shorter confidence interval and thus is more powerful.
4. In the case of comparisons involving general contrasts, Scheffes method tends to give narrower confidence interval
and p
provides a more p
powerful test.
5. Scheffes method is less sensitive to the violations of assumptions of normal distribution and homogeneity of
variances.
.

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE III
LECTURE - 12

EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

2
We consider the models which are used in designing an experiment.

The experimental conditions, experimental setup and the objective of the study essentially determine that what type of
design is to be used and hence which type of design model can be used for the further statistical analysis to conclude
about the decisions. These models are based on one-way classification, two way classifications (with or without
interactions), etc. We discuss them now in detail in few setups which can be extended further to any order of
classification We discuss them now under the set up of one-way and two-way classifications.
classification.
classifications

It may be noted that it has already been described how to develop the likelihood ratio tests for the testing the hypothesis
of equality of more than two means from normal distributions and now we will concentrate more on deriving the same
tests through the least squares principle under the setup of linear regression model. The design matrix is assumed to be
not necessarily of full rank and consists of 0s and 1s only.

One-way classification
Let p random samples from p normal populations with same variances but different means and different sample sizes
have been independently drawn.
Let the observations follow the linear regression model setup and
Yij: denotes the jth observation of dependent variable Y when effect of ith level of factor is present.
Then Yij are independently
p
y normally
y distributed with

E (Yij ) = + i , i = 1, 2,..., p, j = 1, 2,..., ni


Var (Yij ) = 2
where

- is the general mean effect.


- is fixed.
- gives
i
an idea
id about
b t the
th generall conditions
diti
off the
th experimental
i
t l units
it and
d treatments.
t t
t

i - is the effect of ith level of the factor.


- can be fixed or random.

4
Example

Consider a medicine experiment in which

are three different dosages of medicines - 2 mg., 5 mg., 10 mg.

which are given to the patients having fever.


fever These are the 3 levels of medicines.
medicines Let Y denotes the time taken by the
medicine to reduce the body temperature from high to normal. Suppose two patients have been given 2 mg. of dosage,
so Y1 j and Y12 will denote their responses. So we can write that when 1 = 2mg. is given to the two patients, then

E (Y1 j ) = + 1 ; j = 1, 2.
Similarly, if 5 mg. and 10 mg. of dosages are given to 4 and 7 patients respectively then the responses follow the model

E (Y2 j ) = + 2 ; j = 1, 2,3, 4
E (Y3 j ) = + 3 ; j = 1, 2,3, 4,5, 6, 7.
Here will denote the general mean effect which may be thought as follows: The human body has tendency to fight against
the fever, so the time taken by the medicine to bring down the temperature depends on many factors like body weight, height
etc. So denotes the general effect of all these factors which are present in all the observations.

In the terminology of linear regression model, denotes the intercept term which is the value of the response variable when
all the independent variables are set to take value zero. In experimental designs, the models with intercept term are more
g
y we consider these types
yp of models.
commonlyy used and so generally

Also, we can express

Yij = + i + ij ; i = 1, 2,..., p , j = 1, 2,..., ni

ij is the random error component in Yij . It indicates the variations due to uncontrolled causes which can
influence the observations. We assume that ij ' s are identically and independently distributed as N (0, 2 ) with
where

E ( ij ) = 0, Var ( ij ) = 2 .
Note that the general linear model considered is

E (Y ) = X
for which

Yij can be written as


E (Yij ) = i

when all the entries in X are 0s or 1s. This model can also be re-expressed in the form of

E (Yij ) = + i .
This gives rise to some more issues.

Consider

E (Yij ) = i
= + ( i )
where

= + i

1 p
= i
p i =1

i i .

Now let us see the changes in the structure of design matrix and the vector of regression coefficients
coefficients.
The model E (Yij ) = i = + i can be now written as
E (Y ) = X * *
Cov (Y ) = 2 I

where
h
t and
d
* = ( , 1 , 2 ,..., p ) iis a p x 1 vector

1
1
X* =
#

is a n ( p + 1) matrix, and X denotes the earlier defined design matrix with

first n1 rows as (1,0,0,,0),


second n2 rows as (0,1,0,,0)
, and
last np rows as (0,0,0,,1).
We earlier assumed that rank(X) = p but can we also say that rank(X*) = p in the present case?
Since the first column of X* is the vector sum of all its remaining p columns, so

rank ( X *) = p
It is thus apparent
pp
that all the linear p
parametric functions of 1 , 2 ,,...,, p are not estimable. The q
question now arises
that what kind of linear parametric functions are estimable?

L = aijYij

Consider any linear estimator

ni

ni

i =1 j =1

with

Ci = aij .
j=1

Now
ni

E (L) =

i =1 j =1

ij

ni

a
i =1 j =1
p

ij

E (Yij )
( + i )

ni

= aij +
i =1 j =1
p

i =1

i =1

ni

a
i = 1 j =1

ij

= ( C i ) + C i i .
p

Thus

= 0,

i =1
p

C
i =1

is estimable if and only if

ie,
i.e.

C
i =1

is a contrast.
p

Thus, in general neither

i =1

nor any

, 1 , 2 ,..., p is estimable. If it is a contrast, then it is estimable.

This effect and outcome can also be seen from the following explanation based on the estimation of parameters , 1 , 2 ,..., p .

8
Consider the least squares estimation , 1 , 2 ,..., p of 1 , 1 , 2 , ..., p , respectively.
Minimize the sum of squares due to
p

ni

ij ' s

ni

S = = ( yij i )2
2
ij

i =1 j =1

i =1 j =1

t obtain
to
bt i , 1 ,..., p .
p ni
S
= 0 ( yij i ) = 0

i =1 j =1

(a)

ni
S
(b)
= 0 ( yij i ) = 0 , i = 1, 2 ,..., p.
i
j =1

Note that (a) can be obtained from (b) or vice versa. So (a) and (b) are linearly dependent in the sense that there are
(p +1) unknowns and p linearly independent equations. Consequently , 1 ,..., p do not have a unique solution.
Same applies to the maximum likelihood estimation of , 1 ,... p . .
If a side condition that
p

ni i = 0 or
i =1

n
i =1

=0

is imposed then ( a ) and ( b ) have a unique solution as

=
i =

p
1 p ni
,
say
where
y
=
y
n
=
ni ,
ij oo

n i =1 j =1
i =1

1
np

ni

y
j =1

ij

= yio yoo .

In case, all the sample sizes are the same, then the condition

nii = 0 or
i =1

ni i = 0 reduces to
i =1

i = 0 or
i =1

i =1

So the model yij = + i + ij needs to be rewritten so that all the parameters can be uniquely estimated. Thus

Yij = + i + ij
= ( + ) + ( i ) + ij
= * + i* + ij
where

* = +
i* = i
=

1 p
i
p i =1

and
p

i =1

*
i

= 0.

This is a reparameterized form of the linear model.

= 0.

10

Thus in a linear model, when X is not of full rank, then the parameters do not have unique estimates. A restriction
p

i = 0 (or equivalently
i =1

n
i =1

= 0 in case all nis are not same) can be added and then the least squares (or

maximum likelihood) estimators obtained are unique. The model

E (Yij ) = * + ;
*
i

i =1

*
1

=0

is called a reparametrization of the original linear model.

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE III
LECTURE - 13

EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

2
Let us now consider the analysis of variance with additional constraint. Let

Yij = i + ij , i = 1, 2 ,..., p; j = 1, 2 ,...,ni


= + ( i ) + ij
= + i + ij
with

= =

1
p

,
i =1

i = i ,
p

n
i =1

= 0,

n = ni
i =1

and

ij s are identically and independently distributed with mean 0 and variance 2 .

The null hypothesis is

H 0 : 1 = 2 = ... = p = 0
and the alternative hypothesis is
H 1 : atleast one i j for all i , j.

This model is a one


one-way
way layout in the sense that the observations yij ' s are assumed to be affected by only one
treatment effect i . So the null hypothesis is equivalent to testing the equality of p population means or equivalently the
equality of p treatment effects.
We use the principal of least squares to estimate the parameters , 1 , 2 ,... p .
Minimize the error sum of squares
p

ni

ni

E = = ( yij i )2
2
ij

i =1 j =1

i =1 j =1

with respect to , 1 , 2 ,..., p .

The normal equations are obtained as

E
= 0 2

ni

i =1

j =1

(y

ij

i ) = 0

or
p

ni

n + ni i = yij
i =1

(1)

i =1 j =1

ni
E
= 0 2 ( yij i ) = 0
i
j =1

or
ni

ni + ni i = yij
j =1

(i = 1, 2,..., p ).

(2)

U i
Using

n
i

i =1

= 0 in
i (1) gives
i

1 p ni
G
= yij = = yoo
n i =1 j =1
n
where G =

ni

y
i =1 j =1

is the grand total of all the observations.

ij

Substituting in (2) gives

i =
=

ni

1
ni

y
j =1

ij

Ti

ni

= yio yoo
where Ti =

ni

ijj

is the treatment total due to ith effect

j=1

and yio =

1
ni

ni

y .
j =1

ij

i , i.e., total of all the observations receiving the ith treatment

5
Now the fitted model is yij = + i and the error sum of squares after substituting and i in E becomes
p

ni

E = ( yij i ) 2
i =1 j =1
p

ni

= ( yij yoo ) ( yio yoo )

i =1 j =1
p

ni

ni

= ( yij yoo ) 2 ( yio yoo ) 2


i =1 j =1

i =1 j =1

p ni 2 G 2 p Ti 2 G 2
= yij

n i =1 ni
n
i =1 j =1
where the total sum of squares (TSS)
p

ni

TSS = ( yij yoo ) 2


i =1 j =1
ni

G2
= y
,
n
i =1 j =1
p

2
ij

G2
and
is called as correction factor (CF).
n

6
To obtain a measure of variation due to treatments, let

H 0 = 1 = 2 = ... = p = 0
be true. Then the model becomes
Yij = + ij , i = 1, 2,..., p ; j = 1, 2,..., ni .

Minimizing the error sum of squares


ni

E1 = ( yij )2
i =1 j =1

with respect to

the normal equation is obtained as

p
E1
= 0 2

i =1

ni

(y
j =1

ij

) = 0

G
= y oo .
n

Substituting in E1 , the error sum of squares becomes


p

ni

E1 = ( yij ) 2
i =1 j =1
p

ni

= ( yij yoo ) 2
i =1 j =1
ni

G2
= y .
n
i =1 j =1
p

2
ij

Note that
E1: Contains variation due to treatment and error both
E: Contains variation due to treatment only
So E1 E : contain variation due to treatment only.
The sum of squares due to treatment is given by

SSTr = E1 E
p

ni

SSTr = ( yio yoo ) 2


i =1 j =1

Ti 2
G2
=

.
n
i =1 ni
p

The following quantity is called the error sum of squares or sum of squares due to error (SSE)
n

SSE =
i =1

ni

(y
j =1

ij

yio ) 2 .

These sum of squares forms the basis for the development of tools in the analysis of variance. We can write

TSS = SSTr + SSE.

8
The distribution of degrees of freedom among these sum of squares is as follows:
The total sum of squares is based on n quantities subject to the constraint that

ni

i =1

j =1

(y

ij

yoo ) = 0 so TSS carries

(n 1) degrees of freedom.
p

The sum of squares due to the treatments is based on p quantities subject to the constraint

n (y
i =1

i
io

yoo ) = 0 so

SSTr has ( p 1) degrees of freedom.


ni

The sum of squares due to errors is based on n quantities subject to p constraints

(y
j =1

ij

yio ) = 0, i = 1, 2,..., p so

SSE carries (n p) degrees of freedom.


Also note that

TSS = SSTr + SSE ,


the TSS has been divided into two orthogonal components - SSTr and SSE. Moreover, all TSS, SSTr and SSE can be
expressed in a quadratic form. Since

ij are assumed to be identically and independently distributed following N (0, 2 ),

N ( + i , 2 ).
)
so yij are also
l iindependently
d
d tl di
distributed
t ib t d ffollowing
ll i
Now using the Theorems 7 and 8 with q1 = SSTr , q2 = SSE , we have under H 0 ,

SSTr

~ 2 ( p 1) and

SSE

~ 2 ( n p ).
)

Moreover, SSTr and SSE are independently distributed.

9
The mean square is defined as the sum of squares divided by the degrees of freedom. So the mean square due to
treatment is

MSTr =

SSTr
p 1

and the mean square due to error is

MSE =

SSE
.
n p

Thus, under H 0 ,

MST
MSTr

F=
~ F ( p 1, n p).
MSE
2

The decision rule is that reject H 0 , if

F > F1 , p 1, n p
at 100 % level of significance.
significance
If H 0 does not hold true, then

MSTr
~ noncentral F ( p 1, n p, )
MSE
p

where

=
i =1

ni i2

is the noncentrality parameter.

10

Note that the test statistic

MSTr
can also be obtained from the likelihood ratio test.
MSE

If H 0 is rejected , then we go for multiple comparison tests and try to divide the population into several groups having
the same effects.

The analysis of variance table is as follows

Source of
variation

Degrees of
freedom

Sum of
squares

Mean
squares

F - value

Treatments

p -1

SSTr

MSTr

MSTr
MSE

Error

n - p

C = F1 ( p, n p )
SSE
H0 : = 0

Total

n-1

TSS

MSE

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE III
LECTURE - 14

EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Now we find the expectations


p
of SSTr and SSE.

E ( SSTr ) = E ni ( yio yoo ) 2


i =1

p
2
= E ni {( + i + io ) ( + oo )}
i =1

where

1
ioi =
ni

ni

j =1

ij

1 p ni
oo = ij
n i =1 j =1
and
p

i =1

ni i
= 0.
n

p
2
E ( SSTr ) = E ni { i + ( io oo )}
i =1

n E (
i =1

2
i

)+

n E (
i =1

io

oo ) 2 + 0.

Since

1 ni
1
2
2
E ( ) = Var ( io ) = Var ij = 2 ni =
ni
ni j =1 ni
1 p ni
1
2
2
E ( oo ) = Var ( oo ) = Var ij = 2 n 2 =
n
n i =1 j =1 n
E ( io oo ) = Cov( io , oo )
2
io

p ni
ni

1
Cov ij ij
=
ni n
j =1 i =1 j =1

ni 2 2
=
=
.
ni n
n
E ( SSTr ) = ni +
2
i

n n

i =1

i=1
p

= ni i2 + ( p 1) 2
i =1

or

2
i

SSTr
2
i =1
E
= +
p 1
p 1
i

or

E ( MSTr ) = 2 +

n
i =1

2
i

p 1

1

n

4
Next

p ni

E ( SSE ) = E ( yij yio ) 2


i =1 j =1

p ni
2
= E {( + i + ij ) ( + i + io )}
i =1 j =1

p ni

= E ( ij io ) 2
i =1 j =1

ni

= E ( ij2 + io2 2 ij io )
i =1 j =1

2 2 2 2
= +

ni
ni
i =1 j =1
ni

ni 1

i =1 j =1 ni

p
n (n 1)
= 2 i i
ni
i =1
p

ni

(n 1)
i =1

= (n p) 2
SSE
2
=
n p

or E

or E ( MSE ) = 2 .
Thus MSE is an unbiased estimator of

2.

Two-way classification under fixed-effects model


Suppose the response of an outcome is affected by the two factors A and B. For example, suppose I varities of
mangoes are grown on I different plots of same size in each of the J different locations. All plots are given same
treatment like equal amount of water, equal amount of fertilizer etc. So there are two factors in the experiment which
affect the yield of mangoes:
- Location (A)
- Variety of mangoes (B).

Such an experiment is called two factor experiment. The different locations correspond to the different levels of A
and the different varieties correspond to the different levels of factor B. The observations are collected on the basis of
per plot.
The combined effect of the two factors (A and B in our case) is called the interaction effect (of A and B).
Mathematically, let a and b be the levels of factors A and B respectively then a function f ( a, b) is called a function of no
interaction if and only if there exists functions g (a ) and h(b) such that f ( a , b) = g ( a ) + h ( b).
Otherwise the factors are said to interact.
For a function f ( a, b) of no interaction,

f ( a1 , b) = g ( a1 ) + h(b)
f ( a2 , b) = g ( a2 ) + h(b)
f ( a1 , b) f ( a2 , b) = g ( a1 ) g ( a2 )
and so it is independent of b. Such no interaction functions are called additive functions.

Now there are two options:


Only one observation per plot is collected.
More than one observations per plot are collected.
If there is only one observation per plot then there cannot be any interaction effect among the observations and we
assume it to be zero.
If there are more than one observations per plot then interaction effect among the observations can be considered.

We consider here two cases:


1. One observation per plot in which the interaction effect is zero.
2. More than one observations per plot in which the interaction effect is present.

Two-way classification without interaction


Let y ij be the response of observation from the ith level of first factor, say A and the jth level of second factor, say B. So
assume y ij are independently distributed as N ( ij , 2 ) i = 1, 2,..., I , j = 1, 2,..., J .

This can be represented in the form of a linear model as

E (Yij ) = ij
= oo + ( io oo ) + ( oj oo ) + ( ij io oj + oo )
= + i + j + ij
where

= oo
i = io oo
j = oj oo
ij = ij io oj + oo
with
I

i =1

i =1

j =1

j =1

i = (io oo ) = 0
j = (oj oo ) = 0.

Here

i : effect of ith level of factor A


or excess of mean of ith level of A over the general mean.

j : effect of jth level of B


or excess of mean of jth level of B over the general mean.

ij : Interaction effect of ith level of A and jth level of B.

Here we assume

ij = 0

as we have only one observation per plot.

We also assume that the model E (Yij ) = ij is a full rank model so that

ij and all linear parametric functions of ij are

estimable.

The total number of observations are I J which can be arranged in a two way classified I J table where the rows
corresponds to the different levels of A and the column corresponds to the different levels of B.

9
The observations on Y design matrix X in this case are

2 " j

" 0
" 0

1
0

0 " 0
1 " 0

% #
" 0

#
0

# % #
0 " 1

#
0

# % #
0 " 1

#
1

# % #
0 " 0

0
#

0 " 1
# % #

0
#

1 " 0
# % #

" 1

0 " 1

1 2 " I

y11

1
1

1
1

0
0

#
1
#

#
1

#
0

y12
#
y1J
#
yI 1
yI 2

1
1

#
yIJ

#
1

If the design matrix is not of full rank, then the model can be reparameterized. In such a case, we can start the analysis
by assuming that the model E (Yij ) = + i + j is obtained after reparameterization.
There are two null hypothesis of interest:

H 0 : 1 = 2 = ... = I = 0
H 0 : 1 = 2 = ... = J = 0

10
Now we derive the least squares estimators (or equivalently the maximum likelihood estimator) of

, i

and

j , i = 1,
1 2,...,
2 I , j = 11, 22,..., J by
b minimizing
i i i i th
the error sum off squares
I

E=
i =1

(y

ij

j =1

i j )2 .

The normal equations are obtained as

E
= 0 2

E
= 0 2
i
E
= 0 2
j

i =1

j =1

(y
J

(y
j =1

ij

(y

ij

i =1

ij

i j ) = 0

i j ) = 0 , i = 1, 2,..., I
i j ) = 0 , j = 1, 2,..., J .
I

Solving the normal equations and using

i =1

where

1
IJ

y
i =1

j =1

ij

= 0 and

j =1

= 0, the least squares estimator are obtained as

G
= yoo
IJ

i =

T
1 J
yij yoo = i yoo = yio yoo i = 1, 2,..., I

J j =1
J

j =

Bj
1 I
yoo = yoj yoo , j = 1, 2,..., J
yij yoo =

I i =1
I

Ti : treatment totals due to ith effect, i.e., sum of all the observations receiving the ith treatment.
B j : block totals due to jth effect, i.e., sum of all the observations in the jth block.

11

Thus the error sum of squares is


I

SSE = Min ( yij j ) 2


, j
I

i =1 j =1

= ( yij i i j ) 2
i =1 j =1
I

( y

i =1 j =1
I

ij

yoo ) ( yio yoo ) ( yoj yoo )

= ( yijj yio yojj + yoo ) 2


i =1 j =1
I

= ( yij yoo ) J ( yio yoo ) I ( yoj yoo ) 2


2

i =1 j =1

i =1 j =1

which carries

IJ ( I 1) ( J 1) 1 = ( I 1)( J 1)
degrees of freedom.

i =1 j =1

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE III
LECTURE - 15

EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

2
Next we consider the estimation of and j under the null hypothesis H 0 : 1 = 2 = ... = I = 0 by minimizing the

error sum of squares


I

E1 = ( yij j ) 2 .
i =1 j =1

The normal equation are obtained by

E1
E1
= 0 and
= 0, j = 1, 2,..., J

j
which on solving gives the least square estimates

= yoo
j = yoj yoo .
The sum of squares due to H 0 is
I

min E1 = ( yij j ) 2
,

i =1 j =1
I

= ( yij j )2
i =1 j =1
I

= J ( yio yoo ) + ( yij yio yoj + yoo ) 2 .


2

i =1

i =1 j =1

Sum of squares

Error sum of

due to factor A

squares

Thus the sum of squares


q
due to deviation from
I

q
due to rows or sum of squares
q
are to factor A)
H 0 ((or sum of squares

SSA = J ( yio yoo ) = J yio2 IJyoo2


2

i =1

i =1

and carries

( IJ J ) ( I 1)( J 1) = I 1
degrees of freedom.
Now we find the estimates of and i under H 0 : 1 = 2 = ... = J = 0 by minimizing
I

E2 = ( yij i ) 2 .
i =1 j =1

The normal equations are

E 2
E 2
= 0 and
= 0, i = 1, 2,..., I

i
which on solving give the estimators as

= yoo
i = yio yoo .

4
The minimum value of the error sum of squares is obtained by
I

Min E2 = ( yij i ) 2
, j

i =1 j =1
I

= ( yij yio ) 2
i =1 j =1
J

= I ( yoj yoo ) + ( yij yio yoj + yoo ) 2 .


2

j =1

i =1 j =1

Sum of squares due to factor B

Error sum of squares

The sum of squares due to deviation from H 0 (or the sum of squares due to columns or sum of squares due to
factor B) is
J

SSB = I ( yoj yoo )2 =I yoj2 IJ yoo2


j =1

and its degrees of freedom are

( IJ I ) ( I 1)( J 1) = J 1.

Note that the total sum of squares is


I

TSS = ( yij yoo ) 2


i =1 j =1
I

= ( yio yoo ) + ( yoj yoo ) + ( yij yio yoj + yoo )

i =1 j =1
I

i =1

j =1

= J ( yio yoo ) 2 +I ( yoj yoo ) 2 + ( yij yio yoj + yoo ) 2


= SSA + SSB + SSE.

i =1 j =1

5
The partitioning of degrees of freedom into the corresponding groups is

IJ 1 = ( I 1) + ( J 1) + ( I 1)( J 1).
Note that SSA, SSB and SSE are mutually orthogonal and that is why the degrees of freedom can be divided like this.
Now using the theory explained while discussing the likelihood ratio test or assuming

yij ' s

to be independently

distributed as N ( + i + j , ), i = 1, 2,..., I ; j = 1, 2,..., J , and using the Theorems 6 and 7, we can write
2

SSA

SSB

SSE

~ 2 ( I 1)
~ 2 ( J 1)
~ 2 (( I 1)( J 1)).

So the test statistic for H 0 is obtained as:

SSA / 2

I 1

F1 =
SSE / 2

( I 1)( J 1)
( I 1)( J 1) SSA
.
( I 1)
SSE
MSA
=
~ F (( I 1),
1) ( I 1) ( J 1)) under
d H 0
MSE
SSA
SSE
where MSA =
and MSE =
.
( I 1)( J 1)
I 1
=

6
Same statistic is also obtained using the likelihood ratio test for H 0 .
The decision rule is
Reject H 0 if F1 > F1 [ ( I 1), ( I 1) ( J 1) ] .

Under H1 , F1 follows a noncentral F distribution F ( , ( J 1), ( I 1)( J 1)) where


noncentrality parameter.
parameter

Similarly, the test statistic for H 0 is obtained as

SSB / 2

J 1

F2 =
SSE / 2

)( J 1))
( I 1)(
( I 1)( J 1) SSB
( J 1) SSE
MSB
=
~ F (( J 1), ( I 1)( J 1)) under H 0
MSE
SSB
where MSB =
.
J 1
=

The decision rule is


Reject

H 0 if F2 > F1 (( J 1), ( I 1)( J 1)).

The same test statistic can also be obtained from the likelihood ratio test.

J i2
i =1

is the associated

7
The analysis of variance table is as follows:

F - value

Source of variation

Degrees of
freedom

Sum of
squares

Mean
squares

Factor A (or rows)

( I - 1)

SSA

MSA

F1 =

MSA
MSE

SSB

MSB

F2 =

MSB
MSE

MSE

Factor B (or rows)

C = F1 ( p, n p )

(J - 1)
H0 : = 0

Error

(I - 1) (J - 1)

SSE
(By subtraction)

Total

n-1

TSS

8
It can be found on similar lines as in the case of one way classification that

J I 2
E ( MSA) = +
i
I 1 i =1
2

I J 2
S ) = +
E ( MSB
j
J 1 j =1
2

E ( MSE ) = 2 .

If the null hypothesis is rejected, then we use the multiple comparison tests to divide the
such that

i ' s (or j ' s) into groups

i ' s (or j ' s) belonging to the same group are equal and those belonging to different groups are different.

Generally, in practice, the interest of experimenter is more in using the multiple comparison test for treatment effects
rather on the block effects. So the multiple comparison test are used generally for the treatment effects only.

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE III
LECTURE - 16

EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Two-way classification with interactions


Consider the two-way classification with an equal number, say K observations per cell. Let y ijk : kth observation in
(i, j)th cell , i.e., receiving the treatments ith level of factor A and jth level of factor B, i = 1, 2,..., I ; j = 1, 2,..., J ; k = 1, 2,..., K
and yijk are independently drawn from N ( ij , 2 ) so that the linear model under consideration is

yijk = ij + ijk
where ijk are identically and independently distributed following N (0, 2 ). Thus

E ( yijk ) = ij
= oo + ( io oo ) + ( oj oo ) + ( ij io oj + oo )
= + i + j + ij
where

= oo
i = io oo
j = oj oo
ij = ij io oj + oo
with
I

i =1

j =1

i =1

j =1

= 0, j = 0, ij = 0, ij = 0.

Assume that the design matrix X is of full rank so that all the parametric functions of ij are estimable.

3
The null hypothesis are

H 0 : 1 = 2 = ... = I = 0
H 0 : 1 = 2 = ... = J = 0
H 0 : All ij = 0 for all i, j.
The corresponding alternative hypothesis is

H1 : At least one i j , for i j


H1 : At least one i j , for i j
H1 : At least one ij ik , for j k .
Minimizing the error sum of squares
I

E=
i =1

(y
j =1

k =1

ijk

i j ij ) 2 ,

the normal equations are obtained as

E
= 0,
0

E
= 0 for all i,
i
E
= 0 for all j and
j
E
= 0 for all i and j.
ij

4
The least squares estimates are obtained as

= yooo =

1
IJK

i =1

j =1

k =1

i = yioo yooo =

1
JK

j = yojo yooo =

1
IK

ijk

y
i =1

ijk

y
j =1

ijk

yooo
yooo

ij = yijo yioo yojo + yooo


=

1
K

y
i =1

ijk

j =1

yioo yojo + yooo .

The error sum of square is


I

( y

SSE = Min

, i , j , ij i =1 j =1 k =1
I

ijk

i j ij ) 2

= ( yijk i j ij ) 2
i =1 j =1 k =1
I

= ( yijk yijo ) 2
i =1 j =1 k =1

with

SSE

~ 2 ( IJ ( K 1)).

5
Now minimizing the error sum of squares under
I

E1 =
i =1

(y
j =1

ijk

k =1

with respect to , j and

ij

H 0 = 1 = 2 = ... = I = 0 , i.e., minimizing

j ij ) 2

and solving the normal equations

E1
E1
E1
= 0,
= 0 for all j and
= 0 for all i and j

j
ij
gives the least squares estimates as

= y ooo
j = yojo yooo
ij = yijo yioo yojo + yooo .
I

The sum of squares due to H 0 is Min

, j , ij

( y

ijk

i =1 j =1 k =1
I

j ij ) 2

= ( yijk j ij ) 2
i =1 j =1 k =1
I

= ( yijk yijo ) 2 + JK ( yioo yooo ) 2


i =1 j =1 k =1

= SSE + JK

i =1

(y
i =1

ioo

yooo ) 2 .

6
Thus the sum of squares due to deviation from

H 0 or the sum of squares due to effect A is


I

SSA = Sum of squares due to H 0 SSE = JK ( yioo yooo ) 2


i =1

with

SSA

~ 2 ( I 1).

Minimizing the error sum of squares under H 0 : 1 = 2 = ... = J = 0 i.e., minimizing


I

E2 =
i =1

(y
j =1

k =1

ijk

i ij ) 2

and solving the normal equations

E2
E2
E2
= 0,
= 0 for all j and
= 0 for all i and j

i
ij
yields the least squares estimators

= yooo
i = yiooo yooo
ij = yijo yioo yojo + yooo .
The minimum error sum of squares is
I

( yijk i ij )2 = SSE + IK ( yojo yooo )2


i =1 j =1 k =1

j =1

7
and the sum of squares due to deviation from

H0

or the sum of squares due to effect B is

SSB = Sum of squares due to H 0 SSE


J

= IK ( yojo yooo ) 2
j =1

with
ith

SSB

~ 2 ( J 1).
1)

Next, minimizing the error sum of squares under H 0 : all ij = 0


I

E3 =
i =1

(y
j =1

k =1

with respect to , and

ijk

i j )2

j and solving the normal equations

E3
E3
E3
= 0,
= 0 for all i and
= 0 for all j

i
j
yields the least squares estimators as

= yooo
i = yioo yooo
j = yojo yooo .

for all i, j, i.e., minimizing

8
The sum of squares due to H 0 is
I

( y

Min
,

i,

ijk

i =1 j =1 k =1
I

i j )2

= ( yijk i j ) 2
i =1 j =1 k =1

= SSE + K ( yijo yioo yojo + yooo ) 2 .


i =1 j =1

Thus the sum of squares due to deviation from H 0 or the sum of squares due to interaction effect AB is

SSAB = Sum of squares due to H 0 SSE


=K

(y
i =1

with

SSAB

j =1

ijo

yioo yojo + yooo ) 2

~ 2 (( I 1) J 1)).

The total sum of squares can be partitioned as


TSS = SSA + SSB + SSAB + SSE

where SSA, SSB, SSAB and SSE are mutually orthogonal. So either using the independence of SSA, SSB, SSAB and
2
SSE as well as their respective distributions or using the likelihood ratio test approach, the decision rules for the

null hypothesis at

level of significance are based on F - statistics as follows:

F1 =

IJ ( K 1) SSA
.
~ F [ ( I 1, IJ ( K 1) ] under H 0 ,
I 1 SSE

F2 =

IJ ( K 1) SSB
.
~ F [ ( J 1,, IJ ( K 1))] under H 0 ,
J 1 SSE

F3 =

IJ ( K 1) SSAB
.
~ F [ ( I 1)( J 1), IJ ( K 1) ] under H 0 .
( I 1)( J 1) SSE

and

So
Reject

H 0 if F1 > F1 [ ( I 1), IJ ( K 1) ]

Reject

H 0 if F2 > F1 [ ( J 1), IJ ( K 1) ]

Reject

H 0 if F3 > F1 [ ( I 1)( J 1), IJ ( K 1) ] .

If

or j ' s are
test or multiple comparison test to find which pairs of i ' s o
H 0 or H 0 is rejected, one can use t -test

significantly different.
If H 0 is rejected, one would not usually explore it further but theoretically t- test or multiple comparison tests can be
used.

10
It can also be shown that

E ( SSA) = 2 +

JK I 2
i
I 1 i =1

E ( SSB) = 2 +

IK J 2
j
J 1 j =1

I
J
K
E ( SSAB) = +
ij2

( I 1)( J 1) i =1 j =1
2

E ( SSE ) = 2 .
The analysis of variance table is as follows:

Source of
variation

Degrees of
freedom

Sum of squares

Factor A

( I - 1)

SSA

Factor B

(J - 1)

C = F1 ( p, n p )
SSB

Interaction AB

(I - 1) (J - 1)

H0 : = 0

Error

I J (K - 1)

SSE

Total

(I J K 1)

TSS

SSAB

Mean squares
MSA =

MSB =

MSAB =

MSA
I 1

MSB
J 1

SSAB
( I I )( J 1)

MSE =

SSE
IJ ( K 1)

F - value
F1 =

MSA
MSE

F2 =

MSB
MSE

F3 =

MSAB
MSE

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE III
LECTURE - 17

EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Tukeys test for nonadditivity

Consider the set up of two-way classification with one observation per cell and interaction as

yij = + i + j + ij + ij ,

i = 1, 2..., I , j = 1, 2,..., J

with
I

i =1

j =1

i = 0, j = 0.
The distribution of degrees of freedom in this case is as follows:
Source

Degrees of freedom

I- 1

J- 1

AB(interaction)
Error

(I 1) (J 1)
0

_______________________________________________
Total

( 1))
(IJ

_______________________________________________
There is no degree of freedom for error. The problem is that the two factor interaction effect and random error
componentt are subsumed
b
d ttogether
th and
d cannott be
b separated
t d out.
t There
Th
iis no estimate
ti t for
f 2.

If no interaction exists, then H 0 : ij = 0 for all i, j is accepted and the additive model yij =

+ i + j + ij

is well enough to test the hypothesis H 0 : i = 0 and H 0 : j = 0 with error having ( I 1)( J 1) degrees of freedom.

If interaction exists, then H 0 : ij = 0 is rejected. In such a case, if we assume that the structure of interaction effect is such
that it is proportional to the product of individual effects, i.e.,

ij = i j
then a test for testing H 0 : = 0 can be constructed. Such a test will serve as a test for nonadditivity. It will help in knowing
the effect of presence of interact effect and whether the interaction enters into the model additively. Such a test is given by
Tukeys test for nonadditivity which requires one degree of freedom leaving

( I -1)( J -1) -1

degrees of freedom for error.

Let us assume that departure from additivity can be specified by introducing a product term and writing the model as

E ( yij ) = + i + j + i j ; i = 1, 2,..., I , j = 1, 2,..., J with

i =1

= 0,

j =1

= 0.

When 0, the model becomes nonlinear model and the least squares theory for linear models is not applicable.

4
I

Note that using

i = 0,
i =1

1
yoo =
IJ

j =1

= 0, we have

1
yij =

IJ
i =1 j =1

+
i =1 j =1

+ j + i j + ij

J
1 I
1 J
I
= + i + j + ( j )( j ) + oo
I i =1
J j =1
IJ i =1
j =1

= + oo
E ( yoo ) =
= yoo .

Next

1 J
1 J
yio = yij = + i + j + i j + ij
J j =1
J j =1
1 J
1 J
= + i + j + i j + io
J j =1
J j =1
= + i + io
E ( yio ) = + i
i = yio = yio yoo .

Similarly,

yoj = + j
j = yoj = yoj yoo .
, i andd j remain
Th
Thus
i the
th unbiased
bi
d estimators
ti t
off

, i

and
d

j,

Also

E yij yio yoj + yoo = i j


or

E ( yij yoo ) ( yio yoo ) ( yoj yoo ) = i j .


Consider the estimation of , i , j , and based on the minimization of

S = ( yij i j i j ) 2
i

= Sij2 .
i

respectively
ti l irrespective
i
ti off whether
h th

= 0 or not.t

6
The normal equations are solved as
I
J
S
= 0 Sij = 0

i =1 j =1

= yoo
J
S
= 0 (1 + j ) Sij = 0
i
j =1
I
S
= 0 (1 + i ) Sij = 0
j
i =1
I
J
S
= 0 i j Sij = 0

i =1 j =1
I

or


i =1 j =1

or

( yij i j i j ) = 0


i =1 j =1

yij

2
2

i j
i =1 j =1
I


=  (say)

which can be estimated provided i and j are assumed to be known.

7
Since i and j can be estimated by

i = yio yoo
and

j = yoj yoo

p
of whether 0 or = 0 so we can substitute them in p
place of i and
irrespective
I

y
i =1 j =1

ij

( IJ ) i j yij
i =1 j =1

I 2 J 2
J i I j
i =1 j =1
I

I 2 J 2
i j
i =1 j =1
I

IJ ( yio yoo )( yoj yoo ) yij


i =1 j =1

S ASB
I

i =1

i =1

j =1

j =1

where S A = J i2 = J ( yio yoo ) 2

S B = I j2 = I ( yoj yoo ) 2 .

j in  which ggives

8
Assuming i and j to be known, we find
2

I J

1
2 2


+
(
)
0
Var ( ) = I

Var
y

i
j
ij
J

2
2 i =1 j =1

i j
j =1
i =1

using

j =1

2 i2 j2
i =1

I 2 J 2
i j
i =1 j =1
2

2
I 2 J 2
i j
i =1 j =1

Var ( yij ) = 2 ,
C ( yij , y jk ) = 0 for all i k .
Cov

9
Since i and j can be estimated by i and j then substitute them back in the expression of Var ( ) and treating it
as Var ( ) gives

Var ( ) =

2
I 2 J 2
i j
i =1 j =1

IJ 2
=
S ASB
f given i and j.
for

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE III
LECTURE - 18

EXPERIMENTAL DESIGN
MODELS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

2
Note that if = 0, then

I J

i j yij

E / i , j for all i, j = E i =1I j =1 J

2
2
i j
j =1
i =1

I J

i j ( + i + j + 0 + ij )

= E i =1 j =1
I
J

2
2
(

)(

i
j

i =1
j =1

0
= I
J
2
( i ) ( j2 )
i =1

j =1

= 0.
0
As i and j

remains valid irrespective of

distributed as

IJ 2

~ N 0,
.
S
S
A B

= 0,

or not, in this sense is a function of yij and hence normally

3
Thus the statistic

I J

i j yij
IJ

( ) 2
i =1 j =1

= 2
S ASB
Var ( )

I J

IJ ( yio yoo )( yoj yoo ) yij


i =1 j =1

=
2
S ASB

I J

IJ ( yio yoo )( yoj yoo )( yij yio yoj + yoo )


i =1 j =1

=
2
S A SB
=

follows a

SN

2 - distribution with one degree of freedom where

I J

IJ ( yio yoo )( yoj yoo )( yij yio yoj + yoo )


i =1 j =1

SN =
S ASB
i th
is
the sum off squares due
d tto non-additivity
dditi it .

Note that
I

S AB

follows

( y
i =1 j =1

ij

yio yoj + yoo )2

2 (( I 1)( J 1)),
))

S N S AB
2 is nonnegative and follows 2 [ ( I 1)( J 1) 1] .
2

so

The reason for this is as follows :

yij = + i + j + non-additivity + ij
and so

TSS = SSA + SSB + S N + SSE


SSE = TSS SSA SSB S N
has degrees of freedom

= ( IJ 1) ( I 1) ( J 1) 1
= ( I 1)( J 1) 1.

We need to ensure that SSE > 0.


So using the result

If Q, Q1 and Q2 are quadratic forms such that Q = Q1 + Q2 with Q ~ 2 ( a ), Q2 ~ 2 (b) and Q2 is


non-negative, then Q1 ~ 2 ( a b ),"

ensures that the difference

SN

SAB

is nonnegative.
Moreover S N ( SS due to nonadditivity ) and SSE are orthogonal.
Thus the F test for nonadditivity is

SN / 2

F=

SSE / 2

( I 1)( J 1) 1
= [ ( I 1)( J 1) 1]

SSN
SSE

~ F [1, ( I 1)( J 1) 1] under H 0 .


So the decision rule is
Reject H 0 : = 0 whenever

F > F1 [1, ( I 1)( J 1) 1] .

The analysis of variance table for the model including a term for nonadditivity is as follows:

Source of
variation

Degrees of
freedom

Sum of squares

Factor A

( I - 1)

SA

Factor B

(J - 1)

SB

Non-additivity

SN

Error

(I - 1) (J - 1) - 1

SSE
(By subtraction)

Total

(I J 1)

TSS

Mean
squares

MS A =

MS B =

SA
I 1

SB
J 1

MS N = S N

MSE =

F - value

SSE
( I 1)( J 1) 1

MS N
MSE

Comparison of variances
One of the basic assumptions
p
in the analysis
y of variance is that the samples
p
are drawn from different normal
populations with different means but same variances. So before going for the analysis of variance, the test of hypothesis
about the equality of variance is needed to be done.
We discuss the test of equality of two variances and more than two variances.

Case 1: Equality of two variances


H 0 : 12 = 22 = 2 .
Suppose there are two independent random samples

A : x1 ,x2 ,...,xn1 ; xi ~ N( A , A2 )
B : y1 , y2 ,..., yn2 ; yi ~ N( B , B2 ).
The sample
p variance corresponding
p
g to the two samples
p
are

1 n1
( xi x ) 2
s =

n1 1 i =1
2
x

1 n2
s =
( yi y ) 2 .

n2 1 i =1
2
y

Under

H 0 : A2 = B2 = 2 ,
(n1 1) sx2

(n2 1) s y2

~ 2 (n1 1))
~ 2 (n2 1).

8
Moreover, the sample variances sx2 and s 2y are independent. So

(
n

1)
s

x
1
2

n1 1

sx2

= 2 ~ Fn1 1, n2 1.
(n2 1) s y2 s y


2

n2 1

So for testing H 0 : 12 = 22 versus H1 : 12 22 , the null hypothesis

F>F

F<F

H 0 is rejected if

1 ;n1 1,n2 1
2

or
1 ;n1 1,n2 1
2

where

F
2

;n1 1,n2 1;

1
F

1 ;n2 1,n1 1
2

If the null hypothesis H 0 : 12 = 22 is rejected, then the problem is termed as Fisher-Behrans problem. The solution are
available for this problem.

Case 2: Equality of more than two variances: Bartlett


Bartletts
s test
H 0 : 12 = 22 = ... = k2 and H1 : i2 2j for atleast one i j = 1,2,..., k.
Let there be k independent normal population N ( i , i2 ) each of size ni , i = 1, 2,..., k . Let
independent unbiased estimators of population variances
2

i si2
s =
i =1
k

where

i = ni 1, = i .
i =1

Bartlett has shown that under H 0 ,

s2

i ln 2
si
i =1

1 k 1 1
1 +

3
1
(
k
)
i =1 i

is distributed as

be k

12 , 22 ,..., k2 respectively with 1 , 2 ,..., k degrees of

freedom. Under H 0 , all the variances are same as , say and an unbiased estimate of
2

s12 , s22 ,..., sk2

2 ( k 1) based on which H 0 can be tested.

is

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE IV
LECTURE - 19

EXPERIMENTAL DESIGNS AND


THEIR ANALYSIS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

2
Design of experiment means how to design an experiment in the sense that how the observations or measurements should
be obtained to answer a q
query
y in a valid,, efficient and economical way.
y The designing
g g of experiment
p
and the analysis
y
of
obtained data are inseparable.

If the experiment is designed properly keeping in mind the question, then the data

generated is valid and proper analysis of data provides the valid statistical inferences. If the experiment is not well
designed, the validity of the statistical inferences is questionable and may be invalid.

It is important to understand first the basic terminologies used in the experimental design.

Experimental unit
For conducting an experiment, the experimental material is divided into smaller parts and each part is referred to as
experimental unit. The experimental unit is randomly assigned to a treatment. The phrase randomly assigned is very
important in this definition.

Experiment
A way of getting an answer to a question which the experimenter wants to know.

Treatment
Different objects or procedures which are to be compared in an experiment are called treatments.

Sampling unit
The object that is measured in an experiment is called the sampling unit. This may be different from the experimental unit.

Factor
A factor is a variable defining a categorization
categorization.
A factor can be fixed or random in nature.

A factor is termed as fixed factor if all the levels of interest are included in the experiment.

A factor is termed as random factor if all the levels of interest are not included in the experiment and those that
are can be considered to be randomly chosen from all the levels of interest.

Replication
It is the repetition of the experimental situation by replicating the experimental unit.

Experimental error
The unexplained random part of variation in any experiment is termed as experimental error. An estimate of experimental
p
error can be obtained byy replication.

Treatment design
A treatment design is the manner in which the levels of treatments are arranged in an experiment.

Example: (Reference: Statistical Design, G. Casella, Chapman and Hall, 2008)


Suppose some varieties of fish food is to be investigated on some species of fishes. The food is placed in the water tanks
containing the fishes. The response is the increase in the weight of fish. The experimental unit is the tank, as the treatment
is applied to the tank, not to the fish. Note that if the experimenter had taken the fish in hand and placed the food in the
mouth of fish, then the fish would have been the experimental unit as long as each of the fish got an independent scoop of
food.

Design of experiment
One of the main objectives of designing an experiment is how to verify the hypothesis in an efficient and economical way.
In the contest of the null hypothesis of equality of several means of normal populations having same variances, the
analysis of variance technique can be used. Note that such techniques are based on certain statistical assumptions. If
these assumptions are violated, the outcome of the test of hypothesis then may also be faulty and the analysis of data may
be meaningless. So the main question is how to obtain the data such that the assumptions are met and the data is readily
available for the application of tools like analysis of variance. The designing of such mechanism to obtain such data is
achieved by the design of experiment. After obtaining the sufficient experimental unit, the treatments are allocated to the
experimental units in a random fashion.
fashion Design of experiment provides a method by which the treatments are placed at
random on the experimental units in such a way that the responses are estimated with the utmost precision possible.

Principles of experimental design


g
There are three basic principles of design which were developed by Sir Ronald A. Fisher.
i.

Randomization

ii
ii.

Replication

iii. Local control.

i.

Randomization

The principle of randomization involves the allocation of treatment to experimental units at random to avoid any bias in the
experiment

resulting from the influence of some extraneous unknown factor that may affect the experiment. In the

development of analysis of variance, we assume that the errors are random and independent. In turn, the observations
also become random. The principle of randomization ensures this.

The random assignment of experimental units to treatments results in the following outcomes.
a) It eliminates the systematic bias.
b) It is needed to obtain a representative sample from the population.
c) It helps in distributing the unknown variation due to confounded variables throughout the experiment and breaks the
confounding influence.

Randomization forms a basis of valid experiment but replication is also needed for the validity of the experiment.
If the randomization process is such that every experimental unit has an equal chance of receiving each treatment, it is
called a complete randomization.

ii.

Replication
p

In the replication principle, any treatment is repeated a number of times to obtain a valid and more reliable estimate than
which is possible with one observation only. Replication provides an efficient way of increasing the precision of

an

experiment. The precision increases with the increase in the number of observations. Replication provides more
observations when the same treatment is used, so it increases the precision. For example, if variance of X is
variance of sample mean based on n observation is

So as n increases,

than

decreases.

iii. Local control (error control)


The replication is used with local control to reduce the experimental error. For example, if the experimental units are divided
into different groups such that they are homogeneous within the blocks, than the variation among the blocks is eliminated
and ideally the error component will contain the variation due to the treatments only. This will in turn increase the efficiency.

Complete
p
and incomplete
p
block designs
g
In most of the experiments, the available experimental units are grouped into blocks

having more or less identical

characteristics to remove the blocking effect from the experimental error. Such design are termed as block designs.

The number of experimental units in a block is called the block size.


If
size of block = number of treatments
and
each treatment in each block is randomly allocated,
then it is a full replication and the design is called as complete block design.

In case, the number of treatments is so large that a full replication in each block makes it too heterogeneous with respect to
the characteristic under study, then smaller but homogeneous blocks can be used. In such a case, the blocks do not
contain a full replicate of the treatments. Experimental designs with blocks containing an incomplete replication of the
treatments are called incomplete block designs.

Completely randomized design (CRD)


The CRD is the simplest design. Suppose there are v treatments to be compared.
All experimental units are considered the same and no division or grouping among them exist.
In CRD, the v treatments are allocated randomly to the whole set of experimental units, without making any effort to
group the experimental units in any way for more homogeneity.
Design is entirely flexible in the sense that any number of treatments or replications may be used.
Number of replications for different treatments need not be equal and may vary from treatment to treatment depending
on the knowledge (if any) on the variability of the observations on individual treatments as well as on the accuracy
required for the estimate of individual treatment effect.

Example: Suppose there are 4 treatments and 20 experimental units, then


the treatment 1 is replicated, say 3 times and is given to 3 experimental units,
the treatment 2 is replicated, say 5 times and is given to 5 experimental units,
the treatment 3 is replicated, say 6 times and is given to 6 experimental units
and
finally, the treatment 4 is replicated [20-(6+5+3)=]6 times and is given to the remaining 6 experimental units.

9
All the variability among the experimental units goes into experimented error.
CRD is used when the experimental material is homogeneous.
CRD is often inefficient.
CRD is more useful when the experiments are conducted inside the lab.
CRD is well suited for the small number of treatments and for the homogeneous experimental material

Layout of CRD
Following steps are needed to design a CRD:
Divide the entire experimental material or area into a number of experimental units, say n.
Fix the number of replications for different treatments in advance
((for g
given total number of available experimental
p
units).
)
No local control measure is provided as such except that the error variance can be reduced by choosing a
homogeneous set of experimental units.

10

Procedure

Let the v treatments are numbered from 1,2,...,v and ni be the number of replications required for ith treatment
k

such that

n
i =1

= n.

Select n1 units out of n units randomly and apply treatment 1 to these n1 units.
(Note: This is how the randomization principle is utilized is CRD.)
Select n2 units out of (n n1 ) units randomly and apply treatment 2 to these n2 units.
Continue with this procedure until all the treatments have been utilized.
Generally equal number of treatments are allocated to all the experimental units unless no practical limitation dictates or
some treatments are more variable or/and of more interest.

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE IV
LECTURE - 20

EXPERIMENTAL DESIGNS AND


THEIR ANALYSIS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Analysis
There is only one factor which is affecting the outcome treatment effect. So the setup of one way analysis of variance is to
be used.
yij : Individual measurement of jth experimental units for ith treatment i = 1,2,...,v , j = 1,2,...,ni.
yij : Independently distributed following N ( + i , ) with
2

n
i =1

= 0.

: overall mean
t t
t effect
ff t
i : ith treatment
H 0 : 1 = 2 = ... = v = 0
H1 : All i' s are not equal.
Treatments

The data set is arranged as follows:

... v

_____________

y11
y12
#
y1n1

y21 ...
y22 ...
# %
y2 n2 ...

yv1
yv 2
#
yvnv

_____________
where Ti =

T1

ni

y
j =1

ij

is the treatment total due to

T2
ith

... Tv

effect, G =

ni

T = y
i =1

i =1 j =1

ij

is the grand total of all the observations.

3
In order to derive the test for H 0 , we can use either the likelihood ratio test or the principle of least squares. Since the
likelihood ratio test has already been derived earlier
earlier, so we choose to demonstrate the use of least squares principle
principle.

The linear model under consideration is

yij = + i + ij , i = 1, 2,..., v , j = 1, 2,..., ni


where

ij ' s are identically and independently distributed random errors with mean 0 and variance 2 . The normality

assumption of

is not needed for the estimation of parameters but will be needed for deriving the distribution of various

involved statistics and in deriving the test statistics.

Let
v

ni

ni

S = = ( yij i )2 .
i =1 j =1

2
ij

i =1 j =1

Minimizing S with respect to and

i , the normal equations are obtained as

v
S
= 0 n + ni i = 0

i =1
ni
S
= 0 ni + ni i = yij , i = 1, 2,..., v.
i
j =1

4
v

Solving them using

= yoo
i = yio yoo
1
yioi =
ni

where

n
i =1

= 0 , we get

ni

y
j =1

ij

is the mean of observation receiving the ith treatment and

1 v ni
yoo = yij
n i =1 j =1

observations.
The fitted model is obtained after substituting the estimate and i in the linear model, we get

yij = + i + ij
or

yij = yoo + ( yio yoo ) + ( yij yio )

or

( yijj yoo ) = ( yio yoo ) + ( yijj y ).

Squaring both sides and summing over all the observation, we have
v

ni

( y
i =1 j =1

ij

or

i =1

i =1 j =1

Sum of squares

+ Sum of squares
=
due
to
treatment

due to error
squares

effects

TSS
=
SSTr
+
SSE

Total sum
of

ni

yoo ) = ni ( yio yoo ) + ( yij yoo ) 2

or

is the mean of all the

Since

ni

( y

ij

i =1 j =1

yoo ) = 0, so TSS is based on the sum of (n 1) squared quantities. Thus TSS carries only (n 1)

degrees of freedom.
v

Since
Si

n (y
i

i =1

io

yoo ) = 0,
0

so SSTr
SST is
i b
based
d only
l on th
the sum off ((v -1)
1) squared
d quantities.
titi
Th
Thus SSTr
SST carries
i
only
l

(v -1) degrees of freedom.


ni

Since

n (y
i =1

constraints

ij

yio ) = 0 for all i = 11,2,...,v,


2 v so SSE is based on the sum of squaring n quantities like ( yij yio ) with v

ni

(y
j =1

ij

yio ) = 0, So SSE carries (n v) degrees of freedom.

Using
g the Fisher-Cochran theorem,
TSS = SSTr + SSE
with degrees of freedom partitioned as
( 1)) = (v
(n
( - 1)) + (n
( v).
)

6
Moreover, the equality in TSS = SSTr + SSE has to hold exactly. In order to ensure that the equality holds exactly, we find
one of the sum of squares through subtraction. Generally, it is recommended to find SSE by subtraction as
SSE= TSS - SSTr
where
ni

TSS = ( yij yio ) 2


i =1 j =1
v

ni

= yij2
i =1 j =1
v

G2
n

ni

G = yij
i =1 j =1

G2
: correction factor
n
ni

SSTr = ni ( yio yoo ) 2


j =1

Ti 2 G 2
=
n
i =1 ni
v

ni

Ti = yij .
j =1

7
Now under H 0 : 1 = 2 = ... = v = 0 , the model become

Yij = + ij ,
v

ni

S = ij2

and minimizing

i =1 j =1

with respect
p
to g
gives

S
G
= 0 = = yoo .

n
The SSE under H 0 becomes
ni

SSE = ( yij yoo ) 2


i =1 j =1

and thus TSS = SSE


SSE.

This TSS under H 0 contains the variation only due to the random error whereas the earlier TSS = SSTr + SSE
contains the variation due to treatments and errors both. The difference between the two will provides the effect of
treatments in terms of sum of squares as
v

SSTr = ni ( yi yoo ) 2 .
i=1

Expectations
v

ni

E ( SSE ) = E ( yij yio )

i =1

i =1 j =1

ni

= E ( ij io )

i =1 j =1

= ni E ( i + io oo ) 2
i =1

= ni + ni io2 n oo2
i =1
i =1

v
v 2
2
2
= ni i + ni
n
n
n
i =1
i
i =1
v

ni

E ( SSTr ) = ni E ( yio yoo ) 2

E (
i =1 j =1

2
ij

) ni E ( io2. )
i =1

i =1

ni

= n ni
2

= (n v)

SSE
2
E ( MSE ) = E
=
nv

2
i

= ni i2 + (v 1) 2
i =1

1 v
SStr
E ( MSTr ) = E
ni i2 + 2 .

=
v 1 v 1 i =1
2
In general E ( MSTr )
but under H 0 , all

i = 0

and so

E ( MSTr ) = 2

Distributions and decision rules


Using the normal distribution property of

ij ' s,

we find that yij ' s are also normal as they are the linear

combination of ij ' s.

SSTr

SSE

SSTr

~ 2 (v 1)

under

H0

~ 2 (n v)

under

H0

and

SSE

are independently distributed

MStr
~ F (v 1, n v) under H 0 .
MSE

So the decision rule is to


reject

H0

at

level of significance if F

> F *, 1,n .

[Note: We denote the level of significance here by * because has been used for denoting the factor]

10

The analysis of variance table in this case is given as following:

Source of
variation

Degrees of
freedom

Sum of
squares

Mean
squares

F - value

Between
treatments

v1

SSTr

MSTr

MSTr / MSE

Error

(n v)

SSE

MSE

Total

n-1

TSS

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE IV
LECTURE - 21

EXPERIMENTAL DESIGNS AND


THEIR ANALYSIS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Randomized block design


If large number of treatments are to be compared, then large number of experimental units are required. This will
increase the variation among the responses and CRD may not be appropriate to use. In such a case when the
experimental material is not homogeneous and there are v treatments to be compared, then it may be possible
to
group the experimental material into blocks of sizes v units.
Blocks are constructed such that the experimental units within a block are relatively homogeneous and
resemble to each other more closely than the units in the different blocks.
If there are b such blocks, we say that the blocks are at b levels. Similarly, if there are v treatments, we say
that the treatments are at v levels.

The responses from the b levels of blocks and the v levels of treatments can be arranged in a two
two-way
way
layout. The observed data set is arranged as follows:

Treatments

Blocks

Block
totals

y11

y21

yi1

yb1

B1 = yo1

y12

y22

yi2

yb2

B2 = yo22

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

y1j

y2j

ybj

Bj = yoj

.
.
.

.
.
.

.
.
.
v
Treatment
totals

.
.
.

.
.
.

yij

.
.
.

y1v

y2v

yiv

ybv

Bb = yob

T1 = y1o

T2 = y2o

Ti = yio

Tb = yvo

Grand
total
G = yoo

Layout
A two-way layout is called a randomized block design (RBD) or a randomized complete block design (RCB) if within each
block, the v treatments are randomly assigned to the v experimental units such that each of the v! ways of assigning the
treatments to the units has the same p
probability
y of being
g adopted
p
in the experiment
p
and the assignment
g
in different blocks
are statistically independent.
The RBD utilizes the principles of design- randomization, replication and local control- in the following way:

1. Randomization
Number the v treatments 1, 2,,v.
Number
N b th
the units
it iin each
h bl
block
k as 1
1, 2
2,...,v.
Randomly allocate the v treatments to the v experimental units in each block.

2 Replication
2.
Since each treatment is appearing in the each block, so every treatment will appear in all the blocks. So each treatment
can be considered as if replicated the number of times as the number of blocks. Thus in RBD, the number of blocks and
the number of replications are same.

5
3. Local Control
Local control is adopted in RBD in following way:
First form the homogeneous blocks of the experimental units.
Then allocate each treatment randomly in each block.
The error variance now will be smaller because of homogeneous blocks and some variance will be parted away from the
error variance due to the difference among the blocks.

Example
Suppose there are 7 treatment denoted as T1 , T2 ,.., T7 corresponding to 7 levels of a factor to be included in 4 blocks. So
one possible layout of the assignment of 7 treatments to 4 different blocks in a RBD is as follows:

Block 1

T2

T7

T3

T5

T1

T4

T6

Block 2

T1

T6

T7

T4

T5

T3

T2

Block 3

T7

T5

T1

T6

T4

T2

T3

Block 4

T4

T1

T5

T6

T2

T7

T3

Analysis
Let
yij : Individual measurements of jth treatment in ith block, i = 1, 2,...,b, j = 1, 2,...,v.
yij s are independently
p
y distributed following
g N ( + i + j , 2 )
where : overall mean effect

i : ith block effect


j : jth treatment
t t
t effect
ff t
b

such that

i = 0,
i =1

j =1

=0.

There are two null hypotheses to be tested.


-

related to the block effects

H 0 B : 1 = 2 = .... = b = 0.
-

related to the treatment effects

H 0T : 1 = 2 = .... = v = 0.
0
The linear model in this case is a two-way model as

yij = + i + j + ij , i = 1, 2,.., b; j = 1, 2,.., v


where

ij

are identically and independently distributed random errors following a normal distribution with mean 0 and

variance .
2

7
The tests of hypothesis can be derived using the likelihood ratio test or the principle of least squares. The use of likelihood
ratio test has already been demonstrated earlier, so we now use the principle of least squares.
b

Minimizing

S = ij2 = ( yij i j ) 2
i =1 j =1

i =1 j =1

and solving the normal equation

S
S
S
= 0,
= 0,
= 0 for all i = 1, 2,.., b, j = 1, 2,.., v,

i
j
the least squares estimators are obtained as

= yoo ,
i = yio yoo ,
j = yojj yoo .
The fitted model is

yij = + i + j + ij
= yoo + ( yio yoo ) + ( yoj yoo ) + ( yij yio yoj + yoo )).
Squaring both sides and summing over i and j gives
b

( y
i =1 j =1

or

TSS

ij

yoo ) = v ( yio yoo ) + b ( yoj yoo ) +


2

i =1

SSBl

j =1

SSTr

(v 1)

( y
i =1 j =1

ij

yio yoj + yoo ) 2


SSE

with degrees of freedom partitioned as

bv 1

(b 1)

(b 1)(v 1).

The reason for the number of degrees of freedom for different sums of squares is the same as in the case of CRD.
b

Here

TSS = ( yij yoo ) 2


i =1 j =1
b

= yij2
i =1 j =1

G2
bv

G2
: correction factor
bv

G = yij : Grand total of all the observation.


i =1 j =1

SSBl = v ( yio yoo ) 2


i =1

Bi2 G 2

b bv

=
i =1
v

Bi = yijj : i th block total


j=1

SSTr = b ( yoj yoo ) 2


j =1

=
j =1

T j2

G2

v bv

T j = yij : j th

treatment total

i =1
i=

SSE = ( yij yio yoj + yoo ) 2 .


i =1 j =1

The expectations of mean squares are

v b 2
SSBl
2
E ( MSBl ) = E

=
+
i

b 1 i =1
b 1
b v 2
SSTr
2
E ( MSTr ) = E

+
=
j

v 1 j =1
v 1

SSE
2
E ( MSE ) = E
= .

(
b
1)(
v
1)

Moreover,

(b 1)

SSBl

(v 1)

SSTr

(b 1)(v 1)

~ 2 (b 1)
~ 2 (v 1)
SSE

~ 2 (b 1)(v 1).

10

Under

H 0 B : 1 = 2 = ... = b = 0,
0
E ( MSBl ) = E ( MSE )

and

SSBl and SSE are independent, so


Fbl =

MSBl
~ F ((b 1, (b 1)(v 1)).
MSE

Similarly, under

H 0T : 1 = 2 = ... = v = 0,

E ( MSTr ) = E ( MSE ).
)
Also,

SSTr and SSE are independent, so


FTr =

MSTr
~ F (v 1), (b 1)(v 1)).
MSE

Reject

H 0 B if Fbe > F ((b 1), (b 1)(v 1)

Reject

H 0T if FTr > F ((v 1), (b 1)(v 1))

If H0B is accepted, then it indicates that the blocking is not necessary for future experimentation.
If H0T is rejected then it indicates that the treatments are different. Then the multiple comparison tests are used to divide
the entire set of treatments into different subgroups such that the treatments in the same subgroup have the same
treatment effect and those in the different subgroups have different treatment effects.

11

The analysis of variance table is as follows:

Source of
variation

Degrees of
freedom

Sum of
squares

Mean
squares

F - value

Bl k
Blocks

b 1

SSBl

MSBI

FBl

Treatments

v -1

SSTr

MSTr

FTr

Errors

(b - 1)(v - 1)

SSE

MSE

Total

bv - 1

TSS

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE IV
LECTURE - 22

EXPERIMENTAL DESIGNS AND


THEIR ANALYSIS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Latin square design


The treatments in the RBD are randomly assigned to b blocks such that each treatment must occur in each block rather
than assigning them at random over the entire set of experimental units as in the CRD.

There are only two factors block and treatment effects which are taken

into account and the total number of

experimental units needed for complete replication are bv where b and v are the numbers of blocks and treatments
respectively.

If there are three factors and suppose there are b,v and k levels of each factor, then the total number of experimental
units needed for a complete replication are bvk. This increases the cost of experimentation and the required number of
experimental units over RBD.

In Latin square design (LSD), the experimental material is divided into rows and columns, each having the same number
of experimental units which is equal to the number of treatments. The treatments are allocated to the rows and the columns
such that each treatment occurs once and only once in the each row and in the each column.

In order to allocate the treatment to the experimental units in rows and columns, we take the help from Latin squares.

Latin square
A Latin square of order p is an arrangement of p symbols in p2 cells arranged in p rows and p columns such that each
symbol occurs once and only once in each row and in each column.

For example, to write a Latin square of order 4, choose four symbols A,B,C and D. These letters are Latin letters which
are used as symbols. Write them in a way such that each of the letters out of A,B,C and D occurs once and only once is
each row and each column.
column For example,
example as
A

This is a Latin square.


We consider first the following example to illustrate how a Latin square is used to allocate the treatments and in getting the
response.

4
Example:
Suppose different brands of petrol are to be compared with respect to the mileage per liter achieved in motor cars.
Important factors responsible for the variation in the mileage are

difference between individual cars.

difference in the driving habits of drivers.

We have three factors cars, drivers and petrol brands. Suppose we have

4 types of cars denoted as 1, 2, 3, 4.

4 drivers that are represented by a, b, c, d.

4 brands of petrol are indicated by A, B, C, D.

Now the complete replication will require 4 X 4 X 4 = 64 number of experiments. We choose only 16 experiments. To
choose such 16 experiments, we take the help of Latin square. Suppose we choose the following Latin square:
A

Write them in rows and columns and choose rows for cars, columns for drivers and letter for petrol brands.
Thus 16 observations are recorded as per this plan of treatment combination (as shown in the next figure) and further
analysis is carried out. Since such design is based on Latin square, so it is called as a Latin square design.

Drivers

Cars

C
Driver d will use
petrol C in car 4
4.

Driver a will use petrol A in car 1.

Driver b will use petrol C in car 2.

Another choice of a Latin square of order 4 is


C

This will again give a design different from the previous one. The 16 observations will be recorded again but based on
different treatment combinations.

Since we use only 16 out of 64 possible observations, so it is an incomplete 3 way layout in which each of the 3 factors
cars, drivers and petrol brands are at 4 levels and the observations are recorded only on 16 of the 64 possible treatment
combinations.

7
Thus in a LSD,

the treatments are grouped into replication in two ways


once in rows and
and in columns,

rows and columns variations are eliminated from the within treatment variation.
In RBD, the experimental units are divided into homogeneous blocks according to the blocking factor. Hence it
eliminates the difference among blocks from the experimental error.
In LSD, the experimental units are grouped according to two factors. Hence two effects (like as two block effects)
are removed from the experimental error.
So the error variance can be considerably reduced in LSD.

The LSD is an incomplete three-way layout in which each of the three factors, viz., rows, columns and treatments, is at v
levels each and observations only on of the possible treatment combinations are taken. Each treatment combination
contains one level of each factor.

The analysis of data in a LSD is conditional in the sense it depends on which Latin square is used for allocating the
treatments. If the Latin square changes, the conclusions may also change.

We note that Latin squares play an important role is a LSD, so first we study more about these Latin squares before
describing the analysis of variance.

Standard form of Latin square


A Latin square is in the standard form if the symbols in the first row and first columns are in the natural order (Natural order
means the order of alphabets like A,B,C,D,).

Given a Latin square, it is possible to rearrange the columns so that the first row and first column remain in natural order.
Example: Four standard forms of Latin square are as follows:
A B CD

A B CD

A B CD

A B C D

B A DC

B C DA

B D AC

B A D C

C D BA

C D AB

C A DB

C D A B

D C AB

D A BC

D CB A

D CB A

For each standard Latin square of order p, the p rows can be permuted in p! ways. Keeping a row fixed, vary and permute
(p - 1) columns in (p - 1)! ways. So there are p!(p - 1)! different Latin squares.

For illustration:

Size of square

Number of

Value of

Total number of

standard squares

p!(p -1)!

different squares

3 X 3

12

12

4 X 4

144

576

5 X 5

56

2880

161280

6 X 6

9408

86400

812851250

Conjugate
Two standard Latin squares are called conjugate if the rows of one are the columns of other .
For example
A

and

are conjugate. In fact, they are self conjugate.

A Latin square is called self conjugate if its arrangement in rows and columns are the same.

Transformation set
A set of all the Latin squares obtained from a single Latin square by permuting its rows, columns and symbols is called a
transformation set.

From a Latin square of order p,


p p!(p-1)! Different Latin squares can be obtained by making p! permutations of columns and
(p-1)! permutations of rows which leaves the first row in place. Thus

Number of different
Latin squares of order
p in a transformation set

p!(p-1)! x number of standard Latin


=

squares in the set

10

Orthogonal Latin squares


If two Latin squares of the same order but with different symbols are such that when they are superimposed on each other,
every ordered pair of symbols (different) occurs exactly once in the Latin square, then they are called orthogonal.

Greco-Latin square
A pair of orthogonal Latin squares, one with Latin symbols and the other with Greek symbols forms a Greco-Latin square.
For example

A B C D
B A D C
C D A B
D C B A

is a Greco-Latin square of order 4.

Greco Latin squares design enables to consider one more factor than the factors in Latin square design. For example, in
the earlier example, if there are four drivers, four cars, four petrol and each petrol has four varieties, as , , and ,
then Greco-Latin square helps in deciding the treatment combination as follows:

11

Cars
1

Drivers

Now

means: Driver a will use the

variant of petrol A in Car 1.

means: Driver c will use the

variant of petrol B in Car 4

and so on
on.

Mutually orthogonal Latin square


A set of Latin squares of the same order is called a set of mutually orthogonal Latin square (or a hyper Greco-Latin square)
if every pair in the set is orthogonal. The total number of mutually orthogonal Latin squares of order p is at most (p -1).

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE IV
LECTURE - 23

EXPERIMENTAL DESIGNS AND


THEIR ANALYSIS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Analysis of LSD (one observation per cell)


In designing a LSD of order p,
choose one Latin square
q
at random from the set of all p
possible Latin squares
q
of order p
p.
Select a standard Latin square from the set of all standard Latin squares with equal probability.
Randomize all the rows and columns as follows:
Choose
Ch
a random
d
number,
b lless th
than p, say n1 and
d th
then 2nd row is
i th
the n1th row.
Choose another random number less than p, say n2 and then 3rd row is the n2th row and so on.
Then do the same for column.

For Latin squares of order less than 5, fix first row and then randomize rows and then randomize columns. In Latin
squares of order 5 or more, need not to fix even the first row. Just randomize all rows and columns.

Example
Suppose following Latin square is chosen
A B C

B C D

D E A

E A B

C D E

Now randomize rows,, e.g.,


g , 3rd row becomes 5th row and 5th row becomes 3rd row. The Latin square
q
becomes
A B C

B C D

C D E

E A B

C D E

Now randomize columns, say 5th column becomes 1st column, 1st column becomes 4th column and 4th column becomes
5th column

E B C

A C D

D A B

C E A

B D E

Now use this Latin square for the assignment of treatments.

yijk : Observation on kth treatment in ith row and jth block, i = 1, 2,...,v, j = 1, 2,...,v, k = 1, 2,...,v.

Triplets (i, j, k) take on only the v2 values indicated by the chosen particular Latin square selected for the experiment.
yijks are independently distributed as

N ( + i + j + k , 2 ) .

Linear model is

yijk = + i + j + k + ijk , i = 1, 2,..., v; j = 1, 2,..., v; k = 1, 2,..., v


where

ijk

are random errors which are identically and independently distributed following N (0, 2 ).

with
v

i =1

= 0,

j =1

= 0,

k =1

= 0,

i : main effect of rows


j : main effect of columns

k :

main effect of treatments.

The null hypothesis under consideration are

H 0 R : 1 = 2 = .... = v = 0
H 0C : 1 = 2 = .... = v = 0
H 0T : 1 = 2 = .... = v = 0.

5
The analysis of variance can be developed on the same lines as earlier.
Minimizing S =


i =1

j =1

k =1

with respect to , i , j and

2
ijk

gives the least squares estimate as

= yooo
i = yioo yooo

i = 1, 2,..., v

j = yojo yooo

j = 1, 2,..., v

k = yook yooo

k = 1, 2,..., v.

Using the fitted


f
model based on these estimators, the total sum off squares can be partitioned into mutually orthogonal sum
of squares SSR, SSC, SSTr and SSE as

TSS = SSR + SSC + SSTr + SSE


where

G2
( yijk yooo ) = y 2

v
i =1 j =1 k =1
i =1 j =1 k =1
v

TSS: Total sum of squares =

SSR: Sum of squares due to rows =

v ( yioo
i =1

2
ijk

v
v
Ri2 G 2
yooo ) =
2 ; Ri = yijk
v
i =1 v
j =1 k =1
v

SSC: Sum of squares due to column =

v ( yojoj yooo ) =
2

j =1

j =1

SSTr : Sum of squares due to treatment =

v ( yook
k =1

C 2j

v
v
G2
2 ; C j = yijkj
v
v
i =1 k =1

Tk2 G 2
yooo ) =
2 ;
v
v
k =1
v

Tk = yijk
i =1 j =1

6
Degrees of freedom carried by SSR, SSC and SSTr are (v - 1) each.
Degrees of freedom carried by TSS are v2 1.
1
Degrees of freedom carried by SSE are (v - 1)(v - 2).
The expectations of mean squares are obtained as

v v 2
SSR
E ( MSR) = E

=
+
i

v 1 i =1
v 1
v v 2
SSC
2
E ( MSC ) = E
j
= +
v 1 j =1
v 1
v v 2
SSTr
2
E ( MSTr ) = E
k
= +
v 1 k =1
v 1

SSE
2
E ( MSE ) = E
= .
(v 1)(v 2)
Thus

- under H 0 R , FR =

MSR
~ F ((v 1), (v 1)(v 2))
MSE

- under H 0C , FC =

MSC
~ F ((v 1), (v 1)(v 2))
MSE

- under H 0T , FT =

MSTr
~ F ((v 1),
), (v 1)(
)(v 2)).
))
MSE

7
Decision rules:
Reject H 0 R at level

if FR > F1 ;v ( 1),( v 1)( v 2)

Reject H 0 C at level

if FC > F1 ;( v 1),( v 1)( v 2)

Reject H 0T at level

if FT > F1 ;( v 1),( v 1)( v 2) .

If any null hypothesis is rejected, then use multiple comparison test.


The analysis of variance table is as follows:

Source of
variation

Degrees of
freedom

Sum of
squares

Mean
squares

F - value

Rows

v1

SSR

MSR

FR

Columns

v1

SSC

MSC

FC

Treatments

v1

SSTr

MSTr

FT

Error

(v 1)(v 2)

SSE

MSE

Total

v2 - 1

TSS

Missing plot technique


It happens many time in conducting the experiments that some observation are missed
missed. This may happen due to several
reasons.

For example,
p in a clinical trial, suppose
pp
the readings
g of blood p
pressure are to be recorded after three days
y of g
giving
g the
medicine to the patients. Suppose the medicine is given to 20 patients and one of the patient doesnt turn up for providing
the blood pressure reading. Similarly, in an agricultural experiment, the seeds are sown and yields are to be recorded after
f
few
months.
th Suppose
S
some cattle
ttl destroys
d t
th crop off any plot
the
l t or the
th crop off any plot
l t is
i destroyed
d t
d due
d to
t storm,
t
i
insects
t etc.
t

In such cases, one option is to


somehow estimate the missing value on the basis of available data
data,
replace it back in the data and make the data set complete.
Now conduct the statistical analysis on the basis of completed data set as if no value was missing by making necessary
adjustments in the statistical tools to be applied. Such an area comes under the purview of missing data models and lot of
development has taken place.
pp
, e.g.
g
Several books on this issue have appeared,
Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2nd edition, New York: John Wiley.
Schafer, J.L. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall, London etc.

We discuss here the classical missing plot technique proposed by Yates which involve the following steps:
Estimate the missing observations by the values which makes the error sum of squares to be minimum.
Substitute the unknown values by the missing observations.
Express the error sum of squares as a function of these unknown values.
Minimize the error sum of squares using principle of maxima/minima, i.e., differentiating it with respect to the missing
value and put it to zero and form a linear equation.
Form as many linear equation as the number of unknown values (i.e., differentiate error sum of squares with respect to
each unknown value).
Solve all the linear equations simultaneously and solutions will provide the missing values.
values
Impute the missing values with the estimated values and complete the data.
Apply analysis of variance tools.
The error sum of squares thus obtained is corrected but treatment sum of squares are not corrected.
The number of degrees of freedom associated with the total sum of squares are subtracted by the number of missing
values and adjusted in the error sum of squares. No change in the degrees of freedom of sum of squares due to
treatment is needed.

Analysis of Variance and


Design
g of Experiments
Experimentsp
-I
MODULE IV
LECTURE - 24

EXPERIMENTAL DESIGNS AND


THEIR ANALYSIS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Missing observations in RBD


One missing observation:
Suppose one observation in (i, j)th cell is missing and let this be x.
The arrangement of observations in RBD then will look like as follows:

Treatments
s

Blocks
1

y11

y21

yi1

yb1

B1 = yo1

y12

y22

yi2

yb2

B2 = yo2

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

y1j

y2j

ybj

Bj=

.
.
.

.
.
.

ybv

Bb= yob

yvo

Grand Total
'
G = yoo
+x

.
.
.
v
Treatment
totals

where
here

Block totals
i

.
.
.

.
.
.
y1v

y2v

T1 = y1o

T2 = y2o

yij = x

.
.
.

yiv

'

Ti = yio + x

'
yoo
: total of kno
known
n observations.
obser ations
'
yoj : total of known observations in jth block
yio' : total of known observations in ith treatment..

yoj' + x

Correction factor

(CF ) =
b

'
+ x)2
(G ') 2 ( yoo
=
n
bv

TSS = yij2 CF
i =1 j =1

= ( x 2 + terms which are constant with respect to x) CF


1
SSBl = [( yio' + x) 2 + terms which are constant with respect to x] CF
b
1
SSTr = [( yo' j + x) 2 + terms which are constant with respect to x] CF
v
SSE = TSS SSBl SSTr
'
( yoo
+ x)2
1 '
1 '
2
2
= x ( yio + x) ( yoj + x) +
+ (terms which are constant with respect to x) CF .
b
v
bv
2

Find x such that SSE is minimum


'
2( yio' + x) 2( yoj + x) 2( yoo
+ x)
( SSE )
= 0 2x

+
=0
x
b
v
bv
b
'
vyio' + byoj' yoo
x=
.
(b 1)(v 1)
'

or

Two missing observations


If there are two missing observation, then let they be x and y.
Let the corresponding row sums (block totals) are
Column sums (treatment totals) are
Total of known observations is S.

Then

1
1
1
SSE = x 2 + y 2 [( R1 + x ) 2 + ( R2 + y ) 2 ] [(C1 + x ) 2 + (C2 + y ) 2 ] + ( S + x + y ) 2 + terms independent of x and y.
b
v
bv

Now differentiate SSE with respect to x and y , as

R + x C1 + x S + x + y
( SSE )
=0 x 1

+
=0
x
b
b
bv
R + y C2 + y S + x + y
( SSE )
=0 y 2

+
= 0.
y
v
v
bv

Thus solving the following two linear equations in x and y, we obtain the estimated missing values

(b 1)(v 1) x = bR1 + vC1 S y


(b 1)(v 1) y = bR2 + vC2 S x.

Adjustments to be done in analysis of variance


i.

Obtain the within block sum of squares from incomplete data.

ii.

Subtract correct error sum of squares from (i) . This given the correct treatment sum of squares.

iii.

Reduce the degrees of freedom of error sum of squares by the number of missing observations.

iv.

No adjustments in other sum of squares are required.

Missing observations in LSD


Let
- x be the missing observation in (i, j, k)th cell, i.e. yijk , i = 1,,2,.., v , j = 1, 2,.., v , k = 1, 2,.., v.
- R:
R Total
T t l off known
k
observations
b
ti
in
i ith row
- C: Total of known observations in jth column
- T: Total of known observation receiving the kth treatment.
- S: Total of known observations
Now
Correction factor (CF ) =

( S + x) 2
v2

Total sum of squares (TSS ) = x 2 + term which are constant with respect to x - CF
Row sum of squares ( SSR) =

( R + x)2
+ term which are constant with respect to x - CF
v

Column sum of squares ( SSC ) =

(C + x)2
+ term which are constant with respect to x - CF
v

Treatment sum of squares ( SSTr ) =

(T + x) 2
+ term which are constant with respect to x - CF
v

Sum of squares due to error ( SSE ) = TSS - SSR - SSC - SSTr


1
2( S + x)2
= x 2 ( R + x) 2 + (C + x)2 + (T + x)2 +
.
v
v2

Ch
Choose
x such
h th
thatt SSE iis minimum
i i
.
So

d ( SSE )
=0
dx
2x

2
4( S + x )
( R + C + T + 3x )) + 2
v
v

or
x=

V ( R + C + T ) 2S
.
(v 1)(v 2)

Adjustment to be done in analysis of variance


Do all the steps as in the case of RBD.
To get the correct treatment sum of squares, proceed as follows:
Ignore the treatment classification and consider only row and classification .
Substitute the estimated values at the place of missing observation
observation.
Obtain the error sum of squares from complete data, say SSE1 .
Let SSE2 be the error sum of squares based on LSD obtained earlier.
Find corrected treatment sum of squares = SSE 2 SSE1.
Reduce of degrees of freedom of error sum of squares by the number of missing values.

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE V
LECTURE - 25

FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Factorial experiments
p
involve simultaneouslyy more than one factor each at two or more levels. Several factors affect
simultaneously the characteristic under study in factorial experiments and the experimenter is interested in the main
effects and the interaction effects among different factors.
Fi t we consider
First
id an example
l to
t understand
d t d the
th utility
tilit off factorial
f t i l experiments.
i
t

Example: Suppose the yield from different plots in an agricultural experiment depend upon
1.

(i)

variety of crop and

(ii)

type of fertilizer.

Both the factors are in the control of experimenter.


2.

(iii) Soil fertility. This factor is not in the control of experimenter.

In order to compare the different crop varieties


- assign it to different plots keeping other factors like irrigation, fertilizer, etc. fixed and the same for all the plots.
- The conclusions for this will be valid only for the crops grown under similar conditions with respect to the factors like
fertilizer, irrigation etc.

3
In order to compare different fertilizers (or different dosage of fertilizers)
sow single crop on all the plots and vary the quantity of fertilizer from plot to plot
plot.
The conclusions will become invalid if different varieties of crop are sown.
It is quite possible that one variety may respond differently than another to a particular type of fertilizer.

Suppose we wish to compare,


- two crop varieties a and b, keeping the fertilizer fixed and
- three varieties of fertilizers A, B and C.
This can be accomplished with two randomized block designs (RBD) by assigning the treatments at random to three plots
in any block and two crop varieties at random.
The possible arrangement of the treatments may appear as follows.

aA aB

bB bA bC
bC bB

bA

and

bA bC bB

aC

aC aA aB
aB aC aA

With these two RBDs,


- the difference among two fertilizers can be estimated
- but the difference among the crop varieties cannot be estimated. The difference among the crop varieties is entangled
with the difference in blocks.

On the other hand


hand, if we use three sets of three blocks each and each block having two plots
plots, then
randomize the varieties inside each block and
assign treatments at random to three sets.
The possible arrangement of treatment combinations in blocks can be as follows:

bB

aB

aC

bC

aB

bB

bC

aC

bB

aB

aC

bC

and

aA

bA

bA

aA

bA

aA

Here the difference between crop varieties is estimable but the difference between fertilizer treatment is not estimable.

Factorial experiments overcome this difficulty and combine each crop with each fertilizer treatment. There are six treatment
combinations as
aA, aB, aC, bA, bB, bC.

Keeping the total number of observations to be 18 (as earlier), we can use RBD with three blocks with six plots each, e.g.
bA aC

aB bB

aA bC

aA aC

bC aB

bB

bB aB
B

bA aC
C

aA
A bC

bA

Now we can estimate the


- difference between crop varieties and
- difference between fertilizer treatments.

F t i l experiments
Factorial
i
t involves
i
l
simultaneously
i lt
l more than
th one factor
f t each
h att two
t
or more levels.
l
l
If the number of levels for each factor is the same, we call it as symmetrical factorial experiment.
If the number of levels of each factor is not the same, then we call it as a symmetrical or mixed factorial experiment.
We consider only symmetrical factorial experiments.

Through the factorial experiments, we can study the individual effect of each factor and interaction effect.

Now we consider a 22 factorial experiment with an example and try to develop and understand the theory and notations
through this example.

A general notation for representing the factors is to use capital letters, e.g., A,B,C etc. and levels of a factor are
represented in small letters.

For example, if there are two levels of A, they are denoted as a0 and a1.
Similarly the two levels of B are represented as b0 and b1 .
Other alternative representation to indicate the two levels of A is 0 (for a0 ) and 1 (for a1 ).
b1 )).
The factors of B are then 0 ((for b0 ) and 1 (for
(

Note: An important point to remember is that the factorial experiments are conducted in a design of experiment. For
example, the factorial experiment is conducted as an RBD.

Factorial experiments with factors at two levels (22 factorial experiment)


Suppose in an experiment, the values of current and voltage in an experiment affect the rotation per minutes (rpm) of a fan
speed. Suppose there are two levels of current.
5 Ampere, call it as level 1 (C1) and denote it as a0
10 Ampere,
Ampere call it as level 2 (C2) and denote it as a1.
Similarly, the two levels of voltage are
200 volts, call it as level 1 (V0) and denote it as b0
220 volts,
lt callll it as llevell 2 (V1) and
dd
denote
t it as b1.

The two factors are denoted as A, say for current and B, say for voltage.
In order to make an experiment
experiment, there are 4 different combinations of values of current and voltage.
voltage
1.

Current = 5 Ampere and Voltage = 200 Volts, denoted as C 0V0 a0 b0 .

2.

Current = 5 Ampere and Voltage = 220 Volts, denoted as C0V1 a0b1 .

3.

Current = 10 Ampere and Voltage = 200 Volts, denoted as C1V0 a1b0 .

4.

Current = 10 Ampere and Voltage = 220 Volts, denoted as C1V1 a1b1 .

The response from those treatment combinations are represented by a0b0 (1), (a0b1 ) (b),
respectively
(a1b0 ) (a ) and ( a1b1 ) ( ab),
) respectively.

8
Now consider the following :

(C0V0 ) + (C0V1 )
2

I.

: Average effect of voltage for current level C0


:

( a0b0 ) + ( a0b1 ) (1) + (b)

.
2
2

(C1V0 ) + (C1V1 )
: Average effect of voltage for current level C1
2
: ( a1b0 ) + ( a1b1 ) ( a ) + ( ab ) .
2
2

II.

Compare these two group means (or totals) as follows:


Average effect of V1 level Average effect at V0 level

(b) + (ab) (1) + (a )

2
2

= Main effect of voltage


= Main effect of B.

Comparison
p
like

(C0V1 ) (C0V0 ) ( a ) (1) indicate the effect of voltage at current level C0


and
(C1V1 ) (C1V0 ) (ab) (b) indicate the effect of voltage at current level C1.

9
The average interaction effect of voltage and current can be obtained as
Average effect of voltage Average effect of voltage

I
I
at
current
level
at
current
level
0
1

= Average effect of voltage at different levels of current.


=

(C1V1 ) (C1V0 ) (C0V1 ) (C0V0 )

2
2

(ab) (b) (a ) (1)

2
2

= Average interaction effect.

Similarly,

(C0V0 ) + (C1Vo ) (1) + (b)


=
: Average effect of current at voltage level V0 .
2
2
(C0V1 ) + (C1V1 ) (a ) + (ab)
=
: Average
A
effect
ff t off currentt att voltage
lt
level
l
l V1.
2
2
Comparison of these two as

Average effect of current Average effect of current

at
voltagelevel
V
at
voltage
level
V
0

(C0V1 ) + (C1V1 ) (C0V0 ) + (C1V0 )

2
2

(a ) + ( ab) (1) + (b)

2
2

= Main effect of current


= Main effect of A.

10

Comparison
p
like

(C1V0 ) (C0V0 ) = (b) (1) : Effect of current at voltage level V0


(C1V1 ) (C0V1 ) = (ab) (a ) : Effect of current at voltage level V1.
The average interact effect of current and voltage can be obtained as

Average effect of current Average effect of current

at voltage level V0
at voltage level V1
= Average effect of current at different levels of voltage
=

(C1V1 ) (C0V1 ) (C1V0 ) (C0V0 )

2
2

(ab) (a ) (b) (1)

2
2

= Average interaction effect


= Same as average effects of voltage at different levels of current.
(It is expected that the interaction effect of current and voltage is same as the
interaction effect of voltage and
and current)
current).

The quantity

(C0V0 ) + (C1V0 ) + (C0V1 ) + (C1V1 ) (1) + (a ) + (b) + ( ab)


=
4
4
gives the general mean effect of all the treatment combination.

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE V
LECTURE - 26

FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

2
Treating (ab) as (a)(b) symbolically (mathematically and conceptually, it is incorrect), we can now express all the main
effects interaction effect and general mean effect as follows:
effects,

Main effect of

A=

(a ) + (ab) (1) + (b) 1


(a 1)(b + 1)

= [ (ab) (b) + (a ) (1) ] =


2
2
2
2

Main effect of

B=

(b) + (ab) (1) + (a ) 1


(a + 1)(b 1)

= [ (ab) (a ) + (b) (1) ] =


2
2
2
2

Interaction effect of

A and B =

General mean effect

(M ) =

(ab) (b) (a ) (1) 1


(a 1)(b 1)

= [ (ab) (a ) + (1) (b) ] =


2
2
2
2

(1) + (a) + (b) + (abb) 1


(a + 1)(b + 1)
.
= [ (1) + (a) + (b) + (ab) ] =
4
4
4

Notice the roles of + and signs as well as the divisor.


There are two effects related to A and B.
To obtain the effect of a factor, write the corresponding factor with sign and others with + sign.
For example,
example in the main effect of A,
A a occurs with sign as in (a - 1) and b occurs with + sign as in (b + 1).
1)
In AB, both the effects are present so a and b both occur with + signs as in (a + 1)(b + 1).
Also note that the main and interaction effects are obtained by considering the typical differences of averages, so they
have divisor 2 whereas general mean effect is based on all the treatment combinations and so it has divisor 4.
There is a well defined statistical theory behind this logic but this logic helps in writing the final treatment combination
easily. This is demonstrated later with appropriate reasoning.

Other popular notations of treatment combinations are as follows:

a0b0 0 0 I
a0b1 0 1 a
a1b0 1 0 b
a1b1 1 1 ab.
Sometimes 0 is referred to as low level and 1 is referred to high level.

I denotes that both factors are at lower levels (a0b0 or 0 0).


) This is called as control treatment.
These effects can be represented in the following table:
Factorial effects

Treatment combinations

Divisor

((1))

((a))

((b))

((ab))

AB

The model corresponding to 22 factorial experiment is

yijk = + Ai + B j + ( AB )ij + ijk , i = 1,, 2,, j = 1,, 2,, k = 1,, 2,...,


, ,n
where n observations are obtained for each treatment combinations.

When the experiments are conducted factor by factor, then much more resources are required in comparison to the factorial
experiment. For example, if we conduct RBD for three level of voltage V0, V1 and V2 and two levels of current l0 and l1,
then to have 10 degrees of freedom for the error variance, we need
6 replications on voltage,
11 replications
li i
on current.
So total number of fans needed are 40.
For the factorial experiment with 6 combinations of 2 factors, total number of fans needed are 18 for the same precision.

We have considered the situation up to now by assuming only one observation for each treatment combination, i.e., no
replication. If r replicated observations for each of the treatment combinations are obtained, then the expressions for the
main and interaction effects can be expressed as

A=

1
[ (ab) + (a) b (1)]
2r

B=

1
[ (ab) + (b) a (1)]
2r

AB =
M=

1
[ (ab) + (1) a (b)]
2r

1
[(ab) + (a) + (b) + (1)].
4r
4r

5
Now we detail the statistical theory and concepts related to these expressions.
Let Y* = ((1), a, b, ab) ' be the vector of total response values. Then

A=

1 '
1
A AY* = (1 1 1 1)Y*
2r
2r

B=

1 '
1
A BY* = (1 1 1 1)Y*
2r
2r

AB =

1 '
1
A ABY* = (1 1 1 1)Y* .
2r
2r

Note that A, B and AB are the linear contrasts. Recall that a linear parametric function is estimable only when it is in the
form of linear contrast. Moreover, A, B and AB are the linear orthogonal contrasts in the total response values (1), a, b, ab
except for the factor 1/2r.
Th sum off squares off a lilinear parametric
The
t i function
f
ti A ' y is
i given
i
b
by

(A ' y ) 2
. If there
th
are r replicates,
li t
th
then th
the sum off
A 'A

(A ' y ) 2
squares is
. It may also be recalled under the normality of ys, this sum of squares has a Chi-square distribution
rA ' A
2
with one degree of freedom ( 1 ) . Thus the various associated sum of squares due to A, B and AB are given by following:
(A 'AY* ) 2
1
SSA = '
= ( ab + a b (1)) 2
rA AA A
4r
(A 'BY* ) 2
1
SSB = '
= ( ab + b a (1)) 2
rA B A B
4r
(A 'ABY* ) 2
1
SSAB = '
= ( ab + (1) a b) 2 .
r A AB A AB 4r

6
Each of SSA, SSB and SSAB has

12

under normality of y.

The sum of squares


q
due to total is computed
p
as usual
2

2
TSS = yijk

i =1 j =1 k =1

G2
4r

where
2

G = yijk
i =1 j =1 k =1

is the grand total of all the observations.


The TSS has

distribution with (22 r 1) degrees of freedom.

The sum of squares due to error is also computed as usual as

SSE = TSS SSA SSB SSAB


which has

2 distribution with

(4r 1) 1 1 1 = 4( r 1)
degrees of freedom.
The mean squares are

MSA =

SSA
,
1

MSB =

SSB
,
1

MSAB =
MSE =

SSAB
,
1

SSA
.
4(r 1)

7
The F-statistic corresponding to A, B and AB are

FA =

MSA
~ F (1, 4(r 1) under H 0 ,
MSE

FB =

MSB
~ F (1, 4(r 1) under H 0 ,
MSE

FAB =

MSAB
~ F (1, 4( r 1) under H 0 .
MSE

The ANOVA table is case of 22 factorial experiment is given as follows:

Source

Sum of

Degrees of

Mean

squares

freedom

squares

A
B

SSA

SSB

AB
Error

SSAB

1
4( r 1)

Total

TSS

SSE

MSA
MSB
MSAB
MSE

MSA
MSE
MSB
FB =
MSE
MSAB
FAB =
MSE

FA =

4r 1

The decision rule is to reject the concerned null hypothesis when the value of concerned F statistic

Feffect > F1 (1, 4( r 1)).

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE V
LECTURE - 27

FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

23 Factorial experiment
p
Suppose that in a complete factorial experiment, there are three factors - A, B and C, each at two levels, viz., a0 , a1 ; b0 , b1
and c0 , c1 respectively. There are total eight number of combinations:

a0b0 c0 , a0b0 c1 , a0b1c0 , a0b1c1 , a1b0 c0 , a1b0 c1 , a1b1c0 , a1b1c1.

Each treatment combination has r replicates, so the total number of observations are N = 23 r = 8r that are to be
analyzed for their influence on the response.

Assume the total response values are

Y* = [ (1), a, b, ab, c, ac, bc, abc ] '.


The response values can be arranged in a three-dimensional contingency table. The effects are determined by the linear
contrasts

A 'effectY* = A'effect ( (1), a, b, ab, c, ac, bc, abc )


using the following table:

Factorial effect

Treatment combinations

(1)

ab

ac

bc

abc

AB

AC

BC

ABC

N t that
Note
th t once few
f
rows have
h
been
b
determined
d t
i d in
i this
thi table,
t bl restt can b
be obtained
bt i d b
by simple
i l multiplication
lti li ti off th
the symbols.
b l
For example, consider the column corresponding to a, we note that A has + sign, B has sign ,
so AB has sign (=sign of A x sign of B).
Once AB has - sign, C has sign then ABC has (sign of AB x sign of C ) which is + sign and so on.
The first row is a basic element. With this a = 1' Y* can be computed where 1 is a column vector of all elements unity. If
other rows are multiplied with the first row, they stay unchanged (therefore we call it as identity and denoted as I). Every
other row has the same number of + and signs.
signs If + is replaced by 1 and is replaced by -1,
1 we obtain the vectors of
orthogonal contrasts with the norm 8(= 23 ).
If each row is multiplied by itself, we obtain I (first row). The product of any two rows leads to a different row in the table.
F example
For
l

A.B = AB
AB.B = AB 2 = A
AC.BC = A.C 2 BB = AB.
The structure in the table helps in estimating the average effect.

For example, the average effect of A is

A=

1
[(a) (1) + (ab) (b) + (ac) (c) + (abc) (bc)]
4r

which has following explanation.


(i) Average effect of A at low level of B and C (a1b0 c0 ) ( a0b0 c0 )

[(a) (1)]
r

(ii) Average effect of A at low level of B and low level of C (a1b1c0 ) (a0b1c0 )

[ (ab) (b)]

(iii) Average effect of A at low level of B and high level of C (a1b0 c1 ) (a0b0 c1 )
(iv) Average effect of A at low level of B and C (a1b1c1 ) (a0b1c1 )

r
[(ac) (c)]

[(abc) (bc)] .

Hence for all combinations of B and C, the average effect of A is the average of all the average effects in (i)-(iv).

Similarly, other main and interaction effects are as follows:

B=

1
(a + 1)(b 1)(c + 1)
[(b) + (ab) + (bc) + (abc) (1) (a) (c) (ac)] =
4r
4r

C=

1
(a + 1)(b + 1)(c 1)
[c + (ac) + (bc) + (abc) (1) (a) (b) (ab)] =
4r
4r

AB =

1
(a 1)(b 1)(c + 1)
[(1) + (ab) + (c) + (abc) (a) (b) (ac) (bc)] =
4r
4r

AC =

1
(a 1)(b + 1)(c 1)
[(1) + (b) + (ac) + (abc) (a) (ab) (c) (bc)] =
4r
4r
4r

BC =

1
(a + 1)(b 1)(c 1)
[(1) + (a) + (bc) + (abc) (b) (ab) (c) (ac)] =
4r
4r

ABC =

1
(a 1)(b 1)(c 1)
.
[(abc) + (ab) + (b) + (c) (ab) (ac) (bc) (1)] =
4r
4r

Various sum of squares in the 23 factorial experiment are obtained as


'
(linear
li
contrast
t t ) 2 ( A effectY* )
SS ( Effect ) =
= '
8r
r A effect A effect
2

which follow a Chi-square distribution with one degree of freedom under normality of Y*. The corresponding mean
q
are obtained as
squares

MS ( Effect ) =

SS ( Effect )
.
Degrees of freedom

The corresponding F-statistics are obtained by

Feffect =

MS ( Effect )
MS ( Error )

which follows an F-distribution with degrees of freedoms 1 and error degrees of freedom under respective null
hypothesis The decision rule is to reject the corresponding null hypothesis at level of significance whenever
hypothesis.

Feffect > F1 (1, df error ).

Th
These
outcomes
t
are presented
t d in
i the
th ffollowing
ll i ANOVA table.
t bl

Sources

Sum of

Degrees of

Mean

squares

freedom

squares

A
B
AB
C
AC
BC
ABC
Error

SSA
SSB
SSAB

SSC
SSAC
SSBC
SSABC
SS ( Error )

Total

TSS

1
1
1
1
1
8((r 1))
8r - 1

MSA = SSA /1
MSB = SSB /1

FA

MSAB = SSAB /1
MSC = SSC /1
MSAC = SSAC /1
MSBC = SSBC /1
MSABC = SSABC /1

FAB

MS ( Error ) = SS ( Error ) /{8(


{ (r 1)}
)}

FB
FC
FAC
FBC
FABC

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE V
LECTURE - 28

FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

2n Factorial experiment
p
Based on the theory developed for 22 and 23 factorial experiments, we now extend them for 2n factorial experiment.
Capital letters A,B,C, denote the factors. They are the main effect contrast for the factors A,B,C,

AB,AC,BC, denote the first order or 2-factor interactions

ABC,ABD,BCD, denote
d
t th
the second
d order
d or 3-factor
3 f t iinteractions
t
ti
and
d so on.

Each of the main effect and interaction effect carries one degree of freedom.

Total number of main effects = 1 = n.


n
Total number of first order interactions = 2

n
3

Total
T t l number
b off second
d order
d iinteractions
t
ti
=
and so on.

Standard order for treatment combinations


The list of treatments can be expressed in a standard order.
For one factor A, the standard order is (1), a.

For two factors A and B, the standard order is obtained by adding b and ab in the standard order of one factor A.
This is derived by multiplying (1) and a by b , i.e.

b {(1), a} = (1), a, b, ab.

For three factors, add c, ac, bc and abc which are derived by multiplying the standard order of A and B by c, i.e.

c {(1, a , b, ab} = (1), a, b, ab, c, ac, bc, abc.

Th the
Thus
th standard
t d d order
d off any ffactor
t iis obtained
bt i d step
t by
b step
t by
b multiplying
lti l i it with
ith additional
dditi
l letter
l tt to
t preceding
di
standard order.
For example, the standard order of A, B, C and D is 24 factorial experiment is

(1), a, b, ab, c, ac, bc, abc, d {(1), a, b, ab, c, ac, bc, abc}
= (1), a, b, ab, c, ac, bc, abc, d , ad , bd , abd , cd , acd , bcd , abcd .

How to find the contrasts for main effects and interaction effect
Recall that earlier, we had illustrated the concept in writing the contrasts for main and interaction effects. For example, in a
22 factorial experiment, we had expressed

1
1
(a 1)(b + 1) = [ (1) + (a ) (b) + (ab)]
2
2
1
1
AB = (a 1)(b 1) = [ (1) (a ) (b) + (ab) ].
2
2
A=

Note that each effect has two component


p
- divisor and contrast. When the order of factorial increases,, it is cumbersome to
derive such expressions. Some methods have been suggested to write the expressions for factorial effects. First we
detail how to write divisor and then illustrate the methods for obtaining the contrasts.

How to write divisor


In a 2n factorial experiment , the
general mean effect has divisor 2n and
any effect (main or interaction) has divisor 2n-1.

For example, in a 26 factorial experiment, the general mean effect has divisor 26 and any main effect or interaction effect
of any order has divisor 26-1 = 25.

If r replicates of each effect are available , then


general mean effect has divisor r 2n and
any main effect or interaction effect of any order has divisor r 2n-1 .

5
How to write contrasts
M th d 1:
Method
1 Contrast
C t t belonging
b l
i tto the
th main
i effects
ff t and
d the
th interaction
i t
ti effects
ff t are written
itt as follows:
f ll
A = (a 1)(b + 1)(c + 1)...( z + 1)
B = (a + 1)(b 1)(c + 1)...( z + 1)
C = (a + 1)(b + 1)(c 1)...( z + 1)
#

AB = (a 1)(b 1)(c + 1)...( z + 1)


BC = (a + 1)(b 1)(c 1)...( z + 1)
#

ABC = (a 1)(b 1)(c 1)...( z + 1)


#

ABC...Z = (a 1)(b 1)(c 1)...( z 1).


Look at the pattern of assigning + and signs on the right hand side. The letters common on left and right hand sides of
the equality (= )sign (irrespective of small or capital letters) contain sign and rest contain + sign.
The expression on right hand side when simplified algebraically give the contrasts in terms of treatment combination.
For example, in a 23 factorial

1
(a 1)(b + 1)(c + 1)
231
1
= [ (1) + (a) (b) + (ab) (c) + (ac) (bc) + ( abc) ]
4
1
M = 3 (a + 1)(b + 1)(c + 1)
2
1
= [ (1) + (a) + (b) + (ab) + (c) + (ac) + (bc) + (abc) ]
8
A=

Method 2:
Form a table such that
the rows correspond to the main or interaction effect and
the columns corresponds to treatment combinations (or other way round.)
+ and signs in the table indicate the sign of the treatment combinations of main and interaction effects.
Signs are determined by the rule of odds and evens given as follows:
if the interaction has an even number of letters (AB, ABCD,), a treatment combination having an even number
of letters common with the interaction enters with a + sign and one with an odd number of letters common enters
with a sign.
if interaction has odd number of letters (A, ABC,), the rule is reversed.
Once few rows are filled up, other can be obtained through multiplication rule. For example, sign of is obtained as
(sign of A x sign of BCD) or (sign of AB x sign of CD).
Treatment combination (1) is taken to have an even number (zero) of letters common with every interaction.

7
This rule of assignment of + or is illustrated in the following flow diagram:

Interaction

Evennumberofletters(AB, ABCD,)

Oddnumberofletters(A, ABC,)

Countthenumberofletters
commonbetweentreatment
b
combinationsandinteractions

Countthenumberofletters
commonbetweentreatment
combinationsandinteractions

Evennumber

Oddnumber

Evennumber

Oddnumber

ofletters

ofletters

ofletters

ofletters

+sign
sign

sign

- sign

+ sign
+sign

For example, in a 23 factorial experiment, write


rows for
f main
i and
d iinteraction
t
ti effects
ff t and
d
columns for treatment combinations in standard order.
Take treatment combination (1) to have an even number (zero) of letter common with every interaction.

This gives the following table:

Factorial effect

Treatment combinations

(1)

ab

ac

bc

abc

AB

AC

BC

ABC

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE V
LECTURE - 29

FACTORIAL EXPERIMENTS
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Sums of squares
q
Suppose 2n factorial experiment is carried out in a randomized block design with r replicates.
Denote the total yield (output) from r plotes (experimental units) receiving a particular treatment combination by the same
symbol
b l within
ithi a square bracket.
b k t For
F example,
l [ab]
[ b] denotes
d
t th
the ttotal
t l yield
i ld ffrom th
the plots
l t receiving
i i th
the treatment
t t
t
combination (ab).

I a 22 factorial
In
f t i l experiment,
i
t the
th factorial
f t i l effect
ff t totals
t t l are

[ A] = [ ab] [b] + [ a ] [1]


[ab] = treatment total, i.e. sum of r observations in which both the factors A and B are at second level.
[a] = treatment total, i.e., sum of r observations in which factor A is at second level and factor B is at first level.
[b] = treatment total, i.e., sum of r observations in which factor A is at first level and factor B is at second level.
[1] = treatment total, i.e., sum of r observations in which both the factors A and B are at first level.
Thus

[ A] = yi ( ab ) yi (b ) + yi ( a ) yi (1)
i =1

= A 'A y A (say).
where A A is a vector of +1 and -1 and y A is a vector denoting the responses from ab, b, a and 1. Similarly, other effects can
also be found.

The sum of squares due to a particular effect is obtained as

[ Total yield]

Total number of observations

In a 22 factorial experiment in an RBD, the sum of squares due to A is

(A 'A y A ) 2
SSA =
.
r 22
In a 2n factorial experiment in an RBD, the divisor will be r. 2n . If latin square design is used based on 2n x 2n Latin
square , then r is replaced by 2n.

Yates method of computation


p
of sum of squares
q
Yates method gives a systematic approach to find the sum of squares. We are not presenting here the complete method.
Only the part which is used for computing only the sum of squares is presented and the method to verify them is not
presented.
t d
It has following steps:
1. First write the treatment combinations in the standard order in the column at the beginning of table, called as treatment
column.
2. Find the total yield for each treatment. Write this as second column of the table, called as yield column.
3. Obtain columns (1), (2),,(n) successively
(i) obtain column (1) from yield column
a)) upper half
h lf iis obtained
bt i d b
by adding
ddi yields
i ld in
i pairs.
i
b) second half is obtained by taking differences in pairs, the difference obtained by subtracting the first term
of pairs from the second term.
(ii) The columns (2), (3),, (n) are obtained from preceding ones in the same manner as used for getting (1) from
the yield columns.
4. This process of finding columns is repeated n times in 2n factorial experiment.

[column( n) ]

5 Sum of squares due to interaction =


5.

Total number of observations

E ample Yates proced


Example:
procedure
re for a 22 factorial e
experiment
periment
Treatment

Yield

(1)

(2)

combinations

(total from all r replicates)

(1)

(1)

(1) + (a)

(1) + (a) + (b)+ (ab) = [M ]

(a)

(b) + (ab)

-(1) + (a) - (b) + (ab) = [A]

(b)

(a) - (1)

-(1) - (a) + (b) + (ab) = [B]

ab

(ab)

(ab) - (b)

(1) - (a) - (b) + (ab) = [AB]

Note: The columns are repeatedly obtain 2 times due to 22 factorial experiment.
experiment

[ A]
Now SSA =

4r

[ B]
SSB =

4r

[ AB ]
SSAB =

4r

6
Example: Yates procedure for a 23 factorial experiment

Treatment

Yield (total from all r replicates)

((1))

((2))

((3))

((4))

((5))

((6))

(1)

u1 = (1) + (a )

v1 = u1 + u2

w1 = v1 + v2

[M ]

(a)

u2 = (b) + (ab)

v2 = u3 + u4

w2 = v3 + v4

[ A]

(b)

u3 = (c) + (ac)

v3 = u5 + u6

w3 = v5 + v6

[ B]

ab

(ab)

u4 = (bc) + (abc)

v4 = u7 + u8

w4 = v7 + v8

[ AB]

(ac)

u5 = (a ) (1)

v5 = u2 u7

w5 = v2 v1

[C ]

ac

(ac )

u6 = (ab) (b)

v6 = u4 u3

w6 = v4 v3

[ AC ]

bc

(bc)

u7 = (ac) (c)

v7 = v6 u5

w7 = v6 v5

[ BC ]

abc

(abc)

u8 = (abc) (bc)

v8 = u8 u7

w8 = v8 v7

[ ABC ]

The sum of squares are obtained as follows when the design is RBD:

[ Effect ]
SS ( Effect ) =

r.23

F the
For
th analysis
l i off 2n factorial
f t i l experiment,
i
t the
th analysis
l i off variance
i
iinvolves
l
th
the partitioning
titi i off treatment
t t
t sum off squares
so as to obtain the sum of squares due to main and interaction effects of factors. These sum of squares are mutually
orthogonal, so Total SS = Total of all the SS due to main and interaction effects.

For example:
In 22 factorial experiment in an RBD with r replications, the division of degrees of freedom and the treatment sum of
squares are as follows:

Source

Degrees of
freedom

Replications

r-1

Treatments

41=3

Sum of squares

[ A]

[ B]

[ AB ]

/ 4r

/ 4r
2

Error

3(r - 1)

Total

4r - 1

The decision rule is to reject the concerned null hypothesis when the related F - statistic

Feffect > F1 (1,


(1 3( r 1)).
1))

/ 4r

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE VI
LECTURE - 30

CONFOUNDING
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

If the number of factors or levels increase in a factorial experiment,


p
, then the number of treatment combinations increases
rapidly. When the number of treatment combinations is large, then it may be difficult to get the blocks of sufficiently large
size to accommodate all the treatment combinations. Under such situations, one may use either connected incomplete
bl k designs,
block
d i
e.g., balanced
b l
d incomplete
i
l t block
bl k designs
d i
(BIBD) where
h
allll the
th main
i effects
ff t and
d interaction
i t
ti contrasts
t t can be
b
estimated or use unconnected designs where not all these contrasts can be estimated.
Non-estimable contrasts are said to be confounded.
Note that a linear function

' is said to be estimable if there exist a linear function l ' y of the observations on random

variable y such that E (l ' y ) = ' . Now there arise two questions. Firstly, what does confounding means and secondly,
h
how
d
does it compares to
t using
i
BIBD.
BIBD

In order to understand the confounding, let us consider a simple example of 22 factorial with factors A and B, each factor
has two levels..
levels The four treatment combinations are (1),
(1) a,
a b and ab.
ab Suppose each batch of raw material to be used in the
experiment is enough only for two treatment combinations to be tested. So two batches of raw material are required. Thus
two out of four treatment combinations must be assigned to each block. Suppose this 22 factorial experiment is being
conducted in a randomized block design. Then the corresponding model is

3
E ( yij ) = + i + j ,
then

A=

1
[ ab + a b (1)] ,
2r

B=

1
[ ab + b a (1)] ,
2r

AB =

1
[ ab + (1) a b].
2r

Suppose the following block arrangement is opted:


Block 1
Block 2
(1)
ab

a
b

The block effects of blocks 1 and 2 are 1 and 2 , respectively, then the average responses corresponding to treatment
combinations a, b, ab and (1) are

E [ y (a) ] = + 2 + (a),
E [ y (b)] = + 2 + (b),
)
E [ y (ab) ] = + 1 + (ab),
E [ y (1) ] = + 1 + (1),
respectively. Here y ( a ), y (b), y ( ab), y (1) and ( a ), (b), ( ab), (1) denote the responses and treatments
corresponding to a, b, ab and (1), respectively.

Ignoring the factor 1/2r in A, B, AB and using

E[ y ( a )], E[ y (b)], E[ y ( ab)], E ( y (1)],


the effect A is expressible as follows :

A = [ + 1 + (ab)] + [ + 2 + (a )] [ + 2 + (b)] [ + 1 + (1)]


= (ab) + (a ) (b) (1).
So the block effect is not present in A and it is not mixed up with the treatment effects. In this case, we say that the main
effect A is not confounded.
Similarly, for the main effect B, we have

B = [ + 1 + (ab)] + [ + 2 + (b)] [ + 2 + (a )] [ + 1 + (1)]


= (ab) + (b) (a) (1).
So there is no block effect present in B and thus B is not confounded.
For the interaction effect AB, we have

AB = [ + 1 + (ab)] + [ + 1 + (1)] [ + 2 + (a )] [ + 2 + (b)]


= 2( 1 2 ) + (ab) + (1) (a) (b).
Here the block effects 1 and 2 are present in AB. In fact, the block effects are mixed up with the treatment effects. The
block effect cannot be separated individually from the treatment effects in AB. So AB is said to be confounded (or mixed
up) with the blocks.

5
Alternatively, if the arrangement of treatments in blocks are as follows:
Block 1
(1)
ab

Block 2
a
b

then the main effect A is expressible as

A = [ + 1 + (ab)] + [ + 1 + (a )] [ + 2 + (b)] [ + 2 + (1)]


= 2( 1 2 ) + (ab) + (a ) (b) (1).
Observe that the block effects 1 and 2 are present in this expression. So the main effect A is confounded with the
blocks in this arrangement of treatments.

We notice that it is in our control to decide that which of the effect is to be confounded. The order in which treatments are
run in a block is determined randomly. The choice of block to be run first is also randomly decided.

The following observation emerges from the allocation of treatments in blocks:

For a given effect, when two treatment combinations with the same signs are assigned to one block and the other two
treatment combinations with the same but opposite signs are assigned to another block, then the effect gets confounded.

For example, in case AB is confounded, then


ab and (1) with + signs are assigned to block 1 whereas
a and b with signs are assigned to block 2.

Similarly, when A is confounded, then


a and ab with + signs are assigned to block 1 whereas
(1) and b with signs are assigned to block 2.

The reason behind this observation is that if every block has treatment combinations in the form of linear contrast, then
effects are estimable and thus unconfounded. This is also evident from the theory of linear estimation that a linear
parametric function is estimable if it is in the form of a linear contrast.

The contrasts which are not estimable are said to be confounded with the differences between blocks (or block
effects) The contrasts which are estimable are said to be unconfounded with blocks or free from block effects
effects).
effects.

Comparison of Balaced Incomplete Block Design (BIBD) versus factorial


Note: This section can be understood after reading the module on Balaced Incomplete Block Design (BIBD) in the second
part of the course.

N
Now
we explain
l i h
how confounding
f
di and
d BIBD compares together.
t
th C
Consider
id a 23 factorial
f t i l experiment
i
t which
hi h needs
d th
the bl
block
k
size to be 8. Suppose the raw material available to conduct the experiment is sufficient only for a block of size 4. One
can use BIBD in this case with parameters b = 14, k = 4, = 8, r = 7 and = 3 (such BIBD exists). For this BIBD, the
efficiency factor is

E=

v
kr

6
8

Var ( j j ' ) BIBD =

and

2k 2 2 2
= ( j j ').
v
6

Consider now an unconnected design in which 7 out of 14 blocks get treatment combination in block 1 as

a b c abc
and remaining 7 blocks get treatment combination in block 2 as

(1) ab bc ac
In this case, all the effects A, B, C , AB, BC and AC are estimable but ABC is not estimable because the treatment
combinations with all + and all signs in

8
ABC = (a 1)(b 1)(c 1)
= (a + b + c + abc) ((1) + ab + bc + ac)



in block1

in block 2

are contained in same blocks. In this case, the variance of estimates of unconfounded main effects and interactions is

8 2 / 7. Note that in case of RBD,

Var ( j j ' ) RBD

2 2 2 2
=
=
( j j ')
r
7

and there are four linear contrasts, so the total variance is 4 (2 2 / 7) which gives the factor 8 2 / 7 and which is
smaller than the variance under BIBD.

We observe that at the cost of not being able to estimate ABC, we have better estimates of A, B, C, AB, BC and AC with the
same number of replicates as in BIBD. Since higher order interactions are difficult to interpret and are usually not large,
so it is much better to use confounding arrangements which provide better estimates of the interactions in which we are
more interested.

Note that this example is for understanding only. As such the concepts behind incomplete block design and confounding
are different.

Confounding arrangement
The arrangement of treatment combinations in different blocks, whereby some pre-determined effect (either main or
interaction) contrasts are confounded is called a confounding arrangement.

For example, when the interaction ABC is confounded in a 23 factorial experiment, then the confounding arrangement
consists of dividing the eight treatment combinations into following two sets:

a b c abc
and

(1) ab bc ac

With the treatments of each set being assigned to the same block and each of these sets being replicated same number of
times in the experiment, we say that we have a confounding arrangement of a 23 factorial in two blocks.

It may be noted that any confounding arrangement has to be such that only predetermined interactions are confounded
and the estimates of interactions which are not confounded are orthogonal whenever the interactions are orthogonal.

10

Defining contrast
The interactions which are confounded are called the defining contrasts of the confounding arrangement.

A confounded contrast will have treatment combinations with the same signs in each block of the confounding arrangement.
For example, if effect AB = (a - 1)(b 1)(c 1) is to be confounded, then put all factor combinations with + sign, i.e., (1),
ab, c and abc in one block and all other factor combinations with sign, i.e., a, b, ac and bc in another block. So the block
size reduces to 4 from 8 when one effect is confounded in 23 factorial experiment.
Suppose
pp
if along
g with ABC confounded,, we want to confound C also. To obtain such blocks,, consider the blocks where
ABC is confounded and divide them into further halves. So the block

a b c abc
i di
is
divided
id d iinto
t ffollowing
ll i ttwo bl
blocks:
k

a b

and

c abc

and the block

(1) ab bc ac
is divided into following two blocks:

(1) ab

and

bc ac

11

These blocks of 4 treatments are divided into 2 blocks with each having 2 treatments and they are obtained in the following
way. If only C is confounded then the block with + sign of treatment combinations in C is

c ac bc abc
and block with sign
g of treatment combinations in C is

(1) a b ab
N
Now
llook
k iinto
t th
the
(i) following block with + sign when ABC = ( a 1)(b 1)( c 1) is confounded,

a b c abc
(ii) following block with + sign when C = ( a + 1)(b + 1)(c 1) is confounded and

c ab ac abc
(iii) table of + and signs in case of 23 factorial experiment.
experiment

Identify the treatment combinations having common - signs in these two blocks in (i) and (ii). These treatment combinations
are c and abc. So assign them into one block. The remaining treatment combinations out of a, b, c and abc are a and b
which go into another block.

12

Similarly look into the

(a) following block with sign when ABC is confounded,

(1) ab bc ac
(b) following block with sign when C is confounded and

(1) a b ab
( ) ttable
(c)
bl off + and
d signs
i
iin case off 23 factorial
f t i l experiment.
i
t

Identify the treatment combinations having common sign in these two blocks in (a) and (b). These treatment
combinations are (1) and which go into one block and the remaining two treatment combinations ac and bc out of c, ac, bc
and abc go into another block. So the blocks where both ABC and C are confounded together are

(1) ab , a b , ac bc

and

c abc

While making these assignments of treatment combinations into four blocks


blocks, each of size two
two, we notice that another effect,
effect
viz., AB also gets confounded automatically. Thus we see that when we confound two factors, a third factor is automatically
getting confounded. This situation is quite general. The defining contrasts for a confounding arrangement cannot be
chosen arbitrarily. If some defining contrasts are selected then some other will also get confounded.

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE VI
LECTURE - 31

CONFOUNDING
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

2
Now we present some definitions which are useful in describing the confounding arrangements.

Generalized interaction
Given any two interactions, the generalized interaction is obtained by multiplying the factors (in capital letters) and ignoring
all the terms with an even exponent.
For example, the generalized interaction of the factors ABC and BCD is ABC BCD = AB 2C 2 D = AD and the generalized
interaction of the factors AB,
AB BC and ABC is

AB BC ABC = A2 B 3C 2 = B.

Independent set
A set of main effects and interaction contrasts is called independent if no member of the set can be obtained as a
generalized interaction of the other members of the set.

For example, the set of factors AB, BC and AD is an independent set but the set of factors AB, BC, CD and AD is not an
independent set because AB BC CD = AB 2C 2 D = AD which is already contained in the set.

Orthogonal treatment combinations


The treatment combination a p b q c r is said to be orthogonal to the interaction A x B y C z ....if ( px + qy + rz + ....) is divisible
by 2. Since p, q, r ,..., x, y , z ,... are either 0 or 1, so a treatment combination is orthogonal to an interaction if they have an
even number of letters in common. Treatment combination (1) is orthogonal to every interaction.
p q r
p q r
If a 1 b 1 c 1 .... and a 2 b 2 c 12 ... are both orthogonal to A x B y C z ..., then the product a p1 + p2 b q1 + q2 c r1 + r2 ... is also orthogonal to

A x B y C z ... Similarly,
y if two interactions are orthogonal
g
to a treatment combination, then their g
generalized interaction is also

orthogonal to it.

Now we give some general results for a confounding arrangement. Suppose we wish to have a confounding arrangement in
bl k off a 2 k factorial
f t i l experiment.
i
t Then
Th we have
h
th
the ffollowing
ll i observations:
b
ti
2 p blocks
1. The size of each block is 2k p.
p
p
2. The number of elements in defining contrasts is (2 1), i.e., (2 1) interactions have to be confounded.
p
Proof: If p factors are to be confounded, then the number of mth order interaction with p factors is , (m = 1, 2,..., p). So
m
the total number of factors to be confounded are

p
p 1

=2
m =1 m
p

3. If any two interactions are confounded, then their generalized interactions are also confounded.

4
4. The number of independent contrasts out of (2 p 1) defining contrasts is p and rest are obtained as generalized
interactions.
p
5. Number of effects getting confounded automatically is (2 p 1).

To illustrate this, consider a 25 factorial (k = 5) with 5 factors, viz., A, B, C, D and E. The factors are to be confounded in
53
23 blocks (p = 3). So the size of each block is 2 = 4. The number of defining contrasts is 23 1 = 7. The number of

independent contrasts which can be chosen arbitrarily is 3 (i.e., p ) out of 7 defining contrasts. Suppose we choose p = 3
following independent contrasts as
i.

ACE

ii.

CDE

iii. ABDE
and then the remaining 4 out of 7 defining contrasts are obtained as
iv.

( ACE ) (CDE ) = AC 2 DE 2 = AD

v.

( ACE ) ( ABDE ) = A2 BCDE 2 = BCD

vi.

(CDE ) ( ABDE ) = ABCD 2 E 2 = ABC

vii. ( ACE ) (CDE ) ( ABDE ) = A2 BC 2 D 2 E 3 = BE.

5
Alternatively, if we choose another set of independent contrast as
i
i.

ABCD

ii.

ACDE

iii. ABCDE
iv.
then the defining
g contrasts are obtained as
iv. ( ABCD) ( ACDE ) = A2 BC 2 D 2 E = BE
v.

( ABCD) ( ABCDE ) = A2 B 2C 2 D 2 E = E

vi. ( ACDE ) ( ABCDE ) = A2 BC 2 D 2 E 2 = B


vii. ( ABCD) ( ACDE ) ( ABCDE ) = A3 B 2C 3 D 3 E 2 = ACD.
In this case, the main effects B and E also get confounded.

As a rule, try to confound, as far as possible, higher order interactions only because they are difficult to interpret.

After selecting
g p independent
p
defining
g contrasts,, divide the 2k treatment combinations into 2p g
groups
p of 2 k-p combinations
each, and each group going into one block.

Principal (key) block


Group containing the combination (1) is called the principal block or key block. It contains all the treatment combinations
which are orthogonal to the chosen independent defining contrasts.

If there are p independent defining contrasts, then any treatment combination in principal block is orthogonal to p
independent defining contrasts. In order to obtain the principal block,
order.
write the treatment combinations in standard order
check each one of them for orthogonality.
If two treatment combinations belongs to the principal block, their product also belongs to the principal block.
When few treatment combinations of the principal block have been determined, other treatment combinations can
be obtained byy multiplication
p
rule.

Now we illustrate these steps in the following example.

7
Example
Consider the set up of a 25 factorial experiment in which we want to divide the total treatment effects into 23 groups by
confounding three effects AD, BE and ABC. The generalized interactions in this case are ADBE, BCD, ACE and CDE.

In order to find the principal block


block, first write the treatment combinations in standard order as follows:
(1)

ab

ac

bc

abc

ad

bd

abd

cd

acd

bcd

abcd

ae

be

abe

ce

ace

bce

abce

de

ade

bde

abde

cde

acde

bcde

abcde

Place a treatment combination in the principal block if it has an even number of letters in common with the confounded
effects AD, BE and ABC. The principal block has (1), acd , bce and abde( = acd bce).
Obtain other blocks of confounding arrangement from principal block by multiplying the treatment combinations of the
principal block by a treatment combination not occurring in it or in any other block already obtained.
In other words, choose treatment combinations not occurring in it and multiply with them in the principal block.
Choose only distinct blocks. In this case, obtain other blocks by multiplying a, b, ab, c, ac, bc, abc like as in the following
. Theyy are separated byy bold letters.

8
Arrangement of treatments in blocks when AD, BE and ABC are confounded
Principle
p
block 1

Block 2

Block 3

Block 4

Block 5

Block 6

Block 7

Block 8

(1)

ab

ac

bc

abc

acd

cd

abcd

bcd

ad

abd

bd

bce

abce

ce

ace

be

abe

ae

abde

bde

ade

de

abcde

bcde

acde

cde

For example, block 2 is obtained by multiplying a with each factor combination in principal block as

(1) a = a, acd a = a 2 cd = cd , bce a = abce, abde a = a 2bde = bde; block 3 is obtained by multiplying b with (1),

acd , bce and abde and similarly other blocks are obtained.
If any other treatment combination is chosen to be multiplied with the treatments in principal block, then we get a block
which will be one among the blocks 1 to 8. For example, if ae is multiplied with the treatments in principal block, then the
blocks obtained consists of

(1) ae = ae, acd ae = cde, bce ae = abc and abde ae = bd


which is same as the block 8.

Alternatively if ACD,
Alternatively,
ACD ABCD and ABCDE are to be confounded
confounded, then independent defining contrasts are ACD,
ACD ABCD,
ABCD
ABCDE and the principal block has (1), ac, ad and cd (= ac x ad).

Analysis of variance in case of confounded effects


When an effect is confounded, it means that it is not estimable. The following steps are followed to conduct the analysis of
variance in case of factorial experiments with confounded effects:
Obtain the sum of squares due to main and interaction effects in the usual way as if no effect is confounded.
Drop the sum of squares corresponding to confounded effects and retain only the sum of squares due to
unconfounded effects.
Find the total sum of squares.
Obtain the sum of squares due to error and associated degrees of freedom by subtraction.
Conduct the test of hypothesis in the usual way.

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE VII
LECTURE - 32

ANALYSIS OF COVARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Any scientific experiment is performed to know something that is unknown about a group of treatments and to test certain
hypothesis about the corresponding treatment effect.

When variability of experimental units is small relative to the treatment differences and the experimenter do not wishes to
use experimental design
design, then just take large number of observations on each treatment effect and compute its mean
mean.
The variation around mean can be made as small as desired by taking more observations.

When there is considerable variation among observations on the same treatment and it is not possible to take an unlimited
number of observations, the techniques used for reducing the variation are

(i) use of proper experimental design and


(ii) use of concomitant variables.

The use of concomitant variables is accomplished through the technique of analysis of covariance
covariance.

If both the techniques fail to control the experimental variability then the number of replications of different treatments (in
other words, the number of experimental units) are needed to be increased to a point where adequate control of variability
is attained.

I t d ti to
Introduction
t analysis
l i off covariance
i
model
d l
In the linear model

Y = X 11 + X 2 2 + ... + X p p + ,
if the explanatory variables are quantitative variables as well as indicator variables, i.e., some of them are qualitative and
some are quantitative, then the linear model is termed as analysis of covariance (ANCOVA) model.
Note that the indicator variables do not provide as much information as the quantitative variables. For example, the
q antitati e obser
quantitative
observations
ations on age can be con
converted
erted into indicator variable.
ariable Let an indictor variable
ariable be
1 if age 17 years
D=
0 if age < 17 years.

Now the following quantitative values of age can be changed into indicator variables.
Ages (in years)

Ages (in terms of indicator variable)

14

15

16

17

20

21

22

In many real application


application, some variables may be quantitative and others may be qualitative
qualitative. In such cases
cases, ANCOVA
provides a way out.

It helps in reducing the sum of squares due to error which in turn reflects the better model adequacy diagnostics.
See how does this work:
In one way model : Yij = + i + ij ,

we have TSS1 = SSA1

+ SSE1

In two way model : Yij = + i + j + ij ,

we have TSS 2 = SSA2 + SSB2

+ SSE2

I three
In
th
way model
d l : Yij = + i + j + k + ik , we h
have TSS3 = SSA3 + SSB3 + SS 3 + SSE3 .

If we have a given data set, then ideally

TSS1 = TSS 2 = TSS3


SSA1 = SSA2 = SSA3 ;
SSB2 = SSB3 .
So SSE1 SSE2 SSE3 .

Note that in the construction of F - statistics we use

So F - statistic essentially depends on the SSEs.


SSEs
Smaller SSE large F more chance of rejection.

SS (effects) / df
.
SSE / df

Since

SSA, SSB etc. here are based on dummy variables, so obviously if SSA, SSB, etc. are based on quantitative

variables, they will provide more information. Such ideas are used in ANCOVA models and we construct the model by
incorporating the quantitative explanatory variables in ANOVA models.

In another example,
example suppose our interest is to compare several different kinds of feed for their ability to put weight on
animals. If we use ANOVA, then we use the final weights at the end of experiment. However, final weights of the animals
depend upon the initial weight of the animals at the beginning of the experiment as well as upon the difference in feeds.
U off ANCOVA models
Use
d l enables
bl us tto adjust
dj t or correctt th
these iinitial
iti l diff
differences.
ANCOVA is useful for improving the precision of an experiment.

Suppose response Y is linearly related to covariate X (or concomitant variable).


Suppose experimenter cannot control X but can observe it. ANCOVA involves adjusting for the effect of X.

If such an adjustment is not made, then the X can inflate the error mean square and makes the true differences is Y due
to treatment harder to detect.

If for a given experimental material, the use of proper experimental design cannot control the experimental variation, the
use
se of concomitant variables
ariables (which
( hich are related to experimental
e perimental material) may
ma be effective
effecti e in reducing
red cing the variability.
ariabilit

6
Consider the one way classification model as

E (Yij ) = i

i = 1, 2,..., p; j = 1, 2,..., N i ,

Var (Yij ) = 2 .

If usual analysis of variance for testing the hypothesis of equality of treatment effects shows a highly significant
difference in the treatment effects due to some factors affecting the experiment, then consider the model which takes into
account this effect

E (Yij ) = i + tij

i = 1,
1 2,...,
2
p, j = 11, 2,...,
2 Ni ,

Var (Yij ) = 2
where tij are the observations on concomitant variables (which are related to Xij) and is the regression coefficient
associated with tij. With this model, the variability of treatment effects can be considerably reduced.
For example, in any agricultural experimental, if the experimental units are plots of land then, tij can be measure of fertility
characteristic of the jth plot receiving ith treatment and Xij can be yield.
In another example, if experimental units are animals and suppose the objective is to compare the growth rates of groups of
animals receiving different diets. Note that the observed differences in growth rates can be attributed to diet only if all the
animals are similar in some observable characteristics like weight, age etc. which influence the growth rates.

In the absence of similarity, use tijj which is the weight or age of jth animal receiving ith treatment.
If we consider the quadratic regression in tij then

E (Yij ) = i + i tij + 2tij2 ,

i = 1,..., p, j = 1,..., ni ,

Var (Yij ) = 2 .
2
ANCOVA in this case is the same as ANCOVA with two concomitant variables tij and tij .

In two way classification with one observation per cell,

E (Yij ) = + i + j + tij ,

i = 1,..., I , j = 1,..., J

or

E (Yij ) = + i + j + i tij + 2 wij


with

= 0,

=0,

then ( yij , tij ) or ( yij , tij , wij ) are the observations in (i, j )

th

cell and tij , wij are the concomitment variables.

The concomitant variables can be fixed on random.


y
We consider the case of fixed concomitant variables only.

One-way classification
Let Yij ( j = 1... ni , i = 1... p ) be a random sample of size ni from ith normal populations with mean

ij = E (Yij ) = i + tij
Var (Yij ) = 2
where i , and 2 are the unknown parameters,
parameters tij are known constants which are the observations on a
concomitant variable.
The null hypothesis is H 0 : 1 = 2 = ... =
Let

1
ni

y ;

1
ni

t ;

yio =
tio =

ij

yoj =

ij

toj =

p.

1
1
yij , yoo = yij

p i
n i j

1
1
tij , too = tij

p i
n i j

n = ni .
i

Under the whole parametric space ( ) , use likelihood ratio test for which we obtain
squares principle (or maximum likelihood estimation) as follows:
Minimize

S = ( yij ij ) 2
i

= ( yij i tij ) 2
i

S
= 0 for fixed
i
i = yio tio .

i ' s and using the least

Put

in

i.e. minimize

ij

yio (tij tio ) with respect to gives

( y y )(t t
=
(t t )
ij

io

ij

io

ij

Thus

S
= 0,

S and minimize the function by

io

i = yio tio
ij = i + tij .

Since yij ij = yij i tij

= yij yio (tij tio ),


we have

( yij yio )(tij tio )


i
j
.
( yij ij ) 2 = ( yij yio ) 2

2
i
j
(tij tio )
i

Under H 0 : 1 = ... = p = (say),


consider

S w = yij tij
i

and minimize S w under sample space

S w
= 0,

S w
=0

( w ) as

10

= yoo too

( y y )(t t
(t t )
ij

oo

ij

oo

ij

oo

ij = + tij .
Hence

( yij yoo )(tij too )


i
j

( yij ij ) 2 = ( yij yoo ) 2

2
i
j
i
j
(tij too )
i

and
2

( ij ij ) 2 = ( yi yoo ) + (tij tio ) (tij too ) .

i
j
i
j

The likelihood ratio test statistic in this case is given by

max L( , , 2 )
w

max L( , , 2 )

(
=
( y
i

ij

ij ) 2

2
ij ij )

11

Now we use the following


g theorems:
Theorem 1: Let Y = (Y1 , Y2 ,..., Yn ) follow a multivariate normal distribution N ( , ) with mean vector and positive

. Then Y A Y follows a noncentral chi-square distribution with p degrees of freedom and


A , i.e., 2 ( p, A ) if and only if A is an idempotent matrix of rank p.

definite covariance matrix


noncentrality parameter

Theorem 2: Let Y = (Y1 , Y2 ,..., Yn ) follows a multivariate normal distribution N ( , ) with mean vector and positive
definite covariance matrix

2
2
Let Y AY
follows ( p1 , A1 ) and Y A2Y follows ( p 2 , A2 ) . Then Y AY
and Y A2Y
1
1

0
are independently
i d
d tl di
distributed
t ib t d if A1A2 = 0.

2
Theorem 3: Let Y = (Y1 , Y2 ,..., Yn ) follows a multivariate normal distribution N ( , I ) , then the maximum likelihood (or

least squares)
sq ares) estimator L of estimable linear parametric ffunction
nction is independentl
independently distrib
distributed
ted of 2 ; L follow
follo
2

n
2
N L , L( X X )1 L and
follows (n p) where rank ( X ) = p.
2

Using these theorems on the independence of quadratic forms and dividing the numerator and denominator by respective
degrees of freedom, we have

(
n p 1
F=
p 1 ( y
i

ij

ij ) 2

ij

ij ) 2

~ F ( p 1, n p ) under H 0 .

So reject H 0 whenever F F1 ( p 1, n p ) at level of significance.

12
The terms involved in

can be simplified for computational convenience as follows:

We can write

( yij ij ) 2 = yij tij

i
j
i
j

= ( yij yoo ) (tij too )

i
j

= ( yij yoo ) (tij too ) + (tij tio ) (tij tio )

i
j
= ( yij yio ) (tij tio )
i

= ( yij yoo ) + (tij tio ) (tij too )

i
j

= ( yij ij ) 2 + ( ij ij ).
i

For computational convenience

2
2

T
E
yt
yt
( ij ij ) 2 Tyy E yy

Ttt
Ett

= i j
=
( yij ij ) 2

E yt2

E yy

i
j
E
yy

where

Tyy = ( yij yoo ) 2 , Ttt = (tij too ) 2 , Tyt = ( yij yoo )(tij too ),
i

E yy = ( yij yio ) 2 , Ett = (tij tio ) 2 , E yt = ( yij yio )(tij tio ).


i

13

Analysis of covariance table for one way classification is as follows:

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE VII
LECTURE - 33

ANALYSIS OF COVARIANCE
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Two way
y classification ((with one observations p
per cell))
Consider the case of two way classification with one observation per cell.
Let yij ~ N ( ij , ) be independently distributed with
2

E ( yij ) = + i + j + tij , i = 1...I , j = 1...J


Var ( yij ) = 2
where

: Grand mean

1 : Effect of ith level of A satisfying


1 : Effect of jth level of B satisfying

=0

=0

i
J
i

tij : Observation (known) on concomitant variable.


The null hypothesis under consideration are

H 0 : 1 = 2 = ... = I = 0
H 0 : 1 = 2 = ... = J = 0
Dimension of whole parametric space ( ) : I + J
Dimension of sample space ( w ) : J + 1 under H 0
Dimension of sample space ( w ) : I + 1 under H 0
with respective alternative hypotheses as

H1 : At least one pair of ' s is not equal


H1 : At least one pair of ' s is not equal.

Consider the estimation of parameters under the whole parametric space ( ) .


Find minimum value of

( y

ij

ij ) 2 under .

To do this, minimize

(y

ij

i j tij ) 2 .

For fixed , which gives on solving the least squares estimates (or the maximum likelihood estimates) of the respective
parameters as

= yoo to
i = yi yoo ( tio too )

(1)

j = yoj yoo ( toj too ).


Under these values of , i and j , the sum of squares

( y

ij

ij

yio yoj + yoo + (tij tio toj + too ) .

( y
i =1 j =1

Now minimization of (2) with respect to

ij

gives

yio yoj + yoo )(tij tio toj + too )


I

(t
i =1 j =1

ij

tio toj + too )

i j tij ) 2 reduces to

(2)

4
Using , we get from (1)

= yoo too
i = ( yio yoo ) ( tio too )
j = ( yoj yoo ) ( toj too )).
Hence

( yij yioi yojj + yoo )(tij tioi tojj + too )


i
j

( yij ij ) 2 = ( yij yio yoj + yoo ) 2

2
i
j
i
j
(tij tio toj + too )
i

= E yy

E yt2
Ett

where

E yy = ( yij yio yoj + yoo ) 2 ,


i

E yt = ( yij yio yoj + yoo )(tij tio toj + too ),


Ett = (tij tio toj + too ) .
2

Case (i) : Test of H 0


Minimize

( y

ij

j tij ) 2 with respect to , j and

gives the least squares estimates (or the maximum

likelihood estimates) of respective parameters as

= yoo too

j = yoj yoo ( toj too )

( y y )(t t
(t t )
ij

oj

ij

oj

(3)

ij

oj

= + j + tij .

Substituting these estimates in (3) we get

( yij yoj )(tij toj )


i
j

( yij ij ) 2 = ( yij y j ) 2

2
i
j
i
j
(tij toj )
i

E yt + Ayt
= E yy + Ayy
.
Ett + Att

6
where

Ayy = J ( yio yoo ) 2


i

Att = J ( tio too ) 2


i

Ayyt = J ( yio yoo )( tio too ) 2


i

E yy = ( yij yio yoj + yoo ) 2


i

Ett =

(t

ij

tioi tojj + too ) 2

E yt = ( yij yio yoj + yoo )(tij tio toj + too ).


i

Thus the likelihood ratio test statistic for testing H 0 is

( y

ij

1 =

ij ) 2 ( yij ij ) 2

( y

2
ijj ijj )

Adjusting with degrees of freedom and using the earlier result for the independence of two quadratic forms and their
distribution

( yij ij ) 2 ( yij ij ) 2

( IJ I J ) i j
i
j
F1 =
~ F ( I 1, IJ I J ) under H 0 .

2
( I 1)
( yij ij )

i
j

So the decision rule is to reject H 0 whenever F1 > F1 ( I 1, IJ I J ).

Case (ii) : Test of H 0


Minimize

( y

ij

i tij ) 2 with respect to , i and gives the least squares estimates (or the maximum

likelihood estimates) of respective parameters as

= yoo  too
 j = yio yoo  ( tio too )

( y y )(t t
 =
(t t )
ij

io

ij

io

(4)

ij

io

 ij =  +  i + ij .
From (4), we get

(
y
y
)(
t
t
)

ij
io
ij
oj
i
j

2
2
( yij  ij ) = ( yij yio )

2
i
j
i
j
(tij tio )
i

E yt + Byt
= E yy + Byy
Btt

where
Byy = I ( yoj yoo ) 2
j

Btt = I ( toj too ) 2


j

Byt = I ( yio yoo )( toj too ) 2 .


j

Thus the likelihood ratio test statistic for testing H 0 is

( yij  ij ) 2 ( yij ij ) 2
( IJ I J ) i j

i
j
F2 =
2
~ F ( J 1, IJ I J ) under H 0 .

( J 1)
(
y

ij
ij

i
j
So the decision rule is to reject H 0 whenever F2 F1 ( J 1, IJ I J ).
If H o is rejected, use multiple comparison methods to determine which of the contrasts i are responsible for this
rejection.
j ti
Th
The same iis ttrue ffor

H 0 .

9
The analysis of covariance table for two way classification is as follows:

Source

of

Sum of products

Adjusted sum of squares

variation
Degrees of

yy

yt

tt

freedom
Between

Degrees of
freedom

I 1

Ayy

Ayt

Att

I 1

J 1

Byy

B yt

Btt

J 1

levels of A
B t
Between
levels of

q0 = q 3 q2

( I 1)( J 1)

E yy

Total

IJ 1

Tyy

Error +
levels of A

IJ J
IJ I

Error
o +

E yt
Tyt

Ett

IJ I J

Ttt

IJ 2

F1 =

IJ I J q0
I 1 q2

F2 =

IJ I J q1
J 1 q2

q1 = q4 q2

Error

levels of

Sum of squares

q2 = E yy

E yt2
Ett

_________

q3 = ( Ayy + E yy )
q4 = ( Byy + E yt )

( Ayt + E yt ) 2
Att + Ett
( Byyt + E yty ) 2
Btt + Ett

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE VIII
LECTURE - 34

ANALYSIS OF VARIANCE IN
RANDOM--EFFECTS MODEL AND
RANDOM
MIXED--EFFECTS MODEL
MIXED
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Fixed effects model


Fixed-effects
All the factors in a fixed-effects model for an experiment have a predetermined set of levels, i.e., they are fixed. The
statistical inferences are drawn only for those levels of the factors which are actually used in the experiment.

Random-effects model
The levels of factors used in experiment are randomly drawn from a population of possible levels in case of a randomeffects model for an experiment. The statistical inferences are drawn from the data for all levels of the factors in the
population from which the levels were selected and not only the levels used in the experiment.

For example, in case of quality control experiment about the daily production of five machines from an assembly line, we
have the following set ups of fixed and random effect models:
i.

Fixed-effects: The daily production of five particular machines from an assembly line.

ii.

Random-effects: The daily production of five machines, chosen at random, that represent the machines as a
class.
class

Many studies involve factors having a predetermined set of levels and factors in which the levels used in the study are
randomly selected from a population of levels.
For example, the blocks in a randomized complete block design may represent a random sample of b plots of land taken
from a population of plots in an agricultural research land. Then the effects due to the blocks are considered to be random
effects.

Suppose the treatments are four new varieties of wheat that have been developed to be resistant to a specific bacteria.
The levels of the treatment are fixed because there are only varieties of interest to the researchers, whereas the levels of
th plots
the
l t off land
l d are random
d
b
because
th researchers
the
h
are nott interested
i t
t d in
i only
l those
th
plots
l t off land
l d but
b t are interested
i t
t d in
i the
th
effects of these treatments on a wide range of plots of land.

When some of the factors to be used in the experiment have levels randomly selected from a population of possible levels
and other factors have predetermined levels, the model used to relate the response variable to the levels of the factors is
referred to as a mixedeffects model.

Mixedeffects model
In a mixed-effects model for an experiment, the levels of some of the factors used in the experiment are randomly selected
from a population of levels, whereas the levels of the other factors in the experiment are predetermined.

The inferences from the data in the experiment concerning factors with fixed levels are only for the levels of the factors
used in the experiment, whereas the inferences concerning factors with randomly selected levels are for all levels of the
factors in the population from which the levels were selected.

A l i off variance
Analysis
i
in
i one way random-effects
d
ff t model
d l
The model with random effects is of the same structure as the model with fixed effects given as

yij = + i + ij (i = 1, 2,..., s; h = 1, 2,..., ni ).


Here

is the general mean effect as usual and

parameter

ij s are the usual random error component. The meaning of the

i however has now changed. The i ' s are now the random effects of the i th treatment ( i th machine). Hence,

i ' s are the random variables whose distributions we have to specify


specify. We assume
E ( i ) = 0,
and

Var ( i ) = 2
E ( ij i ) = 0,
0
E ( i j ) = 0 (i j ).

Then

yij ~ ( , 2 + 2 )
holds.

In the model with fixed effects,, the treatment effect A was represented
p
by
y the p
parameter estimates i , or i = + i ,
respectively. In the model with random effects, a treatment effect can be expressed by the variance components. The
2
variance is estimated as a component of the entire variance. The absolute or relative size of this component then

makes conclusions about the treatment effect


ff
possible.

The estimation of the variances 2 and 2 requires no assumptions about the distribution. For the test of hypothesis and
the computation of confidence intervals, however, we assume the normal distribution, i.e.,

ij ~ N (0, 2 ),
ij 's are assumed to be independent of each other,
i ~ N (0, 2 ),
i 's are assumed to be independent of each other.
and, hence,

yij ~ N ( , 2 + 2 ).
Unlike, the model with effects, the response values yij of a level i of the treatment (i.e., of the
uncorrected

E ( yij )( yij ' ) = E ( i + ij )( i + ij ' ), j j '


= E ( i2 ) = 2 .

i th sample) are no longer

7
On the other hand, the response values of different samples are still uncorrelated (i i ', for any j , j ') :

E ( yij )( yi ' j ' ) = E ( i i ' ) + E ( ij ' i ' j ' ) + E ( i ' ij ) = 0.


In the case of a normal distribution, uncorrelated can be replaced by independence.
2
2
Test of the null hypothesis H 0 : = 0 against H1 : > 0

The hypothesis : no treatment effect for the two models is:


- fixed effects: H 0 : i = 0 for all i.
2
- random effects: H 0 : = 0.

If 2 = 0, then the random

effects are identically 0. In this case, each i estimate (i = 1, 2,..., s ) should be close

to 0 relative to the MSE.


2
If > 0, then the random effects

are not identically 0. In this case, the variability of the i estimate (i = 1, 2,..., s )

should be large relative to the MSE


MSE.
Testing hypotheses about the equality of means is meaningless in the random effects case. Therefore, we do not
perform a multiple comparison procedure to compare the means.
.

The ANOVA table for a random factor is the same as the ANOVA table for a fixed factor with

SSTotal = SSTreatment + SS Error


.
To see this we need to look at the expected mean squares for the random effects model
model. We can partly adopt some of
the results of fixed effect model, we have for random effect model;

E ( MS Error ) = 2
i.e., 2 = MS Error is an unbiased estimate of 2 .

We compute E ( MSTr ) as follows:


s

ni

SSTr = ( yio yoo ) 2 ,


i =1 j =1

yio = + i + io ,
yoo = + + oo ,

= ni i / n,
( yio yoo ) = ( i ) + ( io oo ).
)

9
Then

E ( yio yoo ) 2 = E ( i ) 2 + E ( io oo ) 2 ,

E ( i ) 2 = E ( i2 ) + E ( 2 ) 2 E ( i )
ni2
n
= 1 + 2 2 i ,
n
n

E ( io2 oo ) 2 = E ( io2 + E ( oo2 ) 2 E ( io oo )


=

2
ni

2
n

1 1
= 2 .
ni n

10

Hence
ni

E( y

io

j =1

yoo ) 2 = ni E ( yio yoo ) 2

n
= ni + i
n

and

2
i

n2

ni
ni
2
+ 1
n
n

ni2

2
ni E ( yio yoo ) = n
1)
+ ( s 1).

n
i =1

We have now
i.

In the unbalanced case, i.e., all sample sizes ni s are not the same, we have

E ( MSTr ) =

1
E ( SS A ) = 2 + k 2
s 1

with

k=
ii.

1
1
2
n ni ;
s 1
n

In the balanced case , we have (ni = r for all i, n = rs )

k=

1
1 2
rs s r = r ,
s 1
rs

E ( MSTr ) = 2 + r 2 .

11

2
2
This yyields the unbiased estimate of as follows:

(i) In the unbalanced case

2 =

MSTr MS Error
k

(ii) in the balanced case

2 =

MSTr MS Error
.
r

In the case of an assumed normal distribution we have

MS Error ~ 2 n2 s
and

MSTr ~ ( 2 + k 2 ) s21.
The two distributions are independent, hence the ratio

MSTr
2
.
MS Error 2 + k 2
has a central F- distribution under the assumption of equal variances, i.e., under H 0 : 2 = 0.

12

Under H 0 : 2 = 0 we thus have


.

MSTr
~ Fs 1,n s .
MS Error
2
Hence H 0 : = 0 is tested with the same test statistic as H 0 : i = 0 (all i ) in the model with fixed effects. The table of

the analysis of variance remains unchanged. It is given as follows:

Source

Sum of

Degrees of

E(MS)

squares

freedom

Effects
Fixed

Treatments

SSTr

s1

Error

SSError

ns

Random

n
+
i

s 1

2
i

2 + k2
2

13

Note: The estimate of

2 can be negative also (2 < 0).


negative.
0) But we know that a variance component cannot be negative

The following are 3 possible ways to handle this situation:


2
1. Assume = 0 and the negative estimate occurred due to random sampling. The problem is that using zero

instead of a negative number can affect the other estimates.

2 Estimate
2.

2 using the restricted maximum likelihood method because it always yields a nonnegative estimate.
estimate

This method will adjust other variance component estimates.

3. Assume the model is incorrect, and examine the problem in another way. For example, add or remove an effect
from the model, and then analyze the new model.

Analysis of Variance and


Design of Experiments-I
MODULE VIII
LECTURE - 35

ANALYSIS OF VARIANCE IN
RANDOM-EFFECTS MODEL AND
MIXED-EFFECTS MODEL
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Confidence intervals for variance components


Given the normality and independence assumptions of the random effects model, we can generate various confidence
intervals related to the variance components. We consider now several cases.

Case 1: For 2

Because

(n s ) MS Error

(n s ) MS Error

/2,n s
2

~ n2 s , a 100(1 )% confidence interval for 2 is

(n s ) MS Error

12 ,n s

Case 2: For 2
There is no closed form for a confidence interval for

2 .

Case 3: For

2
2

If all sample sizes are same, i.e., ni = r , then because MS Error and MSTr are independent

MSTr / ( 2 + r 2 )
~ F ( s 1, n s ).
MS Error / 2

MSTr
2
P F1 /2 ( s 1, n s )
F /2 ( s 1, n s ) =
1
2
2

+
MS
r
(
)

Error

MS Error ( 2 + r 2 )
1
1
1
P

=
2
MSTr
F /2 ( s 1, n s )
F1 /2 ( s 1, n s )

MS Error
1
P

MSTr
F1 /2 ( s 1, n s)

r 2
1
1.
=
1 + 2

(
1,
)

F
s
n
s

/2

2
2
Thus, a 100 (1 )% confidence interval for / is (L, U) where

1 MSTr
1
1

k MS Error F /2 ( s 1, n s )

1 MSTr
1

.
k MS Error F1 /2 ( s 1, n s )

and

4
2
Case 4: For 2 2
+

Note that

2
1- = P L 2 U

2
= P 1 + L 2 1 + U

2 + 2
= P 1 + L

+
U
1

1
2
1
=
2

2
1 + L + 1 + U

2
1
1
= P 1
1
1 2
2
1 + L + 1 + U
L
2
=
2 2
P
1 + L +

U
.

+
U
1

Thus, L , U is a 100 (1 )% confidence interval for 2 2 which represents the proportion of the total
+
1+ L 1+U
variability attribute to the variability among the treatments.
2

Analysis of variance in mixed-effects models


Suppose that we have a mixed-effects model for a design where one effect is fixed and the other is random. Let us consider
a mixed for a general

a b factorial treatment structure in a completely randomized design. The model is

yijk = + i + j + ij + ijk ,
where we use the following conditions with the levels of factor A fixed and the levels of factor B randomly selected:
1.

is the unknown overall mean response.

2. i is a fixed effect corresponding to the


3.

j is a random effect due to the j th level of factor B. The j ' s have independent normal distributions, with
mean 0 and variance

4.

i th level of factor A with a = 0.

2 .

ij is a random effect due to the interaction of the i th level of factor A with the j th level of factor B. The ij s
have independent normal distributions with mean 0 and variance

5.

2 .

The j ' s, ij ' s and ijk ' s are mutually independent.

Using these assumptions, the analysis of variance table for a fixed, random, or mixed model in a two-factor experiment is
shown in following table.

6
ANOVA table for an a b factorial treatment structure, with n observations per cell

Source

Sum
of

Degrees of
freedom

E(MS)

Mean
squares

Fixed effects

Random Effects

Mixed Effects
A fixed, B Random

squares

SSA

a-1

MSA

2 + bn

2 + n 2 + bn 2

2 + n 2 + bn

SSB

b-1

MSB

2 + an

2 + n 2 + an 2

2 + n 2 + an

AB

SSAB (a - 1)(b 1)

MSAB

2 + n

2 + n 2

Error

SSE

MSE

Total

TSS

ab(n 1)

(nab 1)

2 + n 2
2

The test for

H 0 : 2

2 is the same in the mixed model as in the random-effects model. That is, to test

0 versus H 0 : 2 > 0, we use the statistic

F=

MSAB
.
MSE

Reject H 0 if F > F1 ,( a 1)( b 1), ab ( n 1).


No matter what the results of our test for

, we would proceed to use the following tests for factors A and B. For factor

A we have

H 0 : 1= ...= a= 0
H a : at least one of the s differs from the rest.

The test statistic is

F=

MSA
MSAB

based on df1 =(a 1) and df 2 =(a 1) (b 1).

For factor B, we have

H 0 : 2 = 0
H1 : 2 > 0.
The test statistic is

F=

MSA
MSAB

based on df1 =
(b 1) and

df 2 =
(a 1) (b 1).

The analysis of variance procedure outlined for a mixed-effects model for an factorial treatment structure can be used as
well for a randomized block design, where treatments are fixed, blocks are assumed to be random, and there are
observations for each block and treatment.

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE IX
LECTURE - 36

ANALYSIS OF
NONORTHOGONAL DATA
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

Orthogonal
g
data
The concept of orthogonality of data is associated with two or higher way classified data. Consider the set up of two-way
classified data.
Let A and B be two factors at p and q levels respectively.
Let nij : number of observations in (i, j)th cell.
Let yijk = kth observation in cell, i = 1, 2,..., p; j = 1, 2,..., q; k = 1, 2,..., nij ,

Ai = yijk : marginal total correspondence to i th level of A


j

= Tij
i

where Tij =

ijk

: cell total

B j ` = yijkj : marginal total correspondence to j th level of B


i

= Tij .
j

Let

A = B
i

= nio ,

ij

ti =

= G : Grand total,

ij

= noj ,

io

= noj = n
j

Ai
: marginal mean of A,
nij
j

Bj
b j =
: marginal mean of B.
nij
j

3
If any contrast

l t

i i

of marginal means of A is orthogonal to any contrast

m b
j

of the other marginal means, then the

data are called orthogonal


orthogonal, other
otherwise
ise non
non-orthogonal.
orthogonal
If each cell has constant number of observations then the data are orthogonal as shown below.
Note that in this case
q

nio = nijj = n = nq, n j = np.


The definition extends to higher classification if we treat the marginal means of every pair of factors of classification.
L = li ti =

l A
i

qn

1
li (Ti1 + ... + Tiq ).
qn i

Similarly
M = m j b j =
j

m B
j

Bn

1
m j (T1 j + ... + Tpj ).
pn j

The sum of products of the coefficients of identical observations in the two contrasts is
1
pqn 2

l m n = pqn l m
i

= 0.

So the data are orthogonal as the two contrasts are orthogonal.


When cell frequencies are proportional, i.e.,

nij
nij '

Cj
C j'

( j , j ' = 1, 2,..., r; i = 1,..., p )

then also the data are orthogonal.


The same is true for higher order classification if the number of observations in the ultimate cell is constant.

Analysis of non-orthogonal two way data


When the data are non-orthogonal, then the analysis is no longer simple as the straight solution of normal equations is not
available. Consider the following usual two way model:

yiik = + i + j + ijk

i = 1,..., I ; j = 1,...., J ; k = 1,..., nij .

Assume that ijk ' s are identically and independently distribution as N (0,
(0 2 )).

Using the least squares principal, we minimize the sum of squares due to error as

E = ijk2 = ( yijk i j ) 2
i

E
= 0 noo + nio i + noj j = yijk = G

j
i
j
k
i

(1)

E
= 0 nio + nio i + nij j = yijk = Ai (i = 1, 2,..., I )
1
j
j
k

(2)

E
= 0 noj i + noj j = yijk = B j
j
i
k

From (3), we obtain

j =

j
noj

noj
noj

( j = 1, 2,..., J ).

(3)

nm j m ( we use m instead of i to avoid confusion).


m

Put j in (2), nioi + nioi i +

Bj
1
n
j ij n n
oj
oj

m mjj m = Ai

or

1
Bj
Ai nij
= nio + nio i nij nij
nojj
j
j
j

nojj
1
= nio i nij nmj m
noj m
j

nij2
= i ni

j nojj

or


1
nij .
nojj
j

Bj
nij2
Ai nij
= i nio
n
j
j noj
oj

Qi

nmj m
m

nij nmj

m
m i j noj

Cii

mj

Cim

or

Qi = Cii i + Cim m

(i = 1, 2,..., I ).

(4)

m i

These are referred to as the reduced normal equations.


Qi is called the adjusted treatment total of ith level of A.

nij2
nojj

Cii = nio
j

Cim =
j

nij2
noj

nij nmj
nojj

Cim = Cmi .

These equations in (4) are not independent. See how- write

Q1 = C111 + C1m m
m 1

Q2 = C22 2 + C2 m m
m2

QI = CII I + CIm m .
m I

Sum them all on left and right hands of equality sign


I
nij B j
Q
=
Ai

i
noj
i =1
i =1
j
I
nB
= Ai ij j .
noj
i =1
i
j
I

7
Using normal equations

Q = n
i =1

io

nij

noj + noj oj + niji

i

noj

+ nioi + nij j

nij

= nio + nioi + nij j


i
j
i j noj

+ n

n
n
i ij i
oj j
oj

= nii + nij j nij j ni j


i

= 0.
Consider right hand side of (4) and sum over I, we get

[ RHS of
i

(4) ] = Cii i + Cim m


i
m i

= 0 (using normal equations)

The I equation in (4) are not independent.


So no unique solution exists

if i (1.....I ) is a set of solution then i + (i = 1, 2,..., I ) is also a set of solution where is a constant.
To get unique solution, impose a condition

=0.

i ' s are estimated as deviates from their mean.


As a matter of fact restrictions need not be
contrasts. Such restriction changes

only.

= 0 always, it can be any linear functions of i ' s other than their

8
'
After obtaining set of solution from (4) for i s, we can obtain the solution of 'j s from (3) if so required. Further, the error

sum of squares
q
is

E = ( yijk i j ) 2
2
= yijk
G i Ai j B j
i

nmjj m

Bj

m
m
= y G B j

j

noj
i
j
k
j
noj

B2
2
= yijk
j i Qi .
i
j
k
j noj
i
2
ijk

(5)

Here in this case, we eliminated j and obtained the error sum of squares by obtaining i .
Now we consider the other way round, i.e., eliminate i , obtain j and then obtain the sum of square due to error. So
doing so, we eliminate i and obtain the error sum of square as follows:

Ai2
j R j
E = y
i
j
k
i nio
j
2
ijk

(6)

where Rj is the adjusted total of jth level of B given by

Rj = Bj
i

nij Ai
nio

Both error sum of squares (5) and (6) are the same, so

B 2j
Ai2

i n + j j R j = j n + i iQi
io
oj
or

B 2j
Ai2
i n j n = i iQi j j R j .
io
oj

(7)

Analysis of Variance and


Design
g of ExperimentExperiment
p
-I
MODULE IX
LECTURE - 37

ANALYSIS OF
NONORTHOGONAL DATA
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur

2
Now under

H 0 : 1 = 2 = ... = k = 0,,
we get the model as

yijk = + j + ijk .
Now minimize the sum of squares due to error as

E * = ( yijk j ) 2
i

E *
= 0 noo + noj j = G

j
E *
= 0 noj + noj j = B j
j
or

j =

Bj
noj

The error sum of squares is

E1 = yijk j
i

= y
2
ijk

B 2j
noj

(8)

The sum of squares due to A = E1 E obtained by equation (5) - equation (8) = i Qi


which is called as the adjusted sum of squares due to A whereas

Unadjusted sum of squares due to A =


i

Ai2 G 2
.
nio noo

Once adjusted sum of squares due to A is obtained,


obtained the adjusted sum of squares due to B can be obtained from (7)
(7). The
analysis of variance table is given as follows.
Source

Degrees of

Sum of squares

Mean squares

freedom
A (adjusted)

I1

Q
i

MSA =

B (unadjusted)

J1

B 2j

n
j

Error

IJ I J 1

oj

G2
noo

SSerror
(by subtraction)

Total

IJ 1

G2
y

noo
i
j
k
2
ijk

MSB =

SSA(adjusted)
I 1

SSB (unadjusted)
J 1

MSE =

SSerror
IJ I J 1

MSA
MSE

The sum of squares due to errors is

E = y
2
ijk

B 2j
noj

i Qi
i

G2 Bj G2
2
= yijk

1Qi

n
n
n
i
j
k
j
i
oo
oj
oo

= Total S.S. - S.S. block(unadjusted) - S.S. treat(adjusted)

(9)

and also

Ai2
E = y
j R j
n
i
j
k
j
i
io
2
ijk

Ai2 G 2
G2
2
= yijk


j R j
noo j nio noo i
i j k

(10)

= Total S.S. - S.S. treat(adjusted) - S.S. block(unadjusted).


For the Fisher-Cochran theorem to be used here, we need to have
Total = SS(treat) + SS(block) + SS Error which is possible only under (9) or (10)
(10).
Total SS SS treatment(adjusted) + SS block(adjusted ) + SS Error.

When interaction effect is present


When interaction effect is present, then the model is

yijk = + i + j + ij + ijk , i = 1, 2,..., I ; j = 1, 2,..., J .


The normal equation for ij ' s are

Tij = nij ( + i + j + ij ).
The error sum of squares is

E2 = yijk ( yijk i j ij )
i

= y
2
ijk

Tij2
nij

under the null hypothesis that all ij ' s are zero,


zero we already have derived the error sum of squares as

E = y
2
ijk

B 2j
noj

i Qi .
i

Hence, the sum of squares


q
due to interaction is

E E2 =
i

Tij2
nij

B 2j
noj

i Qi .
i

It has ( I 1)( J 1) degrees


g
of freedom, if there is at least one observation in each cell; otherwise it is reduced byy the
number of cells having no observation.

Estimate of treatment contrast and its variance


Let L =

A
i

be a treatment contrast. The estimate of i is a linear function of Qi ' s . Hence L is estimated by

another linear function of Qi .


Let

L = A i i = qi Qi .
i

If i ' s are available as linear functions of Qi ' s , then qi ' s can be obtained easily. However, qi ' s can be obtained as
follows:

Since
Qi = Cii i + Cim im , i = 1, 2,..., I
m i

so

A = q [C
i

+ Ci 2 2 + ... + CiI I ].

Equating the coefficients of i in this identity, we get

A i = qmCmi (i = 1, 2,.., I ).
m

This is the same as the normal equation


Cii i + Cim im = Qi
mi

'
except that Qi s are substituted by A i ' s and the unknown i ' s have been written as qi ' s .

Hence qi ' s can be obtained from the solution of the same normal equation
equation.

Now

nB
Var (Qi ) = Var Ai ij i = Cii2 2

noj
i

n B
n B
Cov(Qi , Qm ) = Cov Ai ij i Am mj j
noj
noj
i
i

2
= Cim .

So

Var ( qi Qi ) = 2 qi2Cii + qi qmCim


im
i

= 2 qi qmCim
m
i

2
= qi A i .
i

In particular

Var ( i im ) = 2 (qi qm )
)
where
h
qi and
d qm are the
th coefficients
ffi i t off Qi and
d Qm respective
ti iin th
the expression
i giving
i i th
the estimate
ti t off ( i m ).
2
The sum of squares of a contrast is the square of the contrast is the square of the contrast divided by the coefficient of

in its variance. Hence, the sum of squares of the estimate of


2

qi Qi
i
.
A i qi
i

A
i

is

When data is proportionate to cell frequencies


Wh the
When
th cellll frequencies
f
i are the
th same in
i a column
l
((row),
) th
though
h it may vary ffrom column
l
tto column,
l
th
then nonorthogonal type of data is obtained. Suppose the frequencies in each row of the ith column be ni. Then

Cii = rni
where N =

rni2
.
N

and r is the number of rows,

Cij =

rni nm
.
N

The reduced normal equations become

or

ni2
r
r ni i ni nm m = Qi
N
N mi

n
Q
ni i i nm m = i .
N m
r

Imposing restriction

n
m

= 0,
0 the solution is obtained as

i =

Qi
.
rni

In this case the adjustment


j
sum of squares
q
due to can be obtained from

Ai2 G 2
i rn rN
i

Bi2 G 2
.
and unadjusted sum of squares due to B is
rN
j rni
2 1 1
Thus Var ( i m ) =
+ .
n ni nm

Você também pode gostar