Basic Econometrics

MATHEMATICAL ECONOMICS
IV Semester
COMPLEMENTARY COURSE
B Sc MATHEMATICS
(2011 Admission)
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
Calicut University P.O. Malappuram, Kerala, India 673 635
422
School of Distance Education
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
STUDY MATERIAL
COMPLEMENTARY COURSE
B Sc Mathematics
IV Semester
MATHEMATICAL ECONOMICS
Prepared by &
Sri. Shabeer K P,
Assistant Professor,
Dept. of Economics,
Govt College Kodanchery.
Scrutinised by:
Layout:
Computer Section, SDE
Reserved
Mathematical Economics
Page 2
CONTENTS
PAGE No.
MODULE I
INTRODUCTION TO ECONOMETRICS
MODULE II
TWO VARIABLE REGRESSION MODEL
17
MODULE III
THE CLASSICAL NORMAL LINEAR

REGRESSION MODEL
37
MODULE IV
EXTENSION OF TWO VARIABLE

REGRESSION MODEL
52
Page 3
Page 4
MODULE I
INTRODUCTION TO ECONOMETRICS
1.1 Definition and Scope of Econometrics
Literally interpreted, econometrics means economic measurement. Econometrics
deals with the measurement of economic relationships. It is a science which
combines economic theory with economic statistics and tries by mathematical and
statistical methods to investigate the empirical support of general economic law
established by economic theory. Econometrics, therefore, makes concrete certain
economic laws by utilising economics, mathematics and statistics. The term
econometrics is formed from two words of Greek origin, oukovouia meaning
economy and uetpov meaning measure.
Although measurement is an important part of econometrics, the scope of
econometrics is much broader, as can be seen from the following quotations. In the
words of Arthur S Goldberger econometrics may be defined as the social science in
which the tools of economic theory, mathematics and statistical inference are
applied to the analysis of economic phenomena. Gerhard Tintner points out that
econometrics, as a result of certain outlook on the role of economics, consists of
application of mathematical statistics to economic data to lend empirical support to
the models constructed by mathematical economics and to obtain numerical
results. For H Theil econometrics is concerned with the empirical determination of
economic laws. In the words of Ragnar Frisch the mutual penetration of
quantitative econometric theory and statistical observation is the essence of
econometrics.
Thus, econometrics may be considered as the integration of economics,
mathematics and statistics for the purpose of providing numerical values for the
parameters of economic relationships and verifying economic theories. It is a special
type of economic analysis and research in which the general economic theory,
formulated in mathematical terms, is combined with empirical measurement of
economic phenomena. Econometrics is the art and science of using statistical
methods for the measurement of economic relations. In the practice of
econometrics, economic theory, institutional information and other assumptions are
relied upon to formulate a statistical model, or a set of statistical hypotheses to
explain the phenomena in question.
Economic theory makes statements or hypotheses that are mostly qualitative in
nature. Econometrics gives empirical content to most economic theory.
Econometrics differs from mathematical economics. The main concern of the
mathematical economics is to express economic theory in mathematical form
(equations) without regard to measurability or empirical verification of the theory. As
noted above, econometrics is mainly interested in the empirical verification of
Page 5
economic theory. The econometrician often uses the mathematical equations

proposed by mathematical economist but put these equations in such a form that
they lend themselves to empirical testing. Further, although econometrics
presupposes the expression of economic relationships in mathematical form, like
mathematical economics it does not assume that economic relationships that are
exact. On the contrary, econometrics assumes that economic relationships are not
exact but stochastic. Econometric methods are designed to take into account
random disturbances which create deviations from exact behavioural patterns
suggested by economic theory and mathematical economics. Econometric methods
are designed in such a way that they take into account the random disturbances.
Econometrics differs both from mathematical statistics and economic statistics.
An economic statistician gathers empirical data, records them or charts them, and
then attempts to describe the pattern in their development over time and detects
some relationship between various economic magnitudes. Economic statistics is
mainly descriptive aspect of economics. It does not provide explanations of the
development of the various variables and measurement of the parameters of
economic relationships. On the contrary, mathematical statistics deals with
methods of measurement which are developed on the basis of controlled
experiments in laboratories. Statistical methods of measurement are not
appropriate for economic relationships, which cannot be measured on the basis of
evidence provided by controlled experiments, because such experiments cannot be
designed for economic phenomena. For instance, in studying the economic
behaviour of human beings one cannot change only one factor while keeping all
other factors constant. In real world, all variables change continuously and
simultaneously. So the controlled experiments are not possible. Econometrics uses
statistical methods for adapting them to the problems of economic life. These
adapted statistical methods are called econometric methods. In particular,
econometric methods are adjusted so that they become appropriate for the
measurement of economic relationships which are stochastic, that is, they include
random elements.
1.2
Methodology of Econometrics
Broadly speaking, traditional or classical econometric methodology consists of

the following steps.
1)
2)
3)
4)
5)
6)
7)
8)
Statement of the theory or hypothesis

Specification of the mathematical model of the theory
Specification of the econometric model of the theory
Obtaining the data
Estimation of the parameters of the econometric model
Hypothesis testing
Forecasting or prediction
Using the model for control or policy purposes.
To illustrate the preceding steps, let us consider the well known psychological law of
consumption.
Page 6
1) Statement of theory or hypothesis

Keynes stated the fundamental psychological law......is that men (women) are
disposed, as a rule and on average, to increase their consumption as their income
increases, but not as much as the increase in their income. In short, Keynes
postulated that the marginal propensity to consume (MPC), that is, the rate of
change in consumption as a result of change in income, is greater than zero, but
less than one. That is 0<MPC<1.
2) Specification on the mathematical model of consumption
Economic theory may or may not indicate the precise mathematical form of the
relationship or the number of equations to be included in the economic model.
Mathematical model is specifying mathematical equations that describe the
relationships between economic variables as proposed by the economic theory.
Although Keynes postulated a positive relationship between consumption and
income, he did not specify the precise form of functional relationship between the
two. For simplicity, a mathematical economist might suggest the following form of
the Keynesian consumption function:
Y =1+2X
0< 2 <1
(1.1)
Where Y = consumption expenditure, X= income and 1and 2, known as

parameters of the model are intercept and slope coefficients respectively. The slope
coefficient 2 measures the MPC. This equation, which states that consumption is
linearly related to income, is an example of mathematical model of relationship
between consumption and income that is called the consumption function in
economics. A model is simply a set of mathematical equations. If the model has only
one equation, it is called single equation model. If the model has more than one
equation it is called a multiple equation model.
In the above equation (1.1), the variable appearing on the left side of the equality
sign is called the dependent variable and the variables on the right side are called
the independent or explanatory variables. Thus, in the Keynesian consumption
function, consumption expenditure is the dependent variable and income is the
explanatory variable.
(3) Specification of the econometric model of consumption
The purely mathematical model of the consumption function as in equation
(1.1) is of limited interest to the econometrician because it assumes that there is an
exact or deterministic relationship between consumption and income. But
relationships between economic variables are generally inexact. Thus, if we obtain
data on consumption and income, we could not expect all the observations to lie on
the straight line. This is because of the fact that in addition to income other
variables affect consumption expenditure. For example, size of family, ages of the
members in the family, family religion etc are likely to exert some influence on
consumption.
Page 7
To allow for the inexact relationship between economic variables, the

econometrician would modify the deterministic consumption function as follows
Y =1+2X+u
(1.2)
Consumption expenditure
Where u, known as the disturbance or error term, is a random or stochastic

variable. The disturbance term u represents all those factors that affect
consumption but are not taken into account explicitly. This equation is an example
of an econometric model. More technically, it is an example of linear regression
model. The econometric consumption function hypotheses that the dependent
variable Y (consumption) is linearly related to the explanatory variable X (income)
but that the relationship between the two is not exact; it is subject to individual
variation. The econometric model of consumption function is shown below
Income
(4) Obtaining Data

To estimate the econometric model given in equation (1.2), that is, to obtain
the numerical values of 1and 2, we need data. Three types of data may be
available for empirical analysis. They are time series, cross-sectional and pooled
data. A time series is a set of observations on the values that a variable takes at
different times. That is, time series data give information about the numerical
values of variables from period to period. Such data may be collected at regular
time intervals such as daily, weekly, monthly, quarterly, annually, quinquennially or
decennially. The data thus collected may be quantitative or qualitative. Qualitative
variables also called dummy variables or categorical variables can be every bit as
important as the quantitative variables. Thus, data on one or more variables
collected over a period of time is called time series data. That is, values of one or
more variables for several time periods pertaining to a single economic entity are
given such data set is called time series data. Cross-sectional data are data on one
or more variables collected at the same point of time. These data give information on
the variables concerning individual agents at a given point of time. Pooled data is a
combination of time series and cross sectional data. That is, in the pooled data are
elements of both time series and cross-sectional data.
Page 8
(5) Estimation of the econometric model

After the model has been specified and data has been collected, the
econometrician must proceed with its estimation. In other words, he must obtain
the numerical estimates of the coefficients of the model. In our case, the task is to
estimate the parameters of the consumption function, that is, 1 and 2. The
numerical estimates of the parameters gives empirical content to the consumption
function. The statistical tool of regression analysis is the main tool used to obtain
the estimates. Choice of the appropriate econometric technique for the estimation of
the function and critical examination of the assumptions of the chosen technique is
a crucial step.
(6) Hypothesis Testing
A hypothesis is a theoretical proposition that is capable of empirical
verification or disproof. It may be viewed as an explanation of some event or events,
and which may be true or false explanation. Assuming that the fitted model is a
reasonably good approximation of reality, we have to develop suitable criteria to find
out whether the estimates obtained are in accordance with the expectation of the
theory that is being tested. According to Milton Friedman, a theory or hypothesis
that is not verifiable by appeal to empirical evidence may not be admissible as a part
of scientific theory. Confirmation or refutation of economic theories on the basis of
sample evidence is based on a branch of statistical theory known as statistical
inference or hypothesis testing.
(7) Forecasting or Prediction
The objective of any econometric research is to obtain good numerical
estimates of the coefficients of economic relationships and use them for the
prediction of the values of economic variables. Forecasting is one of the prime aims
of econometric research. Econometric methods are used to estimate the parameters
of the model, to test the hypothesis concerning them and to generate forecasts from
the model. If the model confirms the hypotheses or theory under consideration, we
may use it to predict the future values of the dependent or forecast variable Y, on
the basis of known value or expected values of the explanatory, or predictor,
variable X. It is conceivably possible that the model is economically meaningful and
statistically and econometrically correct for the sample period for which the model
has been estimated, yet it may well not be suitable for forecasting. The reasons may
be either (a) the values of the explanatory variables used in the forecast may be
inaccurate or (b) the estimates of the coefficients may be incorrect due to
deficiencies of the sample data.
(8) Use the model for control or policy purposes
Suppose that we have estimated the Keynesian consumption function, then
the government can use it for control or policy purposes such as to determine the
level of income that will guarantee the target amount of consumption expenditure.
In other words, an estimated model may be used for control or policy purposes. By
appropriate fiscal and monetary policy mix, the government can manipulate the
control variable X to produce the desired lavel the target variable Y.
Page 9
1.3 The Nature of Regression Analysis

The term regression was introduced by Francis Galton. In a famous paper
Family Likeness in Stature, Galton found that, although there was a tendency for
tall parents to have tall children and for short parents to have short children, the
average height of children born of parents of a given height tended to move or
regress toward the average height of population as a whole. In other words, the
height of the children of unusually tall or unusually short parents tends to move
toward the average height of the population. Galtons law of universal regression
was confirmed by Karl Pearson, who collected more than a thousand records of
heights of members of family groups.
However, the modern interpretation of regression is quite different. Broadly
speaking, we may say that regression analysis is concerned with the study of the
dependence of one variable, the dependent variable, on one or more other variables,
the explanatory variables, with a view to estimating and/or predicting the
population mean or average value of the former in terms of the known or fixed
values of the latter.
In regression analysis, we are concerned with what is known as the statistical,
not functional or deterministic, dependence among variables. In statistical
relationships among variables we essentially deal with random or stochastic
variables. That is, variables that have probability distributions. A random or
stochastic variable is a variable that can take on any set of values, positive or
negative, with a given probability. On the other hand, in functional or deterministic
dependency, we also deal with variables, but these variables are not random or
stochastic.
In other words, a relation between X and Y characterised as Y = f(X) is said to
be deterministic or non-stochastic if for each value of independent variable X, there
is one and only one corresponding value of dependant variable Y. On the other
hand, a relation between X and Y is said to be stochastic if for a particular value of
X, there is whole probability distribution of values of Y.
Although regression analysis deals with the dependence of one variable on
other variables, it does not necessary imply causation. In the words of Kendall and
Stuart, a statistical relationship, however strong and however suggestive, can never
establish casual connection: our ideas of causation must come from outside
statistics, ultimately from theory or other. That is, a statistical relationship per se
cannot logically imply causation. To ascribe causality, one must appeal to a priori or
theoretical consideration.
1.4 Regression and Correlation
Closely related to but conceptually very much different from regression
analysis is correlation analysis, where the primary objective is to measure the
strength or degree of linear association between two variables. The correlation
coefficient measures this strength of linear association. For example, we may be
interested in finding the correlation coefficient between marks of mathematics and
Page 10
statistics examination, between smoking and lung cancer and so on. In regression
analysis, we are not primarily interested in such a measure. Instead, we try to
estimate or predict the average value of one variable on the basis of the fixed values
of other variables. Thus, we want to know whether predict the average mark on a
mathematics examination by knowing a students marks on a statistics
examination.
In regression analysis, there is an asymmetry in the way the dependent and
explanatory variables are treated. The dependent variable is assumed to be
statistical, random or stochastic, that is, to have a probability distribution. On the
other hand, the explanatory variables are assumed to have fixed values (in repeated
sampling). But in correlation analysis we treat any two variables symmetrically;
there is no distinction between the dependent and explanatory variables. The
correlation between marks of mathematics and statistics examinations is the same
as that between marks of statistics and mathematics examinations. Moreover, both
variables are assumed to be random. While most of the correlation theory is based
on the assumption of randomness of variables, whereas most of the regression
theory is based on the assumption that the dependent variable is stochastic but
explanatory variables are fixed or non-stochastic.
1.5 Two-Variable Regression Analysis
As noted above, regression analysis is largely concerned with estimating
and/or predicting the population mean or average value of the dependant variable
on the basis of the known value of the explanatory variable. We start by a simple
linear regression model, that is, by the relationship between two variables, one
dependent and one explanatory, related with a linear function. If we are studying
the dependence of a variable on only a single explanatory variable, such as
consumption expenditure on income, such study is known as the simple or twovariable regression analysis.
Suppose we are interested in studying the relationship between weekly
consumption expenditure Y and weekly after-tax or disposable family income X.
More specifically, we want to predict the population mean level of weekly
consumption expenditure knowing the family weekly income. For each of the
conditional probability distributions of Y we can compute its mean average value,
known as conditional mean or conditional expectation, denoted as E (Y/X=X i) and is
read as the expected value of Y given that X takes the specific value X i, which for
simplicity written as E(Y/Xi). An expected value is simply a population mean or
average value.
Each conditional mean E(Y/Xi) will be a function of Xi. Symbolically,
E(Y/Xi) = f (Xi)
(1.3)
Where f (Xi) denotes some function of the explanatory variable X i. We assume

that E(Y/Xi) is a linear function of Xi. Equation (1.3) is known as the two variable
Population Regression Function (PRF) or Population Regression (PR) for short. It
states merely that the population mean of the distribution of Y given X i is
Page 11
functionally related to Xi. In other words, it tells how the mean or average response
of Y varies with X. The form of the function f (X i) is important because in real
situations we do not have the entire population available for examination. Therefore,
the functional of the PRF is an empirical question. For example, an economist might
hypothesize that consumption expenditure is linearly related to income. Thus, we
assume that the PRF E(Y/Xi) is a linear function of Xi, say of the type,
E(Y/Xi) = 1+2Xi
(1.4)
Where 1and 2 are unknown but fixed parameters known as the regression
coefficients. 1and 2 are also known as the intercept and slope coefficients
respectively. Equation (1.4) is known as the linear population regression function or
simply the linear population regression. Some alternative expressions used are
linear population regression model or linear population regression equation. In
regression analysis our interest is in estimating the PRF like equation (1.4) that is
estimating the values of unknowns 1and 2 on the basis of observations on Y and
X.
1.6 The Meaning of the term Linear
The term linear can be interpreted in two different ways, namely, linearity in
variables and linearity in parameters. The first and perhaps more natural meaning
of linearity is that the conditional expectation of Y is a linear function of X i, such as
equation (1.4). Geometrically, the regression curve in this case is a straight line. In
this interpretation, a regression function such as E(Y/X i) = 1+2Xi2 is not a linear
function because the variable X appears with a power of 2.
The second interpretation of linearity, that is, linearity in parameters is that
the conditional expectation of Y, E(Y/X i), is a linear function of the parameters that
is s. It may or may not be linear in the variable X. In this interpretation E(Y/X i) =
1+2Xi2 is a linear regression model but E(Y/X i) = 1+ Xi is not. The later is an
example of a nonlinear in parameters regression model.
Of the two interpretations of linearity, linearity in the parameters is relevant
for the development of regression theory. Thus, for our analysis, the term linear
regression will always mean a regression that is linear in the parameters, that is s.
In other words, parameters are raised to the first power only. It may or may not be
linear in the explanatory variable, the Xs. It can be noted that E(Y/X i) = 1+2Xi is
linear in both parameters and variable.
1.7 Stochastic Specification of PRF
The stochastic nature of the regression model implies that for every value of X
there is a whole probability distribution of values of Y. In other words, the value of Y
can never be predicted exactly. The form of the equation (1.4) implies that the
relationship between consumption expenditure and income is exact, that is, all the
variations in Y are is due solely to changes in X, and there is no other factors
affecting the dependent variable. Given the income level of X i, an individual familys
consumption expenditure will cluster around the average consumption of all
families at that Xi, that is around the conditional expectation. Therefore, we can
express the deviation of an individual Y i around the expected value as follows.
Page 12
ui = Yi E(Y/Xi)
or
Yi= E(Y/Xi) + ui
(1.5)
Where the deviation ui is an unobservable random variable taking positive or

negative values. Technically ui is known as the stochastic disturbance or stochastic
error term. This is because u i is supposed to disturb the exact linear relationship
which is assumes to exist between X and Y. The equation (1.5) means that the
expenditure of an individual family, given its income level, can be expressed as the
sum of two components. Firstly, E(Y/X i), which is simply the mean consumption
expenditure of all families with the same level of income. This component is known
as the systematic or deterministic component. Secondly, u i, which is the random or
non-systematic component. In other words, [variations in Y i] = [Systematic variation]
+ [random variation] or [variations in Yi] = [explained variations] + [unexplained
variation].
Since E(Y/Xi) is assumed to be linear in X i, equation (1.5) may be written as
Yi= E(Y/Xi) + ui
Yi = 1+2Xi + ui
(1.6)
Equation (1.6) posits that the consumption expenditure of a family is linearly

related to its income plus the disturbance term. Now if we take the conditional
expectation, conditional upon the given Xs on both sides of the equation (1.5), we
obtain
E(Yi/Xi) = E[E(Y/Xi)]+ E(ui/Xi)
E(Yi/Xi) = E(Y/Xi)+ E(ui/Xi)
(1.7)
Since the expected value of a constant is that constant itself. Again, E(Y i/Xi) is
the same thing as E(Y/Xi), equation (1.7) implies that
E(ui/Xi) = 0
(1.8)
Thus, the assumption that the regression line passes through the conditional
mean of Y implies that the conditional mean values of u i (conditional upon the given
Xs) are zero. The stochastic specification clearly shows that there are other
variables besides income that affect the consumption expenditure and that an
individuals family consumption expenditure cannot be fully explained only by the
variables included in the regression model.
1.8 The Significance of the Stochastic Disturbance Term
As noted above, the disturbance tern u i is the substitute for all those variables that
are omitted from the model but that collectively affect Y. The reasons for not
introducing those variables into the model and significance of stochastic
disturbance term ui are explained below.
Page 13
a) Vagueness of theory
The theory determining the behaviour of Y may be incomplete. We might know
for certain that weekly income X influences weekly consumption expenditure Y, but
we might be ignorant or unsure about other variables affecting Y. Therefore, u i may
be used as a substitute for all the excluded or omitted variables from the model.
b) Unavailability of Data
Even if we know some of the excluded variables and therefore consider a multiple
regression rather than a simple regression, we may not have quantitative
information about these variables. For example, we could introduce family wealth as
an explanatory variable in addition to the income variable to explain family
consumption expenditure. But unfortunately, information about family wealth is not
generally available.
c) Core variables vs. Peripheral variables
Assume the in the consumption-income example that besides income X 1, the
number of children per family X 2, sex X3, religion X4, and education X5 also affect
consumption expenditure. But it is quite possible that the joint influence of all or
some of these variables may be so small ad non-systematic or random. Thus, as a
practical matter and for cost consideration we do not introduce them into the model
explicitly. Their combined effect can be treated as a random variable u i.
d) Intrinsic Randomness in Human Behaviour
Human behaviour is not predictable. Even if we succeed in introducing all the
relevant variables into the model, there is bound to be some intrinsic randomness in
individual Y that cannot be explained no matter how hard we try. The disturbances,
the us may very well reflect this intrinsic randomness.
e) Poor Proxy Variables
Although regression model assumes that the variables Y and X are measured
accurately, in practice the data may be plagued by errors of measurement. The
deviations of the points from the true regression line may be due to errors of
measurement of the variables, which are inevitable due to the methods of collecting
and processing statistical information. The disturbance term u i also represents the
errors of measurement.
f) Principle of Parsimony
Following Occams razor which states that descriptions be kept as simple as
possible until proved inadequate, we would like to keep our regression model as
simple as possible. If we can explain the behaviour of Y substantially with two or
three explanatory variables and if our theory is not strong enough to suggest that
other variables might be included, there is no need to introduce other variables. Let
ui represent all other variables.
Page 14
g) Wrong Functional Form

Even if we are theoretically correct variables explaining a phenomenon and even
if we can obtain data on these variables, very often we do not know the form of the
functional relationship between the regressand and the regressors. We may have
linearised a possibly nonlinear relationship. Or we may have left out of the model
some equations. The economic phenomena are much more complex than a single
equation may reveal, no matter how many explanatory variables it contains. Thus,
the disturbance term ui represents such errors which may be due to wrong
specification of the functional relationship.
Thus, for all these reasons, the stochastic disturbances u i assume an extremely
critical role in regression analysis.
1.9 The Sample Regression Function (SRF)
The linear relationship Yi = 1+2Xi + ui holds for the population of the values of
X and Y so that we could obtain the numerical values of 1 and 2 only if we have all
the conceivably possible values of X, Y and u which form the population of these
variables. But in practice one rarely has access to entire population of interest.
Thus, in most practical situations we have only sample Y values corresponding to
some fixed Xs. Therefore our task is to estimate the PRF on the basis of the sample
information.
Analogous to the PRF, we can develop the concept of Sample Regression
Function (SRF) to represent the sample regression line. The sample counter part of
equation (4) may be written as
Y = + X
(1.9)
Where Y is read as Y-hat or Y-cap. In the above equation,

Y = Estimator of E(Y/Xi)
= Estimator of 1
= Estimator of 2
An estimator also known as a sample statistic is simply a rule or formula or
method that tells us how to estimate the population parameter from the information
provided by the sample at hand. The particular numerical value obtained by the
estimator in an application is known as an estimate. Now we can express the SRF in
its stochastic form as
Y = + X + u
(1.10)
Where , u denotes the sample residual term. Conceptually, u is analogous to ui

and can be regarded as an estimate of u i. It is introduced in the SRF for the same
reason as ui was introduced in the PRF.
Page 15
Thus, to sum up, the primary objective in the regression analysis is to estimate the
PRF
Yi = 1+2Xi + ui
On the basis of the SRF
Y = + X + u
But because of the sampling fluctuations our estimate of PRF based on the SRF
is at best an approximate one. This approximation is shown the figure below.
Yi
Weekly Consumption expenditure
SRF:
Yi
ui
PRF: E(Y/Xi) = 1+2Xi
E(Y/Xi)
A
0
Xi
Weekly Income
For X = Xi, we have one sample observation Y=Y i. In terms of the SRF, the observed
Yi can be expressed as
Yi = Y + u
(1.11)
And in terms of PRF, it can be expressed as
Yi = E (Y/Xi) + ui
(1.12)
In the figure, Y overestimates the true E(Y/Xi). But at the same time, for any Xi
to the left of point A, SRF will underestimate the true PRF. Such over and
underestimation is inevitable because of sampling fluctuations. The important task
is to device a rule or method that will make the approximation as close as possible.
That is, SRF should be constructed such that is as close as possible to true 1
and is as close possible to the true 2, even though we never know the true 1
and 2.
Page 16
MODULE II
TWO VARIABLE REGRESSION MODEL
2.1 The Method of Ordinary Least Squares
Our task is to estimate the population regression function (PRF) on the basis
of the sample regression function (SRF) as accurately as possible. Though there are
several methods of constructing the SRF, the most popular method to estimate the
PRF from SRF is the method of Ordinary Least Squares (OLS). The method of
ordinary least squares is attributed to German mathematician Carl Friedrich Gauss.
Under certain assumptions, the method of least squares has some very attractive
statistical properties that have made it one of the most popular methods of
regression analysis.
The relationship between X and Y in the PRF is
Yi = 1+2Xi + ui
Since PRF is not directly observable, we estimate it from the SRF,
Y = + X + u
Yi = Y + u
Where Y is the estimated conditional mean value of Y i. Now to determine the SRF,
we have
u = Y Y
u = Y X
u = Y ( + X )
(2.1)
Which shows that the residuals, u are simply the difference between the actual
and estimated Y values. Given n pairs of observations on Y and X, we would like to
determine the SRF in such a manner that it is as close as possible to the actual Y.
To this objective, we may adopt the following criterion: Choose the SRF in such a
way that the sum of residuals u = ( Y Y ) is as small as possible. The
rationale of this criterion is easy to understand. It is intuitively obvious that the
smaller the deviations from the population regression line, the better the fit of the
line to the scatter of observations. Although intuitively appealing this is not a good
criterion as can be seen from the following scatter diagram.
Page 17
SRF:
X2
X1
X4
X3
If we adopt the criterion of minimising u , then all the residuals will receive
the same weight in the sum, although u and u are more widely scattered around
the SRF than u and u . In other words, all the residuals receive equal importance
no matter how close or how widely scattered the individual observations are from
the SRF. A consequence of this is that it is quite possible that the algebraic sum of
the u is small or even zero although u are widely scattered around the SRF. That is,
while summing these deviations the positive values will offset the negative values, so
that the final algebraic sum of these residuals might equal to zero.
For avoiding this problem the best solution is to square the deviations and
minimise the sum of squares. That is we can adopt the least squares criterion,
which states that the SRF can be fixed in such a way that
That is,
u =
= (Y X )
(Y Y )
(2.2)
Thus, in the least squares criterion, we try to make equation (2.2) as small as
possible, where u are the squared residuals. The reason for calling this method as
the least squares method is that it seeks the minimisation of the sum of squares of
the deviations of actual observations from the line. By squaring u , this method
gives more weight to residuals such as u and u in the above than the residuals u
and u As noted previously, under the minimum u criterion, the sum can be small
even though the u are widely spread around the SRF. But this is not possible under
the least squares method because the larger the absolute value of u , the larger will
be u . A further justification of the least squares method lies in the fact that the
estimators obtained by it have some very desirable statistical properties.
Page 18
It is obvious from the equation (2.2) that

u = f ( , )
(2.3)
That is, the sum of squared residuals is some function of the

estimators and . For any given set of data, choosing different values for and
will give different us and hence different values of u . The method of least
squares chooses and in such a manner that for a given sample or set of data
u is as small as possible. In other words, for a given sample, the method of least
squares provides us with unique estimators of and that give smallest possible
value of u . This can be accomplished through the application of differential
calculus. The process of differentiation yields the following equations for estimating
1 and 2.
= n + X
(2.4)
= X + X
(2.5)
Where n is the sample size. These simultaneous equations are known as the
normal equations.
2.1.1 Formal Derivation of the Normal Equations
We have to minimise the function
u
(Y X )
With respect to and . The necessary condition for minimum is that the
first derivatives of the function be equal to zero. That is,
= 0 and
=0
To obtain the above derivatives we apply the function of the function rule of
differentiation. Then, the partial derivatives with respect to will be,
(Y X )
This can be written as

Y = n + X
=0
(Y X ) (1) = 0
X )
=0
(2.4)
Page 19
The partial derivatives with respect to will be,

(Y X )
2
This can be written as
=0
(Y X ) (X ) = 0
X X
(YX
) =0
= X + X
(2.5)
2.1.2 Derivation of Least Squares Estimators

We have the normal equations as,
= n + X
(2.4)
= X + X
(2.5)
Applying the Crammers rule for solving we have

|A| =
n
X
X
X
= n X ( X )
For solving , forming the special matrix , |A2|

|A2| =
n
X
X Y
= n X Y
X Y
=
n X Y X Y
n X ( X )
Multiplying all the elements with n n we have
X Y
n n
n
=
X X
X n
n
n
X Y
X Y nX Y
X nX
Page 20
(X X ) (Y Y)
(X X )
Where X and Y are the sample means of X and Y. By defining x i = (X X )

and yi =(Y Y), that is letting the lower case letters to denote deviations from their
mean values, we have
=
(2.6)
To derive , let us reproduce the first normal equation, that is equation (2.4), we
have
Dividing both sides by n, we have
= n + X
That is, Y = + X
(2.7)
The estimators and obtained are known as ordinary least squares estimators
because they are derived from the least squares principle.
Example:
To illustrate the use of the above formulae we will estimate the supply function of
commodity z. We have 12 pairs of observations as shown in the following table.
Number of Observations
1
Quantity
69
Price
9
76
12
52
56
10
57
77
10
58
55
67
12
10
53
11
72
11
12
64
Page 21
The following is the worksheet for the estimation of the supply function of
commodity z.
n
Quantity
(Yi)
Price
(Xi)
x i yi
xi2
69
76
12
13
39
52
-11
-3
33
56
10
-7
-7
57
-6
77
10
14
14
58
-5
-2
10
55
-8
-1
67
12
12
10
53
-10
-3
30
11
72
11
18
12
64
-1
-1
n=12
Y = 756
yi =(
) xi = (
y =0
X = 108
Y=
X=
756
=
=
n
12
x =0
x y =156
= 48
108
=
=
n
12
x y
156
=
= .
x
48
X = 63-(3.25)9 = 33.75
Thus, the estimated supply function is
2.2 Numerical Properties of OLS Estimators
+ .
Numerical properties are those that hold as a consequence of the use of ordinary
least squares, regardless of how the data were generated. The following are
numerical properties of OLS estimators
I.
The OLS estimators are expressed solely in terms of the observable (that is,
sample ) quantities of X and Y. Therefore it can be easily computed.
Page 22
II.
III.
OLS estimators are point estimators. That is, given the sample, each
estimator will provide only a single (point) value of the relevant population
parameter.
Once the OLS estimates are obtained from the sample data, the sample
regression line can be easily obtained. The regression line thus obtained has
the following properties:
1) It passes through the sample means of Y and X. This fact is obvious
from equation (2.7) which can also be written as Y = + X. This is
shown in the following figure.
Y
SRF:
2) The mean value of the estimated Y that is, Y is equal to the mean value of the
actual Y. That is Y = Y
Proof:
Y = + X
Since, = Y X
We
will
get,
Y = (Y X) + X
Y = Y + (X X)
Applying summation
Dividing by n, we have
(X X)
Y
nY (X X)
=
+
n
n
n
since (X X) = 0
we will get, Y = Y
3) The mean value residuals, u is zero.
Y = nY +
Page 23
Proof:
From our earlier equation, we have
2
Or
(Y X ) (1) = 0
(Y X )
=0
But since u = Y X , the preceding equation reduces to u =0.
As a result of the proceeding property, the sample regression,

Y = + X + u
(1.10)
Can be expressed in an alternative form where both Y and X are expressed as

deviations from their mean values. For thus, taking the sum on both sides of (1.10),
we will get
Y = +
Since, u = 0, we have
X +
Y = + X
(2.8)
Dividing equation (2.8) through by n, we obtain

= + X
(2.9)
Note that equation (2.9) is the same as equation (2.7). Subtracting (2.9) from (1.10)
we obtain
Y
or
= (X X) + u
y = x +u
(2.10)
Where yi and xi are deviations from their respective sample mean values. Equation
(2.10) is known as the deviation form. Note that the intercept term, is absent. Yet
we can find it from our earlier equation (2.7). In the deviation form, the SRF can be
written as
y = x
(2.11)
4) The residuals are uncorrelated with the predicted Yi.

u = 0.
That is Y
Page 24
Proof:
Y u = x u
= x (y x )
=
x y
Since =
We have Y u =
Y u = 0
Proof:
as per equation (2.11)

x
x
5) The residuals u are uncorrelated with Xi, that is u X = 0
From, our earlier normal equations, we have

2
(Y X ) (X ) = 0
That is
2 u X = 0 or
u X =0
2.3 The Classical Linear Regression Model: The Assumptions Underlying the
Method of Least Squares
If our objective is to estimate 1 and 2 only, then the method of OLS will be
sufficient. But our aim is not only to obtain and but also to draw inferences
about the true 1 and 2. We would like to know how close and are to their
counterparts in the population or how close is Y to the true E(Y/Xi).The PRF: Yi =
1+2Xi + ui shows that Yi depends on both Xi and ui. Therefore unless we are
specific about how Xi and ui are generated, there is no way we can make any
statistical inference about Y i, 1 and 2. The classical linear regression model, which
is the cornerstone of the most econometric theory, makes 10 assumptions, which
are explained below.
1. Linear Regression Model
The regression model is linear in the parameters as shown in the PRF, Y i = 1+2Xi +
u i.
2. X Values are fixed in repeated sampling
Values taken by the regressor X are considered fixed in repeated samples. More
technically, X is assumed to be nonstochastic. Our regression analysis is
conditional regression analysis, that is, condition upon the given values of the
Page 25
regressor X. The Xis are a set of fixed values in the hypothetical process of repeated
sampling which underlines the linear regression model. This means that, in taking a
large number of samples on Y and X, the Xi values are the same in all the samples,
but the ui and Yi do differ from sample to sample.
3. Zero mean value of the disturbance u i
Given the value of X, the mean or expected value of the random disturbance term
ui is zero. Technically, the conditional mean value of u i is zero. Symbolically we have
E (ui/Xi) = 0
(2.12)
The equation states that the mean value of u i conditional upon the given Xi is
zero. This means that for each X, u may assume various values, some greater than
zero and some smaller than zero, but if we consider all the possible value of u, for
any given value of X, they would have an average value of zero. This assumption
implies that E(Yi/Xi) = 1+2Xi.
4. Homoscedasticity or equal variance of u i
Given the value X, the variance of ui is the same for all observations. That is, the
conditional variances of ui are identical. Symbolically, we have
var(ui/Xi) =E [u E
var(ui/Xi)= E(ui2/Xi) because of assumption 3

var(ui/Xi)=2
(2.13)
var stands for variance. Equation (2.13) states that the variance of ui for each X i is
some positive constant number equal to 2. Technically it represents the
assumption of Homoscedasticity or equal spread or equal variance. Stated
differently, the equation means that the Y populations corresponding to X values
have the same variance. The assumption also implies that the conditional variances
of Yi are also homoscedastic. That is,
var(Yi/Xi)=2
(2.14)
5. No autocorrelation between the disturbances

Given any two X values, Xi and Xj (ij), the correlation between any two u i and uj
(ij) is zero. Symbolically,
cov (ui,uj / Xi, Xj) = E [ui-E(ui/Xi)][ uj-E(uj/Xj)]
cov (ui,uj / Xi, Xj) = E(ui/Xi)E(uj/Xj)
cov (ui,uj / Xi, Xj) =0
since E(ui)=E(uj)=0
(2.15)
Page 26
Where i and j are two different observations and where cov means covariance.
Equation (2.15) postulates that the disturbances u i and uj are uncorrelated.
Technically, this is the assumption of no serial correlation or no auto correlation.
That is the covariances of any u i with any other uj are equal to zero. The value
which the random term assumed in one period does not depend on the value
which it assumed in any other period.
6. Zero covariance between ui and Xi or E(uiXi)=0
Formally, cov (ui, Xi) = E [u i-E(ui)][ Xi-E(Xi)]
cov (ui, Xi) = E [ui ( Xi-E(Xi))] since E(ui) = 0
cov (ui, Xi) = E (uiXi)-E(Xi)E(ui)
cov (ui, Xi) = E (uiXi) since E(ui)=0
cov (ui, Xi) =0, by assumption
(2.16)
The above assumption states that the disturbance u and explanatory variable X
are uncorrelated. The rationale for this assumption is as follows: when we express
the PRF as Yi = 1+2Xi+ui, we assumed that explanatory variable X and u which
represent the influence of all omitted variables have separate and additive influence
on Y. But if X and u are correlated, it is impossible to assess their individual effects
on Y.
7. The number of observations n must be greater than the number of
parameters to be estimated.
Alternatively, the number of observations n must be greater than the number of
explanatory variables. For instance, if we had only one pair of observations on Y and
X, there is no way to estimate the two unknowns, namely 1and 2. We need at least
two pairs of observations to estimate the two unknowns.
8. Variability in X values
The X values in a given sample must not all be the same. Technically, var(X)
must be a finite positive number. If all the X values are identical, then X = X and
the denominator of the equation =

will be zero. Thus, it is impossible to
estimate 2 and therefore 1. Variation in both Y and X is essential for regression
analysis. In short, variables must vary.
9. The regression model is correctly specified
Alternatively, there is no specification bias or error in the model used in
empirical analysis. An econometric investigation begins with the specification of the
econometric model underlying the phenomenon of interest. Some important
questions that arise in the specification of the model include the following: (a) what
variables should be included in the model? (b) What is the functional form of the
Page 27
model? Is it linear in the parameters, the variables or both? (c) What are the
probabilistic assumptions made about the Y i, the Xi and the ui entering the model?
10.
There is no perfect multicollenearity
That is, there is no perfect linear relationship among the explanatory variables. If
there is more than one explanatory variable in the relationship it is assumed that
they are not perfectly correlated with each other. Indeed, the regressors should not
even be strongly correlated; they should not be highly multicollinear.
2.4 Properties of Least Squares Estimators: The Gauss Markov Theorem
As noted earlier, given the assumptions of classical linear regression model,
the least squares estimates possess some ideal or optimum properties. These
properties are contained in the well known Gauss Markov theorem. To understand
this theorem, first we need to consider the best linear unbiasedness properties of an
estimator, which is explained below.
An estimator is the best when it has the smallest variance as compared to any
other estimators obtained from other econometric methods. Symbolically assumed
that has two estimates, namely and . is the best if,
Or var( ) < var )
E [ E ] < E [ E ]
An estimator is linear if it is linear function of sample observations. That is, it

is determined by linear combination of sample data. Given the sample observations,
Y1, Y2, Y3....Yn, a linear estimator will have the form
k1Y1+ k2Y2+ k3Y3+...+kn Yn
Where kis are some constants. The bias of an estimator is defined as the
difference between its expected value and the true parameter.
Bias = E ()
The estimator is unbiased if its bias is zero, that is, E () = this means that
the unbiased estimator converges to the true value of the parameter as the number
of samples increases. An unbiased estimator gives on the average the true value of
the parameter.
An estimator is best linear unbiased estimator (BLUE) if it is linear,
unbiased and has the smallest variance as compared to all other linear unbiased
estimators of the true . The BLU estimate has the minimum variance within a class
of linear unbiased estimators of the true .
An estimator, say the ordinary least squares estimator is said to be BLUE
of 2 if the following hold.
Page 28
1) It is linear, that is, a linear function of a random variable, such as the

dependent variable Y in the regression model.
2) It is unbiased, that is, its average or expected value, E( ) is equal to the
true value 2
3) It has the minimum variance in the class of all such linear unbiased
estimators; an unbiased estimator with the least variance is known as an
efficient estimator.
In the regression context it can be proved that the OLS estimators are BLUE.
This is the essence of the Gauss Markov Theorem, which can be stated as follows:
Given the assumptions of the classical linear regression model, the least squares
estimators, in the class of unbiased linear estimators, have the minimum variance,
that is, they are BLUE.
2.4.1 Proof of the Gauss Markov Theorem
1) Property of Linearity
The least squares estimates and are
sample values of Yi. Given that Xis appear
hypothetical repeated sampling process, it can
estimates depend on the values of Y i only, that is,
We have =
=
linear functions of the observed

always in the same values in
be shown that the least squares
= f(Y )and = f (Y )
x (Y Y)
x
x Y Yx )
x
= k Y
since x = 0
Or
where ki =
The values of Xs are fixed in hypothetical repeated sampling. Hence the k is are
fixed constants from sample to sample and may be regarded as constant weights
assigned to the individual values of Y. We write,
=
k Y = k Y + k Y + + k Y = f (Y)
The estimate is a linear function of Ys, that is, a linear combination of the values
of dependent variable.
Page 29
Similarly,
=Y X
=YX
Y
X
n
kY
kY
1
Xk Y
n
Since X values are constants and X and k are fixed constants from sample to
sample. Thus, depends only on the values of Yi, that is, is a linear function of
the sample values of Y.
2) Property of Unbiasedness
The property of unbiasedness of and can be proved if we can establish
E( ) = and E( ) = . The meaning of this property is that the estimates
converge to the true value of the parameters as we increase the number of
hypothetical sample.
We have = k Y
k ( + X + u )
k +
k X +
We can prove that k = 0 and k X = 1
k u)
That is, k =
ki =
=0
similarly, k X =
kX =
Since x
kX =
= X X X we have
(X X)X
x
X XX
x
Page 30
kX =
Therefore we have = + k u
X XX
X XX
= +
Taking the expected values, we have

E
=E
=1
x u
x
Since E = and E(u ) = 0 we will get E

estimator of .
= . Therefore, is unbiased
Similarly we can prove that is an unbiased estimator of .

We
=
Taking the expected values we have
E ( ) =
E ( ) =
E ( ) =
E ( ) =
1
Xk E(Y )
n
1
Xk ( + X )
n
X
Xk +
Xk X
n
n
n
+X
n
Since k = 0 and k X = 1 , we get
have
1
Xk Y
n
k +
X
X
n
kX
E ( ) = + X X
E ( ) =
Thus, is an unbiased estimator of .
3) Property of Minimum Variance
Page 31
In this section we will prove the Gauss Markov theorem which states that the
least squares estimates have the smallest variance as compared with any other
linear unbiased estimators.
It can be proved that var ( ) = E[ E( )]
Since = k Y we get
var
var
we have var (Y ) = and k =

so, var ( ) =
= var (
k Y)
var (Y )since k is constant
Similarly, it can be proved that
)=
var ( ) = E[ E( )]
1
Xk Y
n
var ( ) = var
since var (Y ) = we have
var
var
=
=
+X
Now taking LCM, we have

1
Xk
n
var( Y )
+X k
+X k
since k = 0 we have var

That is, var
var
+ X k
Page 32
var
since xi 2 = X2 nX ,
var 1 =
X2 nX nX
2
2
n xi
that is, ( ) =
Thus, we obtained the variance of both ordinary least squares estimators.

Now we need to prove that any other linear unbiased estimate of the true parameter
for example,2 , obtained from any other econometric method has a bigger variance
than the OLS estimator 2 .
That is, var(2 ) < var(2 )
The new estimator 2 is by assumption a linear combination of Y is. As a
weighted sum of the sample values of Y i, the weights being different from the weight
x
ki (= xi 2 ) of the OLS estimates. For example, let us assume
i
2 =
Where ci = ki + di
ci Yi
ki being the weights defined earlier for OLS estimates and d i, an arbitrary set of
weights.
The new estimator 2 is also assumed to be unbiased estimator of 2 . That is,
E(2 ) = 2
We have
2 =
2 = 1
2 =
ci (1 + 2 Xi + u )
i
ci +2
Taking expected value on both sides,
(2 ) = 1
ci Yi
ci +2
ci X i +
ci X i +
ci u
ci u
Page 33
If we assume, ci = o, ci Xi = 1 and ci ui = 0, then

( ) =
Now, the variance of the new estimator is

var
var
var
var
But k
= var (
c Y)
var (Y )since c is constant
var
var
= var
= variance of and therefore,
k +
k +
>
Given that dis are defined as an arbitrary constant weights not all of them are zero,
the second term is positive, that is, d > 0.
Therefore, var
Thus, in the group of linear unbiased estimates of true , the least squares
estimate has the minimum variance. In the similar way, we can prove that the least
squares intercept coefficient has the minimum variance.
2.5 The Coefficient of Determination: A Measure of Goodness of Fit
After the estimation of the parameters and the determination of the least
squares regression line, we need to know how good is the fit of this line to the
sample observations of Y and X. That is, we need to measure the dispersion of
observations around the regression line. This knowledge is essential, the closer the
observations to the line, the better the goodness of fit and the better is the
explanation of the variations of Y by the changes in the explanatory variables.
If all the observations lie on the sample regression line, we would obtain a
perfect fit. But this is rarely the case. Generally, there will be some positive u and
negative u . What we hope for is that these residuals around the regression line are
as small as possible. The square of correlation coefficient known as coefficient of
determination r2, in two variable case, and R2 in multiple regression is a summary
Page 34
measure that tells how well the sample regression line fits the data. As a measure of
goodness of fit, r2 shows the percentage of the total variation of the dependent
variable Y that is explained by the independent variable X.
To compute the coefficient of determination (r 2) we proceed as follows. Let
y = (Y Y) Total variation of the actual Y values about their sample mean,
which may be called Total Sum of Squares (TSS). We compute total variation of the
dependent variable by comparing each value of Y to the mean value Y and adding all
the resulting deviations. Note that in order to find the TSS, we square the simple
deviations, since by definition the sum of simple deviations of any variable around
its means is identically equal to zero.
y = Y Y Variations of the estimated Y values about their means, which
may be called the sum of squares due to regression, or explained by regression or
simply the Explained Sum of Squares (ESS). This is the part of the total variations
of Yi which is explained by the regression line. Thus, ESS is the total explained by
the regression line variation of the dependent variable.
u = Y Y Variations of the dependent variable which is not explained by
the dependent variable and is not explained by the regression line and is attributed
to the existence of the disturbance term. The Residual Sum of Squares (RSS) is the
sum of the squared residuals that gives the total unexplained variation of the
dependent variable Y around its mean.
In summary
y = Y Y deviations of Y from its mean
y = Y Y deviations of regressed value Y from the mean

u = Y Y deviations of Y from the regression line
Thus we have
Y =Y +u
Taking the sum and squaring, Y = Y + u

Since 2 Y u = 0, we get
That is
Y =
Y +2
Y =
TSS = ESS + RSS
Y +
u +
u
(2.17)
or
Page 35
]=[
]+[
The above equation shows that the total variation in the observed Y values
about their mean value can be partitioned into two parts; one attributable to the
regression line and the other to random forces because not all actual Y observation
lie on the fitted line.
Now dividing equation (2.17) by TSS on both sides, we obtain,
1=
1=
ESS RSS
+
TSS TSS
Y Y
Y Y
+
(Y Y)
(Y Y)
We now define r2 as
Or alternatively
=

=
( )
The quantity r2 thus defined is known as the coefficient of determination and is

the most commonly used measure of the goodness of fit of a regression line.
Verbally, r2 measures the proportion or percentage of the total variation in Y
explained by the regression model. For example, if r . = 0.90, this means that the
regression line explains 90 per cent of the total variation of Y values around their
mean. The remaining 10 per cent of the total variation in Y is unaccounted for by
the regression line and is attributed to the factors included in the disturbance
variable u. The following are two properties of the r 2.
1) r2 is a non negative quantity.
2) The limits of r2 are 0 r 1 . An r2 of 1 means perfect fit, that is, Y =
Y for each i. On the other hand, an r2 of zero means that there is no relationship
between the regressand and the regressor whatsoever ( = 0). In such case
Y = = Y. That is, the best prediction of any Y value is simply its mean value.
Therefore, in this situation, the regression line will be horizontal to the X axis.
That is, as noted above, u = Y Y is the proportion of the unexplained
variation of the Y around their mean. In other words, the total variation of Y is
explained completely by the estimated regression line. Consequently, there will
be no unexplained variation (RSS=0) and hence r 2=1. On the other hand, if the
regression line explains only part of the variation in Y, there will be some
unexplained variation, (RSS>0) and therefore r 2will be smaller than 1. Finally, if
the regression line does not explain any part of the variation in Y, Y Y =
(Y Y) and hence r2=0.
Page 36
MODULE III
THE CLASSICAL NORMAL LINEAR REGRESSION MODEL
3.1 The Probability Distribution of Disturbances
For the application of the method of ordinary least squares (OLS) to the
classical linear regression model, we did not make any assumptions about the
probability distribution of the disturbances u i. The only assumption made about u i
were that they had zero expectations, were uncorrelated and had constant variance.
With these assumptions we saw that the OLS estimators satisfy several desirable
statistical properties, such as unbiasedness and minimum variance. If our objective
is point estimation only, the OLS method will be sufficient. But point estimation is
only one aspect of statistical inference, the other being hypothesis testing.
Thus, our interest is not only in obtaining, say but also using it to make
statements or inferences about true . That is, the goal is not merely to obtain the
Sample Regression Function (SRF) but to use it to draw inferences about the
Population Regression Function (PRF). Since our objective is estimation as well as
hypothesis testing, we need to specify the probability distribution of disturbances u i.
In the module II we proved that the OLS estimators of and are both linear
functions of ui, which is random by assumption. Therefore, the sampling or
probability distribution of OLS estimators will depend upon the assumptions made
about the probability distribution of u i. Since the probability distribution of these
estimators are necessary to draw inferences about their population values, the
nature of probability distribution of u i assumes an extremely important role in
hypothesis testing.
But since the method of OLS does not make any assumptions about the
probabilistic nature of ui, it is of little help for the purpose of drawing inferences
about the PRF from SRF. But this can solved if we assume that the us follow some
probability distribution. In the regression context, it is usually assumed that the us
follow the normal distribution.
3.2 The Normality Assumption
The classical normal linear regression model assumes that each u i is
distributed normally with
Mean: E(u ) = 0
Variance: E(u ) =
cov u , u = 0 i j
(3.1)
(3.2)
(3.3)
These assumptions may also be compactly stated as

u ~N(0, )
(3.4)
Page 37
Where ~ means distributed as and where N stands for the normal distribution.
The terms in the parentheses represents the two parameters of the normal
distribution, namely, the mean and the variance. u is normally distributed around
zero mean and a constant finite variance . For each ui, there is a distribution of
the type of (3.4).The meaning is that small values of u have a higher probability to
be observed than large values. Extreme values of u are more and more unlikely the
more extreme we get.
For two normally distributed variables zero covariance or correlation means
independence of the two variables. Therefore, with the normality assumption,
equation (3.3) means that u and u are not only uncorrelated but also independently
distributed. Therefore, we can write equation (3.4) as,
u ~NID(0, )
(3.5)
Where NID stands for normally and independently distributed. There are several
reasons for the use of normality assumption, which are summarised below.
1) As noted earlier, ui represents the combined influence of a large number of
independent variables that are not explicitly introduced in the regression
model. We hope that the influence of these omitted or neglected variables is
small or at best random. By the central limit theorem of statistics it can be
shown that if there are large number of independent and identically
distributed random variables, then, with few exceptions, the distribution of
their sum tends to a normal distribution as the number of such variables
increases indefinitely. It is this central limit theorem that provided a
theoretical justification for the assumption of normality of u i.
2) A variant of central limit theorem states that even if the number of variables is
not very large or if these variables are not strictly independent, their sum may
still be normally distributed.
3) With the normality assumption, the probability distribution of the OLS
estimators can be easily derived because one property of the normal
distribution is that any linear function of normally distributed variables is
itself normally distributed.
4) The normal distribution is a comparatively simple distribution involving only
two parameters, namely mean and variance.
5) The assumption of normality is necessary for conducting the statistical tests
of significance of the parameter estimates and for constructing confidence
intervals. If this assumption is violated, the estimates of and are still
unbiased and best, but we cannot assess their statistical reliability by the
classical test of significance, because the latter are based on normal
distribution.
3.3 Properties of OLS Estimators under the Normality Assumption
With the assumptions of normality the OLS estimators have the following properties
1. They are unbiased.
Page 38
2. They have the minimum variance. Combined with property 1, this means that
they are minimum-variance unbiased or efficient estimators.
3. As the sample size increases indefinitely, the estimators converge to their
population values. That is, they are consistent.
4. is normally distributed with
E =
~N( ,
var
nx
or more compactly,
(3.6)
Then by the properties of normal distribution, the variable Z, which is defined
as Z =
follows the standardised normal distribution, that is normal
distribution with zero mean and unit variance or Z~N(0,1)

5. is normally distributed with
E =
~N( ,
6.
And Z =
(
var
or more compactly,
(3.7)
follows the standardised normal distribution.
= is distributed as X2 (chi- square) distribution with n-2 degrees of

freedom.
7. , are distributed independently of .
8. and have the minimum variance in the entire class of unbiased
estimators, whether linear or not. Therefore, we can say that the least squares
estimates are best unbiased estimators (BUE).
3.4 The Method of Maximum Likelihood
Like the OLS, the method of Maximum Likelihood (ML) is a method for
obtaining estimates of the parameters of population from the random sample. The
method was developed by R A Fisher and is an important procedure of estimation in
econometrics. In the ML, we take a fixed random sample. This sample might have
been generated by many different normal populations, each having its own
parameters of mean and variance. Which of these possible alternative populations is
most probable to have given rise to the observed n sample values? To answer this
question we must estimate the joint probability of obtaining all the n values for each
possible normal population. Then choose the population whose parameters
maximise the joint probability of observed sample values.
The ML method chooses among all possible estimates of the parameters, those
values, which make the probability of obtaining the observed sample as large as
Page 39
possible. The function which defines the joint (total) probability of any sample being
observed is called the likelihood function of the variable X.
The general expression of the likelihood function is
L(X , X , , X ; , , )
Where , , denote the parameters of the function which we want to estimate.

In the case of normal distribution of X, the likelihood function in its general form is
L(X , X , , X ; , )
The ML method consists of maximisation of the likelihood function. Following the

general condition of maximisation, the maximum value of the function is that value
where the first derivative of the function with respect to its parameters is equal to
zero. The estimated value of the parameters are the maximum likelihood estimates
of population parameters. The various stages of ML method are outlined below.
1) Form the likelihood function, which gives the total probability of the
particular sample values being observed.
2) Takes the partial derivatives of the likelihood function with respect to the
parameters which we want to estimate and set them equal to zero.
3) Solve the equations of the partial derivatives for the unknown parameters to
obtain their maximum likelihood estimates.
3.5 Maximum Likelihood Estimation of Two Variable Regression Model
We already established that in the two variable regression model,
Yi = 1+2Xi + ui
The Yi are normally distributed with mean= 1+2Xi and the variance 2. As a
result, the joint probability density function, given the man and the variance, can
be written as
f (Y1, Y2,...Yn/ 1+2Xi, 2)
as Ys are independent, this probability density function can be written as a
product of n individual P D Fs as
f (Y1, Y2,...Yn/ 1+2Xi, 2) = f (Y1 / 1+2Xi, 2) f (Y2 / 1+2Xi, 2)..... f (Yn /
1+2Xi, 2)
(3.8)
where ,
f (Yi) = f(Y ) =
(3.9)
which is the density function of a normally distributed variance with given mean
and variance. Substituting equation (3.9) in (3.8), we get
Page 40
f (Y1, Y2,...Yn/ 1+2Xi, 2) =
(3.10)
if Y1, Y2...Yn are known or given, but 1,2and 2 are not known, the function in
(3.10) is called a likelihood function denoted by LF (1,2, 2) and written as,
LF (1,2, 2)=
(3.11)
In the method of maximum likelihood, we estimate the unknown parameters in

such a manner that the probability of observing the given Ys is as high as
possible. Therefore, we have to find the maximum of equation (3.11). This is a
straight forward exercise of differential calculus as shown below.
For differentiation, it is easier to express equation (3.11) in log form as,
Log LF = nlog
Y X
n
1
log(2)
2
2
Y X
n
n
1
Log LF = log log(2)
2
2
2
Or
n
n
1
Log LF = log log(2)
2
2
2
Y X
Differentiating with respect to and setting equal to zero,

dLog LF
1
2
d
which is the same as the
Similarly,
dLog LF
1
2
d
2 Y X
Y X
Y = n +
normal equation of the least square theory.

2 Y X
YX =
=0
(1) = 0
X +
(X ) = 0
Page 41
Which is same as the second normal equation of the least squares theory.
Therefore, the ML estimators, s are the same as OLS estimators s.
3.6 Two variable Regression Model: Interval Estimation and Hypothesis

Testing
3.6.1 Interval Estimation: Some Basic Ideas
In statistics, the reliability of point estimator is measured by its standard error.

Therefore, instead of relying on the point estimate alone, we may construct an
interval around the point estimator, say within two or more standard errors on
either side of the point estimator, such that this interval has, say 95 % probability
of including the true parameter value. This is roughly the idea behind interval
estimator.
Assume that we want to know how close is to . For this purpose, we try to
find out two positive numbers, and , the latter lying between 0 and 1, such that
the probability that the random interval ( , + ) contain the true is 1-.
Symbolically,
Pr ( + = 1
(3.12)
Such interval is known as a confidence interval. 1- is known as confidence

coefficient and (0< <1) is known as the significance level. It is also probability of
committing Type I error. A Type I error consist in rejecting the true hypotheses,
where as Type II error consist in accepting the false hypotheses. In other words,
whenever we happen to incorrectly reject the null hypothesis, we make error of Type
I and whenever we happen to incorrectly accept the null hypothesis, we make error
of Type II. As a simile, Type I error is convicting an innocent, while that of Type II
error is letting go a guilty. Thus, the significance level () is the probability of
making the wrong decision. That is, the probability of rejecting the hypothesis when
it is actually true or the probability of committing Type I error. We choose a level of
significance for deciding whether to accept or reject our hypothesis.
The probability of Type II error may be denoted as . Then Pr(1-)= P, is
known as the power of the test, that is probability of rejecting a false hypothesis.
Significance level minimum and power maximum is the procedure usually
followed. The end point of the confidence interval are known as confidence limits.
The equation (3.12) shows that an interval estimator, in contrast to the point
estimator, is an interval constructed in such a manner that it has a specified
probability 1 of including within its limits the true value of the parameter. For
example, if = 0.05 or 5%, then the probability that the random interval includes
the true is 0.95 or 95%. Thus, the interval estimator gives a range of values
within which the true may lie. Generally in practice a level of significance () of
0.05 or 0.01 is customary.
Page 42
3.6.2 Confidence Interval for Regression Coefficients

(1) If
is known
~N( ,
Z=
P Z > Z/
P Z < Z/
(3.13)
(3.14)
That is, P Z <
(3.15)
< Z = 1
(3.16)
Substituting equation (3.13) in (3.16), we get
P Z <
< Z = 1
(3.17)
To remove from the denominator, we multiply all the elements by

subtracting from each element, we will get
< <
Rearranging, we will have
< < +
Therefore, [
Substituting
(2) If
If
, +
=1
]=1
and
(3.18)
(3.19)
] is the 100(1-) % confidence interval for .
as it is known, we get,
, +
is unknown
(3.20)
~ (0, 1) and Z ~X with n degrees of freedom, then

Z
Z
n
~t distribution with (n 2)degree of freedom
Page 43
=
That is,
(3.21)
(3.22)
or t
(3.23)
We proceed on the similar way as we did earlier, we have
< <
(3.24)
=1
Substituting equation (3.23) in (3.24), we get
P t <
< t = 1
(3.25)
Multiplying all the elements with se and subtracting from (3.25), we have
P t se < < t se
Rearranging, we will have
=1
P[ t se < < + t se ] = 1
Therefore,
interval for .
se , +
se
or
(3.26)
(3.27)
se
is the 100(1-) % confidence
By following the same procedure, we can get the confidence interval for also.
P[ t se < < + t se ] = 1
Therefore,
interval for .
se , +
se
or
(3.28)
se
is the 100(1-) % confidence
An important feature of the confidence interval given in (3.27) and (3.28) may
be noted. In both the cases, the width of the confidence interval is proportional to
the standard error of the estimator. That is, the larger the standard error, the larger
Page 44
is the width of the confidence interval. Put differently, the larger is the standard
error of the estimator, the greater is the uncertainty of estimating the true value of
the unknown parameter. Thus, the standard error of an estimator is often described
as a measure of the precision of the estimator. That is, how precisely the estimator
measures the true population value.
3.6.3 Confidence Interval for 2
As pointed out in the properties of OLS estimators under the normality
assumption (Property 6), the variable
X =
(n 2)
~X with n 2 degrees of freedom
Therefore, we can use X2 distribution to establish confidence interval .

Pr X
X X
(3.29)
= 1
Substituting for X2, we have
Pr X
= 1
(3.30)
Taking the reciprocal of equation (3.30) and rearranging we have
Pr
= 1
(3.31)
Multiplying all the elements of equation (3.31) by (n 2) we will get,
Pr
Therefore,
3.6.4 Hypothesis Testing
= 1
(3.31)
gives the 100(1-) % confidence interval for
Very often in practice, we have to make decisions about population on the

basis of sample information. In attempting to arrive at decisions, we make
assumptions about population. Such assumptions about populations are called
statistical hypothesis and in general, are assumptions about the form of distribution
of the population or the values of the parameter involved.
If a hypothesis determines the population completely, that is, if it specifies the
form of the distribution of the population and the values of all parameters involved,
it is called simple hypothesis; otherwise it is called composite hypotheses. Testing of
hypothesis is a procedure by which we accept or reject a hypothesis on the basis of
sample taken from population.
Page 45
The hypothesis which is tested for possible rejection under the assumption
that it is true is called null hypothesis and is denoted by H o. Rejection of Ho
naturally results in acceptance of some other hypothesis which is called alternative
hypothesis and is denoted by H1. The testable hypothesis is called the null
hypothesis. The term null refers to the idea that there is no difference between the
true value and the value we hypothesise. Since null hypothesis is a testable
hypothesis there must also exists a counter proposition to it in order to test the
hypothesised proposition. This counter proposition is called alternative hypothesis.
In other words, the stated hypothesis is known null hypothesis. The null hypothesis
is usually tested against alternative hypothesis which is also known as maintained
hypothesis.
The theory of hypothesis testing is concerned with developing rules or
procedures for deciding whether to reject or not reject the null hypothesis. There are
two mutually complementary approaches for devising such rules, namely confidence
interval and test of significance. Both these approaches predicate that the variable
(statistic or estimator) under consideration has some probability distribution and
that hypothesis tasting involves making statement or assertions about the values of
the parameters of such distribution.
In the confidence interval approach of hypothesis testing we construct a
100(1-) % confidence interval for the estimator, say 2. If 2 under H0 falls within
this confidence interval, we accept H0. But if it falls outside this interval we reject
H0. When we reject the null hypothesis, we say that our finding is statistically
significant. On the other hand, when we do not reject the null hypothesis, we say
that our finding is not statistically significant.
An alternative but complementary approach to the confidence interval method
of testing statistical hypothesis is the test of significance approach developed by R A
Fisher, Neyman and Pearson. Broadly speaking, a test of significance is a procedure
by which sample results are used to verify the truth or falsity of a null hypothesis.
The key idea of test of significance is that of a test statistic (estimator) and the
sampling distribution of such a statistic under the null hypothesis. The decision to
accept or reject H0 is made on the basis of the value of the test statistic obtained
from the data at hand. In the language of significance tests, a statistic is said to be
statistically significant if the value of the test statistic lies in the regions of rejection
(H0) or the critical region. In this case, the null hypothesis is rejected. By the same
token, a test is said to be statistically insignificant if the value of the test statistic
lies in the region of acceptance (of the null hypothesis). In this situation, the null
hypothesis is not rejected.
Thus, the first step in hypothesis testing is that of formulation of the null
hypothesis and its alternative. The next step consists of devising a criterion of test
that would enable us to decide whether the null hypothesis is to be rejected or not.
For this purpose the whole set of values of the population is divided into two
regions, namely the acceptance region and the rejection region. The acceptance
region includes the values of the population which have a high probability of being
observed and the rejection region or critical region includes those values which are
highly unlikely to be observed. Then the test is performed with reference to test
Page 46
statistic. The empirical tests that are used for testing the hypothesis are called tests
of significance. If the value of the test statistic falls in the critical region, the null
hypothesis is rejected; while if the value of test statistic falls in the acceptance
region, the null hypothesis is not rejected.
The following tables summarises the test of significance approach to
hypothesis testing.
(1) Normal Test of Significance-
is known
Suppose
~N( ,
Then test statistic used is
Z=
Decision Rules Z Test

Type of
Hypothesis
Two Tail
Right Tail
Left Tail
H0: The Null

Hypothesis
=
H1: The Alternative

Hypothesis
<
>
Critical Region
|Z| > Z/
Z > Z
Z < Z
Here,
is the hypothesised numerical value of . |Z| means absolute
value of Z. Z or Z/ means the critical Z value at the or /2 level of significance.
The same procedure holds to test hypothesis about .
(2) t-Test of Significance-
is unknown
When is unknown, we use t- distribution we use t distribution as the test

statistic. Under Normality assumption the variable,
t=
t=

se
Page 47
Decision Rules t Test

Type of
Hypothesis
H0: The Null

Hypothesis
Two Tail
Right Tail
Left Tail
H1: The Alternative

Hypothesis
<
>
Critical Region
|t| > t /
t > t
t < t
The same procedure holds to test hypothesis about .

(3) Testing the Significance of 2: the Chi- Square test
For testing the significance of 2, as noted earlier, we use the chi square test.
X =
(n 2)
~X with n 2 degrees of freedom
H0: The Null

Hypothesis
H1: The Alternative

Hypothesis
X >X
<
X <X
Critical Region
>
X >X
is the value of under the null hypothesis.
3.7 Regression Analysis and Analysis of Variance

In this section, we analyse regression analysis from the point of view of the
analysis of variance and try to develop a complementary way of looking at the
statistical inference problem. We have developed in the previous module,
y =
y +
u =
x +
That is, TSS= ESS+RSS. In other words, the total sum of squares composed of
explained sum of squares and the residual sum squares. A study of these
components of TSS is known as the analysis of variance (ANOVA) from the
regression view point. ANOVA is a statistical method developed by R A Fisher for the
analysis of experimental data.
Page 48
Associated with any sum of squares is its degree of freedom (df), that is, the
number of independent observations on which it s based. TSS has n-1 df because
we lose 1 df in computing the sample mean . RSS has n-2 df and ESS has 1 df
which follows from the fact that
= x is a function of only as
x is known . Both case is true only in two variable regression model. The following
table presents the various sum of squares and their associated df which is the
standard form of the AOV table, sometimes also called the ANOVA table.
Source of
variation
Sum of Squares
(SS)
Due t0 regression
(ESS)
y =
Due to residuals
(RSS)
TSS
Degree of
Freedom
1
n-2
n-1
Mean Sum of
Squares (MSS)
u
=
2
In table the MSS obtained by dividing SS by their df. From the table let us consider,
F=
u
2
(3.32)
If we assume that the disturbances u i are normally distributed and H 0:2=0, it can
be shown that the F of equation (3.32) follows the F distribution with 1 and n-2 df.
3.8 Application of Regression Analysis: The Problem of Prediction
On the basis of sample data, we obtained the following sample regression
Y = + X
Where Y is the estimator of true E(Y i). We want use it to predict or forecast Y
corresponding to some given level of X. There are two kinds of predictions, namely
1) Prediction of the conditional mean value of Y corresponding to chosen X, say

X0. That is, the point on the population regression line itself. This prediction
is known as mean prediction.
2) Prediction of an individual Y value corresponding to X 0, which is known as
individual prediction.
Page 49
Mean Prediction
Given Xi= X0, the mean prediction E(Y0/X0) is given by
E(Y0/ X0) = 1+2Xi
(3.33)
We estimate from equation (3.33)

(3.34)
Y = + X
Taking the expectation of equation (3.34) given X 0, we get

Y
E Y
(3.35)
= ( ) + ( )X
because and are unbiased estimators.
= + X
That is, E Y
= E(Y / X ) = + X
(3.36)
That is Y is an unbiased predictor of E(Y / X )

var Y
= var( + X )
var Y
= var
(3.37)
Now using the property that var (a+b)=var (a) +var (b)+ 2 cov (a,b), we obtain
+ var X
(3.38)
+ 2 cov( , )X
Using the formulas for variances and covariances of and we get,

var Y
1
X
+
n x
+X
+ 2X
x
Rearranging and manipulating the terms we obtain,

var Y
X
x
(3.39)
by replacing the unknown by its estimator , it follows that the variable,

t=
(3.40)
Follows t distribution with n-2 df. Therefore t distribution can be used to derive
confidence intervals for the true E(Y / X ) and test hypotheses about it in the usual
manner. That is,
P[ + X t se(Y ) < + X < + X + t se(Y )] = 1
Where se Y
(3.41)
is obtained from equation (3.39)
Page 50
Individual Prediction
We want to predict an individual Y corresponding to a given X value, say X . That is
we want to obtain,
Y0 = 1+2Xi+ u0
(3.42)
We predict this as Y = + X . The prediction error Y Y is

Y Y = + X +u + X
That is, Y Y = ( ) + X + u
(3.43)
Taking the expectations on both sides of equation (3.43), we have

E(Y Y ) = E( ) + E X E (u )
since and are unbiased, X is a
var Y
=E Y Y
var Y
= var
var Y
= 1+ +
E Y Y
=0
number and
= E ( ) + X + u
var Y = E ( ) + X
222u0
+u
(u ) = 0 by assumption.
(3.44)
+ 2( )X
(3.45)
+ X var ( )+ var (u )+2X cov( , ) + var(u )
+ 2( )u +
(3.46)
Using the variance and covariance formula for and and noting that var(u ) = ,
and slightly rearranging the equation (3.46), we have
(
(3.47)
further , it can be shown that Y follows the normal distribution. Substituting for
the unknown , it follows that,
t=
(3.48)
Also follows the t distribution. Therefore, t distribution can be used to draw

inferences about the true Y .
Page 51
MODULE IV
EXTENSION OF TWO VARIABLE REGRESSION MODEL
4.1
Introduction
Some aspects of linear regression analysis can be easily introduced within the
frame work of the two variable linear regression models that we have been
discussing so far. First we consider the case of regression through the origin, ie, a
situation where the intercept term, 1, is absent from the model. Then we consider
the question of the functional form of the linear regression model. Here we consider
the models that are linear in parameters but not in variables. Finally we consider
the question of unit of measurement, i.e, how the X and Y variables are measured
and whether a change in the units of measurement affects the regression results.
4.2
Regression through origin
There are occasions when two variables PRF assume the following form:
` Y i = 2 X i + u i
(4.1)
In this model the intercept term is absent or zero, hence regression through
origin. How do we estimate models like (4.1) and what special problems do they
pose? To answer these questions, let us first write SRF of (4.1) namely:
Y = X + u
(4.2)
u = (Y Xi)
(4.3)
Now applying the ordinary least square (OLS) method to (4.2), we obtain the
following formulae for the , and its variance. We want to minimize
With respect to .
Differentiating (4.5) with respect to , we obtain
= 2 Y Xi
(Xi)
(4.4)
Setting (4.4) equal to zero and simplifying, we get
(4.5)
Now substituting the PRF: Yi = 2Xi + ui in to this equation, we obtain
(4.6)
Page 52
= 2 +
Note: E ( ) = 2. Therefore,
E ( -2)2 = E
(4.7)
Expand the right hand side of (4.7) and noting that the Xi is nonstochastic and the
ui are homoscedastic and uncorrelated, we obtain:
Var ( ) = E ( -2)2 =
Where is estimated by
2
(4.8)
(4.9)
It is interesting to compare these formulas with those obtained when the

intercept term is included in the model.
Var ( ) =
2
(4.10)
(4.11)
(4.12)
The difference between two sets of formulae should be obvious. In the model
with intercept term is absent, we use raw sums of squares and cross product but in
the intercept present model, we use adjusted (from mean) sum of squares and cross
products. Second, the degrees of freedom for computing 2 is (n-1) in the first case
and (n-2) in the second case.
Although the zero intercept models may be appropriate on occasions, there
are some features of this model that need to be noted. First,
which is always
zero for the model with the intercept term need not be zero when that term is
absent. In short
need not be zero for the regression through the origin.
Suppose we want to impose conditions that
Y = X + u
This expression then gives
= X
= 0. In that case we have
(4.13)
Page 53
(4.14)
But this estimator is not the same as equation (4.5). And since of (4.5) is
unbiased, the of (4.14) is unbiased. Incidentally note from (4.4), we get after
equating it to zero.
=0
(4.15)
The upshot is that, in regression through origin, we cant have both u Xi

and
equal to zero. The only condition that is satisfied is that u Xi = 0. Recall
Yi = Y + u
(4.16)
Summing this equation on both sides and dividing by n, we get

=Y +u
Since for the zero intercept model,
(4.17)
and u need not be zero then it follows that
(4.18)
Y=Y
That is the mean of actual Y values need not be equal to the mean of the
estimated Y values; the two mean values are identical for the intercept present
model.
Second, r2, the coefficient of determination which is always non negative for
the conventional model, can on occasions turn out to be negative for the intercept
less model. Therefore, conventionally, computed r 2 may not be appropriate for
regression through origin model.
r2 =1
= 1
(4.19)
Note, for conventional or intercept present model,

RSS =
Unless
be negative.
is zero. That is for conventional model, RSS TSS or r 2 can never
For the zero intercept models, it can be shown analogously that,

RSS= u = Y X
(4.20)
Now there is no guarantee that this RSS will always be less than TSS which
suggests that RSS can be greater than TSS, implying that r 2 as conventionally
defined can be negative. The conventional r 2 is not appropriate for regression
Page 54
through origin model. But we can compute what is known as the raw r 2 for such
models which is defined as
Raw r2 =
(4.21)
Although the r2 satisfies the relation 0< r2< 1, it is not directly comparable to
the conventional r2 value.
Because of these special features of this model, one needs to exercise great
caution in using the zero intercept models. Unless there is strong apriori
expectation, one would be well advised to seek to the conventional intercept present
model.
4.3 Functional forms of regression models
So far we have considered models that are linear in parameters as well as in
the variables. Here we consider some commonly used regression models that may be
nonlinear in the variables but are linear in the parameters or that can be made so
by suitable transformation of the variables. In particular we discuss the following
regression models
1. Log linear model
2. Semi log models
3. Reciprocal models
4.4 How to measure elasticity: the log linear model
Consider the following model, known as exponential regression model:
Yi = 1 Xi2 eui
(4.23)
Which may be expressed alternatively as

ln Yi = ln 1 + 2 lnXi +ui
(4.24)
Where ln = natural log, ie, log to the base e, and where e= 2.718. If we write
equation (4.24) as;
ln Yi = + 2 ln Xi +ui
(4.25)
Where = ln 1, this model is linear in parameters and , linear in the

logarithms of the variables Y and X and can be estimated by OLS regression.
Because of this linearity, such models are called log-log, double log or log linear
models.
If assumptions of the classical linear regression models are fulfilled, the parameters
of equation (4.25) can be estimated by OLS method by letting
Yi* = + 2 Xi* +ui
(4.26)
Page 55
Where Yi* = ln Yi, Xi* = ln Xi. The OLS estimator and

unbiased estimator of and 2 respectively.
obtained will be best linear
One attractive feature of the log- log model, which has made it popular in
applied work, is that the slope co-efficient 2 measures the elasticity of Y with
respect to X, that is the percentage change in Y for given small percentage change in
X. Thus if Y represents the quantity of a commodity demanded and X its unit price,
2 measures the price elasticity of demand.
In the two variable models, the simplest way to decide whether the log linear
model fit the data is to plot the scatter diagram of ln Y i against Xi and see the
scatter plots lie approximately on a straight line.
4.5 Semi log models: Log Lin and Lin Log models:
4.5.1 How to measure the growth rate: the Log Lin model
Economists, business people and governments are often interested in finding
out the rate of growth of certain economic variables such as population GDP, money
supply, employment etc.
Suppose we want to find out the growth rate of personal consumption
expenditure on services. Let Y t denote real expenditure on services at time t and Y 0
the initial value of the expenditure on services. We may recall the following wellknown compound interest formula given as
Yt = Y0 ( 1+r)t
(4.27)
Where r is the compound that is overtime rate of growth of Y. taking the natural
logarithm of equation (4.27), we can write
ln Yi = ln Y0 + t ln(1+r)
(4.28)
Now letting
1 = ln Y0
(4.29)
2 = ln (1+r)
(4.30)
We can write equation (4.28) as

ln Yi = 1 + 2t
(4.31)
Adding the disturbance term to equation (4.31), we obtain

ln Yt = 1 + 2t +ut
(4.32)
This model is like any other regression model in that the parameters
1 and 2 are linear. The only difference is that the regressand is the logarithm of Y
and the regressor is time which will take values of 1, 2, 3 etc.
Page 56
Models like (4.31) are called semi log models because only one variable (in the
case of regressand) appears in the logarithmic form. For descriptive purposes a
model in which the regressand is logarithmic will be called a log lin model. A model
in which the regressand is linear but the regressor is logarithmic is called a lin-log
model.
Let us briefly examine the properties of the model. In this model the slope coefficient measures the constant proportional or relative change in Y for a given
absolute change in the value of the regressor (in the case of variable t) that is,
2 =
(4.33)
If we multiply the relative change in Y by 100, equation (4.33) will then give
the percentage change or the growth rate, in Y for an absolute change in x, the
regressor. That is, 100 times 2 give the growth rate in Y; 100 times 2 is known in
the literature as semi elasticity of Y with repeat of X.
The slope coefficient of the growth model, 2 gives the instantaneous (at a
point in time) rate of growth and not the compound (over a period of time) rate of
growth. But the latter can be easily found from (4.32) by taking the antilog the
estimated 2 and subtracting 1 from it and multiplying the difference by 100.
Linear trend model: instead of estimating model (4.32), researchers sometimes
estimate the following model:
Yt = 1 + 2t +ut
(4.34)
That is instead of regressing the log of Y on time, they regress Y on time,

where Y is the regressand under consideration. Such a model is called a linear trend
model and the time variable t is known as the trend variable. If the slope coefficient
is positive, there is an upward trend in Y, whereas if, it is negative there is a
downward trend in.
4.5.2. The Lin Log model:
Unlike the growth model just discussed, in which we were interested in
finding the per cent growth in Y for an absolute change in X, suppose we now want
to find the absolute change in Y for the present change in X. A model that can
accomplish this purpose can be written as
Yi = 1 + 2ln Xi + ui
(4.35)
For descriptive purposes we call such a model as a lin log model. Let us interpret
the slope of the coefficient. As usual
2 =
Page 57
=
The relative step follows from the fact that a change the log of a number is a relative
change. Symbolically we have,
2 =
So that Y = 2 (X/X)
(4.36)
This equation states that absolute change in Y (= Y) is equal to slope times

the relative change in X. If the latter is multiplied by 100, then (4.36) gives the
absolute change in Y for a percentage change in X. Thus if (X/X) changes by 0.01
unit (or 1%), the absolute change in Y is 0.01 ( 2): if in an application one finds that
2 = 500, the absolute change in Y is (0.01) (500) = 5.0. Therefor when regressions
like (1) is estimated by OLS, do not forget to multiply by the value of estimated slope
coefficient by 0.01.
4.6 Reciprocal models
Models of the following types are known as the reciprocal models.
Yi = 1 + 2( )+ui (4.37)
Although this model is non-linear in the variable X, because it enters

inversely or reciprocally, the model is linear in 1 and 2 and therefore a linear
regression model.
This model has three features: As X increases indefinitely; the term 2 ( )
approaches zero (note 2 is a constant) and Y approaches the limiting or asymptote

value 1. Therefore the models like (4.37) have built in term an asymptote or limit
value that the dependent variable will take when the value of the X variable
increases indefinitely.
We conclude our discussion of reciprocal models by considering the
logarithmic reciprocal model, which takes the following form;
Yi = 1 2( )+ui
(4.38)
Such a model may therefore be appropriate for shot run production functions.
4.7 Scaling and unit of measurement
Here we consider, how the Y and X variables are measured and whether a
change in the unit of measurement affects the regression results. Let,
Y = + X + u
(4.39)
Page 58
Define
Y i* = w 1 Y i
(4.40)
X i* = w 2 X i
(4.41)
Where w1 and w2 are constants, called the scale factors; w 1 may be equal to w2
or may be different. From (4.40) and (4.41) it is clear that Y i* and Xi* are rescaled Yi
and Xi. Thus if, Yi and Xi measured in billions of dollars and one want to express
them in millions of dollars, we will have Y i* = 1000Yi and Xi*=1000 Xi; here w1 = w2 =
1000.
Now consider the egression using Y i* and Xi* variables:
Yi= *+ *Xi + u *
(4.42)
Where Yi* = w1Yi, Xi* = w1 Xi, and u * = wi u (why?)
We want to find out the relationship between the following pairs:

1.
2.
3.
4.
5.
6.
* and *
* and *
Var( *) and Var( *)
Var( *) and Var( *)
2 and *2
r2xy and r2x*y*
From least square theory, we know that

=Y X
=
var
x y
x
var ( ) =
2
Applying OLS to (4.42), we obtain similarly,
Page 59
Var
Var
*2
From these results it is easy to establish relationships between the two sets of
parameter estimates. All that one has to do is recall these definitional relationships:
Y i* = w 1 Y i
(or yi* = w1yi); Xi* = w2Xi (xi* = w2xi); u * = wi u ; *=w1 and * = w2 .
Making use of these definitions, the reader can easily verify that
*2
Var
= w1
= w12
(4.44)
(4.45)
= w12 Var
Var * =
2xy
(4.43)
2x*y*
Var
(4.46)
(4.47)
(4.48)
From the preceding results, it should be clear that, given the regression
results based on one scale of measurement, one scale of measurement, one can
derive the results based on another scale of measurement once the scaling factors,
the ws are known. In practice, though, one should choose the units of
measurement sensibly; there is little point in carrying all these zeros in expressing
numbers in millions or billions of dollars.
From the results given in (4.43) though (4.48) one can easily derive some
special cases. For instance, if w1= w2, that is, the scaling factors are identical, the
slope coefficient and its standard error remain unaffected in going from the (X i Yi) to
the (Xi*, Yi*) scale, which should be intuitively clear. However the intercept and its
standard error are both multiplied by w 1. But if X scale is not changed, (i.e. w 2 = 1),
and Y scale is changed by the factor w 1, the slope as well as the intercept
coefficients and their respective standard errors are all multiplied by the same w1
factor. Finally if Y scale remains unchanged, (i.e. w 1=1), but the X scale is changed
by the factor w2, the slope coefficient and its standard error are multiplied by the
factor (1/w2) but the intercept coefficient and its standard error remain unaffected.
It should be noted that the transformation from (X i Yi) to the ( Xi*, Yi* ) scale does not
affect the properties of OLS estimators.
Page 60
4.7 Regression on standardised variable

A variable is said to be standardised if we subtract mean value of the variable from
its individual values and divide the difference by the standard deviation of that
variable. Thus in the regression Y and X, if we redefine these variables as
Y i* =
(4.49)
X i* =
(4.50)
Where = sample mean of Y,

= standard deviation of Y, = sample mean of X,
= standard deviation of X. the variables X i* and Yi* are called standardised variable.
An interesting property of a standardised variable is that its mean value is always
zero and its standard deviation is always one.
As a result it is not matter in what unit the regressand and regressor are
measured. Therefore instead of running the standard bivariate regression:
` Y i = 1 + 2 X i + u i
(4.51)
We would run regression on the standardised variable as

Yi* = *+ *Xi +
= *Xi + u *
(4.52)
(4.53)
Since it is easy to show that in the regression involving standardised regressand

and regressor, the intercept term is always zero. The regression coefficient of the
standardised variables denoted by * and *, are known as the beta coefficients.
Incidentally, notice that (4.53) is a regression through origin model.
How do we interpret beta coefficients? The interpretation is that if the standardised
regressor increases by one standard deviation, on average, the standardised
regressand increases by * standardised units. Thus unlike the traditional model
we measure the effect not in terms of the original units in which Y and X are
expressed, but in standard deviation units.
References
1) Gujarati D M, Sangeetha : Basic Econometrics. Fourth Edition. Tata
McGraw Hill Education Pvt Ltd, New Delhi.
2) Koutsoyiannis A : Theory of Econometrics. Second Edition McMillan Press
Ltd. London.
3) Shyamala S, Navadeep Kaur and Arul Pragasam : A text Book on
Econometrics- Theory and Applications. Vishal Publishing Co. Delhi.
4) Gregory C Chow: Econometrics. McGraw Hill Book Co. Singapore
5) Madnani G M K : Introduction to Econometrics- Principles and Applications.
Oxford and IBH Publishing Co. New Delhi.
Page 61

Basic Econometrics

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Basic Econometrics

Enviado por

Direitos autorais:

Formatos disponíveis

MATHEMATICAL ECONOMICS

Calicut University P.O. Malappuram, Kerala, India 673 635

School of Distance Education

Computer Section, SDE

School of Distance Education

TWO VARIABLE REGRESSION MODEL

THE CLASSICAL NORMAL LINEAR

EXTENSION OF TWO VARIABLE

School of Distance Education

School of Distance Education

School of Distance Education

economic theory. The econometrician often uses the mathematical equations

Broadly speaking, traditional or classical econometric methodology consists of

Statement of the theory or hypothesis

School of Distance Education

1) Statement of theory or hypothesis

Where Y = consumption expenditure, X= income and 1and 2, known as

School of Distance Education

To allow for the inexact relationship between economic variables, the

Where u, known as the disturbance or error term, is a random or stochastic

(4) Obtaining Data

School of Distance Education

(5) Estimation of the econometric model

School of Distance Education

1.3 The Nature of Regression Analysis

School of Distance Education

Where f (Xi) denotes some function of the explanatory variable X i. We assume

School of Distance Education

School of Distance Education

Where the deviation ui is an unobservable random variable taking positive or

Equation (1.6) posits that the consumption expenditure of a family is linearly

School of Distance Education

School of Distance Education

g) Wrong Functional Form

Where Y is read as Y-hat or Y-cap. In the above equation,

Where , u denotes the sample residual term. Conceptually, u is analogous to ui

School of Distance Education

PRF: E(Y/Xi) = 1+2Xi

And in terms of PRF, it can be expressed as

School of Distance Education

School of Distance Education

School of Distance Education

It is obvious from the equation (2.2) that

That is, the sum of squared residuals is some function of the

This can be written as

School of Distance Education

The partial derivatives with respect to will be,

2.1.2 Derivation of Least Squares Estimators

Applying the Crammers rule for solving we have

For solving , forming the special matrix , |A2|

Multiplying all the elements with n n we have

School of Distance Education

Where X and Y are the sample means of X and Y. By defining x i = (X X )

Dividing both sides by n, we have

School of Distance Education

Thus, the estimated supply function is

2.2 Numerical Properties of OLS Estimators

School of Distance Education

School of Distance Education

But since u = Y X , the preceding equation reduces to u =0.

As a result of the proceeding property, the sample regression,

Can be expressed in an alternative form where both Y and X are expressed as

Dividing equation (2.8) through by n, we obtain

4) The residuals are uncorrelated with the predicted Yi.