Você está na página 1de 5

ASSIGNMENT-5

CLRM (Assumptions with explanations)

ABDULSATTAR GORAYA
BBA-VII-C
SUBMITTED TO: DR.WAQAR AKRAM & MAAM SUMAN SHEIKH

The Gaussian, standard, or classical linear regression model (CLRM):


The classical linear regression model is a statistical model that describes a data generation
process. The specification of the classical linear regression model is defined by the following set
of assumptions.
Assumption 1: Linear regression model. The regression model is linear in the parameters.
Assumption 2: X values are xed in repeated sampling. Values taken by the regressor X are
considered xed in repeated samples. More technically, X is assumed to be non-stochastic.
Assumption 3: Zero mean value of disturbance Ui. Given the value of X, the mean, or expected,
value of the random disturbance term Ui is zero. Technically, the conditional mean value of Ui is
zero. Symbolically, we have E (Ui|Xi) = 0
Assumption 4: Homoscedasticity or equal variance of Ui. Given the value of X, the variance of
Ui is the same for all observations. That is, the conditional variances of Ui are identical.
Assumption 5: No autocorrelation between the disturbances. Given any two X values, Xi and Xj
(i=j), the correlation between any two Ui and Uj (i=j) is zero.
Assumption 6: Zero covariance between Ui and Xi, or E (UiXi) =0.
Assumption 7: The number of observations n must be greater than the number of parameters to
be estimated. Alternatively, the number of observations n must be greater than the number of
explanatory variables.
Assumption 8: Variability in X values. The X values in a given sample must not all be the same.
Technically, variable (X) must be a nite positive number.
Assumption 9: The regression model is correctly specied. Alternatively, there is no
specication bias or error in the model used in empirical analysis.
Assumption 10: There is no perfect multicollinearity. That is, there are no perfect linear
relationships among the explanatory variables.

Assumptions with Explanation:


Assumption 1: Linear regression model. The regression model is linear in the parameters.
Yi = 1 + 2Xi +Ui
Assumption 1 requires that the dependent variable y is a linear combination of the explanatory
variables X and the error terms (epsilon). Assumption 1 requires the specified model to be linear
in parameters, but it does not require the model to be linear in variables. Equation 1 and 2 depict
a model which is both, linear in parameter and variables. Note that Equation 1 and 2 show the
same model in different notation.

1)
(2)
Suppose we have two random variables X and Y. we assume that Y depends on X. i.e., when
variable X takes a specific value, we expect a response in the random variable Y. In other words,
the value taken by X influences the value of Y. So X is the independent variable and Y is the
dependent variable. If we plot the data, we find that Y depends on X. i.e., higher expenditures is
seen to be associated with higher incomes. However, since we do not know the precise form of
the dependence or relation, what we can do is, to assume the simplest possible relation a linear
relation. In other words, we fit a straight line which most closely represent the plot. The points
on the straight line gives us the expected values of Y for every possible value of X.
Assumption 2: X values are xed in repeated sampling. Values taken by the regressor X may be
considered xed in repeated samples (the case of xed regressor) or they may be sampled along
with the dependent variable Y (the case of stochastic regressor). In the latter case, it is assumed
that the X variable(s) and the error term are independent, that is, cov (Xi, Ui) =0.
Consider the various Y populations corresponding to the levels of income, Keeping the value of
income X xed, say, at level $80, we draw at random a family and observe its weekly family
consumption expenditure Y as, say, $60. Still keeping X at $80, we draw at random another
family and observe its Y value as $75. In each of these drawings (i.e., repeated sampling), the
value of X is xed at $80. We can repeat this process for all the X values. As a matter of fact, the
sample data were drawn in this fashion. What all this means is that our regression analysis is
conditional regression analysis, that is, conditional on the given values of the regressor(s) X.
For example: A farmer may divide his land into several parcels and apply different amounts of
fertilizer to these parcels to see its effect on crop yield. Likewise, a department store may decide
to offer different rates of discount on a product to see its effect on consumers. Sometimes we
may want to x the X values for a specic purpose.
Assumption 3: Zero mean value of disturbance Ui. Given the value of X, the mean, or expected,
value of the random disturbance term Ui is zero. Technically, the conditional mean value of Ui is
zero.
Symbolically, we have E (Ui|Xi) = 0

Assumption 3 simply says that the factors not explicitly included in the model, and therefore
subsumed in Ui, do not systematically affect the mean value of Y; in other words, the positive Ui
values cancel out the negative Ui values so that their average or mean effect on Y is zero.
It is important to point out that Assumption 3 implies that there is no specication bias or
specication error in the model used in empirical analysis. In other words, the regression model
is correctly specied. Leaving out important explanatory variables, including unnecessary
variables, or choosing the wrong functional form of the relationship between the Y and X
variables are some examples of specication error.
Assumption 4: Homoscedasticity or equal variance of Ui. Given the value of X, the variance of
Ui is the same for all observations. That is, the conditional variances of Ui are identical.
Var (Ui|Xi) =E [ui E(ui|Xi)]2
=E (Ui2|Xi) because of Assumption 3
= 2
Above equation states that the variance of Ui for each Xi (i.e., the conditional variance of Ui) is
some positive constant number equal to 2. Technically, equation represents the assumption of
homoscedasticity, or equal (homo) spread (scedasticity) or equal variance. The word comes from
the Greek verb skedanime, which means to disperse or scatter. Stated differently, equation means
that the Y populations corresponding to various X values have the same variance. Put simply, the
variation around the regression line (which is the line of average relationship between Y and X)
is the same across the X values; it neither increases nor decreases as X varies.
Assumption 5: No autocorrelation between the disturbances. Given any two X values, Xi and Xj
(i = j), the correlation between any two Ui and Uj (i = j) is zero. In short, the observations are
sampled independently.
Symbolically, cov (ui, uj|Xi, Xj) =0
Cov (ui, uj) =0
If X is non-stochastic where i and j are two different observations and where cov means
covariance. This is the assumption of no serial correlation, or no auto-correlation. This means
that, given Xi, the deviations of any two Y values from their mean value do not exhibit
patterns .The us are positively correlated, a positive u followed by a positive U or a negative U
followed by a negative U. the us are negatively correlated, a positive u followed by a negative u
and vice versa, this assumption depends on the type of data used in the analysis. If the data are

cross-sectional and are obtained as a random sample from the relevant population, this
assumption can often be justied. However, if the data are time series, the assumption of
independence is difcult to maintain, for successive observations of a time series, such as GDP,
are highly correlated.
Assumption 6: Zero covariance between Ui and Xi, or E (Ui Xi) =0. The Number of
Observations n must be Greater than the Number of Parameters to Be Estimated: Alternatively,
the number of observations must be greater than the number of explanatory variables.
This assumption is not as innocuous as it seems. Imagine that we had only the rst pair of
observations on Y and X (4 and 1). From this single observation there is no way to estimate the
two unknowns, 1 and 2. We need at least two pairs of observations to estimate the two
unknowns.
Assumption 7: The number of observations n must be greater than the number of parameters to
be estimated. Alternatively, the number of observations n must be greater than the number of
explanatory variables. The X values in a given sample must not all be the same. Technically, var
(X) must be a positive number. Furthermore, there can be no outliers in the values of the X
variable, that is, values that are very large in relation to the rest of the observations.
Intuitively, we readily see why this assumption is important. Looking at our family consumption
expenditure example in Chapter 2, if there is very little variation in family income, we will not
be able to explain much of the variation in the consumption expenditure. The reader should keep
in mind that variation in both Y and X is essential to use regression analysis as a research tool. In
short, the variables must vary.
Assumption 8: Variability in X values. The X values in a given sample must not all be the same.
Technically, variable (X) must be a nite positive number.
This assumption too is not as innocuous as it looks, If all the X values are identical, then Xi = X
and the denominator of that equation will be zero, making it impossible to estimate 2 and
therefore 1. Intuitively, we readily see why this assumption is important. Our family
consumption expenditure, if there is very little variation in family income, we will not be able to
explain much of the variation in the consumption expenditure.
Assumption 9: The regression model is correctly specied. Alternatively, there is no
specication bias or error in the model used in empirical analysis.
This assumption can be explained informally as follows. An econometric investigation begins
with the specication of the econometric model underlying the phenomenon of interest.
Assumption 10: There is no perfect multicollinearity. That is, there are no perfect linear
relationships among the explanatory variables.

Você também pode gostar