Two variable regression ppt

© All Rights Reserved

11 visualizações

Two variable regression ppt

© All Rights Reserved

- Probability of Detection AHi
- CCP403
- PLS-Proc
- Coefficient of Determination
- tmp6AD4.tmp
- 16_siddareddy.Bathini_13
- hbpl19_02
- Gravity Model Final
- 1 agdd
- 56888-204036-1-PB
- bus 173
- nfsc440-and evidence analysis worksheet
- STA2020F+Test+2+2009
- The Major Role Accountants Play
- Lecture 01
- Gibrat's Law
- Regression_notes.pdf
- rough draft feasibility
- DecisionTree.doc
- openSAP_ds1_Week_3_Transcript.pdf

Você está na página 1de 32

TWO-VARIABLE REGRESSION

ANALYSIS:

SOME BASIC IDEAS

By

Domodar N. Gujarati

Regression analysis is concerned with the study of the

dependence of one variable, the dependent variable, on one or

more other variables, the explanatory variables.

Regression analysis, as known as the bivariate, or two

variable.

Dependent variable (the regressand)

Explanatory variable (the regressor)

3

Estimating and/or predicting the (population) mean

value of the dependent variable on the basis of the

known or fixed values of the explanatory variable(s).

Total population of 60 families and their weekly

income (X) and weekly consumption

expenditure (Y). The 60 families are divided into

10 income groups.

Consumption Expenditure is increasing as

income increases, but not as much as the

increase in income.

The marginal propensity to consume (MPC) for a

unit change in income is greater than zero but less

than unit

Y =

1

+

2

X

0 <

2

< 1

1

and

2

are parameters whereas

1

is intercept and

2

is slope

The dark circled points show the

conditional mean values of Y against

the various X values.

By joining them we obtain what is

known as the population regression line

(PRL), or more generally, the

population regression curve

(regression of Y on X).

As weekly income level of $80, the mean

consumption expenditure is $65,where

total is 325 while corresponding to the

income level of $200, it is expenditure is

$137 where total is 685.

Total 10 mean values for the 10

subpopulations of Y which is conditional

expected values, as they depend on the

values of the (conditioning) variable X.

Symbolically

E(Y | X)

The expected value of Y given the value of X.

This figure shows that for each X (i.e., income

level) there is a population of Y values (weekly

consumption expenditures) that are spread

around the (conditional) mean of those Y values.

Y values are distributed symmetrically around

their respective (conditional) mean values. And

the regression line (or curve) passes through

these (conditional) mean values

10

Each conditional mean E(Y | X

i

) is a function of X

i

.

Symbolically

E(Y | X

i

) = f (X

i

)

which is known as the conditional expectation

function (CEF) or population regression function

(PRF) or population regression (PR) for short.

PRF is an empirical question, we may assume that

the PRF E(Y | X

i

) is a linear function of X

i

E(Y | X

i

) =

1

+

2

X

i

Linearity in the Variables

Linearity is that the conditional expectation of Y is a

linear function of X

i

, the regression curve in this case is a

straight line. But it is not a linear function

E(Y | X

i

) =

1

+

2

X

2

i

Linearity in the Parameters

Linearity is that the conditional expectation of Y, E(Y |

X

i

), is a linear function of the parameters, the s; it may

or may not be linear in the variable X.

E(Y | X

i

) =

1

+

2

X

2

i

It is a linear (in the parameter) regression model.

Now consider the model:

E(Y | X

i

) =

1

+

2

2

X

i

.

The preceding model is an example of a nonlinear (in

the parameter) regression model.

Linear Regression will always mean a regression

that is linear in the parameters; the s (that is, the

parameters are raised to the first power only).

The deviation of an individual Y

i

around its expected

value as follows:

u

i

= Y

i

E(Y | X

i

)

or

Y

i

= E(Y | X

i

) + u

i

Technically, u

i

is known as the stochastic disturbance

or stochastic error term.

How do we interpret ?

The expenditure of an individual family, given its

income level, can be expressed as the sum of two

components:

E(Y | X

i

), the mean consumption of all families

with the same level of income. This component is

known as the systematic, or deterministic,

component,

u

i

, which is the random, or nonsystematic,

component.

The Stochastic disturbance term is a proxy for all the

omitted or neglected variables that may affect Y but are

not included in the regression model.

If E(Y | X

i

) is assumed to be linear in X

i

, may be written as:

Y

i

= E(Y | X

i

) + u

i

=

1

+

2

X

i

+ u

i

This Equation posits that the consumption expenditure of

a family is linearly related to its income plus the

disturbance term. Thus, the individual consumption

expenditures, given X = $80 can be expressed as:

Y1 = 55 =

1

+

2

(80) + u

1

Y2 = 60 =

1

+

2

(80) + u

2

Y3 = 65 =

1

+

2

(80) + u

3

Y4 = 70 =

1

+

2

(80) + u

4

Y5 = 75 =

1

+

2

(80) + u

5

Now if we take the expected value of (2.4.1) on both sides,

we obtain

E(Y

i

| X

i

) = E[E(Y | X

i

)] + E(u

i

| X

i

)

= E(Y | X

i

) + E(u

i

| X

i

)

Where expected value of a constant is that constant itself.

Since E(Y

i

| X

i

) is the same thing as E(Y | X

i

), implies that

E(u

i

| X

i

) = 0

Thus, the assumption that the regression line passes

through the conditional means of Y implies that the

conditional mean values of u

i

(conditional upon the given

Xs) are zero.

It is clear that

E(Y | X

i

) =

1

+

2

X

i

and

Y

i

=

1

+

2

X

i

+ u

i

Better are equivalent forms if E(u

i

| X

i

) = 0.

But the stochastic specification (2.4.2)

has the advantage that it clearly shows

that there are other variables besides

income that affect consumption

expenditure and that an individual

familys consumption expenditure

cannot be fully explained only by the

variable(s) included in the regression

model.

Why dont we introduce them into the model explicitly? The

reasons are many:

1. Vagueness of theory: The theory, if any, determining the

behavior of Y may be, and often is, incomplete. We might be

ignorant or unsure about the other variables affecting Y.

2. Unavailability of data: Lack of quantitative information

about these variables, e.g., information on family wealth

generally is not available.

3. Core variables versus peripheral variables: Assume that

besides income X

1

, the number of children per family X

2

, sex X

3

,

religion X

4

, education X

5

, and geographical region X

6

also affect

consumption expenditure. But the joint influence of all or

some of these variables may be so small and it does not pay

to introduce them into the model explicitly. One hopes that

their combined effect can be treated as a random variable ui.

4. Intrinsic randomness in human behavior: Even if we

succeed in introducing all the relevant variables into the

model, there is bound to be some intrinsic randomness in

individual Ys that cannot be explained no matter how hard

we try. The disturbances, the us, may very well reflect this

intrinsic randomness.

5. Poor proxy variables: For example, Friedman regards

permanent consumption (Y

p

) as a function of permanent

income (X

p

). But since data on these variables are not directly

observable, in practice we use proxy variables, such as

current consumption (Y) and current income (X), there is the

problem of errors of measurement, u may in this case then

also represent the errors of measurement.

6. Principle of parsimony: we would like to keep our

regression model as simple as possible. If we can explain the

behavior of Y substantially with two or three explanatory

variables and if our theory is not strong enough to suggest

what other variables might be included, why introduce more

variables? Let u

i

represent all other variables.

7. Wrong functional form: Often we do not know the form

of the functional relationship between the regressand

(dependent) and the regressors. Is consumption

expenditure a linear (in variable) function of income or a

nonlinear (invariable) function? If it is the former,

Y

i

=

1

+ B

2

X

i

+ u

i

is the proper functional relationship

between Y and X, but if it is the latter,

Y

i

=

1

+

2

X

i

+

3

X

2

i

+ u

i

may be the correct functional form.

In two-variable models the functional form of the

relationship can often be judged from the scattergram.

But in a multiple regression model, it is not easy to

determine the appropriate functional form, for graphically

we cannot visualize scattergrams in multipledimensions.

The data of Table 2.1 represent the population, not a sample. In

most practical situations what we have is a sample of Y values

corresponding to some fixed Xs.

Pretend that the population of Table 2.1 was not known to us and

the only information we had was a randomly selected sample of Y

values for the fixed Xs as given in Table 2.4. each Y (given X

i

) in

Table 2.4 is chosen randomly from similar Ys corresponding to the

same X

i

from the population of Table 2.1.

Can we estimate the PRF from the sample data? We may not be

able to estimate the PRF accurately because of sampling

fluctuations. To see this, suppose we draw another random

sample from the population of Table 2.1, as presented in Table 2.5.

Plotting the data of Tables 2.4 and 2.5, we obtain the scatter gram

given in Figure 2.4. In the scatter gram two sample regression

lines are drawn so as

Which of the two regression lines represents the true

population regression line? There is no way we can be

absolutely sure that either of the regression lines

shown in Figure 2.4 represents the true population

regression line (or curve). Supposedly they represent

the population regression line, but because of

sampling fluctuations they are at best an

approximation of the true PR. In general, we would

get N different SRFs for N different samples, and these

SRFs are not likely to be the same.

We can develop the concept of the sample

regression function (SRF) to represent the sample

regression line. The sample counterpart may be

written as

Y

i

=

1

+

2

X

i

where Y is read as Y-hat or Y-cap

Y

i

= estimator of E(Y | X

i

)

1

= estimator of

1

2

= estimator of

2

An estimator, also known as a (sample) statistic, is

simply a rule or formula or method that tells how to

estimate the population parameter from the

information provided by the sample at hand.

Now just as we expressed the PRF in two equivalent

forms, (2.2.2) and (2.4.2), we can express the SRF (2.6.1) in

its stochastic form as follows:

Y

i

=

1

+

2

X

i

+u

i

u

i

denotes the (sample) residual term. Conceptually u

i

is

analogous to u

i

and can be regarded as an estimate of u

i

. It is

introduced in the SRF for the same reasons as u

i

was

introduced in the PRF.

To sum up, then, we find our primary objective in

regression analysis is to estimate the PRF

Y

i

=

1

+

2

X

i

+ u

i

on the basis of the SRF

Y

i

=

1

+

2

X

i

+u

i

Because more often than not our analysis is based upon a

single sample from some population. But because of

sampling fluctuations our estimate of

the PRF based on the SRF is at best an approximate

one. This approximation is shown diagrammatically

in Figure 2.5. For X = X

i

, we have one (sample)

observation Y = Y

i

. In terms of the SRF, the observed

Y

i

can be expressed as:

Y

i

= Y

i

+u

i

(2.6.3)

and in terms of the PRF, it can be expressed as

Y

i

= E(Y | X

i

) + u

i

(2.6.4)

Now obviously in Figure 2.5 Y

i

overestimates the

true E(Y | X

i

) for the X

i

shown therein. By the same

token, for any X

i

to the left of the point A, the SRF

will underestimate the true PRF.

The Critical Question

Granted that the SRF is but an approximation

of the PRF, can we devise a rule or a method

that will make this approximation as close

as possible? In other words, how should the

SRF be constructed so that

1

is as close as

possible to the true

1

and

2

is as close as

possible to the true

2

even though we will

never know the true

1

and

2

?

32

- Probability of Detection AHiEnviado porPDDELUCA
- CCP403Enviado porapi-3849444
- PLS-ProcEnviado porKalyan Raghav Bollampally
- Coefficient of DeterminationEnviado porJazib Ali Javed
- tmp6AD4.tmpEnviado porFrontiers
- 16_siddareddy.Bathini_13Enviado poriiste
- hbpl19_02Enviado porSandra Radic
- Gravity Model FinalEnviado porAkshay Aggarwal
- 1 agddEnviado porKruh Kwabena Isaac
- 56888-204036-1-PBEnviado porahgilan
- bus 173Enviado porsalim_sajid
- nfsc440-and evidence analysis worksheetEnviado porapi-242439244
- STA2020F+Test+2+2009Enviado porAnn Kim
- The Major Role Accountants PlayEnviado porZlatan Mesanovic
- Lecture 01Enviado pornarendra
- Gibrat's LawEnviado porSandiep Singh
- Regression_notes.pdfEnviado porRavi Rathod
- rough draft feasibilityEnviado porapi-341020751
- DecisionTree.docEnviado porPham Tin
- openSAP_ds1_Week_3_Transcript.pdfEnviado porqwerty_qwerty_2009
- DETERMINANTS OF MARKET PRICE OF GOATS IN CASE OF ASSAYITA MARKET, AFAR REGION, ETHIOPIAEnviado porAnonymous 6uJ2smoJXL
- exam1Enviado porAmalAbdlFattah
- 3277941Enviado porawidyas
- POS 260 Proposal Paper (Edited)Enviado porEugene Albert Olarte Javillonar
- My ResumeEnviado porAnonymous aN2vf9Zn0q
- Empirical ReviewEnviado porchioma
- Stock SelectionEnviado porMonchai Phaichitchan
- Albanese 2008 A metric method for sex determination using the proximal femur and fragmentary hipbone.pdfEnviado porManu Juanma
- MATH F432Enviado porChetan Chauhan
- Certificación OHSAS VS AccidentesEnviado porjimmy ortiz

- McDonaldsEnviado porWajiha Asad Kiyani
- Lecture 2A HTMLEnviado porWajiha Asad Kiyani
- Project MgmtEnviado porWajiha Asad Kiyani
- Manager's Guide 4th Ed 09.pptxEnviado porFaiza Shah
- Water Management System in PakistanEnviado porWajiha Asad Kiyani
- Summary of Leadership TheoriesEnviado porWajiha Asad Kiyani
- Role of World BankEnviado porWajiha Asad Kiyani
- Bank Alfalah ProjectEnviado porWajiha Asad Kiyani
- Manual of FDI PakEnviado porrazzaqambreen2806
- Fact Sheet.docxEnviado porWajiha Asad Kiyani
- finalllllllEnviado porZohaib_Ahmad_3063
- Development EconomicsEnviado porWajiha Asad Kiyani
- EFinal QuestionnaireEnviado porWajiha Asad Kiyani
- Basel Accords AssignmentEnviado porWajiha Asad Kiyani
- The Determinants of Foreign Direct Investment in PakistanEnviado porusmanfsd99
- Impact of Incentives on ProductivityEnviado porWajiha Asad Kiyani
- Engro Corporation Limited Project FinalEnviado porWajiha Asad Kiyani
- Car Selection AhpEnviado porWajiha Asad Kiyani
- project.docxEnviado porWajiha Asad Kiyani
- Waseem Variability of BetaEnviado porWajiha Asad Kiyani
- Impact of Employee Training and Development on ProductivityEnviado porWajiha Asad Kiyani
- ConflictEnviado porWajiha Asad Kiyani
- Sanam Nasim Interpetation of BetaEnviado porWajiha Asad Kiyani
- Case Study Notes_CSIEnviado porRinder Sidhu
- Business PlanEnviado porWajiha Asad Kiyani
- MBA711 - Chapter 10 - Answers to All QuestionsEnviado porWajiha Asad Kiyani
- Econometrics Chap 2Enviado porWajiha Asad Kiyani

- Introduction to ML TheoryEnviado porTanat Tonguthaisri
- psy 233 course outline (u of s)Enviado porapi-290174387
- One sample test of hypotensisEnviado porDanang Dwi Cahyadi
- excelEnviado porAnanna Ananna
- RStata.pdfEnviado porpabloti
- MIL-STD-105E_text.pdfEnviado porAntonios Papadopoulos
- 0471925179_engEnviado porMentari Salma Nurbaiti
- Explain why business organizations use statistical analysis of market dataEnviado porThung Wong
- statistics syllabusEnviado porapi-240639978
- Reliabilitas WinniEnviado porzahratharfi
- Https Tutorials Iq Harvard Edu R Rstatistics Rstatistics HTMLEnviado porcsscs
- Minitab BookEnviado porangelokyo
- Panel Data, Stratified Samples, And More Efficient EstimatorsEnviado porKevin Walker
- (Boos, Hughes-Oliver) Applications of Basu's TheoremEnviado porJohann Sebastian Claveria
- Quartiles for Box PlotsEnviado porcmollinedoa
- Analisis Data Kuantitatif - PengenalanEnviado porIli Nurízzati Abdul Rahman
- 15 Building Regression Models Part2Enviado porRama Dulce
- Excel Pronto PizzaEnviado porPatty
- Confidence IntervalsEnviado porzuber2111
- Measures of PositionsadasEnviado porBen Bash Sarapuddin
- Dt 33726730Enviado porAnonymous 7VPPkWS8O
- Bootstrap for Panel Data (Ppt), HounkannounonEnviado poranon_183065072
- metaEnviado pormphil.ramesh
- Definition Mathsppt-140225094645-Phpapp01Enviado porpradeep
- Lab Report Zoology 2Enviado porRebecca Uy
- On-tests-and-indices-for-evaluating-structural-models_2007_Personality-and-Individual-Differences.pdfEnviado porLuis Anunciacao
- Fin06answers JwEnviado porsimao_sabrosa7794
- A new look at the statistical model identification.pdfEnviado porAnonymous kiXSYsJ
- Tutorial Questions 1Enviado porayariseifallah
- Simplis ExamplesEnviado portm515

## Muito mais do que documentos

Descubra tudo o que o Scribd tem a oferecer, incluindo livros e audiolivros de grandes editoras.

Cancele quando quiser.