Lect 6 A 2007

Economics 2110 Fall Semester 2007
Marcelo J. Moreira Harvard University
Lect. 6: Joint Distributions, Conditional Distributions,

Independence of Random Variables
Definition 1 A ndimensional random variable is a function from the sam-

ple space to Rn .
Definition 2 The cumulative distribution function for an bivariate random

vector (X, Y ) is
FX,Y (x, y) = Pr( |X() x, Y () y).
Definition 3 Let (X, Y ) be a discrete bivariate vector. Then the function

fXY (x, y) from R2 to R defined by
fXY (x, y) = Pr( |X() = x, Y () = y),
is the joint probability mass function of (X, Y ).
Example A: Toss a coin three times. The sample space is
= {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.
An example of a bivariate random variable is (X, Y ), where X() is the

number of heads in the first two tosses, and Y () is the number of heads in
the last two tosses. Assuming the coin is a fair one, and the probability of
heads is 1/2 at every toss, with different tosses independent events, the joint
pmf is

1/8 if (x, y) = (0, 0) ( {T T T }),

1/8 if (x, y) = (0, 1) ( {T T H}),

1/8 if (x, y) = (1, 0) ( {HT T }),

2/8 if (x, y) = (1, 1) ( {HT H, T HT }),
fXY (x, y) =

1/8 if (x, y) = (1, 2) ( {T HH}),

1/8 if (x, y) = (2, 1) ( {HHT }),

1/8 if (x, y) = (2, 2) ( {HHH}),

0 otherwise.
1
The fact that we define two random variables does not change the definition of
the probability mass function for any of the variables. Ignoring the definition
of Y (), the pmf for x is

1/4 x=0 ( {T T T, T T H}),

2/4 x=1 ( {HT H, HT T, T HH, T HT }),
fX (x) =

1/4 x=2 ( {HHT, HHH}),

0 elsewhere.
Similarly we can calculate the pmf for Y . In the context of joint random
variables we typically refer to these single variable probability mass functions
as marginal probability mass functions. In general they can be calculated
from the joint pmf as
fX (x) = P r( |X() = x)
= P r( |X() = x, < Y () < )
X
= fXY (x, y),
y
and
X
fY (y) = fXY (x, y).
x
Given the joint pmf we can calculate probabilities involving both random
variables. For example,
X
P r(X 1, Y 1) = P r( |X() 1, Y () 1) = fXY (x, y)
x1,y1
= f (0, 1) + f (1, 1) + f (1, 2) = 1/2.

Alternatively we can have joint continuous random variables.
Definition 4 Let (X, Y ) be a continuous bivariate vector. Then the function
fXY (x, y) from R2 to R is the joint probability density function of (X, Y ) if
it satisfies
Z
fXY (x, y)dxdy = P r( |(X(), Y ()) A),
A
for all sets A.
2
Example B: Pick a point randomly in the triangle (0, 0), (0, 1), (1, 0).
Let X be the xcoordinate and y be the ycoordinate. A reasonable choice
for the joint pfd appears to be
fXY (x, y) = c,
for {(x, y)|0 < x < 1, 0 < y < 1, x + y < 1} and zero otherwise (that is,
constant on the domain). What is the appropriate value for c? Note that we
know
1 = P r( ) = P r((X, Y ) {(x, y)|0 < x < 1, 0 < y < 1, x + y < 1})
Z Z 1 Z 1y
= fXY (x, y)dxdy = cdxdy
{(x,y)|0<x<1,0<y<1,x+y<1} 0 0
Z 1
= c (1 y)dy = c/2.
0
Hence c = 2.
Note here that in general we can write such integrals in different ways:
Z Z 1 Z 1y Z 1 Z 1x
g(x, y)dxdy = g(x, y)dxdy = g(x, y)dydx.
{(x,y)|0<x<1,0<y<1,x+y<1} 0 0 0 0
The marginal pdfs are in this case defined as

Z Z 1x
fX (x) = fXY (x, y)dy = 2dy = 2 2x,
0
for 0 < x < 1 and zero otherwise. Similarly

fY (y) = 2 2y, 0 < y < 1, and zero otherwise.
Conditional Probability
Recall how the conditional probability of events was defined:
P r(E1 E2 )
P r(E1 |E2 ) = ,
P r(E2 )
provided P r(E2 ) > 0. We can do the same thing for bivariate discrete random
variables:
3
Definition 5 Given two discrete random variables X and Y with joint prob-
ability mass function fXY (x, y), the conditional probability mass function for
X given Y = y is
fXY (x, y)
fX|Y (x|y) = ,
fY (y)
provided fY (y) > 0.
Example A (contd): Consider the distribution of X given that Y = 1.

In that case we are conditioning on {HT H, HHT, T HT, T T H}

1/4 x=0 ( {T T H}),

2/4 x=1 ( {HT H, T HT }),
fX|Y (x|Y = 1) =

1/4 x=2 ( {HHT }),

0 elsewhere.
Note that this is the same as the marginal pmf of X. This is not true if we
condition on Y = 2. In that case we condition on {HHH, T HH}:

1/2 x=1 ( {T HH}),
fX|Y (x|Y = 2) = 1/2 x=2 ( {HHH}),

0 elsewhere.
For continuous bivariate random variables things are a little more com-
plicated because the probability that they take on a particular value is zero,
and conditional probabilities are not defined when the probability of the
conditioning event is zero. Nevertheless, the definition of conditional proba-
bility density functions is very similar to that of conditional probability mass
functions:
Definition 6 Given two continuous random variables X and Y with joint

probability density function fXY (x, y), the conditional probability density func-
tion for X given Y = y is
fXY (x, y)
fX|Y (x|y) = .
fY (y)
4
To see why this works we first look at the conditional cumulative distri-
bution function where we condition on Y (y, y + ).
P r(X x, y < Y < y + )
P r(X x|Y (y, y + )) =
P r(y < Y < y + )
R x R y+
y
fXY (u, v)dvdu
R R y+ .
y
fXY (u, v)dvdu
Now take the limit as goes to zero. In that case both numerator and
denominator go to zero, so to get the limit we have to use lHospitals rule
and differentiate both numerator and denominator with respect to . In
that case we get:
Rx Rx
f XY (u, y)du fXY (u, y)du
FX|Y (x|y) = R
= .
f (u, y)du
XY
fY (y)
Take the derivative with respect to x to get the conditional probability den-
sity function:
fXY (x, y)
fX|Y (x|y) = .
fY (y)
This is not, however, the conditional density of X given the event Y = y. Its
interpretation is that of the limit of the conditional densities, conditioning
on y < Y < y + , and taking the limit as goes to zero.
Definition 7 Let (X, Y ) be a bivariate random vector with pmf/pdf fXY (x, y).
Then the random variables X and Y are independent if
fXY (x, y) = fX (x) fY (y),
where fX (x) and fY (y) are the marginal pdf/pmf s of X and Y .
Result 1 The two random variables X and Y are independent if and only
if there exist functions g and h such that the joint pdf/pmf can be written as
fXY (x, y) = g(x) h(y),
for all < x < , and < y < .
5
Result 2 Let X and Y be two independent random variables. Then

(i) fX|Y (x|y) = fX (x), and
(ii) fY |X (y|x) = fY (y).
Example B (contd): In the previous example we had
fXY (x, y) = 2, 0 < x < 1, 0 < y < 1, x+y < 1, and zero elsewhere.
It may appear that X and Y are independent by the previous results: choose
g(x) = 2 and h(y) = 1. The reason this does not work is because of the
restrictions on the values of X and Y . To see this write
fXY (x, y) = 2 1{0 < x < 1} 1{0 < y < 1} 1{x + y < 1},
for all < x < and < y < . The indicator function 1{A}
is equal to one if the argument is true and zero if it is false. The indicator
function 1{x+y < 1} cannot be seperated into a function of x and a function
of y and therefore X and Y are not independent.
To see how tricky conditional probability density functions can be with
continuous random variables, consider the BorelKolmogorov paradox. The
joint density of two random variables X and Y is
fY X (y, x) = exp(x y),
for x, y > 0. The marginal density function of Y is
fX (x) = exp(x).
The conditional density of X given Y = 1 is
fX|Y (x|y = 1) = exp(x),
because X and Y are obviously independent.

Now consider the transformation from (X, Y ) to (X, Z) where Z is (Y
1)/X. The inverse of the transformation is X = g1 (X, Z) = X and Y =
g2 (X, Z) = XZ + 1 with Jacobian
g1
(x, z) g2
(x, z) 1 z
g
1 (x, z) g2 (x, z) = 0 x = |x|.
x x
z z
6
the joint density of Z and X is
fZX (z, x) = x exp(x xz 1),
for z > 1/x and x > 0.

Let us calculate the marginal density of z. For z > 0, the marginal density
is
Z Z
1 1 1
fZ (z) = x exp(x(z +1))e dx = e x(z +1) exp(x(z +1))
0 z+1 0
1
= .
e(z + 1)2
For z < 0, the marginal density is
Z 1/z Z 1/z
1 1 1
fZ (z) = x exp(x(z+1))e dx = e x(z+1) exp(x(z+1))
0 z+1 0
1/z Z 1/z
1
=
x exp(x(z + 1)) + exp(x(z + 1))dx
e(z + 1) 0 0
1/z
1 1 1
= exp((z + 1)/z) exp(x(z + 1))
e(z + 1) z z+1 0

1 1
= 2
exp((z + 1)/z) + 1 .
e(z + 1) z
Take the limit as z 0 to get in both cases fZ (z) = e1 .
The conditional density of X given Z = 0 is then

fXZ (x, z)
fX|Z (x|z = 0) = = x exp(x),
fZ (z) z=0
for x > 0. The paradox is Z = 0 implies and is implied by Y = 1, yet the

conditional density of X given Z = 0 differs from that of X given Y = 1.
The difference is in the way the limit is taken. In one case we take the
limit for 1 < Y < 1 + . In the other case we take the limit < Z <
which is the same as the limit 1 x < Y < 1 + x which is clearly different
7
from the first limit. For fixed the latter conditioning set obviously includes
more large values of X.
Expectation
Expectations are calculated in pretty much the same way as before, but
now involving double (or, in general, multiple) sums or integrals:
Z Z
E[r(X, Y )] = r(x, y)fXY (x, y)dydx,
x y
for continuous bivariate random variables, and

XX
E[r(X, Y )] = r(x, y)fXY (x, y),
x y
for discrete bivariate random variables.

Example B (expectation): Consider the expected value of XY :
Z 1 Z 1x Z 1 1x Z 1

2
E(XY ) = 2xydydx = xy dx = x(1 x)2 dx
0 0 0 0 0
Z 1
1
3 2 1 4 2 3 1 2 1
= (x 2x + x)dx = x x + x = .
0 4 3 2 0 12
Consider two random variables X and Y with means X and Y and
variances 2X and 2Y respectively. The covariance of X and Y is defined as
C(X, Y ) = E[(X X )(Y Y )],
and the correlation coefficient as
XY = C(X, Y )/( X Y ).
The mean and variance of the sum X + Y are
E[X + Y ] = X + Y ,
and
V (XY ) = 2X + 2Y + 2CXY .
If the two random variables are independent, the variance simplifies to the
sum of the variances 2X + 2Y .

Lect 6 A 2007

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lect 6 A 2007

Enviado por

Direitos autorais:

Formatos disponíveis

Economics 2110 Fall Semester 2007

Marcelo J. Moreira Harvard University

Lect. 6: Joint Distributions, Conditional Distributions,

Definition 1 A ndimensional random variable is a function from the sam-

Definition 2 The cumulative distribution function for an bivariate random

FX,Y (x, y) = Pr( |X() x, Y () y).

Definition 3 Let (X, Y ) be a discrete bivariate vector. Then the function

fXY (x, y) = Pr( |X() = x, Y () = y),

is the joint probability mass function of (X, Y ).

Example A: Toss a coin three times. The sample space is

= {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.

An example of a bivariate random variable is (X, Y ), where X() is the

= f (0, 1) + f (1, 1) + f (1, 2) = 1/2.

for all sets A.

The marginal pdfs are in this case defined as

for 0 < x < 1 and zero otherwise. Similarly

provided fY (y) > 0.

Example A (contd): Consider the distribution of X given that Y = 1.

Definition 6 Given two continuous random variables X and Y with joint

Result 2 Let X and Y be two independent random variables. Then

Example B (contd): In the previous example we had

fY X (y, x) = exp(x y),

for x, y > 0. The marginal density function of Y is

The conditional density of X given Y = 1 is

fX|Y (x|y = 1) = exp(x),

because X and Y are obviously independent.

the joint density of Z and X is

fZX (z, x) = x exp(x xz 1),

for z > 1/x and x > 0.

for x > 0. The paradox is Z = 0 implies and is implied by Y = 1, yet the

for continuous bivariate random variables, and

for discrete bivariate random variables.

Você também pode gostar