Escolar Documentos
Profissional Documentos
Cultura Documentos
aatay Candan
On Gaussian Distribution
Gaussian distribution is defined as follows:
f x ( x) =
2
x
( x )2
2
2
x
The function f x ( x) is clearly positive valued. Before calling this function as a probability
~
density function, we should check whether the area under the curve is equal to 1 or not.
R1.1: Area under Gaussian Distribution [Signal Analysis, Papoulis]
Let
I = e
( x )2
2 x2
x2
2 x2
dx ; then
x1 2 x2 2
2 x
2 x
2
I = e
dx1 e
dx2
x12
2 x2
r2
2 x2
x2 2
2 x2
dx1dx2
rdrd
= 2 x2
This shows that f x ( x) is a valid probability density function.
~
symmetry then x = x .
We can show this result as follows: By a simple change of variables, we get
x = xf x ( x)dx = ( x + x) f x ( x + x)dx = ( x x) f x ( x x)dx . But since we have
~
Prepared by:
aatay Candan
f x ( x + x) = f x ( x x) ,
~
x = ( x + x) f x ( x + x)dx = ( x x) f x ( x + x)dx ,
then
2 x = ( x + x) f x ( x + x)dx + ( x x) f x ( x + x)dx = 2 x .
~
Variance: This is more tricky, [Probability, Papoulis]. The area under the Gaussian
distribution is equal to 1, that is
2 x2
( x )2
2 x2
d
d
1
2
2
x
2
x
(x )
x2
(x ) f
e
x
~
( x )2
2 x2
dx = 0
( x )2
2 x2
dx = 0
(x ) f
x
~
( x)dx = 0
relation
with
respect
to
for
a
second
time,
we
get
2
(x )
2 f x ( x)dx + (1) f x ( x)dx = 0 , showing that the variance of Gaussian
distribution is .
2
x
1
2 x2
( x )2
2 x2
dx = 1 is valid for .
A second note is on the interchange of derivative an integral operators. This step looks
innocent but it is indeed treacherous. We have to verify the interchange in general. If you
have one sided or double sided integrals, (One sided integral has upper or lower limit as
, double sided integrals have upper/lower limits as and .); the integrals may or
may not converge. The interchange of integrals or derivatives operators is the interchange
of two limiting operations. (Remember derivative is also defined at a limit of h 0 or
1 / h ) . One has to be careful about this interchange. The integrals studied in the
probability course in general behave nicely (positive and unit area functions). But we
Prepared by:
aatay Candan
y b
1
f x(
) . When the definition for
|a| ~ a
~
Gaussian density is substituted for f x ( x) we get
functions of random variables:
f y ( y) =
~
1
1
f y ( y) =
e
| a | 2 x2
~
y b
)2
a
2 x2
2a 2 x2
( y b a ) 2
2 a 2 x2
From the last equation, we note that y is also Gaussian distributed with mean
~
(a x + b) and variance a .
2
2
x
We reach an important conclusion that by scaling and biasing a Gaussian r.v., we get
another Gaussian r.v. with different mean and variance.
The mean and variance of the new Gaussian random variable can be easily calculated
using expectation operator, y = E{ y} = E{ax + b} = a x + b (and a similar relation for
variance). And the distribution of y can be written by using the calculated mean and
~
(Gaussian distributed or normal distributed with zero mean and unit variance) , then
w = 10 z 10 is N(-10,100) since the mean of 10 z 10 is -10 and its variance is
~
sz
2
correction
z2
perfect
square
and
a
term
as
follows:
2
2
2
2
z
z 2 sz
( z s)
s
( sz ) = (
)=
+ . This process is known as completing the
2
2
2
2
square and frequently used. When this relation is substituted to (s ) , we get
Prepared by:
aatay Candan
s2
2
z2
2
( z s )2
2
dz = e
e e dz = e 2 e
2
generating function is valid for all s values.)
( s) =
sz
s2
2
1
2
z2
2
s2
2
dz = e .
(The moment
s ( 2 z + )
~
}=e
s +
s 2 2
2
x ( s) = e
s +
s 2 2
2
zk
, we can write x (s) as follows:
k!
which is e z =
k =0
( s + s 2 2 / 2) 2 ( s + s 2 2 / 2)3
x ( s) = 1 + (s + s / 2) +
+
+ ...
2
6
The coefficient of sk in x (s ) is equal to E{xk}/k!. When x : N (0, 2 ) , we have
2
(2n)!
E{x k } = m k = 2 n n!
2n
= 1.3.5....(2n 1) 2 n
k = 2n
k = 2n + 1
y : N ( y ,
2
y
practice.
This
result
can
be
easily
s ( x+ y )
shown
using
moment
generating
functions.
} = E{e }E{e }
sz
sy
generating
function
of
Gaussian
distribution
and
get
Prepared by:
aatay Candan
z (s) = e
s x +
s y +
s 2 y2
2
=e
s ( x + y )+
s 2 ( x2 + 2y )
2
variance
2
x
+ .
2
y
In the next section we define joint Gaussianity and show that arbitrary linear combination
of Gaussian random variables result in Gaussian distribution.
1
2 x y
(x )2
( x x ) ( y y ) ( y y ) 2
1
exp
2
+
2(1 2 ) 2
x y
y2
1 2
x
x2
The function fxy(x,y) is clearly positive valued and of finite area (since e
upper
-|x|
bounded by e for large x), therefore with proper scaling it can be utilized as density
function. Note that fxy(x,y) is a function of two independent variables and has five
parameters { x , y , x2 , y2 , } . We will attach the labels of mean, variance and
correlation coefficient to these variables after justifying why we do so.
An alternative (but equivalent) definition for joint Gaussianity can be given as follows: x
~
and y are called jointly Gaussian if z = x x + y y is Gaussian distributed for any value
~
previously examined. (lso called univariate Gaussian) We take granted the equivalence of
two definitions and proceed without a proof for now. (The function g ( x , y ) and
related descriptions given in R2.4 (given below) can be formulated as proof.)
Prepared by:
aatay Candan
(x ) ( y ) x
x
y
x y
1
f xy ( x, y ) =
exp
2
2 x y 1 2
x y ( x x )
y2 ( y y )
Note that the argument of the exponential mimics the one dimension Gaussian
distribution. If you try not to see the terms related to y variable then you have
[( x )
x
exp
x2
]
1
( x x )
( x x )2
)
= exp(
2
x y
y2
x y
1
=
y2
x2 y2 (1 2 ) x y
1 / x2
x y
1
=
x2 (1 2 ) / x y
/ x y
1 / y2
[x
1
f xy ( x, y ) =
exp
2 x y 1 2
x y x
y2 y
x2
y ]
x y
If new random variables x 2 and y 2 are defined from x and y where x and y are
~
jointly Gaussian r.v.s with zero mean (the joint pdf is given above) such that
x2 = x + x
y2 = y + y
The distribution of new random variables is
f x2 y2 ( x, y ) = f xy ( x x , y y )
(by
fundamental theorem) and we see that the distribution of x 2 and y 2 has the form of
~
general joint Gaussian density. Since mean values only shift the center of symmetry to a
point on (x,y) plane, sometimes we prefer to work with zero mean Gaussian distribution
Prepared by:
aatay Candan
and say that this result is also valid for non-zero mean Gaussian distributions. This is due
to the mentioned shift in the center of symmetry. We have some examples of this
situation in these notes. s
noted in R2.2, the value of the mean is not important in this argument. We prefer to take
the mean values as zero to simply the presentation.) Then f xy ( x, y ) = f x ( x) f y ( y ) and
when the definition for the univariate Gaussian distribution is inserted, we get the
definition of joint Gaussian distribution for zero mean variables. Since the joint density of
two independent Gaussian random variables can be written in the form required by the
the joint distribution of Gaussian densities, two independent Gaussian random variables
are indeed jointly Gaussian. (This is not very surprising, the labels of the distributions
give away the final conclusion.)
A more surprising result is the next one. If x and y are known to be Gaussian distributed
~
(but not independent), the joint distribution of x and y is not necessarily is Gaussian.
~
That is if marginal densities of x and y are Gaussian, the joint density of x and y is not
~
necessarily Gaussian.
A simple example is as follows. Let x : N(0,1) and y = c x . Here c can take the value of
~
~ ~
1 and 1 with equal probability. It is not very difficult to show that (for EE230 students)
that y is also N(0,1). Clearly y and x are related to each other through a random
~
mapping c . Therefore x and y are not independent. Furthermore the condition density
~
Therefore the distribution for y given x which is 0.5 ( y 1) + 0.5 ( y + 1) gives away
~
that
x and y are not jointly Gaussian. For a constructive example see [Probability,
~
Papoulis].
R2.4: Moment Generating Function of joint Gaussian r.v.s x and y (zero mean)
[s x s y ] xy
sx x + s y y
s y
Independent Case: ( s x , s y ) = E{e
} = E{e
} is equal to E{e s x x }E{e y } by
independence. Then
Prepared by:
aatay Candan
(sx , s y ) = e
s 2 2
sx x + x x
2
sy y+
s y 2 y 2
2
s x+sy y
=ex
s 2x x 2 + s 2y y 2
Dependent Case: Let x' and y ' be independent zero mean Gaussian random variables.
~
Lets define the random variables x and y from x' and y ' as follows:
~
x = a x'
~
(1)
y = cx'+ d y '
~
g ( x , y ) = E{exp( x x + y y )}
= E{exp( x ax'+ y (cx'+ dy ' ))}
= E{exp(( x a + y c) x '+ y dy ' )}
= E{exp(( x a + y c) x ' )}E{exp( y dy ' )}
= x ' ( x a + y c ) y ' ( y d )
Here x , y are some scalars. When the moment generating functions for x' and y ' is
~
g ( x , y ) = x ' ( x a + y c ) y ' ( y d )
=e
=e
1
( x a + y c )2
2
1
( y d )2
2
1 2 2
( x a + 2 x y ac + 2y ( c 2 + d 2 ))
2
sx x+ s y y
expectation result is finite. For univariate Gaussian distributions such as x' , x ' ( s x ' )
~
function is defined for all complex s x ' values. The same is also valid for y ' variable. We
~
can then say that g ( x , y ) function is defined for arbitrary complex valued pairs of
( x , y ) .
Since the parameters of g ( x , y ) function can be arbitrary, we choose to relabel them
as x = s x and y = s y to get:
Prepared by:
aatay Candan
g ( x , y ) xy ==ssxy = E{exp( s x x + s y y )} = ( s x , s y )
and we get the following for the moment generating function of x and y .
~
(sx , s y ) = e
1 2 2
( s x a + 2 s x s y ac + s 2y ( c 2 + d 2 ))
2
(2)
Next we use the fundamental theorem to express the distribution of x and y . We can
~
also invert ( s x , s y ) given in (2) to get the distribution of x and y , but we prefer using
~
0 x
x a 0 x' x' 1 / a
y = c d y ' y ' = c / ad 1 / d y
Then
2
1 x 2
11
1
x 1
c
c
f xy ( x, y ) =
f x'y' ( , y
x) = A exp exp y
x
2a
2d
|J|
a d
ad
ad
(3)
Here J is the Jacobian (which is J=ad) and A is a scalar whose exact value is not of
interest for now.
Finally we fix (a,c,d) given in (1) to some special values.
a =x
c = y
d = y (1 2 )
Then (2) and (3) become
( sx , s y ) = e
1 2 2
( s x x + 2 s x s y x y + s 2y 2y )
2
x2
1
xy
y2
f xy ( x, y ) = A exp
2
+
x y
y2
2(1 2 ) x2
(2)
(3)
Prepared by:
aatay Candan
From (2) and (3), we get the moment generating function of jointly distributed Gaussian
zero mean random variables x and y .
~
R2.5: Moment Generating Function of joint Gaussian r.v.s x and y (non-zero mean)
Let x' and y ' be jointly distributed zero mean Gaussian random variables. The moment
~
generating function of x' and y ' is given in R2.4 equation (2). Define new random
~
variables as
x = x'+ x
y = y '+ y
s x+ s y y
( s x , s y ) = E{e x
s ( x '+ x )+ s y ( y '+ y )
} = E{e x
s x+sy y
}= e x
which is
(sx , s y ) = e
sx x +s y y
1 2 2
( s x x + 2 s x s y x y + s 2y y2 )
2
Note that when = 0 the moment generating function given in the last equation reduces
to the one found for the independent variables in R2.4.
definition is not effected by the mean values of x and y . We also know that
~
E{xy} =
Gaussian random variables (with zero mean) and calculate its partial derivative as
follows:
Prepared by:
aatay Candan
2
2 2( sx2 x2 +2sxsy x y +sy2 y2 )
(sx , s y ) =
e
xy
xy
1
( sx2 x2 +2 sx s y x y +s y2 y2 )
2
2
= (sx x + s y x y )e
y
1
= x y e
1 2 2
( sx x +2 sx s y x y +s2y y2 )
2
+ (s y + sx x y )e
2
y
1 2 2
( sx x +2 sx s y x y +s2y y2 )
2
When zero is substituted for s x and s y , we get E{xy} = x y and from here we see
that =
E{xy}
x y
. Note that right hand side of the last relation is the definition of
correlation coefficient for zero mean random variables. Therefore appearing in the
definition of the joint density is the actual value of correlation coefficient between x and
~
R2.7: If the correlation coefficient of jointly Gaussian distributed r.v.s is zero, then
random variables are independent.
In R2.5 we have noted that Note that when = 0 the moment generating function given
in the last equation reduces to the one found for the independent variables in R2.4., we
have also shown that the correlation coefficient between x and y in R2.5. This
~
function can be expressed from the moment generating function of the joint distribution
( s x , s y ) = E{e
sx x+ s y y
}= e
sx x + s y y
1 2 2
( sx x + 2 sx s y x y + s 2y y2 )
2
E{e } = e e
sx x
sx x
1 2 2
s x x
2
by
taking
sy = 0 .
function of univariate Gaussian distribution with mean x and variance x2 . Hence when
we marginalize a joint Gaussian density, then we have an univariate Gaussian density.
Prepared by:
aatay Candan
i.
f xy ( x, y ) =
1
2 x y
( x )2
( x x ) ( y y ) ( y y ) 2
1
exp
2
+
2
2(1 2 ) 2
1 2
x
x y
y
f xy ( x, y )
f xy ( x, y )
f x ( x)
2 x y
x2
1
exp
2(1 2 ) x2
1 2
x2
1
exp 2
2 x
2 x
x2
xy
y2
1
+
2
2(1 2 ) x2
x y y2
y 2
+ 2
y
xy
x y
x2
1
xy
y2
2
+
2(1 2 ) x2
x y y2
1
=
2
2
2(1 ) y
y2 2
y xy
+ y2
2 x 2
x
x
xy
y 2 2 y + x y
=
2(1 2 ) y2
x
x
y
y
x x
=
2(1 2 ) y2
x x
y
x
x
2
y2 2
+ 2 x
y y x + x
=
2(1 2 ) y2
x 2 x2
Note that in the second line we have added and subtracted the same term shown with
curly brackets, i.e. {...}2 . This process is called completion to a square. Now
f y| x ( y | x) can be easily written as follows:
2
y2 2
+ 2 x
Prepared by:
aatay Candan
f y| x ( y | x) =
1
2 y
exp
y
2(1 2 ) y2
x
1 2
ii.
and x are assumed to be zero mean without any loss of generality. The goal is to
~
express f y| x ( y | x) =
f xy ( x, y )
f x ( x)
given the observation vector x :
f y |x ( y | x ) =
f xy ( x , y )
f x (x)
(
(
N +1
Cz
Cx
exp z T C z1 z
2
exp x T C x1 x
2
C x rxy x
y ] T
2
rxy y y
In the equation above, rxy = E{xy} is the cross-correlation of observations and y . The
1 T 1
z C z z = [x
2
matrix inverse in the equation above can be taken using the partitioned matrix
inversion lemma.
There are various type of matrix inversion lemmas, they are abundant in books but for
the sake of simplicity we present a screenshot of a webpage illustrating the form of
lemma that we would like to implement.
Prepared by:
aatay Candan
Substituting A with C x and b with rxy , we can express C z 1 as follows using the lemma:
1 1 1 T 1
1
C
r
C x + k C x rxy rxy C x
x
x
y
C z 1 = T
=
1
rxy y
rxTy C x1
k
C x1rxy
k
exp z T C z 1 z
exp z T C z 1 z
f xy ( x , y )
Cx
2
=K
2
=
f y |x ( y | x ) =
1
f x (x)
1
2 C z 2 exp 1 x T C 1 x
exp x T C x1 x
x
2
In the last equation, K denotes a constant scalar. The ratio of exp zT C z 1 z and
2
1
x
1
exp x T y C z1
exp zT C z1 z
1 T
1 C x1 0 x
y
2
=
2
Cz
=
exp
x
y
2
1 T 1
1 T
0
0
C
0 1 x
exp x C x x exp x
y x
Cx
2
2
y
0 0
Prepared by:
aatay Candan
With the substitution of C z 1 , we get the following for the matrix in the quadratic product:
C x 1 0 1 C x1rxy rxTy C x1
C
=
T
1
0 0 k rxy C x
with k = y2 rxTy C x1 rxy . Then the density becomes:
C x1rxy
1
z
exp zT C z1 z
1
T
1
1
1 T
1 C x rxy rxy C x C x rxy x
2
f y |x ( y | x ) = K
= K exp
x
y
1 y
2k
k rxTy C x1
1 T 1
exp x C x x
2
f y|x ( y | x ) = K exp
x C x rxy rxy C x x 2 yrxTy C x1x + y 2
2k
This not-so-friendly relation is our old friend in wolfs clothing. To recognize our friend,
we just need to define y = rxTy C x1 x , and rewrite the same expression as follows:
1 2
1
2
f y|x ( y | x ) = K exp
y 2 yy + y 2 = K exp ( y y )
2k
2k
T
1
2
T
1
Hence the conditional density is N rxy C x x, y rxy C x rxy . (This is one of the most
important results for the estimation theory. This shows that the linear minimum mean
square error estimator (a topic of EE503) is the conditional mean which is the optimal
estimator for the general minimum mean square error estimation having no linearity
constraints.)