Você está na página 1de 22

Lecture 6: Entropy Rate

Entropy rate H(X)


Random walk on graph

Dr. Yao Xie, ECE587, Information Theory, Duke University

Coin tossing versus poker


Toss a fair coin and see and sequence
Head, Tail, Tail, Head

(x1, x2, . . . , xn) 2nH(X)


Play card games with friend and see a sequence


Kr

Qq

10

(x1, x2, . . . , xn) ?

Dr. Yao Xie, ECE587, Information Theory, Duke University

How to model dependence: Markov chain


A stochastic process X1, X2,
State {X1, . . . , Xn}, each state Xi X
Next step only depends on the previous state

p(xn+1|xn, . . . , x1) = p(xn+1|xn).


Transition probability

pi, j : the transition probability of i j

p(xn+1) = xn p(xn)p(xn+1|xn)
p(x1, x2, , xn) = p(x1)p(x2|x1) p(xn|xn1)
Dr. Yao Xie, ECE587, Information Theory, Duke University

Hidden Markov model (HMM)


Used extensively in speech recognition, handwriting recognition,
machine learning.

Markov process X1, X2, . . . , Xn, unobservable


Observe a random process Y1, Y2, . . . , Yn, such that
Yi p(yi|xi)
We can build a probability model
p(xn, yn) = p(x1)

n1

i=1

Dr. Yao Xie, ECE587, Information Theory, Duke University

p(xi+1|xi)

p(yi|xi)

i=1
3

Time invariance Markov chain


A Markov chain is time invariant if the conditional probability p(xn|xn1)
does not depend on n
p(Xn+1 = b|Xn = a) = p(X2 = b|X1 = a), for all a, b X
For this kind of Markov chain, define transition matrix

P11

P =

Pn1

Dr. Yao Xie, ECE587, Information Theory, Duke University

P1n

Pnn

Simple weather model


X = { Sunny: S, Rainy R }
p(S|S) = 1 , p(R|R) = 1 , p(R|S) = , p(S|R) =
[
]
1

P=

1
#"

$%#"
"

$%!"

!"

Dr. Yao Xie, ECE587, Information Theory, Duke University

Probability of seeing a sequence SSRR:


p(SSRR) = p(S)p(S|S)p(R|S)p(R|R) = p(S)(1 )(1 )

What will this sequence behave, after many days of observations?

What sequences of observations are more typical?


What is the probability of seeing a typical sequence?

Dr. Yao Xie, ECE587, Information Theory, Duke University

Stationary distribution
Stationary distribution: a distribution on the states such that the
distribution at time n + 1 is the same as the distribution at time n.
Our weather example:
If (S) =

+ ,

(R) =

[
]
1

P=

1
Then

p(Xn+1 = S) = p(S|S)(S) + p(S|R)(R)


= (1 )
Dr. Yao Xie, ECE587, Information Theory, Duke University

+
=
= (S ).
+
+ +
7

How to calculate stationary distribution


Stationary distribution i, i = 1, , |X| satisfies

i =

j p ji, ( = P), and

|X|

i = 1.

i=1

Detailed balancing:

#!"

!"
$%!"#%"

$&!"#&"

Dr. Yao Xie, ECE587, Information Theory, Duke University

Stationary process
A stochastic process is stationary if the joint distribution of any subset is
invariant to time-shift

p(X1 = x1, , Xn = xn) = p(X2 = x1, , Xn+1 = xn).


Example: coin tossing
p(X1 = head, X2 = tail) = p(X2 = head, X3 = tail) = p(1 p).

Dr. Yao Xie, ECE587, Information Theory, Duke University

Entropy rate
When Xi are i.i.d., entropy H(X ) = H(X1, , Xn) =
n

n
i=1

H(Xi) = nH(X)

With dependent sequence Xi, how does H(X n) grow with n? Still linear?
Entropy rate characterizes the growth rate
Definition 1: average entropy per symbol
H(X n)
H(X) = lim
n
n
Definition 2: rate of information innovation
H (X) = lim H(Xn|Xn1, , X1)
n

Dr. Yao Xie, ECE587, Information Theory, Duke University

10

H (X) exists, for Xi stationary


H(Xn|X1, , Xn1) H(Xn|X2, , Xn1)
H(Xn1|X1, , Xn2)

(1)
(2)

H(Xn|X1, , Xn1) decreases as n increases


H(X) 0
The limit must exist

Dr. Yao Xie, ECE587, Information Theory, Duke University

11

H(X) = H (X), for Xi stationary


1
1
H(X1, , Xn) =
H(Xi|Xi1, , X1)
n
n i=1
n

Each H(Xn|X1, , Xn1) H (X)


Cesaro mean:
If an a, bn =

So

1
n

i=1 ai ,

bi a, then bn a.
1
H(X1, , Xn) H (X)
n

Dr. Yao Xie, ECE587, Information Theory, Duke University

12

AEP for stationary process


1
log p(X1, , Xn) H(X)
n

p(X1, , Xn) 2nH(X)


Typical sequences in typical set of size 2nH(X)
We can use nH(X) bits to represent typical sequence

Dr. Yao Xie, ECE587, Information Theory, Duke University

13

Entropy rate for Markov chain


For Markov chain
H(X) = lim H(Xn|Xn1, , X1) = lim H(Xn|Xn1) = H(X2|X1)
By definition

p(X2 = j|X1 = i) = Pi j

Entropy rate of Markov chain


H(X) =

i Pi j log Pi j

ij

Dr. Yao Xie, ECE587, Information Theory, Duke University

14

Calculate entropy rate is fairly easy


1. Find stationary distribution i
2. Use transition probability Pi j

H(X) =

i Pi j log Pi j

ij

Dr. Yao Xie, ECE587, Information Theory, Duke University

15

Entropy rate of weather model


Stationary distribution (S) =

+ ,

(R) =

[
]
1

P=

H(X) =
[ log + (1 ) log(1 )]
+

=
H() +
H()
+
+
(
)
( )

H 2
H
+
Maximum when = = 1/2: degenerate to independent process
Dr. Yao Xie, ECE587, Information Theory, Duke University

16

Random walk on graph


An undirected graph with m nodes {1, . . . , m}
Edge i j has weight Wi j 0 (Wi j = W ji)
A particle walks randomly from node to node
Random walk X1, X2, : a sequence of vertices
Given Xn = i, next step chosen from neighboring nodes with probability
Wi j
Pi j =
k Wik
Dr. Yao Xie, ECE587, Information Theory, Duke University

17

Dr. Yao Xie, ECE587, Information Theory, Duke University

18

Entropy rate of random walk on graph


Let
Wi =

Wi j,

W=

Wi j

i, j:i> j

Stationary distribution is
Wi
i =
2W
Can verify this is a stationary distribution: P =
Stationary distribution weight of edges emanating from node i
(locality)
Dr. Yao Xie, ECE587, Information Theory, Duke University

19

Dr. Yao Xie, ECE587, Information Theory, Duke University

20

Summary
AEP Stationary process X1, X2, , Xi X: as n
p(x1, , xn) 2nH(X)
Entropy rate
H(X) = lim H(Xn|Xn1, . . . , X1) =
n

Random walk on graph


i =

Dr. Yao Xie, ECE587, Information Theory, Duke University

1
lim H(X1, . . . , Xn)
n
n

Wi
2W
21

Você também pode gostar