Lecture 6

Lecture 6: Entropy Rate
Entropy rate H(X)

Random walk on graph
Dr. Yao Xie, ECE587, Information Theory, Duke University
Coin tossing versus poker

Toss a fair coin and see and sequence
Head, Tail, Tail, Head
(x1, x2, . . . , xn) 2nH(X)

Play card games with friend and see a sequence

Kr
Qq
10
(x1, x2, . . . , xn) ?
How to model dependence: Markov chain

A stochastic process X1, X2,
State {X1, . . . , Xn}, each state Xi X
Next step only depends on the previous state
p(xn+1|xn, . . . , x1) = p(xn+1|xn).

Transition probability
pi, j : the transition probability of i j
p(xn+1) = xn p(xn)p(xn+1|xn)
p(x1, x2, , xn) = p(x1)p(x2|x1) p(xn|xn1)
Hidden Markov model (HMM)

Used extensively in speech recognition, handwriting recognition,
machine learning.
Markov process X1, X2, . . . , Xn, unobservable

Observe a random process Y1, Y2, . . . , Yn, such that
Yi p(yi|xi)
We can build a probability model
p(xn, yn) = p(x1)
n1
i=1
p(xi+1|xi)
p(yi|xi)
i=1
3
Time invariance Markov chain

A Markov chain is time invariant if the conditional probability p(xn|xn1)
does not depend on n
p(Xn+1 = b|Xn = a) = p(X2 = b|X1 = a), for all a, b X
For this kind of Markov chain, define transition matrix
P11
P =
Pn1
P1n
Pnn
Simple weather model

X = { Sunny: S, Rainy R }
p(S|S) = 1 , p(R|R) = 1 , p(R|S) = , p(S|R) =
[
]
1
P=
1
#"
$%#"
"
$%!"
!"
Probability of seeing a sequence SSRR:

p(SSRR) = p(S)p(S|S)p(R|S)p(R|R) = p(S)(1 )(1 )
What will this sequence behave, after many days of observations?
What sequences of observations are more typical?

What is the probability of seeing a typical sequence?
Stationary distribution
Stationary distribution: a distribution on the states such that the
distribution at time n + 1 is the same as the distribution at time n.
Our weather example:
If (S) =
+ ,
(R) =
[
]
1
P=
1
Then
p(Xn+1 = S) = p(S|S)(S) + p(S|R)(R)

= (1 )
+
=
= (S ).
+
+ +
7
How to calculate stationary distribution

Stationary distribution i, i = 1, , |X| satisfies
i =
j p ji, ( = P), and
|X|
i = 1.
i=1
Detailed balancing:
#!"
!"
$%!"#%"
$&!"#&"
Stationary process
A stochastic process is stationary if the joint distribution of any subset is
invariant to time-shift
p(X1 = x1, , Xn = xn) = p(X2 = x1, , Xn+1 = xn).

Example: coin tossing
p(X1 = head, X2 = tail) = p(X2 = head, X3 = tail) = p(1 p).
Entropy rate
When Xi are i.i.d., entropy H(X ) = H(X1, , Xn) =
n
n
i=1
H(Xi) = nH(X)
With dependent sequence Xi, how does H(X n) grow with n? Still linear?
Entropy rate characterizes the growth rate
Definition 1: average entropy per symbol
H(X n)
H(X) = lim
n
n
Definition 2: rate of information innovation
H (X) = lim H(Xn|Xn1, , X1)
n
10
H (X) exists, for Xi stationary

H(Xn|X1, , Xn1) H(Xn|X2, , Xn1)
H(Xn1|X1, , Xn2)
(1)
(2)
H(Xn|X1, , Xn1) decreases as n increases

H(X) 0
The limit must exist
11
H(X) = H (X), for Xi stationary

1
1
H(X1, , Xn) =
H(Xi|Xi1, , X1)
n
n i=1
n
Each H(Xn|X1, , Xn1) H (X)

Cesaro mean:
If an a, bn =
So
1
n
i=1 ai ,
bi a, then bn a.
1
H(X1, , Xn) H (X)
n
12
AEP for stationary process

1
log p(X1, , Xn) H(X)
n
p(X1, , Xn) 2nH(X)

Typical sequences in typical set of size 2nH(X)
We can use nH(X) bits to represent typical sequence
13
Entropy rate for Markov chain

For Markov chain
H(X) = lim H(Xn|Xn1, , X1) = lim H(Xn|Xn1) = H(X2|X1)
By definition
p(X2 = j|X1 = i) = Pi j
Entropy rate of Markov chain

H(X) =
i Pi j log Pi j
ij
14
Calculate entropy rate is fairly easy

1. Find stationary distribution i
2. Use transition probability Pi j
H(X) =
i Pi j log Pi j
ij
15
Entropy rate of weather model

Stationary distribution (S) =
+ ,
(R) =
[
]
1
P=
H(X) =
[ log + (1 ) log(1 )]
+
=
H() +
H()
+
+
(
)
( )
H 2
H
+
Maximum when = = 1/2: degenerate to independent process
16

An undirected graph with m nodes {1, . . . , m}
Edge i j has weight Wi j 0 (Wi j = W ji)
A particle walks randomly from node to node
Random walk X1, X2, : a sequence of vertices
Given Xn = i, next step chosen from neighboring nodes with probability
Wi j
Pi j =
k Wik
17
18
Entropy rate of random walk on graph

Let
Wi =
Wi j,
W=
Wi j
i, j:i> j
Stationary distribution is
Wi
i =
2W
Can verify this is a stationary distribution: P =
Stationary distribution weight of edges emanating from node i
(locality)
19
20
Summary
AEP Stationary process X1, X2, , Xi X: as n
p(x1, , xn) 2nH(X)
Entropy rate
H(X) = lim H(Xn|Xn1, . . . , X1) =
n

i =
1
lim H(X1, . . . , Xn)
n
n
Wi
2W
21

Lecture 6

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lecture 6

Enviado por

Direitos autorais:

Formatos disponíveis

Lecture 6: Entropy Rate

Entropy rate H(X)

Dr. Yao Xie, ECE587, Information Theory, Duke University

Coin tossing versus poker

(x1, x2, . . . , xn) 2nH(X)

(x1, x2, . . . , xn) ?

Dr. Yao Xie, ECE587, Information Theory, Duke University

How to model dependence: Markov chain

p(xn+1|xn, . . . , x1) = p(xn+1|xn).

pi, j : the transition probability of i j

Hidden Markov model (HMM)

Markov process X1, X2, . . . , Xn, unobservable

Dr. Yao Xie, ECE587, Information Theory, Duke University

Time invariance Markov chain

Dr. Yao Xie, ECE587, Information Theory, Duke University

Simple weather model

Dr. Yao Xie, ECE587, Information Theory, Duke University

Probability of seeing a sequence SSRR:

What will this sequence behave, after many days of observations?

What sequences of observations are more typical?

Dr. Yao Xie, ECE587, Information Theory, Duke University

p(Xn+1 = S) = p(S|S)(S) + p(S|R)(R)

How to calculate stationary distribution

j p ji, ( = P), and

Dr. Yao Xie, ECE587, Information Theory, Duke University

p(X1 = x1, , Xn = xn) = p(X2 = x1, , Xn+1 = xn).

Dr. Yao Xie, ECE587, Information Theory, Duke University

Dr. Yao Xie, ECE587, Information Theory, Duke University

H (X) exists, for Xi stationary

H(Xn|X1, , Xn1) decreases as n increases

Dr. Yao Xie, ECE587, Information Theory, Duke University

H(X) = H (X), for Xi stationary

Each H(Xn|X1, , Xn1) H (X)

Dr. Yao Xie, ECE587, Information Theory, Duke University

AEP for stationary process

p(X1, , Xn) 2nH(X)

Dr. Yao Xie, ECE587, Information Theory, Duke University

Entropy rate for Markov chain

Entropy rate of Markov chain

Dr. Yao Xie, ECE587, Information Theory, Duke University

Calculate entropy rate is fairly easy

Dr. Yao Xie, ECE587, Information Theory, Duke University

Entropy rate of weather model

Random walk on graph

Dr. Yao Xie, ECE587, Information Theory, Duke University

Entropy rate of random walk on graph

Dr. Yao Xie, ECE587, Information Theory, Duke University

Random walk on graph

Dr. Yao Xie, ECE587, Information Theory, Duke University

Você também pode gostar