熵率 entropy rate
对于 n n n 个随机变量, 熵率描述随机变量序列的熵随 n n n 的变化情况.
随机过程 X i {X_i} Xi 的熵率
两种定义方式,
第一种:
H
(
X
)
=
lim
n
→
∞
H
(
X
1
,
X
2
,
⋯
 
,
X
n
)
n
H(\mathcal{X})=\lim\limits_{n\rightarrow\infty}\frac{H(X_1, X_2, \cdots,X_n)}{n}
H(X)=n→∞limnH(X1,X2,⋯,Xn)
当
X
1
,
X
2
,
⋯
 
,
X
n
X_1, X_2, \cdots,X_n
X1,X2,⋯,Xn 独立同分布时,
H
(
X
)
=
lim
n
→
∞
H
(
X
1
,
X
2
,
⋯
 
,
X
n
)
n
lim
n
H
(
X
1
)
n
=
H
(
X
1
)
H(\mathcal{X})=\lim\limits_{n\rightarrow\infty}\frac{H(X_1, X_2, \cdots,X_n)}{n}\lim\frac{nH(X_1)}{n}=H(X_1)
H(X)=n→∞limnH(X1,X2,⋯,Xn)limnnH(X1)=H(X1)
即
H
(
X
)
=
H
(
X
i
)
=
H
(
X
1
)
H(\mathcal{X})=H(X_i)=H(X_1)
H(X)=H(Xi)=H(X1)
熵是编码单个符号所需要的平均比特量. 熵率也是编码符号所需要的平均比特量 (the entropy rate per symbol or the per symbol entropy of the n n n random variables) 二者相等.
Let { X i } \{X_i\} {Xi} be a stochastic process such that all X i X_i Xi are i.i.d. Recall that the entropy is the average number of bits to encode a single source symbol. As all X i X_i Xi are i.i.d. each random variable emits symbols according to the same distribution. Therefore the output of each Random Variable can be encoded using H ( X i ) H(X_i) H(Xi) bits. If the entropy rate of a stochastic process is the average number of bits used to encode a source symbol it makes sense that for an i.i.d. stochastic process the entropy rate is equal to the entropy of its random variables.
另外一种表示形式:
在前面随机变量确定的条件下最后一个随机变量发生所形成的条件熵 (the conditional entropy of the last random variable given the past)
H
′
(
X
)
=
lim
n
→
∞
H
(
X
n
∣
X
n
−
1
,
X
n
−
2
,
⋯
 
,
X
1
)
H^\prime(\mathcal{X})=\lim\limits_{n\rightarrow\infty} H(X_n\mid X_{n-1}, X_{n-2},\cdots , X_1)
H′(X)=n→∞limH(Xn∣Xn−1,Xn−2,⋯,X1)
对于平稳的随机过程 H ( X ) = H ′ ( X ) H(\mathcal{X})=H^\prime(\mathcal{X}) H(X)=H′(X); H ( X 1 , X 2 , ⋯   , X n ) n \frac{H(X_1, X_2, \cdots,X_n)}{n} nH(X1,X2,⋯,Xn) 和 H ( X n ∣ X n − 1 , X n − 2 , ⋯   , X 1 ) H(X_n\mid X_{n-1}, X_{n-2},\cdots , X_1) H(Xn∣Xn−1,Xn−2,⋯,X1) 均不随 n n n 增长; H ( X n ∣ X n − 1 , X n − 2 , ⋯   , X 1 ) ≤ H ( X 1 , X 2 , ⋯   , X n ) n f o r ∀ n > 1 H(X_n\mid X_{n-1}, X_{n-2},\cdots , X_1)\leq \frac{H(X_1, X_2, \cdots,X_n)}{n}~for~ \forall n>1 H(Xn∣Xn−1,Xn−2,⋯,X1)≤nH(X1,X2,⋯,Xn) for ∀n>1
平稳的马尔可夫过程的熵率 stationary Markov process entropy rate
H ′ ( X ) = lim n → ∞ H ( X n ∣ X n − 1 , X n − 2 , ⋯   , X 1 ) = lim H ( X n ∣ X n − 1 ) = H ( X 2 ∣ X 1 ) = H ( X 1 , X 2 ) − H ( X 1 ) H^\prime(\mathcal{X})=\lim\limits_{n\rightarrow\infty} H(X_n\mid X_{n-1}, X_{n-2},\cdots , X_1)=\lim H(X_n\mid X_{n-1})=H(X_2\mid X_1)=H(X_1,X_2)-H(X_1) H′(X)=n→∞limH(Xn∣Xn−1,Xn−2,⋯,X1)=limH(Xn∣Xn−1)=H(X2∣X1)=H(X1,X2)−H(X1)
又由条件熵 conditional entropy: H ( Y ∣ X ) = ∑ x ∈ X p ( x ) H ( Y ∣ X = x ) H(Y\mid X)=\sum\limits_{x\in\mathfrak{X}}p(x)H (Y \mid X = x) H(Y∣X)=x∈X∑p(x)H(Y∣X=x) 和 平稳的马尔可夫 μ j = ∑ i μ i P i j \mu_j=\sum_i\mu_iP_{ij} μj=∑iμiPij,
平稳的马尔可夫过程的熵率:
H
′
(
X
)
=
H
(
X
2
∣
X
1
)
=
∑
i
μ
i
(
∑
j
(
−
P
i
j
log
P
i
j
)
)
=
−
∑
i
j
μ
i
P
i
j
log
P
i
j
H^\prime ( \mathcal{X} ) = H (X_2 \mid X_1 ) =\sum\limits_i \mu_i (\sum\limits_j(-P_{ij}\log P_{ij}))=-\sum\limits_{ij}\mu_i P_{ij}\log P_{ij}
H′(X)=H(X2∣X1)=i∑μi(j∑(−PijlogPij))=−ij∑μiPijlogPij
ref:
[1]. ELEMENTS OF INFORMATION THEORY Second Edition THOMAS M. COVER JOY A. THOMAS 2006
[2]. https://homepages.cwi.nl/~schaffne/courses/infcom/2014/reports/EntropyRate_Mulder_Peters.pdf