Entropy
Definition
-
Let X be a discrete random variable with alphabet X \mathcal{X} X and probability mass function p ( x ) = P r ( X = x ) , x ϵ X . p(x)=Pr(X=x),x \epsilon \mathcal{X}. p(x)=Pr(X=x),xϵX.
-
The entropy of X is defined as
H ( X ) = − ∑ x ϵ X p ( x ) l o g p ( x ) H(X)=-\sum_{x\epsilon \mathcal{X}}p(x)logp(x) H(X)=−xϵX∑p(x)logp(x)
a measure of a uncertainty of a random variable -
H ( X ) H(X) H(X) only depends on p ( x ) p(x) p(x).We also write H ( p ) H(p) H(p) for H ( X ) H(X) H(X).
-
H ( X ) ≥ 0 H(X)\ge0 H(X)≥0
-
When X X X is uniform over X \mathcal{X} X,then H ( X ) = l o g ∣ X ∣ H(X)=log\lvert \mathcal{X} \rvert H(X)=log∣X∣
-
H b ( X ) = l o g b a ∗ H a ( X ) H_{b}(X)=log_{b}a*H_{a}(X) Hb(X)=logba∗Ha(X)
Example
-
Binary entropy function H ( p ) H(p) H(p)
L e t X = { 1 with probability p 0 with probability 1-p Let X= \begin{cases} 1& \text{with probability p}\\ 0& \text{with probability 1-p} \end{cases} LetX={10with probability pwith probability 1-p
H ( X ) = − p l o g ( p ) − ( 1 − p ) l o g ( 1 − p ) H(X)=-plog(p)-(1-p)log(1-p) H(X)=−plog(p)−(1−p)log(1−p) -
H ( X ) = − E p [ l o g p ( X ) ] H(X)=-E_{p}[logp(X)] H(X)=−Ep[logp(X)]
-
For a discrete random variable X X X defined on X \mathcal{X} X,
0 ≤ H ( X ) ≤ l o g ∣ X ∣ 0\le H(X)\le log\vert \mathcal{X} \rvert 0≤H(X)≤log∣X∣
Equality if and only if p ( x ) = 1 / ∣ X ∣ p(x)=1/ \lvert \mathcal{X} \rvert p(x)=1/∣X∣.(Uniform distribution maximizes entropy) -
Convexity is widely applied
∑ i p i f ( x i ) ≤ f ( ∑ i p i x i ) \sum_{i}p_{i}f(x_{i})\le f(\sum_{i}p_{i}x_{i}) i∑pif(xi)≤f(i∑pixi)
Joint Entropy
- Two random variables X X X and Y Y Y can be considered to be a single vector-valued random variable
- The joint entropy
H
(
X
,
Y
)
H(X,Y)
H(X,Y) of a pair of discrete random variable
(
X
,
Y
)
(X,Y)
(X,Y) with joint distribution
p
(
x
,
y
)
p(x,y)
p(x,y) is defined as
H ( X , Y ) = − ∑ x ϵ X ∑ y ϵ Y p ( x , y ) l o g p ( x , y ) H(X,Y)=-\sum_{x\epsilon \mathcal{X}} \sum_{y\epsilon \mathcal{Y}}p(x,y)logp(x,y) H(X,Y)=−xϵX∑yϵY∑p(x,y)logp(x,y) - H ( X , Y ) = − E l o g p ( X , Y ) H(X,Y)=-Elogp(X,Y) H(X,Y)=−Elogp(X,Y)
- H ( X , X ) = H ( X ) H(X,X)=H(X) H(X,X)=H(X)
- H ( X , Y ) = H ( Y , X ) H(X,Y)=H(Y,X) H(X,Y)=H(Y,X)
- H ( X 1 , X 2 , . . . , X n ) = − ∑ p ( x 1 , x 2 , . . . , x n ) l o g p ( x 1 , x 2 , . . . , x n ) H(X_{1},X_{2},...,X_{n})=-\sum p(x_{1},x_{2},...,x_{n})logp(x_{1},x_{2},...,x_{n}) H(X1,X2,...,Xn)=−∑p(x1,x2,...,xn)logp(x1,x2,...,xn)
Conditional Entropy
- Entropy for
p
(
Y
∣
X
=
x
)
p(Y|X=x)
p(Y∣X=x)
H ( Y ∣ X = x ) = ∑ y − p ( y ∣ X = x ) l o g p ( y ∣ X = x ) = − E l o g p ( y ∣ X = x ) H(Y|X=x)=\sum_{y}-p(y|X=x)logp(y|X=x)=-Elogp(y|X=x) H(Y∣X=x)=y∑−p(y∣X=x)logp(y∣X=x)=−Elogp(y∣X=x) - When X X X is known: H ( Y ∣ X ) ≤ H ( Y ) H(Y|X)\le H(Y) H(Y∣X)≤H(Y)
- H ( X ∣ Y ) ≠ H ( Y ∣ X ) H(X|Y)\ne H(Y|X) H(X∣Y)=H(Y∣X)
- H ( X ∣ Y ) + H ( Y ) = H ( Y ∣ X ) + H ( X ) = H ( X , Y ) H(X|Y)+H(Y)=H(Y|X)+H(X)=H(X,Y) H(X∣Y)+H(Y)=H(Y∣X)+H(X)=H(X,Y)
Zero Entropy
- If
H
(
Y
∣
X
)
=
0
H(Y|X)=0
H(Y∣X)=0:
- then Y Y Y is a function of X X X.
- H ( Y ∣ X = x ) = 0 H(Y|X=x)=0 H(Y∣X=x)=0