Entropy(1): A representation of uncertainty and its basic properties

I first learned entropy in physics, which in thermodynamics, measures energy dispersal at a specific temperature. Later I learned about the term information entropy in communication. And then in machine learning, it is widely used as a representation of uncertainty.
Here we are talking about Shannon entropy, whose definition is − ∑ i = 1 K p i log ⁡ 2 p i -\sum_{i=1}^K p_i\log_2 p_i i=1Kpilog2pi. There are other measures of uncertainty, but for some reason, people choose Shannon entropy more often for its good properties [1].

The following are mainly summarized and extended from [1].

What are the basic properties of Shannon entropy?

Property 1: Uniform distribution has max entropy

This can be proved using Weighted AM–GM inequality [2]:
w 1 x 1 + w 2 x 2 + ⋯ + w n x n w ≥ x 1 w 1 x 2 w 2 ⋯ x n w n w {\frac {w_{1}x_{1}+w_{2}x_{2}+\cdots +w_{n}x_{n}}{w}}\geq {\sqrt[ {w}]{x_{1}^{{w_{1}}}x_{2}^{{w_{2}}}\cdots x_{n}^{{w_{n}}}}} ww1x1+w2x2++wnxnwx1w1x2w2xnwn
by letting w i w = p i \frac{w_i}{w}=p_i wwi=pi, x i = 1 p i x_i=\frac{1}{p_i} xi=pi1.

Property 2: Additivity of independent events

To formulate this property in math equations, we have
H ( X , Y ) = H ( X ) + H ( Y ) H(X,Y) = H(X) + H(Y) H(X,Y)=H(X)+H(Y), if X ⊥ Y X\perp Y XY

Another function − ∑ i = 1 K p i 2 -\sum_{i=1}^K p_i^2 i=1Kpi2, which satisfies the first property, does not satisfy this one. That’s why trace of covariance as a representation of uncertainty may not be as good as entropy.

Property 3: Zero-prob outcome does not contribute to entropy

H ( p 1 , p 2 , … , p n ) = H ( p 1 , p 2 , … , p n , p n + 1 = 0 ) H(p_1,p_2,\dots,p_n) = H(p_1,p_2,\dots,p_n,p_{n+1}=0) H(p1,p2,,pn)=H(p1,p2,,pn,pn+1=0)

Property 4: Continuity in all arguments

Some other measurements also satisfies this property, such as trace of covariance matrix, determinant of covariance matrix.

Note: there is a Uniqueness Theorem [1]

Khinchin (1957) showed that the only family of functions satisfying the four basic properties described above are of the following form:
H ( p 1 , p 2 , … , p K ) = − λ ∑ i = 1 K p i log ⁡ 2 p i H(p_1,p_2,\dots,p_K)=-\lambda\sum_{i=1}^K p_i\log_2 p_i H(p1,p2,,pK)=λi=1Kpilog2pi
Functions that satisfy the 4 basic properties
where λ \lambda λ is a positive constant. Khinchin referred to this as the Uniqueness Theorem. Setting λ = 1 \lambda = 1 λ=1 and using the binary logarithm gives us the Shannon entropy.
To reiterate, entropy is used because it has desirable properties and is the natural choice among the family functions that satisfy all items on the basic wish list (properties 1–4).

Besides the above discussion for the basic 4 properties of entropy, there are some other interesting facts about entropy that I will explore later.

References

[1] Entropy is a measure of uncertainty, Sebastian Kwiatkowski
[2] Inequality of arithmetic and geometric means, Wikipedia

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值