理解自信息和信息熵——为什么自信息这样算？

最新推荐文章于 2022-05-13 20:17:33 发布

Colin_Downey

最新推荐文章于 2022-05-13 20:17:33 发布

阅读量5.7k

点赞数 5

分类专栏：随笔文章标签：信息熵机器学习概率论

本文链接：https://blog.csdn.net/Colin_Downey/article/details/105217487

版权

本文探讨了香农的自信息和信息熵概念，强调它们在通信工程中的意义。自信息是衡量单一事件偏离期望的量，与事件发生的概率成反比，而信息熵则表示整个概率分布的不确定性。文章指出，自信息和信息熵与消息的语意无关，仅与出现概率相关，反映了事件的意外性和对系统的影响力。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一直对香农的信息熵(Information Entropy)都没有一个非常感性的认识，今日摸鱼学习了一下这个问题。

我们先来看看香农是怎么看待交流中的“信息”：

“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.”

简单来说，香农对于“信息”的处理，更多地是从工程的角度出发的。因此，这里的“信息”的定义更多的是基于概率的(probabilistic)，而非基于语意的(semantic)。消息要传递的意思在这里不重要的。我们要重点关注的是消息（通常是文字序列）随机出现的过程。

一个事件的自信息(Self