一直对香农的信息熵(Information Entropy)都没有一个非常感性的认识,今日摸鱼学习了一下这个问题。
我们先来看看香农是怎么看待交流中的“信息”:
“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.”
简单来说,香农对于“信息”的处理,更多地是从工程的角度出发的。因此,这里的“信息”的定义更多的是基于概率的(probabilistic),而非基于语意的(semantic)。消息要传递的意思在这里不重要的。我们要重点关注的是消息(通常是文字序列)随机出现的过程。
一个事件的自信息(Self