Amount of Information - Formula express of Entropy - KL Divergence - Cross Entropy 信息量-熵的公式-KL散度-交叉熵

最新推荐文章于 2024-07-10 22:17:05 发布

大写的D

最新推荐文章于 2024-07-10 22:17:05 发布

阅读量194

点赞数

文章标签：深度学习生成对抗网络

本文链接：https://blog.csdn.net/weixin_41932943/article/details/120725153

版权

Amount of Information - Formula express of Entropy - KL Divergence - Cross Entropy

Amount of Information

When we are reading papers or news, why not every article is equally important to us. Some give great enlightenment, but some are the old lady’s footwear-smelly and long (老太太的裹脚布，又臭又长。实在不会翻译了), talking nosense. The reason is that their amount of information is different! But how?

It is easy to find that:
The amount of information is only related to the probability of a certain event.

The smaller the probability that something happened, the greater amount of information is indicated, like Yasuo getting penta kill in your team;
The larger the probability, the smaller the amount of information is indicated. For example, my first blog post will be read by many people (it will happen, there is no information).

Fomula express of Entropy

Let’s invent the formula express of Entropy together：
Facing a new concept, we should consider its mathematical properties Additivity, which means

F(event A and B and C…) = F(event A) + F(event B) + F(event C) + …

and as discussed above, Amount of Information only relates to probability, so:

F(p(A, B, C…)) = F(p(A)) + F(p(B)) + F(p(C )) + …

And as we all know, if event A and B are not related, then: p(A, B) = p(A)*p(B)
In addition, Amount of Information should not be negative. We can design:

F(A) = -log(p(A))

As for the choice of base, it is the benchmark that we want to choose. Just like the temperature of 0° and 100° are the melting point of standard pressure ice and the boiling point of water. Theoretically, setting 0° and 100° base on alcohol won’t cause any problem.
In information theory, the base is generally 2, which can be understood as: based on the probability of “tossing a coin with heads up”, it can be calculated as F’(tossing a coin with heads up) = -log(p (tossing a coin with heads up)) . Why take 2? Because the computer is binary, the information transmission is easy to calculate with 2;
In the neural network, e is generally used as the base, nothing else, the derivative is easy to be calculated for gradient decent.

Let’s get down to business, Entropy is defined as: the degree of uncertainty of information, in math, it equal to the expectation of Information Amount, then we get:
在这里插入图片描述

Relative Entropy（KL Divergence）

KL Divergence is used to measure the distance between two random events.

在这里插入图片描述
In my understanding, KL Divergence is like measure event B from A point of view. Maybe it is more clear in this form:

Cross Entropy

From KL Divergence：
在这里插入图片描述
The second part is called Cross Entropy:

It is widely used in loss function, because p(x) usually means the real distribution of data, which won’t change during training. The second part is what we want to decrease.

本文主要是确保自己明白了，如果我这英语都能说明白，那我是真的明白了。有内容或语法错误欢迎指正！

大写的D

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Amount of Information - Formula express of Entropy - KL Divergence - Cross Entropy 信息量-熵的公式-KL散度-交叉熵

信息熵（信息量） - 熵的公式表达 - KL散度信息熵（信息量）熵的公式表达合理的创建标题，有助于目录的生成如何改变文本的样式插入链接与图片如何插入一段漂亮的代码片生成一个适合你的列表创建一个表格设定内容居中、居左、居右SmartyPants创建一个自定义列表如何创建一个注脚注释也是必不可少的KaTeX数学公式新的甘特图功能，丰富你的文章UML 图表FLowchart流程图导出与导入导出导入信息熵（信息量）看新闻，看论文的时候为什么并不是每篇都同样重要，有的听完头皮发麻，有的就听君一席话如听半席话。。。
复制链接

扫一扫