熵、贝叶斯定义—— 2018-07-02 07:29:09

最新推荐文章于 2023-04-18 17:37:48 发布

NockinOnHeavensDoor

最新推荐文章于 2023-04-18 17:37:48 发布

阅读量298

点赞数

分类专栏：概率论 math

本文链接：https://blog.csdn.net/NockinOnHeavensDoor/article/details/80879528

版权

                    
   Date
   Unknown
   Interpretations
   Source
  
   2018-05-16 09:14:18 
 2018年5月17日18:42:15
   Bayesian inference下解释 D K L ( P ∥ Q ) D_{KL}(P\Vert Q) DKL​(P∥Q)
    D K L ( P ∥ Q ) D_{KL}(P\Vert Q) DKL​(P∥Q) is a measure of the information gained when one revises(修改) one’s beliefs from the prior probability distribution Q to the posterior probability distribution P. In other words, it is the amount of information lost when Q is used to approximate P. In applications, P typically represents the “true” distribution of data, observations, or a precisely calculated theoretical distribution, while Q typically represents a theory, model, description, or approximation of P. In order to find a distribution Q that is closest to P, we can minimize KL divergence and compute an information projection.
2. Imagine a coder that is designed for a source that generates symbols according to a probability distribution Q.What happens if the source generates symbols drawn from a different probability distribution, P? If the coder had been designed for P (instead of for Q), it would need to generate H ( P ) H(P) H(P) bits per symbol.But in this case, our coder was designed for Q. So it ends up generating  H ( P , Q ) H(P,Q) H(P,Q) bits per symbol. (This is the “cross entropy” between P and Q.) The difference between  H ( P , Q ) H(P,Q) H(P,Q) and

Date	Unknown	Interpretations	Source
2018-05-16 09:14:18 2018年5月17日18:42:15	Bayesian inference下解释 $D_{KL}(P\Vert Q)$	$D_{KL}(P\Vert Q)$ is a measure of the information gained when one revises(修改) one’s beliefs from the prior probability distribution Q to the posterior probability distribution P. In other words, it is the amount of information lost when Q is used to approximate P. In applications, P typically represents the “true” distribution of data, observations, or a precisely calculated theoretical distribution, while Q typically represents a theory, model, description, or approximation of P. In order to find a distribution Q that is closest to P, we can minimize KL divergence and compute an information projection. 2. Imagine a coder that is designed for a source that generates symbols according to a probability distribution Q.What happens if the source generates symbols drawn from a different probability distribution, P? If the coder had been designed for P (instead of for Q), it would need to generate $H (P)$ bits per symbol.But in this case, our coder was designed for Q. So it ends up generating $H (P, Q)$ bits per symbol. (This is the “cross entropy” between P and Q.) The difference between $H (P, Q)$ and

最低0.47元/天解锁文章

NockinOnHeavensDoor

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
熵、贝叶斯定义—— 2018-07-02 07:29:09

Date Unknown Interpretations Source 2018-05-16 09:14:18 2018年5月17日18:42:15 Bayesian inference下解释DKL(P∥Q)DKL(P‖Q)D_{KL}(P\Vert Q) DKL(P∥Q)DKL(P‖Q)D_{KL}(P\Vert Q) is a measure...
复制链接

扫一扫