kl距离以及零值处理方法

最新推荐文章于 2022-10-20 17:23:41 发布

wyx9027

最新推荐文章于 2022-10-20 17:23:41 发布

阅读量3.2k

点赞数

分类专栏：数据挖掘

本文链接：https://blog.csdn.net/u014689510/article/details/50358178

版权

KL散度是一种衡量两个概率分布差异的非对称度量，源于Solomon Kullback和Richard Leibler在1951年的研究。它描述了使用基于Q的编码方法编码来自P的样本相比使用基于P的编码方法所需的额外比特数。虽然不是真正的距离度量，但它是f-散度的一个特例。在计算稀疏数据集上的KL散度时，可能会遇到分母为零的问题。MATLAB中的KLDIV函数用于计算两个分布的KL散度，包括对称和Jensen-Shannon变体。

摘要由CSDN通过智能技术生成

粘贴自：http://www.cppblog.com/sosi/archive/2010/10/16/130127.aspx

In probability theory and information theory, the Kullback–Leibler divergence[1][2][3] (also information divergence,information gain, relative entropy, or KLIC) is a non-symmetric measure of the difference between two probability distributions P and Q. KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Typically P represents the "true" distribution of data, observations, or a precise calculated theoretical distribution. The measure Q typically represents a theory, model, description, or approximation of P.

Although it is often intuited as a distance metric, the KL divergence is not a true metric – for example, the KL from P to Q is not necessarily the same as the KL from Q to P.

KL divergence is a special case of a broader class of divergences called f-divergences. Originally introduced by Solomon Kullbackand Richard Leibler in 1951 as the directed divergence between two distributions, it is not the same as a divergence incalculus. However, the KL divergence can be derived from the Bregman divergence.