Date | Unknown | Interpretations | Source |
---|---|---|---|
2018-05-16 09:14:18 2018年5月17日18:42:15 |
Bayesian inference下解释 D K L ( P ∥ Q ) D_{KL}(P\Vert Q) DKL(P∥Q) | D K L ( P ∥ Q ) D_{KL}(P\Vert Q) DKL(P∥Q) is a measure of the information gained when one revises(修改) one’s beliefs from the prior probability distribution Q to the posterior probability distribution P. In other words, it is the amount of information lost when Q is used to approximate P. In applications, P typically represents the “true” distribution of data, observations, or a precisely calculated theoretical distribution, while Q typically represents a theory, model, description, or approximation of P. In order to find a distribution Q that is closest to P, we can minimize KL divergence and compute an information projection. 2. Imagine a coder that is designed for a source that generates symbols according to a probability distribution Q.What happens if the source generates symbols drawn from a different probability distribution, P? If the coder had been designed for P (instead of for Q), it would need to generate H ( P ) H(P) H(P) bits per symbol.But in this case, our coder was designed for Q. So it ends up generating H ( P , Q ) H(P,Q) H(P,Q) bits per symbol. (This is the “cross entropy” between P and Q.) The difference between H ( P , Q ) H(P,Q) H(P,Q) and |
熵 、贝叶斯定义—— 2018-07-02 07:29:09
最新推荐文章于 2023-04-18 17:37:48 发布