Mutual Information Neural Estimation
Paper:https://arxiv.org/pdf/1801.04062v4.pdf
Code:https://github.com/mzgubic/MINE
Tips:ICML2018的一篇paper。
(阅读笔记)
1.Main idea
- 高维连续随机变量的互信息的估计可以被神经网络通过梯度下降实现。the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks.
- We present a Mutual Information Neural Estimator (MINE).
- 实验提出了一些应用。
2.Intro
- 提出了MINE,which is scalable, flexible, and completely trainable via back-prop, as well as provide a thorough theoretical analysis.
- 互信息比GAN的一些目标函数表现出更优秀的结果。
- We use MINE to apply the Information Bottleneck method.可用于信息瓶颈。(平衡准确度和复杂度的一种方法 It is designed for finding the best tradeoff between accuracy and complexity)
- Mutual Information, f f f-divergence都较为熟悉。对于KL散度的另一表示:Donsker-Varadhan representation有,其中 T T T表示函数:
D K L ( P ∥ Q ) = sup T : Ω → R E P [ T ] − log ( E Q [ e T ) ] ) (1) \begin{aligned} D_{\mathbf{KL}}(\mathbb{P}\|\mathbb{Q}) = \sup_{T:\Omega \rightarrow \mathbb{R}} \mathbb{E}_{\mathbb{P}}[T] - \log (\mathbb{E}_{\mathbb{Q}} [\mathrm{e}^T)] ) \tag{1} \end{aligned} DKL(P∥Q)=T:Ω→RsupEP[T]−log(EQ[eT)])(1)
推导如下:
E P [ T ] − log ( E Q [ exp ( T ) ] ) = ∑ i p i t i − log ∑ i q i e t i → ∂ [ ∑ i p i t i − log ∑ i q i e t i ] ∂ t j = 0 → p j − q j e t j ∑ i q i e t i = 0 → p j ∑ i q i e t i = q j e t j → t j = log p j q j + log ∑ i q i e t i (2) \begin{aligned} \mathbb{E}_{\mathbb{P}}[T] & - \log (\mathbb{E}_{\mathbb{Q}} [\exp(T)] ) = \sum_{i}p_i t_i - \log \sum_i q_i \mathrm{e}^{t_i}\\ & \rightarrow \frac{\partial \left[ \sum_{i}p_i t_i - \log \sum_i q_i \mathrm{e}^{t_i} \right]}{\partial t_j }=0 \\ & \rightarrow p_j - \frac{q_j \mathrm{e}^{t_j}}{\sum_{i} q_i \mathrm{e}^{t_i}} =0 \\ & \rightarrow p_j \sum_{i} q_i \mathrm{e}^{t_i} = q_j \mathrm{e}^{t_j}\\ & \rightarrow t_j = \log \frac{p_j}{q_j} + \log \sum_{i} q_i \mathrm{e}^{t_i} \\ \tag{2} \end{aligned} EP[T]−log(EQ[exp(T)])=i∑piti−logi∑qieti→∂tj∂[∑ipiti−log∑iqieti]=0→pj−∑iqietiqjetj=0→pji∑qieti

MINE(Mutual Information Neural Estimation)是一种利用神经网络和梯度下降估计高维连续随机变量间互信息的方法。它提供了一种可扩展、灵活且完全可通过反向传播训练的互信息估计方案。MINE可用于多种应用,如改进GAN目标函数和信息瓶颈方法。
最低0.47元/天 解锁文章
3061

被折叠的 条评论
为什么被折叠?



