[1]信息瓶颈 Information Bottleneck
[2]Shao, Jiawei, Yuyi Mao, and Jun Zhang. “Learning task-oriented communication for edge inference: An information bottleneck approach.” IEEE Journal on Selected Areas in Communications 40.1 (2021): 197-211.
[3]Tishby, Naftali, Fernando C. Pereira, and William Bialek. “The information bottleneck method.” arXiv preprint physics/0004057 (2000).
[4]HSIC简介:一个有意思的判断相关性的思路
Ma, Wan-Duo Kurt, J. P. Lewis, and W. Bastiaan Kleijn. “The HSIC bottleneck: Deep learning without back-propagation.” Proceedings of the AAAI conference on artificial intelligence. Vol. 34. No. 04. 2020.
[5] Tian, Xudong, et al. “Variational distillation for multi-view learning.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[6] Zhang, Changqing, et al. “Deep partial multi-view learning.” IEEE transactions on pattern analysis and machine intelligence 44.5 (2020): 2402-2415.
[7] Andrew, Galen, et al. “Deep canonical correlation analysis.” International conference on machine learning. PMLR, 2013.
[8] Hardoon, David R., Sandor Szedmak, and John Shawe-Taylor. “Canonical correlation analysis: An overview with application to learning methods.” Neural computation 16.12 (2004): 2639-2664.
[9] 漫谈重参数:从正态分布到Gumbel Softmax
[10] Deep INFOMAX 苏建林
[10] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 586–595.
[11] H. Wu, Y. Shao, E. Ozfatura, K. Mikolajczyk and D. Gündüz, “Transformer-aided Wireless Image Transmission with Channel Feedback,” in IEEE Transactions on Wireless Communications, doi: 10.1109/TWC.2024.3386052.
[12] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. 37th Asilomar Conf. Signals, Syst. Comput., vol. 2, Nov. 2004, pp. 1398–1402.
文章目录
1-IB
Information Bottleneck的主要想法是将多层神经网络视为逐层传递的马尔科夫链,信息在链中被逐层压缩,去掉和输出无关的,留下与输出相关的。也就是说让每一层与模型输入的互信息逐渐减小,与模型输出的互信息逐渐增大【1】
根据[2][3],IB 的思想如下式:
L I B ( ϕ ) = − I ( Z ^ , Y ) ⏟ Distortion + β I ( Z ^ , X ) ⏟ Rate = E p ( x , y ) { E p ϕ ( z ^ ∣ x ) [ − log p ( y ∣ z ^ ) ] + β D K L ( p ϕ ( z ^ ∣ x ) ∥ p ( z ^ ) ) } − H ( Y ) ≡ E p ( x , y ) { E p ϕ ( z ^ ∣ x ) [ − log p ( y ∣ z ^ ) ] + β D K L ( p ϕ ( z ^ ∣ x ) ∥ p ( z ^ ) ) } , \begin{aligned} \mathcal{L}_{I B}(\boldsymbol{\phi})= & \underbrace{-I(\hat{Z}, Y)}_{\text {Distortion }}+\beta \underbrace{I(\hat{Z}, X)}_{\text {Rate }} \\ = & \mathbf{E}_{p(\boldsymbol{x}, \boldsymbol{y})}\left\{\mathbf{E}_{p_{\boldsymbol{\phi}}(\hat{\boldsymbol{z}} \mid \boldsymbol{x})}[-\log p(\boldsymbol{y} \mid \hat{\boldsymbol{z}})]\right. & \left.+\beta D_{K L}\left(p_{\boldsymbol{\phi}}(\hat{\boldsymbol{z}} \mid \boldsymbol{x}) \| p(\hat{\boldsymbol{z}})\right)\right\}-H(Y) \\ \equiv & \mathbf{E}_{p(\boldsymbol{x}, \boldsymbol{y})}\left\{\mathbf{E}_{p_{\boldsymbol{\phi}}(\hat{\boldsymbol{z}} \mid \boldsymbol{x})}[-\log p(\boldsymbol{y} \mid \hat{\boldsymbol{z}})]\right. & \left.+\beta D_{K L}\left(p_{\boldsymbol{\phi}}(\hat{\boldsymbol{z}} \mid \boldsymbol{x}) \| p(\hat{\boldsymbol{z}})\right)\right\}, \end{aligned} LIB(ϕ)==≡Distortion
−I(Z^,Y)+βRate
I(Z^,X)Ep(x,y){
Epϕ(z^∣x)[−logp(y∣z^)]Ep(x,y){
Epϕ(z^∣x)[−logp(y∣z^)]+βDKL(pϕ(z^∣x)∥p(z^))}−H(Y)+βDKL(pϕ(z^∣x)∥p(z^))},
Z ^ → X → Y \hat{Z} \rightarrow X \rightarrow Y Z^→X→Y构成马尔科夫链,信源 X X X, Y Y Y是任务结果(图像分类就是标签), Z Z Z是信源 X X X经过编码得到的信道输入信号,经过信道后变为 Z ^ \hat{Z} Z^. 我们的目标是最小化如下互信息(mutual information) I ( X , Z ^ ∣ Y ) I(X,\hat{Z}|Y) I(X,Z^∣Y),这种压缩模式会使得 I ( Z ^ , X ) I(\hat{Z},X)