[笔记][ProtoPNet]This Looks Like That: Deep Learning for Interpretable Image Recognition

有点欠扁的圈圈

已于 2022-12-09 15:41:04 修改

阅读量684

点赞数

分类专栏：论文阅读笔记文章标签：深度学习人工智能

于 2022-12-09 15:40:33 首次发布

本文链接：https://blog.csdn.net/weixin_44834206/article/details/128254150

版权

论文阅读笔记专栏收录该内容

4 篇文章

订阅专栏

Chen, C., et al. (2019). “This looks like that: deep learning for interpretable image recognition.” Advances in neural information processing systems 32.

has a ‘transparent reasoning process’

our model is able to identify several parts of the image where it thinks that this part of the image looks like that prototypical part of some class, and makes its prediction based on a weighted combination of the similarity scores between parts of the image and the learned prototypes.

Architecture

$f\to g_p\to h$

$f$ : conv layer

$g_p$ : prototype layer

$h$ : FC layer

conv layer

$H_0\times W_0\times D_0$ : input dim

$ H\times W\times D$: output dim

$224\times 224\times 3\to 7\times 7\times 128(256,512)$ this work

$W_{conv}$ : parameters

prototype layer

$P=\{p_j\}_{j=1}^m$ : prototype set

$P_k\subset P$ : prototype set w.r.t. class k. $|P_k|=m_k,\sum m_k=m$

$k\in\{1,...,K\}$ $K = 10$ this work

$ H_1\times W_1\times D$: prototype shape $H_1\le H,W_1\le W$

$H_1=W_1=1$ this work

each prototype will be used to represent some prototypical activation pattern in a patch of the convolutional output, which in turn will correspond to some prototypical image patch in the original pixel space

$g_{p_j}(z)=\max_{\tilde z\in patches(z)}\log(\dfrac{||\tilde z-p_j||_2^2+1}{||\tilde z-p_j||_2^2+\epsilon})$

all patches of $z$ have the same shape as $p_j$

FC layer

$m$ : input(m scores produced by $P$ )

$w_h$ : weight matrix

Training

possible to cycle following 3 steps more than once

SDG before last layer

fix $w_h$

$w_h^{(k,j)}=1, when\ p_j\in P_k$

$w_h^{(k,j)}=-0.5, when\ p_j\notin P_k$

loss func.

$\min\limits_{P,w_{conv}}\dfrac{1}{n}\sum\limits_{i=1}^nCrsEnt(h\circ g_P\circ f(x_i),y_i)+\lambda_1Clst+\lambda_2Sep$

$Clst=\dfrac{1}{n}\sum\limits_{i=1}^n\min\limits_{j:p_j\in P_{y_i},z\in patches(f(x_i))}||z-p_j||^2_2$

$Sep=-\dfrac{1}{n}\sum\limits_{i=1}^n\min\limits_{j:p_j\in P_{y_i},z\notin patches(f(x_i))}||z-p_j||^2_2$

prototype projection

$p_j\gets\arg\min\limits_{z\in\mathcal Z_j}||z-p_j||_2$

$\mathcal Z_j=\{\tilde z:\tilde z\in patches(f(x_i))\ \forall i\ s.t.\ y_i=k\}$

convex optimization of last layer

fix $P,w_{conv}$

$\min\limits_{w_h}\dfrac{1}{n}\sum\limits_{i=1}^nCrsEnt(h\circ g_P\circ f(x_i),y_i)+\lambda \sum\limits_{k=1}^K\sum\limits_{j:p_j\notin P_k}|w_h^{(k,j)}|$