《Few-Shot Classification with Feature Map Reconstruction Networks》论文笔记&代码-CSDN博客

本文链接：https://blog.csdn.net/qq_37252519/article/details/121415774

文章目录

流程

在这里插入图片描述
在 $N w a y - K s h o t$ 的 $e p i s o d e$ 下(图中N=K=3，有3个类，每类有3张图片)， $X_s$ 表示支持图片的集合.当有单张查询图片 $x_q$ 的时候，我们希望能预测它的标签 $y_q$ .

灰色的梯形代表卷积特征提取器， $x_q$ 经过它产生的一个大小为 $r \times d$ 大小的特征图输出(记作 $Q$ )，其中 $r = h \times w$ 代表空间大小， $d$ 是通道数.

对于 $C$ 个类别，我们把每个类的 $k$ 个图片通过卷积特征提取器转换成特征图 $S_c$ ，大小为 $k r \times d$ (因为有k个图片的特征，每个特征图大小为 $r \times d$ )

我们想找到矩阵 $W$ ，这样把 $Q$ 表示为矩阵乘法 $W ×S_c ≈ Q$ ,求出最优 $\bar{W}$ 等于求解线性最小二乘问题
$\bar{W}=\underset{W}{\arg \min }\left\|Q-W S_{c}\right\|^{2}+\lambda\|W\|^{2}$
通过岭回归公式可以得出最优的 $\bar{W}$ 和 $\bar{Q_c}$
$\begin{aligned} &\bar{W}=Q S_{c}^{T}\left(S_{c} S_{c}^{T}+\lambda I\right)^{-1} \\ &\bar{Q}_{c}=\bar{W} S_{c} \end{aligned}$
对于给定的类c，把 $Q$ 和 $\bar{Q_c}$ 之间距离定义为欧氏距离，然后用 $\frac{1}{r}$ 进行放缩。在所有C个类上，距离乘上 $-\gamma$ 做softmax，公式如下:
$\begin{aligned} \left\langle Q, \bar{Q}_{c}\right\rangle &=\frac{1}{r}\left\|Q-\bar{Q}_{c}\right\|^{2} \\ P\left(y_{q}=c \mid x_{q}\right) &=\frac{e^{\left(-\gamma\left\langle Q, \bar{Q}_{c}\right\rangle\right)}}{\sum_{c^{\prime} \in C} e^{\left(-\gamma\left\langle Q, \bar{Q}_{c^{\prime}}\right\rangle\right)}} \end{aligned}$

求解公式 $\bar{W}=\underset{W}{\arg \min }\left\|Q-W S_{c}\right\|^{2}+\lambda\|W\|^{2}$ 的难度是变化的：

如果kr > d, 解出来是比较容易的
如果kr < d, 解出来很麻烦，这个时候就需要改进公式了

为了保证训练的稳定性，于是决定用 $\frac{1}{kr}$ 改进 $\lambda$ ,这有一个额外的好处，使我们的模型在某种程度上健壮.
另外 $\lambda$ 应该是学习来的参数。改变 $\lambda$ 有多样的效果：大的 $\lambda$ 避免过分依赖 $W$ 的权重，但是也降低了重建的效果、增加了重建的error、限制了可区分的能力。

We therefore disentangle the degree of regularization from the magnitude of Qc by introducing a learned recalibration term ρ:
因此，我们通过引入学习的重新校准项，从 Qc 的大小中解开正则化程度 ρ
~~这句没看懂/(ㄒoㄒ)/~~~~

得到公式 $\bar{Q}_{c}=\rho \bar{W} S_{c}$

$\lambda$ 和 $\rho$ 被参数化为 $e^α$ 和 $e^β$ 以确保非负性，并且初始化为零。

因此，总而言之，我们的最终预测由下式给出：
$\begin{gathered} \lambda=\frac{k r}{d} e^{\alpha} \quad \rho=e^{\beta} \\ \bar{Q}_{c}=\rho \bar{W} S_{c}=\rho Q S_{c}^{T}\left(S_{c} S_{c}^{T}+\lambda I\right)^{-1} S_{c} \\ P\left(y_{q}=c \mid x_{q}\right)=\frac{e^{\left(-\gamma\left\langle Q, \bar{Q}_{c}\right\rangle\right)}}{\sum_{c^{\prime} \in C} e^{\left(-\gamma\left\langle Q, \bar{Q}_{c^{\prime}}\right\rangle\right)}} \end{gathered}$
该算法只引入了3个可以学习的参数: $\alpha,\beta,\gamma$

公式 $\bar{Q}_{c}=\rho \bar{W} S_{c}=\rho Q S_{c}^{T}\left(S_{c} S_{c}^{T}+\lambda I\right)^{-1} S_{c}$ 中

在 $\lt d$ 的时候容易计算。因为最麻烦的一步是求 $k r \times k r$ 矩阵的逆，和d没有关系；从左到右计算矩阵乘积也避免了在内存中存储一个可能很大的d×d矩阵
但是在特征图很大或者 $\lt kr$ 的时候，这个公式将会很麻烦。此时可以将公式变为下面的公式，最昂贵的步骤是 $d \times d$ 矩阵的求逆，从右到左计算乘积避免了内存中保存大型 $k r \times k r$ 或 $b r \times k r$ 矩阵。
$\bar{Q}_{c}=\rho \bar{W} S_{c}=\rho Q\left(S_{c}^{T} S_{c}+\lambda I\right)^{-1} S_{c}^{T} S_{c}$

代码

其中红色方框就是模型要学习的参数
在这里插入图片描述

一些疑问

作者邮件回复了我的问题

代码没找到Auxiliary Loss的实现

作者回复我

The code for the auxiliary loss can be found in trainers/frn_train.py, lines 9-26.

代码和代码获取 $S_c$ 的方式不一样
论文中的 $S_c$ 明明是特征图变幻得来，代码中却变成了learnable parameter，奇怪。

The paper and code get Sc in different ways.
In the code , Sc is a learnable parameter in the FRN model.
In the figure2 of paper, the support Images(N way - K shot) gets the feature map [k,d,h,w] through convolutional feature extractor, and reshape the feature map into [khw,d].

作者回复我

The source of S_c depends on whether you are training or pretraining the FRN model.
When pretraining, S_c is indeed a learned layer at the top of the network. This is only used for mini-ImageNet experiments in the pretraining stage.
Otherwise, S_c corresponds to the reshaped pool of support features (using get_neg_l2_distance() instead of forward_pretrain() in models/FRN.py).

代码没找到Reconstruction Visualization的实现

作者回复我

We hadn’t planned on publishing the visualization code but I’m happy to share what we do have.
Unfortunately we had to do a fair amount of adaptation from our working code to the github version, so what I’m sending is mildly sanitized working code and not nicely integrated.
It’s a jupyter notebook - you will have to set some values and imports manually, but after that it should run straightforwardly (load a feature extractor, train the image decoder, evaluate reconstruction error, generate figures).

下载FRN_CUB_image_recon_sharing.ipynb