CVPR 2022 | Cross-Image Relational Knowledge Distillation for Semantic Segmentation

最新推荐文章于 2024-08-19 21:09:10 发布

有为少年

最新推荐文章于 2024-08-19 21:09:10 发布

阅读量1.5k

点赞数

分类专栏：深度学习 # 模型压缩文章标签：计算机视觉人工智能深度学习知识蒸馏模型压缩

本文链接：https://blog.csdn.net/P_LarT/article/details/125671757

版权

深度学习同时被 2 个专栏收录

149 篇文章 20 订阅

订阅专栏

模型压缩

6 篇文章 1 订阅

订阅专栏

CVPR 2022 | Cross-Image Relational Knowledge Distillation for Semantic Segmentation

论文：https://arxiv.org/abs/2204.06986
代码：https://github.com/winycg/CIRKD
解读：https://mp.weixin.qq.com/s/MsvRpR_r2X-BtcXfEFIm7A
原始文档: https://www.yuque.com/lart/gw5mta/bbnaym

Current Knowledge Distillation (KD) methods for semantic segmentation often guide the student to mimic the teacher’s structured information generated from individual data samples. However, they ignore the global semantic relations among pixels across various images that are valuable for KD. This paper proposes a novel Cross-Image Relational KD (CIRKD), which focuses on transferring structured pixel-to-pixel and pixel-to-region relations among the whole images. The motivation is that a good teacher network could construct a well-structured feature space in terms of global pixel dependencies. CIRKD makes the student mimic better structured semantic relations from the teacher, thus improving the segmentation performance.

当前用于视觉分割的知识蒸馏 (KD) 方法通常指导学生模仿教师网络从独立数据样本生成的结构化信息。然而，他们忽略了对 KD 有价值的跨图像的像素间全局语义关系。本文提出了一种新的跨图像的关系知识蒸馏 (CIRKD)，其重点是在整个图像之间迁移 pixel-to-pixel 和 pixel-to-region 的关系。其中的动机是一个好的教师网络可以根据全局像素依赖性构建一个结构良好的特征空间。CIRKD 使学生更好地模仿教师的结构化语义关系，从而提高分割性能。

完整算法如下：

完整损失如下：

跨图像成对像素相似度蒸馏

Mini-batch-based Pixel-to-Pixel Distillation

这里的 Sij 表示 batch 中的图像 i 和 j 之间的全局图像之间的成对相似度，形状为 AxA，A=HxW。这里用于计算 S 的特征 Fi 和 Fj（Axd）都被 l2 归一化处理过。这里计算 KL 散度对齐分布时，会对 S 的每一行 a 进行温度参数为 τ 的 softmax 归一化操作。

这里对整个 batch 中的成对点对点关系损失进行了平均。

Memory-based Pixel-to-Pixel Distillation

由于语义分割任务不同 GPU 上的 batch 基本都很小，一般为 1 或者 2，因此单靠前面的损失是不足充分的模拟全局图像依赖关系。这里收到了基于 memroy 的对比学习的启发，引入了一个在线像素队列存储过去的小批量样本生成的存储库中的大量像素嵌入。(To address this problem, we introduce an online pixel queue that can store massive pixel embeddings in the memory bank generated from the past mini-batches.) 通过这样的方式可以从一个在线的 memory bank 中检索过往 batch 的知识。

由于分割任务图像相同目标区域的大部分像素都是同质的，直接存储所有的像素嵌入可能学习到冗余的关系，并且拖慢蒸馏过程，而且保存数个最后的 batch 可能也会破坏像素嵌入的多样性。因此这里设计了一种有选择性的存储策略。因此对于每个 batch 中的图像，仅对每个类别随机采样少量像素嵌入样本，然后将他们压入队列中。借鉴 Seed: Self-supervised distillation for visual representation，教师与学生使用共享的像素队列。

队列中存贮蒸馏过程中每次迭代后，从教师生成的像素嵌入中从每一类中采样 V 个嵌入压入。

队列 Qp 的形状是 CxNpxd，C 为类别数，Np 是每个类的像素嵌入数量，d 是嵌入的维度。相当于这里就是为每一类都维护了一个嵌入队列。每次计算相似度的时候，都是使用教师和学生模型各自的大小为 Axd 的像素嵌入 Fn，来和从像素队列 Qp 中类平衡采样后得到的集合 Vp，其包含 Kp 个对比嵌入。