[CVPR 2025]OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

论文网址:[2502.20087] OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

论文代码:GitHub - LMMMEng/OverLoCK: [CVPR 2025 Oral] OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.4. Methodology

2.4.1. Deep-stage Decomposition

2.4.2. Dynamic Convolution with Context-Mixing

2.4.3. Network Architecture

2.5. Experiments

2.5.1. Image Classification

2.5.2. Object Detection and Instance Segmentation

2.5.3. Semantic Segmentation

2.5.4. Ablation Studies

2.6. Conclusion

1. 心得

(1)接Oral接接接

(2)很标准的ConvNet文章写法,可以直接套模型来跑

2. 论文逐段精读

2.1. Abstract

        ①Challenge: feature pyramid (downsampling) did not achieve top-down attention mechanism

2.2. Introduction

        ①Key property of top-down attention mechanism: guidience of feedback signal

        ②Effective Receptive Fields (ERF) at stage 3 and 4

other models fail to localize object in stage 3 due to classification (loss) dependence

        ③Performance chart of OverLoCK and other compared models:

biomimetic  adj.仿生的;仿生化(技术)的

2.3. Related Work

        ①Mentioned classic conv nets, dynamic convs, and biomimetic models

2.4. Methodology

2.4.1. Deep-stage Decomposition

        ①The overview of OverLoCK:

where red lines are only applied in pre-training stage

        ②Structures of each block:

where feature map \mathbf{Z}_{i}\in\mathbb{R}^{C_{z}\times H\times W}, context prior \mathbf{P}_{i}\in\mathbb{R}^{C_{p}\times H\times W}\mathbf{Z}_{i+1}\in\mathbb{R}^{C_z\times H\times W}\mathbf{P^{\prime}}_i\in\mathbb{R}^{C_p\times H\times W}. Initial context prior \mathbf{P}_{o} is added for preventing context prior dilution \mathbf{P}_{i+1}=\alpha\mathbf{P}_{i}^{\prime}+\beta\mathbf{P}_{o}\alpha and \beta are learnable scalars

2.4.2. Dynamic Convolution with Context-Mixing

        ①The pipeline of ContMix:

where \mathbf{Q} \in \mathbb{R}^{C\times HW} = \mathrm{Re}(\mathbf{W}_q\mathbf{X})\textbf{K} \in \mathbb{R}^{C\times S^2} =\mathrm{Re}(\mathbf{W}_{k}\mathrm{Pool}(\mathbf{X}))\mathrm{Re} denotes reshape operator

        ②Evenly divide the channels of \mathbf{Q} and \textbf{K} into G groups, obtaining \{\mathbf{Q^{g}}\}_{g=1}^{G} and \{\mathbf{K^{g}}\}_{g=1}^{G}, where \mathrm{Q}^{\mathbf{g}}\in\mathbb{R}^{\frac{C}{G}\times HW} and \mathbf{K^{g}}\in\mathbb{R}^{\frac{C}{G}\times S^{2}}. Calculating affinity matrix by:

\{\mathbf{A^{g}}\}_{g=1}^{G}=\{\mathbf{Q^{gT}}\mathbf{K^{g}}\}_{g=1}^{G}

where \mathbf{A^{g}}\in\mathbb{R}^{HW\times S^{2}}

        ③Define a linear kernel \mathbf{W}_d\in\mathbb{R}^{S^2\times K^2}, and execute:

\mathbf{D}^\mathbf{g}=\mathrm{softmax}(\mathbf{A}^\mathbf{g}\mathbf{W}_d)\in\mathbb{R}^{HW\times K^2}

2.4.3. Network Architecture

        ①Variants of OverLoCK: Extreme-Tiny (XT), Tiny (T), Small (S), and Base (B) with variables channels, blocks, kernel sizes, and groups

2.5. Experiments

2.5.1. Image Classification

        ①Dataset: ImageNet-1k

        ②Optimizer: AdamW

        ③Stochastic depth rate: 0.1, 0.15, 0.4, and 0.5 for OverLoCK-XT, -T, -S, and -B models

        ④Image classification performance:

where #F and #P denote the FLOPs and number of Params of a model, respectively. #T refers to model type,where“C”, “T”, “M”, and “H” refer to ConvNet, Transformer, Mamba, and hybrid models

2.5.2. Object Detection and Instance Segmentation

        ①Dataset: COCO 2017

        ②Frameworks: both Mask R-CNN and Cascade Mask R-CNN

        ③Backbone is pretrained on ImageNet-1K and then fine tune on COCO

        ④Performance of object detection on Mask R-CNN framework:

        ⑤Performance of object detection on Cascade Mask R-CNN framework:

2.5.3. Semantic Segmentation

        ①Dataset: ADE20K

        ②Framework: UperNet

        ③Backbone is pretrained on ImageNet-1K and then fine tune on COCO

        ④Semantic segmentation performance on ADE20K:

2.5.4. Ablation Studies

        ①Module ablation:

        ②Module comparison:

2.6. Conclusion

        ~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值