[TGRS 2023]SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imager

论文网址:SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery | IEEE Journals & Magazine | IEEE Xplore

论文代码:https://github.com/icey-zhang/SuperYOLO

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. Object Detection With Multimodal Data

2.3.2. Super Resolution in Object Detection

2.4. Baseline Architecture

2.5. SuperYOLO Architecture

2.5.1. Focus Removal

2.5.2. Multimodal Fusion

2.5.3. Super Resolution

2.5.4. Loss Function

2.6. Experimental Results

2.6.1. Dataset

2.6.2. Implementation Details

2.6.3. Accuracy Metrics

2.6.4. Ablation Study

2.6.5. Comparisons With Previous Methods

2.6.6. Generalization to Single Modal Remote Sensing Images

2.7. Conclusion and Future Work

3. Reference


1. 心得

(1)自从上次TPAMI把脑子看爆之后又回到了轻松愉悦的频道,人还是不能太勉强自己。我真的是生理上的脑过载红温了

(2)依然很多title,我也好想要个title,Sherlily, Member, Joker

(3)每次看OD的论文实验都是从消融开始,hhh

2. 论文逐段精读

2.1. Abstract

        ①Challenges in smalll object detection: accuracy and timeliness

        ②Existing problem: heavy computing costs

2.2. Introduction

        ①Detection task under different modalities:

2.3. Related Work

2.3.1. Object Detection With Multimodal Data

        ①Lists possible modalities: RGB, synthetic aperture radar (SAR), Light Detection and Ranging (LiDAR), IR, panchromatic (PAN), and multispectral (MS)

        ②Fusion strategy: they choose pixel-level fusion methods of pixel-level fusion, feature-level fusion, and decision-level fusion methods to reduce computational cost

aperture  n.光圈;(尤指摄影机等的光圈)孔径;小孔;缝隙

2.3.2. Super Resolution in Object Detection

        ①Lists other data augmentation methods and points out their assisted SR module

2.4. Baseline Architecture

        ①Backbone of YOLOv5 aims to extract low-level texture and high-level semantic features

        ②Overall framework of SuperYOLO:

where removed Focus operation is for reducing computational cost

        ③Backbone of YOLOv5:

where deep convs cause a sharp reduction in feature map size and loss of small object information

2.5. SuperYOLO Architecture

2.5.1. Focus Removal

        ①Multimodal fusion (MF) module:

2.5.2. Multimodal Fusion

        ①Both X_{\mathrm{RGB}},X_{\mathrm{IR}}\in\mathbb{R}^{C\times H\times W} are downsampled to I_{\mathrm{RGB}},I_{\mathrm{IR}}\in\mathbb{R}^{C\times(H/n)\times(W/n)} by SE and further combine to I\in\mathbb{R}^{C\times(H/n)\times(W/n)} by the whole MF:

I=D(X)

        ②我再次忍不住思考,CNN中的公式到底有什么意义,还不如看图来得直观。因此我打算忽略MF的公式(可能是为了严谨吧,不过y=conv(x)无论怎么看都是少儿读物)

2.5.3. Super Resolution

        ①SR module:

        ②Backbone feature of YOLOv5s, YOLOv5x, and SuperYOLO:

where the 3 feature maps are, features in the first layer, low-level feature and high level feature

2.5.4. Loss Function

        ①Total loss:

L_\mathrm{total}=c_1L_o+c_2L_s

where L_o denotes detection loss and L_s denotes SR construction loss, c_1 and c_2 are weights

        ②SR construction loss: L1 loss:

L_s=\left\|S-X\right\|_1

        ③Detection loss:

L_o=\lambda_\mathrm{loc}\sum_{l=0}^2a_lL_\mathrm{loc}+\lambda_\mathrm{obj}\sum_{l=0}^2b_lL_\mathrm{obj}+\lambda_\mathrm{cls}\sum_{l=0}^2c_lL_\mathrm{cls}

2.6. Experimental Results

2.6.1. Dataset

        ①Dataset: Vehicle Detection in Aerial Imagery (VEDAI)

        ②Pixels in each image: 1024*1024 or 512*512

        ③Original image from Utah Automated Geographic Reference Center (AGRC): 16000*16000 pixels with 12.5cm * 12.5cm per pixel

        ④Modality: RGB and IR

        ⑤Sample: 1246

        ⑥Class: 11 cars

2.6.2. Implementation Details

        ①Cross validation: 10 fold for comparison and the 1st one for ablation

        ②Data split: 1089 for train, 121 for test

        ③Categories: N=8, except for classes which instance number is less than 50

        ④Optimizer: stochastic gradient descent (SGD)

        ⑤Momentum: 0.937

        ⑥Weight decay: 0.0005

        ⑦Batch size: 2

        ⑧Learning rate: 0.01

        ⑨Epoch: 300

2.6.3. Accuracy Metrics

        ①Lists used metrics

2.6.4. Ablation Study

        ①Choose of baseline:

        ②Focus:

        ③Fusuin methods ablation:

where fusion1, fusion2, fusion3, and fusion4 represent the concatenation fusion operation performed in the first, second, third, and fourth blocks

        ④Different fusion methods:

where (a) and (b) Feature-level fusion. (c) Multistage feature-level fusion

        ⑤Resolution ablation:

        ⑥Module ablation:

        ⑦SR ablation:

2.6.5. Comparisons With Previous Methods

        ①Vis of results:

where red cycles represent the false alarms, the yellow ones denote the FP detection results, and the blue ones are FN detection results

        ②Comparison table:

2.6.6. Generalization to Single Modal Remote Sensing Images

        ①Generalize to DOTA, NWPU VHR-10 and DIOR:

2.7. Conclusion and Future Work

        ~

3. Reference

Zhang, J. et al. (2023) SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery, IEEE Transactions on Geoscience and Remote Sensing, 61. doi:  10.1109/TGRS.2023.3258666

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值