Awesome Fine-Grained Image Analysis – Papers, Codes and Datasets

Awesome Fine-Grained Image Analysis – Papers, Codes and Datasets

tricks 

Table of contents

  1. Introduction

  2. Tutorials

  3. Survey papers

  4. Benchmark datasets

  5. Fine-grained image recognition

    1. Fine-grained recognition by localization-classification subnetworks

      1. Employing detection or segmentation techniques

      2. Utilizing deep filters / activations

      3. Leveraging attention mechanisms

      4. Other methods

    2. Fine-grained recognition by end-to-end feature encoding

      1. High-order feature interactions

      2. Specific loss functions

      3. Other methods

    3. Fine-grained recognition with external information

      1. Fine-grained recognition with web data / auxiliary data

      2. Fine-grained recognition with multi-modality data

      3. Fine-grained recognition with humans in the loop

  6. Fine-grained image retrieval

    1. Content-based fine-grained image retrieval

    2. Sketch-based fine-grained image retrieval

  7. Future directions of FGIA

    1. Fine-grained few shot learning

    2. Fine-grained hashing

    3. Fine-grained domain adaptation

    4. Fine-grained image generation

    5. FGIA within more realistic settings

  8. Recognition leaderboard

Introduction

This homepage lists some representative papers/codes/datasets all about deep learning based fine-grained image analysis, including fine-grained image recognition, fine-grained image retrieval, etc. If you have any questions, please feel free to contact Prof. Xiu-Shen Wei.

Tutorials

  • Fine-Grained Image Analysis.
    Xiu-Shen Wei, and Jianxin Wu. Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2018.

Survey papers

Benchmark datasets

Summary of popular fine-grained image datasets. Note that ‘‘BBox’’ indicates whether this dataset provides object bounding box supervisions. ‘‘Part anno.’’ means providing the key part localizations. ‘‘HRCHY’’ corresponds to hierarchical labels. ‘‘ATR’’ represents the attribute labels (e.g., wing color, male, female, etc). ‘‘Texts’’ indicates whether fine-grained text descriptions of images are supplied.

Dataset name Year Meta-class sharp images sharp categories BBox Part anno. HRCHY ATR Texts
Oxford flower 2008 Flowers 8,189 102 surd
CUB200 2011Birds 11,788 200 surdsurd surd surd
Stanford Dog 2011Dogs 20,580 120 surd
Stanford Car 2013Cars 16,185 196 surd
FGVC Aircraft 2013Aircrafts 10,000 100 surd surd
Birdsnap 2014Birds 49,829 500 surd surd surd
NABirds 2015Birds 48,562 555 surd surd
DeepFashion 2016 Clothes 800,000 1,050surdsurd surd
Fru92 2017Fruits 69,614 92 surd
Veg200 2017Vegetable 91,117 200 surd
iNat2017 2017Plants & Animals 859,000 5,089 surd surd
RPC 2019Retail products 83,739 200surd surd

Fine-grained image recognition

Fine-grained recognition by localization-classification subnetworks

Employing detection or segmentation techniques

Utilizing deep filters / activations

Leveraging attention mechanisms

Other methods

Fine-grained recognition by end-to-end feature encoding

High-order feature interactions

Specific loss functions

Other methods

Fine-grained recognition with external information

Fine-grained recognition with web data

Fine-grained recognition with multi-modality data

Fine-grained recognition with humans in the loop

Fine-grained image retrieval

Content-based fine-grained image retrieval

Sketch-based fine-grained image retrieval

  • Sketch Me That Shoe.
    Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, and Chen Change Loy. CVPR, 2016.

Future directions of FGIA

Fine-grained few shot learning

Fine-grained hashing

Fine-grained domain adaptation

Fine-grained image generation

FGIA within more realistic settings

Recognition leaderboard

The section is being continually updated. Since CUB200-2011 is the most popularly used fine-grained dataset, we list the fine-grained recognition leaderboard by treating it as the test bed.

Method Published BBox? Part? External information? Base model Image resolution Accuracy
PB R-CNN ECCV 2014 surd Alex-Net 224x224 73.9%
MaxEnt NeurIPS 2018 GoogLeNet TBD 74.4%
PB R-CNN ECCV 2014 surd surd Alex-Net 224x224 76.4%
PS-CNN CVPR 2016 surd surd CaffeNet 454x454 76.6%
MaxEnt NeurIPS 2018 VGG-16 TBD 77.0%
Mask-CNN PR 2018 surd Alex-Net 448x448 78.6%
PC ECCV 2018 ResNet-50 TBD 80.2%
DeepLAC CVPR 2015surd surd Alex-Net 227x227 80.3%
MaxEnt NeurIPS 2018 ResNet-50 TBD 80.4%
Triplet-A CVPR 2016 surd Manual labour GoogLeNet TBD 80.7%
Multi-grained ICCV 2015 WordNet etc. VGG-19 224x224 81.7%
Krause et al. CVPR 2015 surd CaffeNet TBD 82.0%
Multi-grained ICCV 2015 surd WordNet etc. VGG-19 224x224 83.0%
TS CVPR 2016 VGGD+VGGM448x448 84.0%
Bilinear CNN ICCV 2015 VGGD+VGGM 448x448 84.1%
STN NeurIPS 2015 GoogLeNet+BN 448x448 84.1%
LRBP CVPR 2017 VGG-16 224x224 84.2%
PDFS CVPR 2016 VGG-16 TBD 84.5%
Xu et al. ICCV 2015 surd surd Web data CaffeNet 224x224 84.6%
Cai et al. ICCV 2017 VGG-16 448x448 85.3%
RA-CNN CVPR 2017 VGG-19 448x448 85.3%
MaxEnt NeurIPS 2018 Bilinear CNN TBD 85.3%
PC ECCV 2018 Bilinear CNN TBD 85.6%
CVL CVPR 2017 Texts VGGTBD 85.6%
Mask-CNN PR 2018 surd VGG-16 448x448 85.7%
GP-256 ECCV 2018 VGG-16 448x448 85.8%
KP CVPR 2017 VGG-16 224x224 86.2%
T-CNN IJCAI 2018 ResNet224x224 86.2%
MA-CNN ICCV 2017 VGG-19 448x448 86.5%
MaxEnt NeurIPS 2018 DenseNet-161 TBD 86.5%
DeepKSPD ECCV 2018 VGG-19 448x448 86.5%
OSME+MAMC ECCV 2018 ResNet-101 448x448 86.5%
StackDRL IJCAI 2018 VGG-19 224x224 86.6%
DFL-CNN CVPR 2018 VGG-16 448x448 86.7%
Bi-Modal PMA IEEE TIP 2020 VGG-16 448x448 86.8%
PC ECCV 2018 DenseNet-161 TBD 86.9%
KERL IJCAI 2018 Attributes VGG-16 224x224 87.0%
HBP ECCV 2018 VGG-16 448x448 87.1%
Mask-CNN PR 2018 surd ResNet-50 448x448 87.3%
DFL-CNN CVPR 2018 ResNet-50 448x448 87.4%
NTS-Net ECCV 2018 ResNet-50 448x448 87.5%
HSnet CVPR 2017 surd surd GoogLeNet+BN TBD 87.5%
Bi-Modal PMA IEEE TIP 2020 ResNet-50 448x448 87.5%
CIN AAAI 2020 ResNet-50 448x448 87.5%
MetaFGNet ECCV 2018 Auxiliary data ResNet-34TBD 87.6%
Cross-X CVPR 2020 ResNet-50 448x448 87.7%
DCL CVPR 2019 ResNet-50448x448 87.8%
ACNet CVPR 2020 VGG-16 448x448 87.8%
TASN CVPR 2019 ResNet-50 448x448 87.9%
ACNet CVPR 2020 ResNet-50 448x448 88.1%
CIN AAAI 2020 ResNet-101 448x448 88.1%
DBTNet-101 NeurIPS 2019 ResNet-101 448x448 88.1%
Bi-Modal PMA IEEE TIP 2020 TextsVGG-16 448x448 88.2%
GCL AAAI 2020 ResNet-50 448x448 88.3%
S3N CVPR 2020 ResNet-50 448x448 88.5%
Sun et al. AAAI 2020 ResNet-50 448x448 88.6%
FDL AAAI 2020 ResNet-50 448x448 88.6%
Bi-Modal PMA IEEE TIP 2020 TextsResNet-50 448x448 88.7%
DF-GMM CVPR 2020 ResNet-50 448x448 88.8%
PMG ECCV 2020 VGG-16 550x550 88.8%
FDL AAAI 2020 DenseNet-161 448x448 89.1%
PMG ECCV 2020 ResNet-50 550x550 89.6%
API-Net AAAI 2020 DenseNet-161 512x512 90.0%
Ge et al. CVPR 2019 GoogLeNet+BN Shorter side is 800 px 90.3%
  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值