landmark recognition and image retrieval地标识别和检索介绍分析

最新推荐文章于 2023-09-01 12:46:30 发布

sunflower_sara

最新推荐文章于 2023-09-01 12:46:30 发布

阅读量3k

点赞数 1

分类专栏：深度学习地标检索和识别

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/sunflower_sara/article/details/99361693

版权

深度学习同时被 2 个专栏收录

20 篇文章

订阅专栏

地标检索和识别

2 篇文章

订阅专栏

目录

landmark recognition and image retrieval 地标识别和检索

相关信息：

具体内容展示：

2019的两个竞赛

Google Landmark Recognition 2019 地标识别

评价指标：

提交文件格式：

Google Landmark Retrieval 2019 地标检索

评价指标：

有用特征:Deep Local Features (DELF)

2019地标检索第一名地标识别第三名方案介绍

相关链接：

论文解读：

相关信息：

CVPR2018

http://cvpr2018.thecvf.com/program/workshops

Landmarkworkshop

https://landmarksworkshop.github.io/CVPRW2018/

2018竞赛

Google Landmark Retrieval Challenge 2018

Given an image, can you find all of the same landmarks in a dataset?

https://www.kaggle.com/c/landmark-retrieval-challenge/overview

Google Landmark Recognition Challenge

Label famous (and not-so-famous) landmarks in images

https://www.kaggle.com/c/landmark-recognition-challenge

数据集：

Google-Landmarks Dataset

https://www.kaggle.com/google/google-landmarks-dataset#train.csv

Google Landmark Boxes dataset

只给了图像的链接，可以通过python脚本下载

数据集内容：

Test image于两个任务：对于识别任务，可以为每个测试图像预测地标标签; 对于检索任务，可以为每个测试图像检索相关的索引图像

Training image 训练图像与地标标签相关联，并且可用于训练模型，进行识别和检索任务

Index image 用于检索任务

备注：可以使用来自识别任务的训练数据来训练可能对检索任务有用的模型。但请注意，两个任务的训练/索引集之间没有共同的界标。例如弱监督学习

用train(有label)进行训练

检索：从index中找似图像

识别：给出在train的label中定义的类别的标签

具体内容展示：

For检索

For识别和检索

For识别和检索可以为检索训练模型

training数据中的论文中提到的training集的boxes

（M. Teichmann*, A. Araujo*, M. Zhu and J. Sim, “Detect-to-Retrieve: Efficient Regional Aggregation for Image Search”, Proc. CVPR'19）

第一列图像id

第二列 boxes的归一化距离 [top, left, bottom, right] normalized coordinates (from 0 to 1)

Traing数据集中论文提到的 validation验证图像

（M. Teichmann*, A. Araujo*, M. Zhu and J. Sim, “Detect-to-Retrieve: Efficient Regional Aggregation for Image Search”, Proc. CVPR'19）

第一列图像id

第二列 boxes的归一化距离 [top, left, bottom, right] normalized coordinates (from 0 to 1)

2019的两个竞赛

Google Landmark Recognition 2019 地标识别

https://www.kaggle.com/c/landmark-recognition-2019

Google Landmark Retrieval 2019 地标检索

https://www.kaggle.com/c/landmark-retrieval-2019/overview

具有相同的test数据集

不同于目标识别 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 1k类

有200k类别地标

评价指标：

Global Average Precision (GAP) / micro Average Precision (microAP)

Lable和score

Submissions are evaluated using Global Average Precision (GAP) at kk, where k=1k=1. This metric is also known as micro Average Precision (microAP), as per [1]. It works as follows:

For each query image, you will predict one landmark label and a corresponding confidence score. The evaluation treats each prediction as an individual data point in a long list of predictions (sorted in descending order by confidence scores), and computes the Average Precision based on this list.

If a submission has NN predictions (label/confidence pairs) sorted in descending order by their confidence scores, then the Global Average Precision is computed as:

where:

N is the total number of predictions returned by the system, across all queries
M is the total number of queries with at least one landmark from the training set visible in it (note that some queries may not depict landmarks)
P(i) is the precision at rank ii
rel(i) denotes the relevance of prediciton ii: it’s 1 if the i-th prediction is correct, and 0 otherwise

[1] F. Perronnin, Y. Liu, and J.-M. Renders, "A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval," Proc. CVPR'09

最高水平：0.37606

提交文件格式：

所查询图像id 地标编号置信分数

数据：

Train.csv 每个图一个地标对应

Test.csv 可能一个图对应多个地标or不对应

比赛两个阶段：

Phase 1: Python脚本下载

Phase 2: 数据下载via CVDF https://github.com/cvdfoundation/google-landmark

Google Landmark Retrieval 2019 地标检索

Image retrieval 图像检索给定查询图像，在数据集中找到相似的图像

比赛两个阶段：

Phase 1: 测试数据 700k图像

Phase 2: 索引数据 100k独立地标

Test.csv查询图像

Insex.csv索引图像有独特id 通过url下载

2019数据集

https://github.com/cvdfoundation/google-landmark

wget是Linux最常用的下载命令, 一般的使用方法是: wget + 空格 + 要下载文件的url路径

文件夹存储方式：

TARs extracted into an index directory; train TARs extracted into a train directory

Each image is stored in a directory ${a}/${b}/${c}/${id}.jpg, where ${a}, ${b} and ${c} are the first three letters of the image id, and ${id} is the image id found in train.csv. For example, an image with the id 0123456789abcdef would be stored in 0/1/2/0123456789abcdef.jpg.

数据集大小：

train set.

Ls一千个文件大概一分钟

413w 4,132,914 images

500TAR

Id，url，landmark_id

train_attribution.csv

查询图像列在test.csv，而您要检索的“索引”图像列在index.csv

Index set

76w 761,757 images

100TAR

Test set

117,577 images

20TAR

评价指标：

mean Average Precision

Q is the number of query images that depict landmarks from the index set
mq is the number of index images containing a landmark in common with the query image q (note that this is only for queries which depict landmarks from the index set, so mq≠0)
nq is the number of predictions made by the system for query q
Pq(k) is the precision at rank k for the q-th query
relq(k) denotes the relevance of prediciton k for the q-th query: it’s 1 if the k-th prediction is correct, and 0 otherwise

最高水平：0.37229

有用特征:Deep Local Features (DELF)

https://github.com/tensorflow/models/tree/master/research/delf

2019地标检索第一名地标识别第三名方案介绍

第一名解决方案Landmark2019-1st-and-3rd-Place-Solution

相关链接：

Kaggle：https://www.kaggle.com/c/landmark-retrieval-2019/discussion/94735#latest-551277

(code) https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution

(paper) https://arxiv.org/abs/1906.04087

(poster) https://www.dropbox.com/s/y3c3ovdiizz59j4/cvpr19_smlyaka_slides.pdf?dl=0

流程

1.数据清理 DELF+SV确认的标签明确的图

2. FishNet-150, ResNet-101, and SEResNeXt-101 as backbones trained with cosine-based softmax losses, ArcFace and CosFace

3. accumulating top-k (k=3) similarity in descriptor space and inliers-count by spatial verification

后处理

干扰地标定义为出现次数>30的，令其置信度*-1 作为非地标分类 eg：花飞机汽车人物

论文解读：

Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset

1.情况分析

基于余弦softmax 的loss，常用于人脸识别

2.自动数据清洗

spatial verification 利用RANSAC+仿射变换和DELF(局部细节特征) inlier-count参数=2

每幅图进行表示从v1数据集中训练得到

step1：KNN找每幅图xi的K（1000）个近邻

Step2：对具有相同标签的数据中的100个近邻进行spatial verification空间验证-》降低计算成本

Step3：数量>阈值（2）就添加到清洗后数据集中

3.模型表示

Backbone：

FishNet-150 [20], ResNet-101 [8] and SE-ResNeXt-101 [9]

先在ImageNet和v1数据集上进行训练再在清洗后数据集上训练

+

Loss：

cosine-softmax based losses 用于人脸识别其中，ArcFace [4] and CosFace [21] with a margin of 0.3

metric learning

pooling：

generalized mean-pooling (GeM) [19] p=3.0 在训练中固定

+

FC：减轻过拟合、计算压力

512

+

one-dimensional Batch Normalization 增强泛化能力

框架：pytorch

训练设置：

stochastic gradient descent with momentum 随机梯度下降+动量

initial learning rate, 0.001, 学习率=用余弦退火cosine anneal减小

momentum, 0.9,

weight decay, 0.00001,

batch size 32

训练2个阶段

Phase1：5 epoch 软数据增强随机裁剪crop和scaling

Phase2：7 epoch 硬数据增强亮度 sheer坐标轴倾斜裁剪 scaling

从指定尺寸选取图像新尺寸[352, 384, 448, 512]

最后一个epoch中，bn层固定选择尺寸[544, 608, 704, 800] 可以方便保留空间信息[l1]

模型融合：

6个模型 512*6=3072

做推断时，multi-scale representation，[0.75, 1.0, 1.25]，平均描述子

4.检索Retrieval Track

传统方法使用来l2范数和欧拉空间搜索找到相似地标

对于具有室内外场景的地标来说此方法不能用

新提出reranking方法

识别，得到test和index中每个样本的ladmark-id

查询query一个图，index和test标签一样记为正样本，不一样为负样本

score大的正样本放左边，负样本放右边，每次添加一个新的样本，

（类似Discriminative Query Expansion判别式查询扩展，但是不需要训练判别模型）

5. 识别Recognition Track

3step：欧拉搜索、软投票、后处理

Step1：暴力欧拉搜索knn个近邻

Step2：根据 3个近邻和query图像之间的余弦相似度进行软投票

余弦相似度之和=Score

RANSAC + inlier-based局内点方法用于减少FP误报,使得相似度得分更加稳健

RANSAC 介绍 https://blog.csdn.net/robinhjwy/article/details/79174914

t —— 用于决定数据是否适应于模型的阀值 70

Step3：通过启发式方法抑制干扰物的影响

在GAP中，非地标被预测为高置信分数的东西（eg花、飞机）则需要进行抑制 score*-1

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。