目录
landmark recognition and image retrieval 地标识别和检索
Google Landmark Recognition 2019 地标识别
Google Landmark Retrieval 2019 地标检索
有用特征:Deep Local Features (DELF)
相关信息:
CVPR2018
http://cvpr2018.thecvf.com/program/workshops
Landmarkworkshop
https://landmarksworkshop.github.io/CVPRW2018/
2018竞赛
Google Landmark Retrieval Challenge 2018
Given an image, can you find all of the same landmarks in a dataset?
https://www.kaggle.com/c/landmark-retrieval-challenge/overview
Google Landmark Recognition Challenge
Label famous (and not-so-famous) landmarks in images
https://www.kaggle.com/c/landmark-recognition-challenge
数据集:
Google-Landmarks Dataset
https://www.kaggle.com/google/google-landmarks-dataset#train.csv
Google Landmark Boxes dataset
只给了图像的链接,可以通过python脚本下载
数据集内容:
Test image于两个任务:对于识别任务,可以为每个测试图像预测地标标签; 对于检索任务,可以为每个测试图像检索相关的索引图像
Training image 训练图像与地标标签相关联,并且可用于训练模型,进行识别和检索任务
Index image 用于检索任务
备注:可以使用来自识别任务的训练数据来训练可能对检索任务有用的模型。但请注意,两个任务的训练/索引集之间没有共同的界标。 例如弱监督学习
用train(有label)进行训练
检索:从index中找似图像
识别:给出在train的label中定义的类别的标签
具体内容展示:
For检索
For识别 和 检索
For识别 和 检索 可以为检索训练模型
training数据中的 论文中提到的training集的boxes
(M. Teichmann*, A. Araujo*, M. Zhu and J. Sim, “Detect-to-Retrieve: Efficient Regional Aggregation for Image Search”, Proc. CVPR'19)
第一列 图像id
第二列 boxes的归一化距离 [top, left, bottom, right] normalized coordinates (from 0 to 1)
Traing数据集中 论文提到的 validation验证图像
(M. Teichmann*, A. Araujo*, M. Zhu and J. Sim, “Detect-to-Retrieve: Efficient Regional Aggregation for Image Search”, Proc. CVPR'19)
第一列 图像id
第二列 boxes的归一化距离 [top, left, bottom, right] normalized coordinates (from 0 to 1)
2019的两个竞赛
Google Landmark Recognition 2019 地标识别
https://www.kaggle.com/c/landmark-recognition-2019
Google Landmark Retrieval 2019 地标检索
https://www.kaggle.com/c/landmark-retrieval-2019/overview
具有相同的test数据集
不同于目标识别 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 1k类
有200k类别地标
评价指标:
Global Average Precision (GAP) / micro Average Precision (microAP)
Lable和score
Submissions are evaluated using Global Average Precision (GAP) at kk, where k=1k=1. This metric is also known as micro Average Precision (microAP), as per [1]. It works as follows:
For each query image, you will predict one landmark label and a corresponding confidence score. The evaluation treats each prediction as an individual data point in a long list of predictions (sorted in descending order by confidence scores), and computes the Average Precision based on this list.
If a submission has NN predictions (label/confidence pairs) sorted in descending order by their confidence scores, then the Global Average Precision is computed as:
where:
- N is the total number of predictions returned by the system, across all queries
- M is the total number of queries with at least one landmark from the training set visible in it (note that some queries may not depict landmarks)
- P(i) is the precision at rank ii
- rel(i) denotes the relevance of prediciton ii: it’s 1 if the i-th prediction is correct, and 0 otherwise
[1] F. Perronnin, Y. Liu, and J.-M. Renders, "A Family of Contextual Measures of Similarity between Distributions with Application to Image Retrieval," Proc. CVPR'09
最高水平:0.37606
提交文件格式:
所查询图像id 地标编号 置信分数
数据:
Train.csv 每个图一个地标对应
Test.csv 可能一个图对应多个地标or不对应
比赛两个阶段:
Phase 1: Python脚本下载
Phase 2: 数据下载via CVDF https://github.com/cvdfoundation/google-landmark
Google Landmark Retrieval 2019 地标检索
Image retrieval 图像检索 给定查询图像,在数据集中找到相似的图像
比赛两个阶段:
Phase 1: 测试数据 700k图像
Phase 2: 索引数据 100k独立地标
Test.csv查询图像
Insex.csv索引图像 有独特id 通过url下载
2019数据集
https://github.com/cvdfoundation/google-landmark
wget是Linux最常用的下载命令, 一般的使用方法是: wget + 空格 + 要下载文件的url路径
文件夹存储方式:
TARs extracted into an index directory; train TARs extracted into a train directory
Each image is stored in a directory ${a}/${b}/${c}/${id}.jpg, where ${a}, ${b} and ${c} are the first three letters of the image id, and ${id} is the image id found in train.csv. For example, an image with the id 0123456789abcdef would be stored in 0/1/2/0123456789abcdef.jpg.
数据集大小:
train set.
Ls一千个文件大概一分钟
413w 4,132,914 images
500TAR
Id,url,landmark_id
train_attribution.csv
查询图像列在test.csv,而您要检索的“索引”图像列在index.csv
Index set
76w 761,757 images
100TAR
Test set
117,577 images
20TAR
评价指标:
mean Average Precision
- Q is the number of query images that depict landmarks from the index set
- mq is the number of index images containing a landmark in common with the query image q (note that this is only for queries which depict landmarks from the index set, so mq≠0)
- nq is the number of predictions made by the system for query q
- Pq(k) is the precision at rank k for the q-th query
- relq(k) denotes the relevance of prediciton k for the q-th query: it’s 1 if the k-th prediction is correct, and 0 otherwise
最高水平:0.37229
有用特征:Deep Local Features (DELF)
https://github.com/tensorflow/models/tree/master/research/delf
2019地标检索第一名地标识别第三名方案介绍
第一名解决方案Landmark2019-1st-and-3rd-Place-Solution
相关链接:
Kaggle:https://www.kaggle.com/c/landmark-retrieval-2019/discussion/94735#latest-551277
(code) https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution
(paper) https://arxiv.org/abs/1906.04087
(poster) https://www.dropbox.com/s/y3c3ovdiizz59j4/cvpr19_smlyaka_slides.pdf?dl=0
流程
1.数据清理 DELF+SV确认的标签明确的图
2. FishNet-150, ResNet-101, and SEResNeXt-101 as backbones trained with cosine-based softmax losses, ArcFace and CosFace
3. accumulating top-k (k=3) similarity in descriptor space and inliers-count by spatial verification
后处理
干扰地标定义为出现次数>30的,令其置信度*-1 作为非地标分类 eg:花 飞机 汽车 人物
论文解读:
Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset
1.情况分析
基于余弦softmax 的loss,常用于人脸识别
2.自动数据清洗
spatial verification 利用RANSAC+仿射变换和DELF(局部细节特征) inlier-count参数=2
每幅图进行表示 从v1数据集中训练得到
step1:KNN找每幅图xi的K(1000)个近邻
Step2:对具有相同标签的数据中的100个近邻进行spatial verification空间验证-》降低计算成本
Step3:数量>阈值(2)就添加到清洗后数据集中
3.模型表示
Backbone:
FishNet-150 [20], ResNet-101 [8] and SE-ResNeXt-101 [9]
先在ImageNet和v1数据集上进行训练 再在清洗后数据集上训练
+
Loss:
cosine-softmax based losses 用于人脸识别 其中,ArcFace [4] and CosFace [21] with a margin of 0.3
metric learning
pooling:
generalized mean-pooling (GeM) [19] p=3.0 在训练中固定
+
FC: 减轻过拟合、计算压力
512
+
one-dimensional Batch Normalization 增强泛化能力
框架:pytorch
训练设置:
stochastic gradient descent with momentum 随机梯度下降+动量
initial learning rate, 0.001, 学习率=用余弦退火cosine anneal减小
momentum, 0.9,
weight decay, 0.00001,
batch size 32
训练2个阶段
Phase1:5 epoch 软数据增强 随机裁剪crop和scaling
Phase2:7 epoch 硬数据增强 亮度 sheer坐标轴倾斜 裁剪 scaling
从指定尺寸选取图像新尺寸[352, 384, 448, 512]
最后一个epoch中,bn层固定 选择尺寸[544, 608, 704, 800] 可以方便保留空间信息[l1]
模型融合:
6个模型 512*6=3072
做推断时,multi-scale representation,[0.75, 1.0, 1.25],平均描述子
4.检索Retrieval Track
传统方法使用来l2范数和欧拉空间搜索找到相似地标
对于具有室内外场景的地标来说此方法不能用
新提出reranking方法
识别,得到test和index中每个样本的ladmark-id
查询query一个图,index和test标签一样记为正样本,不一样为负样本
score大的正样本放左边,负样本放右边,每次添加一个新的样本,
(类似Discriminative Query Expansion判别式查询扩展,但是不需要训练判别模型)
5. 识别Recognition Track
3step:欧拉搜索、软投票、后处理
Step1:暴力欧拉搜索knn个近邻
Step2:根据 3个近邻和query图像之间的余弦相似度 进行软投票
余弦相似度之和=Score
RANSAC + inlier-based局内点 方法 用于减少FP误报,使得相似度得分更加稳健
RANSAC 介绍 https://blog.csdn.net/robinhjwy/article/details/79174914
t —— 用于决定数据是否适应于模型的阀值 70
Step3:通过启发式方法抑制干扰物的影响
在GAP中,非地标被预测为高置信分数的东西(eg花、飞机) 则需要进行抑制 score*-1