1、论文 Unifying Deep Local and Global Features for Image Search
https://arxiv.org/pdf/2001.05027.pdf
https://github.com/tensorflow/models/tree/master/research/delf.
解读
Unifying Deep Local and Global Features for Image Search(2020)(十四)
注意 截至10月底官方代码没有论文中的 restruction loss
1、issue : https://github.com/tensorflow/models/issues/9189
作者说 局部特征不加 池化 训练更好(得到 一个更低维度的池化特征)
We found that pooling on the local descriptor directly can be difficult, because it will lead to a lower dimensionality pooled feature, which may have a difficult time optimizing the classifier (consequently, the attention layers cannot be learned so well). Also, if we allow the local descriptors to be tuned directly based on this loss, it may produce abstract/high-level representations which may be good for the attention loss optimization, but not necessarily for local descriptor matching (the local descriptors may become less localizable).
Attention classifier should be able to reuse ArcFace as well; it requires an additional hyperparameter to be set though (the ArcFace margin). The goal of the attention layer is not to produce a powerful global feature, but rather to learn well the attention keypoint detection; so the ArcFace loss may not contribute much to this goal.
tensorflow实现相关的代码 截取
https://github.com/tensorflow/models/blob/4437d7b4b17c5535d516bcb4038ff9397ae9eef9/research/delf/delf/python/training/model/delf_model.py#L83
# 局部特征获取keypoints 的解析过程文件
https://github.com/tensorflow/models/issues/3387
Great to hear you were able to train it!
The step you seem to be missing is to apply some post-processing operations to the extracted features. Essentially, you need to call the ExtractKeypointDescriptor function (from the feature_extractor.py file), which will give you the boxes, features, etc (note that this function requires a model_fn argument, which you should set to the output of BuildModel). A simplified example of how to use ExtractKeypointDescriptor can be seen in the file feature_extractor_test.py.
After extracting those, you can then call DelfFeaturePostProcessing to obtain the final locations and descriptors.
Hope this helps!
loss计算 两个 没有论文中 autoencoder 的 restruction loss (MSE loss)
desc_loss = compute_loss(labels, desc_logits) #全局损失
# Calculate attention loss by applying the attention block classifier.
attn_logits = model.attn_classification(attn_prelogits)
attn_loss = compute_loss(labels, attn_logits) # attention loss
# Cumulate global loss and attention loss.
total_loss = desc_loss + FLAGS.attention_loss_weight * attn_loss
注意力结构
第二个池化 过滤器是1,不是512
feat = 输入特征 l2 * 注意力权重
class AttentionModel(tf.keras.Model):
"""Instantiates attention model.
Uses two [kernel_size x kernel_size] convolutions and softplus as activation
to compute an attention map with the same resolution as the featuremap.
Features l2-normalized and aggregated using attention probabilites as weights.
"""
def __init__(self, kernel_size=1, decay=_DECAY, name='attention'):
"""Initialization of attention model.
Args:
kernel_size: int, kernel size of convolutions.
decay: float, decay for l2 regularization of kernel weights.
name: str, name to identify model.
"""
super(AttentionModel, self).__init__(name=name)
# First convolutional layer (called with relu activation).
self.conv1 = layers.Conv2D(
512,
kernel_size,
kernel_regularizer=reg.l2(decay),
padding='same',
name='attn_conv1')
self.bn_conv1 = layers.BatchNormalization(axis=3, name='bn_conv1')
# Second convolutional layer, with softplus activation.
self.conv2 = layers.Conv2D(
1,
kernel_size,
kernel_regularizer=reg.l2(decay),
padding='same',
name='attn_conv2')
self.activation_layer = layers.Activation('softplus')
def call(self, inputs, training=True):
x = self.conv1(inputs)
x = self.bn_conv1(x, training=training)
x = tf.nn.relu(x)
score = self.conv2(x)
prob = self.activation_layer(score)
# L2-normalize the featuremap before pooling.
inputs = tf.nn.l2_normalize(inputs, axis=-1)
feat = tf.reduce_mean(tf.multiply(inputs, prob), [1, 2], keepdims=False)
# delg 和 2018 delf 相比增加了一部分 feat
# feat 接池化层 +attention loss
# prob 和layer3 输出 restruction loss
return feat, prob, score
def global_and_local_forward_pass(self, images, training=True):
"""Run a forward to calculate global descriptor and attention prelogits.
Args:
images: Tensor containing the dataset on which to run the forward pass.
training: Indicator of wether the forward pass is running in training mode
or not.
Returns:
Global descriptor prelogits, attention prelogits, attention scores,
backbone weights.
"""
backbone_blocks = {}
desc_prelogits = self.backbone.build_call(
images, intermediates_dict=backbone_blocks, training=training)
# Prevent gradients from propagating into the backbone. See DELG paper:
# https://arxiv.org/abs/2001.05027.
block3 = backbone_blocks['block3'] # pytype: disable=key-error
block3 = tf.stop_gradient(block3)
attn_prelogits, attn_scores, _ = self.attention(block3, training=training)
return desc_prelogits, attn_prelogits, attn_scores, backbone_blocks
def build_call(self, input_image, training=True):
(global_feature, _, attn_scores,
backbone_blocks) = self.global_and_local_forward_pass(input_image,
training)
features = backbone_blocks['block3'] # pytype: disable=key-error
return global_feature, attn_scores, features
def call(self, input_image, training=True):
_, probs, features = self.build_call(input_image, training=training)
return probs, features