图片相关度排序模型 论文简析及实现

论文标题:Learning to Compare: Relation Network forFew-Shot Learning

论文链接:https://arxiv.org/abs/1711.06025

 

源起:

     偶然在一次逛QQ群的时候,有一个人问一个有关图片相似度的排序问题,即如何在给定一个图片的情况下,给出其在候选集图片上相似度序的度量。之前接触过文本相似度的排序模型或搜索上query对候选文本的排序问题。这些例子可参见:

QA 问答对排序模型小记

Towards Better Text Understanding and Retrieval through Kernel EntitySalience Modeling、

       等一些论文。

       故图片的相似度排序问题可能就是特征提取上的变化而已。之后便浏览到本论文。基本上是符合预期的,而且网络结构比较简单,所以借这个机会熟悉一下有关计算机视觉问题的数据处理、特征提取过程及pytorch的使用方法。

 

文章大意:

       对于Few-shot learning,其是指在固定一个类别的样本量比较少的学习问题。一般对于这一类问题,从样本量和要解决的问题的区别而言,排序有可能较分类有更大的优势。分类一般要寻找的是一个相对静态的分类准则,其一般对于样本类别的变化不具有可移植性,但排序从另一个角度要寻找的是一种距离度量方式,当不同类别的特征能够被这种度量方式所概括的时候,其对于这些类别的样本就具有了移植性。故在上述条件下,Few-shot learning的情况与排序问题相契合。

    由于本文的神经网络结构较为简单,下面简要地提一下。(仅介绍few-shot情形)

    先简单提一下数据集,使用 mini-Imagenet数据集,下载链接:

https://drive.google.com/file/d/0B3Irx3uQNoBMQ1FlNXJsZUdYWEE/view

    这个数据集中图片地文件名的前部分可以作为分类的依据,之后可以通过三个csv文

件,生成训练集(train.csv)与其他集(test.csv val.csv)类别不相较的图片类别分割(这种类别上的不想交才具有排序的意义)。相应csv数据集见:

https://github.com/twitter/meta-learning-lstm

    要解决的问题就是度量每一个数据集内部图片在类别上的相似度,如给定一个图片,它到底与那一类图片最相近。

 

K-Shot图片相似度排序:

       给定一个图片数据集,它由很多类别的图片构成,其中每个类别都至少包含K个样本(K比较小)。本文的解决方式是,从数据集中的所有类别先随机选出C个类别,对于其中的每一个类别抽取K个样本,每K个样本的特征reduce成待匹配样本进行类别判定的一个“槽”——slot,用这得到的C个slot作为待匹配端的特征概括。K*C个样本表为下式:


                                                

        在从C个样本类别随机从K个外进行抽样,对抽出的样本进行相同的特征抽取(与上面的区别仅仅在于没有reduce同一类别的特征),并“插入”到不同slot上与待匹配端特征进行fuse (concat),之后将配对融合的特征放入打分层进行打分。打分后使用MSE进行回归。(配对label为1,不配对为0)

       对匹配端及待匹配端的特征提取部分,对应原文中的embedding module,在few slot场景仅仅是4层卷积的累加,fuse后的打分层为两层卷积加两层FC。这部分网络层可视化见下图:


                                                

       整体模型可见下图:

                                

下面简单提一下数据处理及pytorch实现。

 

数据处理:

       首先调用如下脚本进行滤波采样:原始链接:

https://github.com/cbfinn/maml/blob/master/data/miniImagenet/proc_images.py

下面是在win10下改了的脚本:

import csv
import glob
import os
import shutil

from PIL import Image
path_to_images = r"C:\tempCodingUsage\python\LearningToCompare_FSL-master_c\LearningToCompare_FSL-master\datas\images"

all_images = glob.glob(path_to_images + r'\*')
# Resize images

for i, image_file in enumerate(all_images):
    im = Image.open(image_file)
    im = im.resize((84, 84), resample=Image.LANCZOS)
    im.save(image_file)
    if i % 500 == 0:
        print(i)

# Put in correct directory
for datatype in ['train', 'val', 'test']:
    os.system('mkdir ' + datatype)

    with open(datatype + '.csv', 'r') as f:
        reader = csv.reader(f, delimiter=',')
        last_label = ''
        for i, row in enumerate(reader):
            if i == 0:  # skip the headers
                continue
            label = row[1]
            image_name = row[0]

            if label != last_label:
                cur_dir = datatype + '/' + label + '/'
                os.system(r"md {}\{}".format(datatype, label))
                last_label = label
            src = "images/" + image_name
            target = cur_dir
            shutil.copy2(src, target)

    上面进行了数据分割及图片的插值采样。

 

    下面的数据处理导出及网络构建或多或少用到了pytorch,作为初学者,给出一些说明。都很简单,可略过:

    pytorch 神经网络基本需要继承基类 nn.Module 之后使用基类定义好的诸多接口,如.cuda等,进行已经方便定义的操作。(区别于TensorFlow,这是明显的py风格,没有代码提示也很正常)

    多层定义nn.Sequential与keras类似。

    要实现SAME版本的conv2d在pytorch中要先使用F.pad(Fis the shortcut of torch.nn.functional) 默认不支持SAME版本。(一种相对方便的shortcut已经被包含在nn.Conv2d的padding参数中了)

    pytorch 权重初始化常用格式先定义网络结构(nn.Module的派生类),之后对于其调用apply方法,作用于初始化函数(常用变量的名或者 变量的类型(如nn.Linear)进行初始化权重的区分),对于变量的权重(data)实现初始化,如zero_ normal_等。

    pytorch模型存储形式的基本单位可以看成部分变量块,可以以nn.Module为基本单位进行模型存储,这也是在这个层级上区别于tensorflow一般以scope及变量名进行变量管理的方式,nn.Module负责对其内部的变量及操作进行了管理(每一部分可以看成一个动态子图,定义的变量及其连接方式)

    pytorch无论optimizer 还是 scheduler of learning rate 其step方法都是手动强制执行优化或梯度衰减调用的方法。

    pytorch训练是要先进行变量梯度初始化对nn.Module调用zero_grad

    pytorch unsqueeze as tf.expand_dims

    tensor.view as tf.reshape

 

数据导出过程:

import random
from collections import defaultdict
import os
from PIL import Image
from torchvision.transforms import transforms
import torch

random.seed(0)

def walk_over_dir(rootDir):
    req_dict = defaultdict(list)
    def Test(rootDir):
        for root, dirs, files in os.walk(rootDir):
            root_key = root.split("\\")[-1].strip()
            for filespath in files:
                full_file_path = os.path.join(root,filespath)
                req_dict[root_key].append(full_file_path)
        return dict(req_dict.items())
    return Test(rootDir)

def global_transform_part():
    # 常用的变换都是直接作用于像素值的
    normalize = transforms.Normalize(mean=[0.5] * 3, std = [0.1] * 3)

    # 其中对image 对象转化成Tensor的操作已经在这一步实现 (其一致的处理numpy.ndarray 及 image object)
    toTensor = transforms.ToTensor()
    return transforms.Compose([toTensor, normalize])

gp = global_transform_part()
# input img file
def transform_img_to_np_array(img_f, img_transform = gp):
    image = Image.open(img_f)
    image = image.convert("RGB")
    image = torch.tensor(img_transform(image).cuda(), requires_grad = False)
    return image

#this loader used for sample C and K-fold from total dataset
# r indicate the sample num other than k-shot
def data_loader(C = 3, K = 3, r = 3, type = "train"):
    def generate_category_img_dict(type = "train"):
        assert type in ["train", "test"]
        img_data_path_format = r"C:\tempCodingUsage\python\LearningToCompare_FSL-master_c\LearningToCompare_FSL-master\datas\{}"
        img_data_path = img_data_path_format.format(type)
        key_filelist_dict = walk_over_dir(img_data_path)
        return key_filelist_dict

    key_filelist_dict = generate_category_img_dict(type)
    label2key = dict((label, key) for label, key in enumerate(key_filelist_dict.keys()))

    while True:
        tensor_list = []
        label_list = []

        residual_tensor_list = []
        residual_label_list = []

        C_labels = random.sample(label2key.keys(), C)
        for c_label in C_labels:
            c_key = label2key[c_label]
            filelist = key_filelist_dict[c_key]
            total_filelist = random.sample(filelist, len(filelist))
            k_files = total_filelist[:K]
            r_files = total_filelist[K:K + r]

            label = c_label
            for k_file in k_files:
                img_tensor = transform_img_to_np_array(k_file)
                tensor_list.append(img_tensor)
                label_list.append(label)
            for r_file in r_files:
                img_tensor = transform_img_to_np_array(r_file)
                residual_tensor_list.append(img_tensor)
                residual_label_list.append(label)

        yield (tensor_list, label_list, residual_tensor_list, residual_label_list)
模型构建及训练:
import torch
import torch.nn as nn
import torch.nn.functional as F
from data_preprocess.data_loader import data_loader
from collections import defaultdict
from torch import optim
from functools import reduce
from sklearn.utils import shuffle
from visualize import make_dot

#torch.set_num_threads(2)
visualize_graph = True

class EmbeddingModule(nn.Module):
    def __init__(self, conv_filters = 64, kernel_size = 3,
                 conv_activation = nn.ReLU, max_pooling_size = 2, init_in_channels = 3):
        super(EmbeddingModule, self).__init__()

        self.conv1 = nn.Conv2d(init_in_channels, conv_filters, kernel_size)
        self.norm1 = nn.BatchNorm2d(conv_filters, affine=True)
        self.relu1 = conv_activation()
        self.max1 = nn.MaxPool2d(max_pooling_size)

        self.conv2 = nn.Conv2d(conv_filters, conv_filters, kernel_size)
        self.norm2 = nn.BatchNorm2d(conv_filters, affine=True)
        self.relu2 = conv_activation()
        self.max2 = nn.MaxPool2d(max_pooling_size)

        self.conv3 = nn.Conv2d(conv_filters, conv_filters, kernel_size)
        self.norm3 = nn.BatchNorm2d(conv_filters,  affine=True)
        self.relu3 = conv_activation()

        self.model = nn.Sequential(
            self.conv1,
            self.norm1,
            self.relu1,
            self.max1,
            self.conv2,
            self.norm2,
            self.relu2,
            self.max2,
            self.conv3,
            self.norm3,
            self.relu3
        )

    def forward(self, input):
        return self.model(input)

# init_in_channels dynamic decided by size func
class RelationModule(nn.Module):
    def __init__(self, init_in_channels, conv_filters = 64, kernel_size = 3,
                 conv_activation = nn.ReLU, max_pooling_size = 2, hidden_size = 8):
        super(RelationModule, self).__init__()
        self.hidden_size = hidden_size

        self.conv1 = nn.Conv2d(init_in_channels, conv_filters, kernel_size)
        self.norm1 = nn.BatchNorm2d(conv_filters, affine=True)
        self.relu1 = conv_activation()
        self.max1 = nn.MaxPool2d(max_pooling_size)

        self.conv2 = nn.Conv2d(conv_filters, conv_filters, kernel_size)
        self.norm2 = nn.BatchNorm2d(conv_filters, affine=True)
        self.relu2 = conv_activation()
        self.max2 = nn.MaxPool2d(max_pooling_size)

        self.model = nn.Sequential(
            self.conv1,
            self.norm1,
            self.relu1,
            self.max1,
            self.conv2,
            self.norm2,
            self.relu2,
            self.max2
        )

        self.fc1 = nn.Linear(in_features=256, out_features=self.hidden_size)
        self.fc2 = nn.Linear(in_features=self.hidden_size, out_features=1)

    def forward(self, input):
        out = self.model(input)
        batch_size = out.size(0)
        out_reshape = out.view([batch_size, -1])
        out = F.relu(self.fc1(out_reshape))
        out = F.sigmoid(self.fc2(out))
        return out

def reduce_k_fold_features_to_one(tensor_features, label_list):
    feature_dict = defaultdict(list)
    for idx in range(len(label_list)):
        lable = label_list[idx]
        feature_dict[lable].append(tensor_features[[idx]])
    req_dict = dict()
    for k, feature_list in feature_dict.items():
        req_dict[k] = torch.cat(feature_list, 0).sum(0)

    # dict of key as label value as shape [channels, height, width]
    return req_dict

# label_feature_dict key as label value as reduced feature tensor
def generate_dual_part(res_tensor_features, residual_label_list, label_feature_dict):
    # ele in [0, 1], 0 indicate label not identical

    nest_label_list = []
    nest_fuse_feature_list = []
    for i in range(len(residual_label_list)):
        residual_label = residual_label_list[i]
        residual_feature = res_tensor_features[[i]]
        dual_label_list = []
        dual_fuse_feature_list = []
        for k, v in label_feature_dict.items():
            label = k
            feature = v.unsqueeze(0)
            dual_label_list.append(0 if residual_label != label else 1)
            # [1, 64 * 2 , h, w]
            dual_fuse_feature_list.append(torch.cat([residual_feature, feature], 1))
        nest_label_list.append(dual_label_list)
        nest_fuse_feature_list.append(dual_fuse_feature_list)
    return (nest_label_list, list(map(lambda x: torch.cat(x, 0),nest_fuse_feature_list)))

def weight_init(var):
    if hasattr(var, "weight") and var.weight is not None:
        var.weight.data.normal_(0.0, 0.01)
    if hasattr(var, "bias") and var.bias is not None:
        var.bias.data.fill_(0.1)

def model_construct():
    global visualize_graph
    train_gen = data_loader(C = 5, K = 5, r = 5 ,type = "train")
    #train_gen = data_loader(C = 2, K = 2, r = 2 ,type = "train")

    embeddingModule_ext = EmbeddingModule().cuda()
    embeddingModule_ext.apply(weight_init)
    relationModule_ext = RelationModule(init_in_channels= 64 * 2).cuda()
    relationModule_ext.apply(weight_init)

    optimizer = optim.Adam(list(embeddingModule_ext.parameters()) + list(relationModule_ext.parameters()), lr = 0.0001)

    train_samples_every_epoch = int(1e10)
    while True:
        for i in range(train_samples_every_epoch):
            tensor_list, label_list, residual_tensor_list, residual_label_list = train_gen.__next__()

            # tensor_list: list of tensor shaped [channels, height, width]
            # tensor_cat [list_len ,channels, height, width]
            tensor_cat = torch.cat(list(map(lambda x: x.unsqueeze(0),tensor_list)), 0)

            tensor_features = embeddingModule_ext(tensor_cat)
            label_feature_dict = reduce_k_fold_features_to_one(tensor_features, label_list)

            res_tensor_cat = torch.cat(list(map(lambda x: x.unsqueeze(0),residual_tensor_list)), 0)
            res_tensor_features = embeddingModule_ext(res_tensor_cat)
            nest_batch_label_list, nest_fused_feature = generate_dual_part(res_tensor_features, residual_label_list, label_feature_dict)
            nest_batch_label_list, nest_fused_feature = shuffle(nest_batch_label_list, nest_fused_feature, random_state = 0)

            fused_feature = torch.cat(nest_fused_feature, 0)
            batch_label_list = reduce(lambda x, y: x + y, nest_batch_label_list)

            batch_score = relationModule_ext(fused_feature).squeeze()
            batch_label = torch.tensor(batch_label_list, requires_grad = False).float().cuda()

            if visualize_graph:
                visualize_graph = False
                print("make_dot :")
                g = make_dot(batch_score)
                g.view()
                print("have made view")

            loss =  nn.MSELoss(reduce = True, size_average = True).cuda()
            output = loss(batch_score, batch_label)

            embeddingModule_ext.zero_grad()
            relationModule_ext.zero_grad()

            output.backward(retain_graph=True)
            torch.nn.utils.clip_grad_norm(embeddingModule_ext.parameters(), 0.5)
            torch.nn.utils.clip_grad_norm(relationModule_ext.parameters(), 0.5)

            optimizer.step()
            if (i + 1) % 100 == 0:
                print("episode :", i + 1, "loss ", output.data[0])

if __name__ == "__main__":
    model_construct()

    pytorch代码量要少一些,这里用了一个pytorch图可视化工具,对于检查网络结构会有一些帮助,见如下链接:

    https://blog.csdn.net/GYGuo95/article/details/78821617

    下面给一个C = 2, K = 2,r = 2,情况下的网络图:

                    

    下面是前面几步训练的loss:

episode : 200 loss  tensor(0.2306, device='cuda:0')
episode : 300 loss  tensor(0.1886, device='cuda:0')
episode : 400 loss  tensor(0.1636, device='cuda:0')
episode : 500 loss  tensor(0.1606, device='cuda:0')
episode : 600 loss  tensor(0.1581, device='cuda:0')
episode : 700 loss  tensor(0.1495, device='cuda:0')
episode : 800 loss  tensor(0.1604, device='cuda:0')
episode : 900 loss  tensor(0.1559, device='cuda:0')
episode : 1000 loss  tensor(0.1589, device='cuda:0')
episode : 1100 loss  tensor(0.1513, device='cuda:0')
episode : 1200 loss  tensor(0.1646, device='cuda:0')
episode : 1300 loss  tensor(0.1452, device='cuda:0')
episode : 1400 loss  tensor(0.1536, device='cuda:0')
episode : 1500 loss  tensor(0.1560, device='cuda:0')
episode : 1600 loss  tensor(0.1540, device='cuda:0')
episode : 1700 loss  tensor(0.1632, device='cuda:0')
episode : 1800 loss  tensor(0.1379, device='cuda:0')
episode : 1900 loss  tensor(0.1557, device='cuda:0')
episode : 2000 loss  tensor(0.1599, device='cuda:0')
episode : 2100 loss  tensor(0.1445, device='cuda:0')
episode : 2200 loss  tensor(0.1565, device='cuda:0')
episode : 2300 loss  tensor(0.1415, device='cuda:0')
episode : 2400 loss  tensor(0.1502, device='cuda:0')
episode : 2500 loss  tensor(0.1376, device='cuda:0')
episode : 2600 loss  tensor(0.1400, device='cuda:0')
episode : 2700 loss  tensor(0.1528, device='cuda:0')
episode : 2800 loss  tensor(0.1455, device='cuda:0')
episode : 2900 loss  tensor(0.1308, device='cuda:0')
episode : 3000 loss  tensor(0.1503, device='cuda:0')
episode : 3100 loss  tensor(0.1514, device='cuda:0')
episode : 3200 loss  tensor(0.1605, device='cuda:0')
episode : 3300 loss  tensor(0.1576, device='cuda:0')
episode : 3400 loss  tensor(0.1380, device='cuda:0')
episode : 3500 loss  tensor(0.1728, device='cuda:0')
episode : 3600 loss  tensor(0.1437, device='cuda:0')
episode : 3700 loss  tensor(0.1310, device='cuda:0')
episode : 3800 loss  tensor(0.1528, device='cuda:0')
episode : 3900 loss  tensor(0.1316, device='cuda:0')
episode : 4000 loss  tensor(0.1582, device='cuda:0')
episode : 4100 loss  tensor(0.1356, device='cuda:0')
episode : 4200 loss  tensor(0.1532, device='cuda:0')
episode : 4300 loss  tensor(0.1415, device='cuda:0')
episode : 4400 loss  tensor(0.1296, device='cuda:0')
episode : 4500 loss  tensor(0.1521, device='cuda:0')
episode : 4600 loss  tensor(0.1168, device='cuda:0')
episode : 4700 loss  tensor(0.1612, device='cuda:0')
episode : 4800 loss  tensor(0.1445, device='cuda:0')
episode : 4900 loss  tensor(0.1515, device='cuda:0')
episode : 5000 loss  tensor(0.1411, device='cuda:0')
episode : 5100 loss  tensor(0.1235, device='cuda:0')
episode : 5200 loss  tensor(0.1425, device='cuda:0')
episode : 5300 loss  tensor(0.1096, device='cuda:0')
episode : 5400 loss  tensor(0.1258, device='cuda:0')
episode : 5500 loss  tensor(0.1218, device='cuda:0')
episode : 5600 loss  tensor(0.1316, device='cuda:0')
episode : 5700 loss  tensor(0.1213, device='cuda:0')
episode : 5800 loss  tensor(0.1312, device='cuda:0')
episode : 5900 loss  tensor(0.1410, device='cuda:0')
episode : 6000 loss  tensor(0.1251, device='cuda:0')
episode : 6100 loss  tensor(0.1204, device='cuda:0')
episode : 6200 loss  tensor(0.1312, device='cuda:0')
episode : 6300 loss  tensor(0.1312, device='cuda:0')
episode : 6400 loss  tensor(0.1430, device='cuda:0')
episode : 6500 loss  tensor(0.1357, device='cuda:0')
episode : 6600 loss  tensor(0.1497, device='cuda:0')
episode : 6700 loss  tensor(0.1255, device='cuda:0')
episode : 6800 loss  tensor(0.1265, device='cuda:0')
episode : 6900 loss  tensor(0.1214, device='cuda:0')
episode : 7000 loss  tensor(0.1355, device='cuda:0')
episode : 7100 loss  tensor(0.1671, device='cuda:0')
episode : 7200 loss  tensor(0.1252, device='cuda:0')
episode : 7300 loss  tensor(0.1276, device='cuda:0')
episode : 7400 loss  tensor(1.00000e-02 *
       9.9594, device='cuda:0')
episode : 7500 loss  tensor(0.1147, device='cuda:0')


阅读更多
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/sinat_30665603/article/details/80686826
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭