运行《Learning Deep Representations of Fine-Grained Visual Descriptions》的代码

最新推荐文章于 2021-04-18 21:52:37 发布

HackerTom

最新推荐文章于 2021-04-18 21:52:37 发布

阅读量1.5k

点赞数 5

分类专栏：机器学习文章标签： CV lua torch

本文链接：https://blog.csdn.net/HackerTom/article/details/86435478

版权

机器学习专栏收录该内容

120 篇文章 16 订阅

订阅专栏

Notes

最终想利要用文中 CNN-RNN 架构的提取文本特征，~~又不想复现，~~ 踩坑 lua。

Links

论文：Learning Deep Representations of Fine-Grained Visual Descriptions
代码：reedscot/cvpr2016
CUB 数据集：Caltech-UCSD Birds dataset
Flowers 数据集：Oxford Flowers102
cuDNN 7.5 下载链接
 libmatio.so.2 64bit 下载页

Reference

《Learning Deep Representations of Fine-Grained Visual Descriptions》论文及代码阅读笔记
 Lua 教程
 centos7下安装Torch7
报错：‘libcudnn not found in library path’
soumith/matio-ffi.torch

Environment Preparation

lua
系统是 CentOS 7，本身就有lua 5.1.4环境（还是之前有人装过？不知道）；
- lua 的下标、维度都是 从 1 开始 的
- #是「求长度」运算，而不是注释
- lua 注释是-- YOUR_COMMENT（单行）和-[[ YOUR_COMMENT ]]（块）
torch
- 若之前装过 torch，想重新安装，执行以下命令清掉旧版：
  curl -s https://raw.github.com/torch/ezinstall/master/clean-old.sh | bash
- 若之前没装过 torch，或已清掉旧版后，执行命令安装：
  curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-all | bash
matio
- 下载libmatio.so.2（上面链接），然后rpm -i mat...安装，安装完之后在/usr/lib64下会有libmatio.so.2和libmatio.so.2.1.0
- 仿照配置libcudnn时的做法，在一个处于LD_LIBRARY_PATH下的目录中创建一个软链接指向libmatio.so.2
- 命令行运行：luarocks install matio

Run

我用 CUB 数据库运行，先行解压数据。
代码目录下的scripts有 6 个运行脚本，对应两个数据库的训练、分类、检索的 shell 命令。

train

将train_cub_hybrid.sh复制到代码目录的根目录下，改掉其中-data_dir的参数值，使其指向我解压出来的CUB/目录，然后运行，报错缺少nngraph。

安装nngraph：转到/usr/local/bin，执行sudo ./luarocks install nngraph；

运行完在程序目录下多一个cv/文件夹，里面一个叫lm_sje_cub_c10_hybrid_0.00070_1_10_trainvalids.txt.t7的文件是训练好的模型存档。

retrieval

将eval_cub_ret.sh复制到代码目录的根目录下，同样改-data_dir，运行。
报错：没有libcudnn，需要补个libcudnn.so.5：

去 nvidia 官网下载了cuDNN v5 (May 12, 2016), for CUDA 7.5的cuDNN v5 Library for Linux（上面的下载链），解压，文件夹重命名成cuda7.5，复制去/usr/local下；
去/etc/ld.so.conf.d/下新建一个文件cuda7.5.conf（据说名字随意），内容：
```
# cuda7.5.conf
/usr/local/cuda7.5/lib64
```
即把刚的解压文件夹下的lib64的路径加入去，保存后执行：sudo ldconfig；

程序目录下多个results/文件夹，里面的文件是运行结果，mAP@50: 0.4540，跟论文里的45.6%差不多。

classify

将eval_cub_cls.sh复制到代码目录的根目录下，同样改-data_dir，运行。
在results/里多了个分类的运行结果文件，Average top-1 val/test accuracy: 0.5387，跟论文里的54.0%差不多。

Extract the Trained Model

在cv/里写 lua 脚本读取模型，为了方便，将上述模型存档改名成saved_model.t7。

-- play.lua
require 'nn'
require 'cudnn'
require 'cunn'
require 'nngraph'
require 'torch'

--[[ Write-File 函数 ]]
function wf(what, name)
    local f = io.open(name, 'w')
    io.output(f) -- 输出重定向
    print(what)
    io.close(f)
end

--[[ 加载 model ]]
local model = torch.load('saved_model.t7')

--[[ 整个模型存档 ]]
wf(model, 'model.txt')

--[[ 模型中的网络 ]]
local protos = model["protos"]
local protos_f = io.open('protos.txt', 'w')
io.output(protos_f)
for k, v in pairs(protos) do
    print(k)
end
io.close(protos_f)

--[[ enc_image 网络的参数 ]]
wf(protos.enc_image:parameters(), 'img_param.txt')

--[[ enc_doc 网络的参数 ]]
wf(protos.enc_doc:parameters(), 'txt_param.txt')
--[[ enc_doc 网络第一层的参数 ]]
wf(protos.enc_doc:parameters()[1], 'txt_param_1.txt')

--[[ redirect to stdout ]]
io.output(io.stdout)
print('hello world!')

运行这段脚本，在cv/下生成对应文件，目前好像并没有什么用。

Extract Feature

在文件retrieval_sje_tcnn.lua中，有extract_img、extract_txt等几个函数，用以提取 image 和 text 的 feature。
此处套用这文件的一些代码，提取 CUB 数据库的文本和图像特征，并传成.mat格式。

乱入：CUB 数据库有 2010 版和 2011 版的，这代码用的是 2011版的，共 11788 张图，2010 版只有 6000+ 张

作者提供的数据库里每张图对应有 10 句文字描述，论文和代码注释都有提及，text feature 按类取了平均，即提取出来的特征每个类只有一个特征向量，形状(1,1024)，对应 image feature 的维度。
然而，我想要它不按类取平均，只将每张图的 10 句话取平均就好，于是小改了其extract_txt_char函数（extract_txt_word没变，因为暂时没用到）。

get.lua

在代码目录下创建get.lua文件，如下。
th get.lua运行前要先创建文件夹，即保存.mat文件的那几个（如下面的代码要先：mkdir cub_text_c10和mkdir cub_image），不然保存不了。

路径中CNN_RNN是代码根目录
lua 的下标、维度都是 从 1 开始 的
#是「求长度」运算，而不是注释（这好像还不支持 lua 高亮？用的 python 高亮…）

-- get.lua
require('nn')
require('nngraph')
require('cutorch')
require('cunn')
require('cudnn')
require('lfs')
local matio = require('matio')
local model_util = require('util.model_utils')
--------------------------------------------------------------------------------
cmd = torch.CmdLine()

cmd:option('-data_dir','../CUB/','data directory.')
cmd:option('-savefile','sje_tcnn','filename to autosave the checkpont to. Will be inside checkpoint_dir/')
cmd:option('-checkpoint_dir', 'cv', 'output directory where checkpoints get written')
cmd:option('-symmetric',1,'symmetric sje')
cmd:option('-learning_rate',0.0001,'learning rate')
cmd:option('-testclasses', 'testclasses.txt', 'validation or test classes to be used in evaluation')
cmd:option('-ids_file', 'trainvalids.txt', 'file specifying which class labels were used for training.')
cmd:option('-model','cv/saved_model.2019.1.12.t7','model to load. If blank then above options will be used.')
cmd:option('-txt_limit',0,'if 0 then use all available text. Otherwise limit the number of documents per class')
cmd:option('-num_caption',10,'numner of captions per image to be used for training')
cmd:option('-outfile', 'results/roc.csv', 'output csv file with ROC curves.')
cmd:option('-ttype','char','word|char')

opt = cmd:parse(arg)
--------------------------------------------------------------------------------
MODEL = 'cv/saved_model.2019.1.12.t7'
DATASET_SIZE = 200
DATA_PATH = '../CUB/'
IMG_PATH = DATA_PATH .. 'images/'
TXT_PATH = DATA_PATH .. 'text_c10/' -- 换成这个才行…不知道为什么
alphabet = "abcdefghijklmnopqrstuvwxyz0123456789-,;.!?:'\"/\\|_@#$%^&*~`+-=<>()[]{} " -- 最后是个空格！
TXT_EMB_DIM = 1024 -- 论文里 Experimental results 说的
--------------------------------------------------------------------------------
local model = torch.load(MODEL)
local protos = model.protos
protos.enc_doc:evaluate()
protos.enc_image:evaluate()
-- print(model)
--------------------------------------------------------------------------------
--[[ 提取 image 特征 ]]
function extract_img(filename)
    local fea = torch.load(filename)[{{},{},1}]
    fea = fea:float():cuda()
    local out = protos.enc_image:forward(fea):clone()
    return out:cuda()
end
--------------------------------------------------------------------------------
--[[ 提取 text 特征 -> 驱动函数 ]]
function extract_txt(filename)
    if opt.ttype == 'word' then
        return extract_txt_word(filename)
    else -- 'char'
        return extract_txt_char(filename)
    end
end
--------------------------------------------------------------------------------
--[[ 真·提取 text 特征 -> word level ]]
function extract_txt_word(filename)
    -- average all text features together.
    local txt = torch.load(filename):permute(1,3,2)
    txt = txt:reshape(txt:size(1)*txt:size(2),txt:size(3)):float():cuda()
    if opt.txt_limit > 0 then
        local actual_limit = math.min(txt:size(1), opt.txt_limit)
        txt_order = torch.randperm(txt:size(1)):sub(1,actual_limit)
        local tmp = txt:clone()
        for i = 1,actual_limit do
            txt[{i,{}}]:copy(tmp[{txt_order[i],{}}])
        end
        txt = txt:narrow(1,1,actual_limit)
    end

    if (model.opt.num_repl ~= nil) then
        tmp = txt:clone()
        txt = torch.ones(txt:size(1),model.opt.num_repl*txt:size(2))
        for i = 1,txt:size(1) do
            local cur_sen = torch.squeeze(tmp[{i,{}}]):clone()
            local cur_len = cur_sen:size(1) - cur_sen:eq(1):sum()
            local txt_ix = 1
            for j = 1,cur_len do
                for k = 1,model.opt.num_repl do
                    txt[{i,txt_ix}] = cur_sen[j]
                    txt_ix = txt_ix + 1
                end
            end
        end
    end

    local txt_mat = torch.zeros(txt:size(1), txt:size(2), vocab_size+1)
    for i = 1,txt:size(1) do
        for j = 1,txt:size(2) do
            local on_ix = txt[{i, j}]
            if on_ix == 0 then
                break
            end
            txt_mat[{i, j, on_ix}] = 1
        end
    end
    txt_mat = txt_mat:float():cuda()
    local out = protos.enc_doc:forward(txt_mat):clone()
    -- out = torch.mean(out,1) --> 此处取了类平均
    return out
end
--------------------------------------------------------------------------------
--[[ 真·提取 text 特征 -> character level ]]
function extract_txt_char(filename)
    -- average all text features together.
    --[[此处刚 load 出来的 txt 维度是 (图片数, 201, 10)
        其中 10 是每张图对应的句子数]]
    local txt = torch.load(filename):permute(1,3,2) -- 变成 (图片数, 句子数(10), 201)
    local one = txt:size(1) -- 图片数
    local two = txt:size(2) -- 句子数
    -- 这个 reshape 将所有图片的描述混为一堆
    txt = txt:reshape(txt:size(1)*txt:size(2),txt:size(3)):float():cuda()
    
    if opt.txt_limit > 0 then
        local actual_limit = math.min(txt:size(1), opt.txt_limit)
        txt_order = torch.randperm(txt:size(1)):sub(1,actual_limit)
        local tmp = txt:clone()
        for i = 1,actual_limit do
            txt[{i,{}}]:copy(tmp[{txt_order[i],{}}])
        end
        txt = txt:narrow(1,1,actual_limit)
    end
    local txt_mat = torch.zeros(txt:size(1), txt:size(2), #alphabet)
    for i = 1,txt:size(1) do
        for j = 1,txt:size(2) do
            local on_ix = txt[{i, j}]
            if on_ix == 0 then
                break
            end
            txt_mat[{i, j, on_ix}] = 1
        end
    end
    txt_mat = txt_mat:float():cuda()
    local out = protos.enc_doc:forward(txt_mat):clone()
    -- 可以不调下面的 mean 以取得每条 description 的 encoding 吧
    -- return torch.mean(out,1) --> 此处取了类平均
    out = out:reshape(one, two, TXT_EMB_DIM) -- 问题！是否正确按原先顺序恢复形状？
    out = torch.mean(out, 2) -- 同一照片的 10 句话取平均
    return out
end
--------------------------------------------------------------------------------
--[[ 提取路径下所有文件的名字，不包括文件夹，只要 .t7 文件 ]]
function all_files(path)
    local j = 1
    local file_name = {}
    for file in lfs.dir(path) do
        if file ~= '.' and file ~= '..' and string.find(file, '.t7') ~= nil then
            local f = path .. file
            local attr = lfs.attributes(f)
            if attr.mode ~= 'directory' then
                file_name [j] = file
                j = j + 1
            end
        end
    end
    return file_name
end
--------------------------------------------------------------------------------
--[[ 提取 text 特征（character-level）并保存 ]]
txt_files = all_files(TXT_PATH)
-- print(txt_files)

for i = 1, #txt_files do
    print(i)
    local f = TXT_PATH .. txt_files[i]
    vec = extract_txt_char(f) -- character-level
    vec = vec:float()
    
    -- 运行前先 `mkdir cub_text_c10`，不然保存不了
    matio.save(string.format('/home/tom/jupyter_code/CNN_RNN/cub_text_c10/%s.mat', txt_files[i]), vec)
    vec = nil
end
--------------------------------------------------------------------------------
--[[ 提取 image 特征并保存 ]]
img_files = all_files(IMG_PATH)
-- print(img_files)

for i = 1, #img_files do
    print(i)
    local f = IMG_PATH .. img_files[i]
    if string.find(f, '.t7') ~= nil then
        vec = extract_img(f)
        vec = vec:float()
        if i == 1 then
            print('--- vec size ---')
            print(vec:size())
            print('--- vec ---')
            print(vec[{3, {}}])
            break
        end
        -- 运行前先 `mkdir cub_image`，不然保存不了
        matio.save(string.format('/home/tom/jupyter_code/CNN_RNN/cub_image/%s.mat', img_files[i]), vec)
        vec = nil
    end
end