在ubuntu运行neural baby talk 中
github地址 https://github.com/jiasenlu/NeuralBabyTalk
1.先要移动到/NeuralBabyTalk/pooling/roi_align文件夹下,执行以下命令
sh make.sh
会报出错误1
Traceback (most recent call last):
File "build.py", line 4, in <module>
from torch.utils.ffi import create_extension
File "/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/__init__.py", line 14, in <module>
raise ImportError("torch.utils.ffi requires the cffi package")
ImportError: torch.utils.ffi requires the cffi package
原因是少了cffi包,那就装一下
pip install cffi
2.再执行
sh make.sh
会有错误2
error: /home/×××/NeuralBabyTalk/pooling/roi_align/src/roi_align_kernel.cu.o: 没有那个文件或目录
Traceback (most recent call last):
File "build.py", line 36, in <module>
ffi.build()
File "/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/__init__.py", line 189, in build
_build_extension(ffi, cffi_wrapper_name, target_dir, verbose)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/ffi/__init__.py", line 111, in _build_extension
outfile = ffi.compile(tmpdir=tmpdir, verbose=verbose, target=libname)
File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 723, in compile
compiler_verbose=verbose, debug=debug, **kwds)
File "/usr/local/lib/python2.7/dist-packages/cffi/recompiler.py", line 1526, in recompile
compiler_verbose, debug)
File "/usr/local/lib/python2.7/dist-packages/cffi/ffiplatform.py", line 22, in compile
outputfilename = _build(tmpdir, ext, compiler_verbose, debug)
File "/usr/local/lib/python2.7/dist-packages/cffi/ffiplatform.py", line 58, in _build
raise VerificationError('%s: %s' % (e.__class__.__name__, e))
cffi.VerificationError: LinkError: command 'x86_64-linux-gnu-gcc' failed with exit status 1
解决方案:
通过修改make.sh的头文件,在前面加上
export CUDA_PATH=/usr/local/cuda/
export CXXFLAGS="-std=c++11"
export CFLAGS="-std=c99"
export PATH=/usr/local/cuda-8.0/binKaTeX parse error: Expected '}', got 'EOF' at end of input: {PATH:+:{PATH}}
export CPATH=/usr/local/cuda-8.0/includeKaTeX parse error: Expected '}', got 'EOF' at end of input: {CPATH:+:{CPATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64KaTeX parse error: Expected '}', got 'EOF' at end of input: …LIBRARY_PATH:+:{LD_LIBRARY_PATH}}
这个问题就解决了
3.然后我想去eval robust-coco,根据github中的命令,在/NeuralBabyTalk 文件夹下,执行
python main.py --path_opt cfgs/robust_coco.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30 --inference_only True --beam_size 3 --start_from save/robust_coco_nbt_1024
.vector_cache/glove.6B.zip: 84%|████████▎ | 721M/862M [13:19:57<2:36:56, 15.0kB/s] 卡住不动了。
我觉得有可能是我有些包没装,去看dockerfile,果然我没有装nltk和stanfordcorenlp,装完之后
会有错误3
IOError: [Errno 2] No such file or directory: 'data/robust_coco/dic_coco.json'
解决方法:按照dockerfile里的把所需要的文件下载并解压放到指定地方
# ----------------------------------------------------------------------------
# -- download pretrained imagenet weights for resnet-101
# ----------------------------------------------------------------------------
RUN mkdir /workspace/neuralbabytalk/data/imagenet_weights && \
cd /workspace/neuralbabytalk/data/imagenet_weights && \
wget --quiet https://www.dropbox.com/sh/67fc8n6ddo3qp47/AAACkO4QntI0RPvYic5voWHFa/resnet101.pth
# ----------------------------------------------------------------------------
# -- download Karpathy's preprocessed captions datasets and corenlp jar
# ----------------------------------------------------------------------------
RUN cd /workspace/neuralbabytalk/data && \
wget --quiet http://cs.stanford.edu/people/karpathy/deepimagesent/caption_datasets.zip && \
unzip caption_datasets.zip && \
mv dataset_coco.json coco/ && \
mv dataset_flickr30k.json flickr30k/ && \
rm caption_datasets.zip dataset_flickr8k.json
RUN cd /workspace/neuralbabytalk/prepro && \
wget --quiet https://nlp.stanford.edu/software/stanford-corenlp-full-2017-06-09.zip && \
unzip stanford-corenlp-full-2017-06-09.zip && \
rm stanford-corenlp-full-2017-06-09.zip
RUN cd /workspace/neuralbabytalk/tools/coco-caption && \
sh get_stanford_models.sh
# ----------------------------------------------------------------------------
# -- download preprocessed COCO detection output HDF file and pretrained model
# ----------------------------------------------------------------------------
RUN cd /workspace/neuralbabytalk/data/coco && \
wget --quiet https://www.dropbox.com/s/2gzo4ops5gbjx5h/coco_detection.h5.tar.gz && \
tar -xzvf coco_detection.h5.tar.gz && \
rm coco_detection.h5.tar.gz
RUN mkdir -p /workspace/neuralbabytalk/save && \
cd /workspace/neuralbabytalk/save && \
wget --quiet https://www.dropbox.com/s/6buajkxm9oed1jp/coco_nbt_1024.tar.gz && \
tar -xzvf coco_nbt_1024.tar.gz && \
rm coco_nbt_1024.tar.gz
然后执行
python prepro/prepro_dic_coco.py --input_json data/coco/dataset_coco.json --split robust --output_dic_json data/robust_coco/dic_coco.json --output_cap_json data/robust_coco/cap_coco.json
结果是
from ._conv import register_converters as _register_converters
parsed input parameters:
{
"output_dic_json": "data/robust_coco/dic_coco.json",
"input_json": "data/coco/dataset_coco.json",
"word_count_threshold": 5,
"max_length": 16,
"output_cap_json": "data/robust_coco/cap_coco.json",
"split": "robust"
}
top words and their counts:
(1019785, u'a')
(224758, u'on')
(212689, u'of')
(206178, u'the')
(191793, u'in')
(161216, u'with')
(146755, u'and')
(102390, u'is')
(75957, u'man')
(71183, u'to')
(55190, u'sitting')
(51987, u'an')
(50467, u'two')
(44506, u'at')
(44297, u'standing')
(43707, u'people')
(42776, u'are')
(38867, u'next')
(37898, u'white')
(35372, u'woman')
('total words:', 6454115)
number of bad words: 18443/27929 = 66.04%
number of words in vocab would be 9486
number of UNKs: 32382/6454115 = 0.50%
('max length sentence in raw data: ', 49)
sentence length distribution (count, number of words):
0: 0 0.000000%
1: 0 0.000000%
2: 0 0.000000%
3: 0 0.000000%
4: 0 0.000000%
5: 1 0.000162%
6: 14 0.002270%
7: 4851 0.786521%
8: 101387 16.438461%
9: 134531 21.812289%
10: 132558 21.492395%
11: 95206 15.436299%
12: 60590 9.823807%
13: 35233 5.712530%
14: 20016 3.245310%
15: 11476 1.860670%
16: 6922 1.122304%
17: 4313 0.699292%
18: 2755 0.446684%
19: 1913 0.310166%
20: 1312 0.212722%
21: 923 0.149651%
22: 665 0.107820%
23: 503 0.081554%
24: 328 0.053181%
25: 258 0.041831%
26: 194 0.031454%
27: 156 0.025293%
28: 97 0.015727%
29: 74 0.011998%
30: 52 0.008431%
31: 65 0.010539%
32: 41 0.006648%
33: 48 0.007783%
34: 43 0.006972%
35: 35 0.005675%
36: 21 0.003405%
37: 24 0.003891%
38: 20 0.003243%
39: 21 0.003405%
40: 19 0.003081%
41: 21 0.003405%
42: 11 0.001783%
43: 19 0.003081%
44: 18 0.002918%
45: 13 0.002108%
46: 6 0.000973%
47: 7 0.001135%
48: 3 0.000486%
49: 4 0.000649%
inserting the special UNK token
('wrote ', 'data/robust_coco/dic_coco.json')
('wrote ', 'data/robust_coco/cap_coco.json')
4.执行
python main.py --path_opt cfgs/robust_coco.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30 --inference_only True --beam_size 3 --start_from save/robust_coco_nbt_1024
有错误4
Traceback (most recent call last):
File "main.py", line 213, in <module>
dataset = DataLoader(opt, split='train')
File "/home/×××/NeuralBabyTalk/misc/dataloader_coco.py", line 112, in __init__
self.dataloader_hdf = HDFSingleDataset(self.opt.proposal_h5)
File "/home/×××/NeuralBabyTalk/misc/dataloader_hdf.py", line 59, in __init__
super().__init__(
TypeError: super() takes at least 1 argument (0 given)
super()为在python3中的方法,而现在是在python2中运行的
改成
super(HDFSingleDataset,self).__init__(
os.path.dirname(hdf_path),
shard_names=[os.path.basename(hdf_path)],
primary_key=primary_key,
stride=stride
)
5.还是执行
python main.py --path_opt cfgs/robust_coco.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30 --inference_only True --beam_size 3 --start_from save/robust_coco_nbt_1024
有错误5
Traceback (most recent call last):
File "main.py", line 267, in <module>
model = AttModel.TopDownModel(opt)
File "/home/×××/NeuralBabyTalk/misc/AttModel.py", line 214, in __init__
self.ccr_core = CascadeCore(opt)
File "/home/×××/NeuralBabyTalk/misc/AttModel.py", line 246, in __init__
self.fg_mask = Parameter(opt.fg_mask)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/parameter.py", line 24, in __new__
return torch.Tensor._make_subclass(cls, data, requires_grad)
RuntimeError: Only Tensors of floating point dtype can require gradients
解决方法安装torch0.4.0。
把COCO_train2014图片放到data/coco/images/train2014文件夹下,
比如data/coco/images/train2014/COCO_train2014_000000398494.jpg
并且把COCO_val2014图片放到data/coco/images/val2014文件夹下,
比如data/coco/images/val2014/COCO_val2014_000000223648.jpg
Namespace(att_feat_size=2048, att_hid_size=512, att_model='topdown', batch_size=20, beam_size=3, cached_tokens='coco-all-idxs', cbs=False, cbs_mode='all', cbs_tag_size=3, checkpoint_path='save/robust_coco_1024', cider_df='corpus', cnn_backend='res101', cnn_learning_rate=1e-05, cnn_optim='adam', cnn_optim_alpha=0.8, cnn_optim_beta=0.999, cnn_weight_decay=0, cuda=True, data_path='data', dataset='coco', decode_noc=False, det_oracle=False, disp_interval=100, drop_prob_lm=0.5, fc_feat_size=2048, finetune_cnn=False, fixed_block=1, grad_clip=0.1, id='', image_crop_size=512, image_path='data/coco/images', image_size=576, inference_only=True, input_dic='data/robust_coco/dic_coco.json', input_encoding_size=512, input_json='data/robust_coco/cap_coco.json', language_eval=1, learning_rate=0.0005, learning_rate_decay_every=3, learning_rate_decay_rate=0.8, learning_rate_decay_start=1, load_best_score=1, losses_log_every=10, mGPUs=False, max_epochs=30, num_layers=1, num_workers=20, optim='adam', optim_alpha=0.9, optim_beta=0.999, optim_epsilon=1e-08, path_opt='cfgs/robust_coco.yml', proposal_h5='data/coco/coco_detection.h5', rnn_size=1024, rnn_type='lstm', scheduled_sampling_increase_every=5, scheduled_sampling_increase_prob=0.05, scheduled_sampling_max_prob=0.25, scheduled_sampling_start=-1, self_critical=False, seq_length=20, seq_per_img=5, start_from='save/robust_coco_nbt_1024', val_every_epoch=3, val_images_use=-1, val_split='test', weight_decay=0)
/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
DataLoader loading json file: data/robust_coco/dic_coco.json
vocab size is 9488
DataLoader loading json file: data/robust_coco/cap_coco.json
loading annotations into memory...
Done (t=18.72s)
creating index...
index created!
loading annotations into memory...
Done (t=10.12s)
creating index...
index created!
assigned 110234 images to split train
DataLoader loading json file: data/robust_coco/dic_coco.json
vocab size is 9488
DataLoader loading json file: data/robust_coco/cap_coco.json
loading annotations into memory...
Done (t=22.57s)
creating index...
index created!
loading annotations into memory...
Done (t=4.28s)
creating index...
index created!
assigned 9138 images to split test
Loading pretrained weights from data/imagenet_weights/resnet101.pth
Loading the model save/robust_coco_nbt_1024/model-best.pth...
Use adam as optmization method
/home/×××/NeuralBabyTalk/misc/model.py:520: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
conv_feats, fc_feats = self.cnn(Variable(img.data, volatile=True))
image 223648: a wooden table topped with a wooden table
image 113588: a man sitting at a desk with a laptop
image 173350: a dog and a toilet in a room
image 81922: a large jetliner flying over a city
image 310391: a green truck parked in the grass near a forest
image 462341: a clock tower with a sky background
image 122851: a man riding a motorcycle with a bunch of banana
image 452684: a glass of wine sitting on a table
image 350341: a bowl of food on a table
image 550529: a motorcycle is parked on a wooden shelf
image 281533: a dog sitting on the floor watching tv
image 291380: a man sitting in the back seat of a car
image 560623: a view of a plane in a window
image 522713: a bench sitting on top of a lush green field
image 354533: a motorcycle is parked on a dirt field
image 29913: a fire hydrant on the side of the street
image 38029: a red truck with a red top is on a street
image 17756: a boat that is sitting in the grass
image 155885: a black and white photo of a harbor with many boating
image 231408: a couple of cats are standing in the grass
0
100
200
300
400
Total image to be evaluated 9138
loading annotations into memory...
Done (t=1.05s)
creating index...
index created!
using 3020/9138 predictions
Loading and preparing results...
DONE (t=0.06s)
creating index...
index created!
tokenization...
PTBTokenizer tokenized 193335 tokens at 479991.21 tokens per second.
PTBTokenizer tokenized 30305 tokens at 183404.06 tokens per second.
setting up scorers...
computing Bleu score...
{'reflen': 27837, 'guess': [27286, 24266, 21246, 18226], 'testlen': 27286, 'correct': [20678, 11036, 5281, 2558]}
ratio: 0.980206200381
Bleu_1: 0.743
Bleu_2: 0.575
Bleu_3: 0.432
Bleu_4: 0.325
computing METEOR score...
METEOR: 0.251
computing Rouge score...
ROUGE_L: 0.534
computing CIDEr score...
CIDEr: 0.958
computing SPICE score...
Parsing reference captions
Initiating Stanford parsing pipeline
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
done [0.6 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.5 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [2.8 sec].
Threads( StanfordCoreNLP ) [44.74 seconds]
Threads( StanfordCoreNLP ) [19.930 seconds]
Parsing test captions
Threads( StanfordCoreNLP ) [6.653 seconds]
SPICE evaluation took: 1.457 min
SPICE: 0.185