仿真代码:https://github.com/yunjey/show-attend-and-tell
一. 依赖:
numpy,matplotlib,scipy,scikit-image,hickle,Pillow
二. 下载 the MSCOCO image dataset and VGGNet19 model,放于相应的位置,
并 python resize.py
三.运行 python prepro.py
1.pip2 install hickle
2.pip2 install pandas
3.问题:
hudou@Amax-Super-Server:~/仿真/show,attend,and tell/show-attend-and-tell-master/show-attend-and-tell-master$ python prepro.py
Traceback (most recent call last):
File "prepro.py", line 212, in <module>
main()
File "prepro.py", line 138, in main
max_length=max_length)
File "prepro.py", line 15, in _process_caption_data
with open(caption_file) as f:
IOError: [Errno 2] No such file or directory: 'data/annotations/captions_train2014.json'
解决:
将annotations文件夹,放置于data文件夹下
4.问题:tensorflow.python.framework.errors_impl.ResourceExhaustedError: <exception str() failed>
解决:
首先百度问题以及相关的解决方法;
然后通过看源码,并加入输出语句定位问题的具体行数;
训练错误的地方应该是:
with tf.Session() as sess:
或者是:
由tf.initialize_all_variables().run()修改为的:
tf.global_variables_initializer().run()
训练的报错是:
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: <exception str() failed>
5.最终运行成功:(截取部分结果)
(略)
Processed 82744 train features..
Processed 82752 train features..
Processed 82760 train features..
Processed 82768 train features..
Processed 82776 train features..
Processed 82784 train features..
Saved ./data/train/train.features.hkl..
doudou2
doudou3
Loaded ./data/val/val.annotations.pkl..
Processed 8 val features..
Processed 16 val features..
Processed 24 val features..
Processed 32 val features..
(略)
Processed 4016 val features..
Processed 4024 val features..
Processed 4032 val features..
Processed 4040 val features..
Processed 4048 val features..
Processed 4056 val features..
Saved ./data/val/val.features.hkl..
doudou2
doudou3
Loaded ./data/test/test.annotations.pkl..
(略)
Processed 3928 test features..
Processed 3936 test features..
Processed 3944 test features..
Processed 3952 test features..
Processed 3960 test features..
Processed 3968 test features..
Processed 3976 test features..
Processed 3984 test features..
Processed 3992 test features..
Processed 4000 test features..
Processed 4008 test features..
Processed 4016 test features..
Processed 4024 test features..
Processed 4032 test features..
Processed 4040 test features..
Processed 4048 test features..
Saved ./data/test/test.features.hkl..
四. 运行 python train.py
1. 错误:MemoryError
解决方案1:查看python位数,结果为64位的。
原因是“后来才知道32bit的Python使用内存超过2G之后,就报这个错误,还没有其他的提示消息。果断换64bit的Python。”、
hudou@Amax-Super-Server:~/仿真/show,attend,and tell/show-attend-and-tell-master/show-attend-and-tell-master$ python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.architecture()
('64bit', 'ELF')
解决方案2:换一个更大内存的服务器跑程序。
2.错误:tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
解决:
3.最终运行结果:
$ conda uninstall tensorflow-gpu
Solving environment: done
## Package Plan ##
environment location: /home/syh-lld/anaconda2
removed specs:
- tensorflow-gpu
The following packages will be REMOVED:
tensorflow-gpu: 1.3.0-0
Proceed ([y]/n)? t^Hy^H^H^H
Invalid choice: y
Proceed ([y]/n)? y
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
$ python
Python 2.7.15 |Anaconda, Inc.| (default, Oct 23 2018, 18:31:10)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> exit()
$ ls
anaconda2 numpy-1.15.3+mkl-cp35-cp35m-win_amd64.whl scipy-1.1.0-cp35-cp35m-win_amd64.whl ttf.py
Anaconda2-5.2.0-Linux-x86_64.sh perl5 show-attend-and-tell-master
$ cd show-attend-and-tell-master/
[syh-lld@localhost show-attend-and-tell-master]$ python train.py
doudou1
image_idxs <type 'numpy.ndarray'> (399998,) int32
file_names <type 'numpy.ndarray'> (82783,) <U55
word_to_idx <type 'dict'> 23110
features <type 'numpy.ndarray'> (82783, 196, 512) float32
captions <type 'numpy.ndarray'> (399998, 17) int32
Elapse time: 321.70
doudou2