TensorFlow下运行Google的Im2txt:show and tell inception v3

我的设备:ubuntu14.04+GPU

TensorFlow1.0.1


相关论文《Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

https://arxiv.org/abs/1609.06647

去年9月刚开源的

github:https://github.com/tensorflow/models/tree/master/im2txt#generating-captions


根据GitHub的readme

先安装相关东西

Bazel
根据官网

$echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
$sudo apt-get update&& sudo apt-get install bazel
报错:
有一些软件包无法被安装。如果您用的是 unstable发行版,这也许是因为系统无法达到您要求的状态造成的。该版本中可能会有一些您需要的软件
包尚未被创建或是它们已被从新到(Incoming)目录移出。
下列信息可能会对解决问题有所帮助:
 
下列软件包有未满足的依赖关系:
 bazel : 依赖: google-jdk 但无法安装它
                 java8-jdk但无法安装它
                 java8-sdk但无法安装它
                 oracle-java8-installer但无法安装它
E: 无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系。
 
试了网上的无数方法,各种换源都没用,直到我看到官网的一行字:
If you want to use the JDK 7, please replace jdk1.8 with jdk1.7 and if you want to install the testing version of Bazel, replace stable with testing.
 
应该是因为我的系统是ubuntu14.04,所以用的jdk7
$ update-java-alternatives -l
#java-1.7.0-openjdk-amd641071 /usr/lib/jvm/java-1.7.0-openjdk-amd64

继续按照官网
$echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.7" | sudo tee /etc/apt/sources.list.d/bazel.list
$curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
$sudo apt-get update&& sudo apt-get install bazel
$sudo apt-get upgrade bazel

检查自己是否安装好了
$/usr/bin/bazel version


NumPy
安装官方文档
$python -m pip install --upgrade pip
$pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose
测试:
$python
>>>import scipy
>>>import numpy
>>>scipy.test()
>>>numpy.test()
网上说也可以这么装,不懂跟GitHub上链接的网址有什么不同
$sudo apt-get install Python-scipy
$sudo apt-get install python-numpy
$sudo apt-get install python-matplotlib

Natural Language Toolkit (NLTK):
首先安装NLTK
$sudo pip install -U nltk
$sudo pip install -U numpy
$python
>>> import nltk

然后安装NLTK数据
$sudo python
>>> import nltk
>>> nltk.download()
设定目录为/usr/share/nltk_data

测试数据已安装
>>> from nltk.corpus import brown
>>> brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

这步完成的时候最后再运行一下:
>>>import nltk
>>>nltk.download('punkt')
以防下载和预处理的时候遇到以下问题

LookupError:

**********************************************************************

 Resource u'tokenizers/punkt/english.pickle' not found.  Please

  usethe NLTK Downloader to obtain the resource: >>>

 nltk.download()

 Searched in:

    -'/home/ubuntu/nltk_data'

    -'/usr/share/nltk_data'

    -'/usr/local/share/nltk_data'

    -'/usr/lib/nltk_data'

    -'/usr/local/lib/nltk_data'

    -u''

**********************************************************************


预处理
# Location to save the MSCOCO data.
MSCOCO_DIR="${HOME}/im2txt/data/mscoco"

# Build the preprocessing script.
bazel build im2txt/download_and_preprocess_mscoco

# Run the preprocessing script.
bazel-bin/im2txt/download_and_preprocess_mscoco "${MSCOCO_DIR}"
这步比较简单,但是网络不好的话会经常蜜汁重下,我就下了好几次,每次时间还特别久,总之真是令人崩溃


看到这个就说明预处理成功啦

训练

$ MSCOCO_DIR="/path/to/MSCOCO"
$ INCEPTION_CHECKPOINT="/path/to/inception_v3.ckpt"
$ MODEL_DIR="/path/to/models/im2txt/model"
$ bazel build -c opt im2txt/...
$ bazel-bin/im2txt/train \
> --input_file_pattern="${MSCOCO_DIR}/train-?????-of-00256" \
> --inception_checkpoint_file="${INCEPTION_CHECKPOINT}" \
> --train_dir="${MODEL_DIR}/train" \
> --train_inception=false \
> --number_of_steps=1000000

报错:
AttributeError: 'module' object has no attribute '_base'

解决方法:
$ pip install --upgrade html5lib==1.0b8

解决后模型训练中:


接下来就是等待啦,网上说需要一到两周
我训练了三四天的样子就训练完了

接下来就是精调,在此之前测试一下训练的效果

想偷懒的小伙伴也可以跳过训练的步骤,直接用我训练的模型


用现有的模型
直接用我训练好的模型

https://github.com/withyou1771/im2txt

$CHECKPOINT_PATH="/path/to/model.ckpt-1000000"
$VOCAB_FILE="/path/to/word_counts.txt"
$IMAGE_FILE="/path/to/models/im2txt/1.jpg"
$bazel build -c opt im2txt/run_inference
$bazel-bin/im2txt/run_inference \

 --checkpoint_path=${CHECKPOINT_PATH} \

 --vocab_file=${VOCAB_FILE} \

 --input_files=${IMAGE_FILE}


$ CHECKPOINT_PATH="/path/to/model.ckpt-1000000"
$ IMAGE_FILE="/path/to/1.jpg"
$ VOCAB_FILE="/path/to/word_counts.txt"
$ bazel build -c opt im2txt/run_inference
INFO: Found 1 target...
Target //im2txt:run_inference up-to-date:
bazel-bin/im2txt/run_inference
INFO: Elapsed time: 0.138s, Critical Path: 0.00s
(tensorflow)ubuntu@ubuntu-All-Series:/home/data1/tf/models/im2txt$ bazel-bin/im2txt/run_inference --checkpoint_path=${CHECKPOINT_PATH} --vocab_file=${VOCAB_FILE} --input_files=${IMAGE_FILE}
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
INFO:tensorflow:Building model.
INFO:tensorflow:Initializing vocabulary from file: /word_counts.txt
INFO:tensorflow:Created vocabulary with 11520 words
INFO:tensorflow:Running caption generation on 1 files matching/1.jpg
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:0a:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x541a970
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:09:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x541e2f0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:06:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x5421c70
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 7.57GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: Y Y Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:0a:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1080, pci bus id: 0000:09:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX 1080, pci bus id: 0000:06:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
INFO:tensorflow:Loading model from checkpoint: /home/data1/tf/model4/model.ckpt-1000000
INFO:tensorflow:Successfully loaded checkpoint: model.ckpt-1000000

Captions for image 1.jpg:

  0)a cat laying on top of a grass covered field . (p=0.002806)

  1)a black and white cat laying on top of a grass covered field . (p=0.000498)

  2)a black and white cat laying on top of a green field . (p=0.000412)



处理一张照片大约需要8秒,大部分时间都用在加载cuda
可以同时处理多张图片,路径用逗号隔开就可以。同时处理5张图片需要26秒,平均每张5秒左右

由于python和TensorFlow版本的不同,即使是用别人的模型也可能遇到很多错误
报错一:
ValueError: No checkpoint file found in: None
word_counts.txt文件的格式不对
替换vocabulary.py第49行
reverse_vocab = [eval(line.split()[0]) for line in reverse_vocab]

报错二:
NotFoundError (see above for traceback): Tensor name "lstm/basic_lstm
ts" not found in checkpoint files /home/data1/tf/model2/model.ckpt-30
[[Node: save/RestoreV2_381 = RestoreV2[dtypes=[DT_FLOAT], _d
:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV
r_names, save/RestoreV2_381/shape_and_slices)]]
[[Node: save/RestoreV2_295/_177 = _Recv[client_terminated=fa
evice="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:loca
ca:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_1333
reV2_295", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/ta
]()]]

TensorFlow1.0中BasicLSTMCell更改了默认变量名,不再匹配检查点。
相关的资料

我可以在Bazel中生成BUILD文件吗?

07-13
<div class="post-text" itemprop="text"> <p>I want to use the go lint tool to generate a BUILD file for bazel.</p> <p>I would have a go binary that would perform something like this bash script:</p> <pre><code>#!/bin/bash cat <<EOF > BUILD # THIS FILE IS AUTOGENERATED package(default_visibility = ["//visibility:public"]) load("//go:def.bzl", "go_prefix", "go_library", "go_binary", "go_test") EOF for pkg in `go list -f {{.ImportPath}} ./...`; do go list -f "`cat test.in`" $pkg >> "BUILD"; done buildifier -mode fix BUILD </code></pre> <p>The test.in file contains:</p> <pre><code>{{ if eq .Name "main" }} go_binary {{else}} go_library {{end}} ("{{.Name}}", srcs=[ {{range .GoFiles}} "{{.}}", {{end}} ], deps=[ {{range .Imports}} "{{.}}", {{end}} ], csrcs=[ {{range .CgoFiles}} "{{.}}", {{end}} ], swig_srcs=[ {{range .SwigFiles}} "{{.}}", {{end}} ], cxxswig=[ {{range .SwigCXXFiles}} "{{.}}", {{end}} ], cflags=[ {{range .CgoCFLAGS}} "{{.}}", {{end}} ], cxxflags=[ {{range .CgoCXXFLAGS}} "{{.}}", {{end}} ], cppflags=[ {{range .CgoCPPFLAGS}} "{{.}}", {{end}} ], {{ with $ctx := context}} {{$ctx.InstallSuffix}} {{end}} ) </code></pre> <p>And this generates:</p> <pre><code># THIS FILE IS AUTOGENERATED package(default_visibility = ["//visibility:public"]) load("//go:def.bzl", "go_prefix", "go_library", "go_binary", "go_test") go_library ( "tensorflow", srcs = [ "doc.go", "gen.go", "graph.go", "session.go", ], deps = [ "C", "encoding/binary", "fmt", "github.com/golang/protobuf/proto", "github.com/tensorflow/tensorflow/tensorflow/contrib/go/proto", "math", "reflect", "runtime", "strings", "unsafe", ], csrcs = [ "lib.go", "tensor.go", ], swig_srcs = [ ], cxxswig = [ "tensorflow.swigcxx", ], cflags = [ "-I/Users/fmilo/workspace/gopath/src/github.com/tensorflow/tensorflow/tensorflow/contrib/go/../../../", ], cxxflags = [ "-I/Users/fmilo/workspace/gopath/src/github.com/tensorflow/tensorflow/tensorflow/contrib/go/../../../", "-std=c++11", ], cppflags = [ ], ) go_library ( "tensorflow", srcs = [ "allocation_description.pb.go", "attr_value.pb.go", "config.pb.go", "cost_graph.pb.go", "device_attributes.pb.go", "event.pb.go", "function.pb.go", "graph.pb.go", "kernel_def.pb.go", "log_memory.pb.go", "master.pb.go", "memmapped_file_system.pb.go", "meta_graph.pb.go", "named_tensor.pb.go", "op_def.pb.go", "queue_runner.pb.go", "saved_tensor_slice.pb.go", "saver.pb.go", "step_stats.pb.go", "summary.pb.go", "tensor.pb.go", "tensor_description.pb.go", "tensor_shape.pb.go", "tensor_slice.pb.go", "tensorflow_server.pb.go", "test_log.pb.go", "tf_ops_def.go", "types.pb.go", "variable.pb.go", "versions.pb.go", "worker.pb.go", ], deps = [ "fmt", "github.com/golang/protobuf/proto", "github.com/golang/protobuf/ptypes/any", "math", ], csrcs = [ ], swig_srcs = [ ], cxxswig = [ ], cflags = [ ], cxxflags = [ ], cppflags = [ ], ) go_library ( "tensorflow_error", srcs = [ "error_codes.pb.go", ], deps = [ "fmt", "github.com/golang/protobuf/proto", "math", ], csrcs = [ ], swig_srcs = [ ], cxxswig = [ ], cflags = [ ], cxxflags = [ ], cppflags = [ ], ) go_library ( "tensorflow_grpc", srcs = [ "master_service.pb.go", "worker_service.pb.go", ], deps = [ "fmt", "github.com/golang/protobuf/proto", "math", "tensorflow/core/protobuf", ], csrcs = [ ], swig_srcs = [ ], cxxswig = [ ], cflags = [ ], cxxflags = [ ], cppflags = [ ], ) </code></pre> <p>of course the above BUILD file does not work yet, but I want to make sure first that the approach that I am pursuing is valid.</p> <ul> <li>How can I specify a pre-BUILD phase in bazel ? </li> <li>is it possible? </li> </ul> </div>
©️2020 CSDN 皮肤主题: 大白 设计师: CSDN官方博客 返回首页
实付0元
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值