Google cloab 运行使用pointsift 官方源码

pointSIFT官方GitHub地址

本文用于记录在运行源码时碰到的问题,后续应该还会分析源码,以及pointnet等点云深度学习相关的代码。

官方源码使用的是TensorFlow 1.4.1的版本,我也尝试过使用TensorFlow 2.x来运行,踩雷很多,最后仍有函数需要由1.4转为2.x并且不是一一对应关系,我最终放弃了。在这里我们使用的是Google Colab的方式进行,所以环境应该是一致的,按照步骤即可逐步复现。

Colab地址

1.新建笔记本,并且选择GPU模式
2.Colab已经不直接支持TensorFlow1.x的版本,需要手动设置(有可能后续手动也无法设置了)

%tensorflow_version 1.x

这行代码需要运行后一开始就指定,才可以成功。
在这里插入图片描述

3.从GitHub将代码clone下来

!git clone https://github.com/MVIG-SJTU/pointSIFT

在这里插入图片描述
4.设置环境变量,这个应该是系列的其他论文中复制过来的,不设置也可以,如果运行不了再补上。

!TF_INC=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_include())')
!TF_LIB=$(python -c 'import tensorflow as tf; print(tf.sysconfig.get_lib())')

5.查看TensorFlow路径

import tensorflow as tf
# include path
print(tf.sysconfig.get_include())
# library path 
print(tf.sysconfig.get_lib())

在这里插入图片描述
6.进入源码,修改源码/content/pointSIFT/tf_utils/tf_ops文件夹下四个文件夹内的以sh结尾的文件(脚本文件)
tf_grouping_compile.sh

#/bin/bash
/usr/local/cuda-10.0/bin/nvcc tf_grouping_g.cu -o tf_grouping_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

# TF1.2
#g++ -std=c++11 tf_grouping.cpp tf_grouping_g.cu.o -o tf_grouping_so.so -shared -fPIC -I /usr/local/lib/python2.7/dist-packages/tensorflow/include -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0

# TF1.4
g++ -std=c++11 tf_grouping.cpp tf_grouping_g.cu.o -o tf_grouping_so.so -shared -fPIC -I /tensorflow-1.15.2/python3.7/tensorflow_core/include -I /usr/local/cuda-8.0/include -I /tensorflow-1.15.2/python3.7/tensorflow_core/include/external/nsync/public -lcudart -L /usr/local/cuda-10.0/lib64/ -L/tensorflow-1.15.2/python3.7/tensorflow_core -ltensorflow_framework -O2 -D_GLIBCXX_USE_CXX11_ABI=0

修改的要点,将cuda8.0修改为10.0,可以按着/usr/local/这个路径去Colab有什么版本的cuda。另外一个修改则是将TensorFlow的路径改为Colab的,注意!有一些是include、有些没有include、有些include后面还接了一串,请注意修改。这里还有一个参考文章pointSIFT尝鲜。再次提醒是,四个文件夹下的sh文件都需要修改。

7.修改.so文件。这一步也是大坑,在相应的目录里,有一个libtensorflow_framework.so.1(如果是TensorFlow2则是so.2)但是需要用的是.so(后面不带数字的),我的解决方法就是复制一份出来丢进去。

%cd /tensorflow-1.15.2/python3.7/tensorflow_core
!cp libtensorflow_framework.so.1 libtensorflow_framework.so
!ls

在这里插入图片描述
8.设置环境变量(可做可不做)

!export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:"/tensorflow-1.15.2/python3.7/tensorflow_core/libtensorflow_framework.so"

将刚刚的.so放入环境变量,是踩坑的时候放进去的,理论上复制.so就没有坑了。

9.运行所有的.sh文件

%cd /content/pointSIFT/tf_utils/tf_ops/pointSIFT_op
!sh /content/pointSIFT/tf_utils/tf_ops/pointSIFT_op/tf_pointSIFT_compile.sh
%cd /content/pointSIFT/tf_utils/tf_ops/interpolation
!sh /content/pointSIFT/tf_utils/tf_ops/interpolation/tf_interpolate_compile.sh
%cd /content/pointSIFT/tf_utils/tf_ops/grouping/
!sh /content/pointSIFT/tf_utils/tf_ops/grouping/tf_grouping_compile.sh
%cd /content/pointSIFT/tf_utils/tf_ops/sampling
!chmod +x tf_sampling_compile.sh
!./tf_sampling_compile.sh

注意如果第六步没有修改好,会报不存在,或者错误退出等error,如下图

/content/pointSIFT/tf_utils/tf_ops/pointSIFT_op
/content/pointSIFT/tf_utils/tf_ops/pointSIFT_op/tf_pointSIFT_compile.sh: 2: /content/pointSIFT/tf_utils/tf_ops/pointSIFT_op/tf_pointSIFT_compile.sh: /usr/local/cuda-8.0/bin/nvcc: not found
g++: error: pointSIFT_g.cu.o: No such file or directory

请仔细检查相关的文件。

成功运行会有warning,但不影响。成功的结果如下

/content/pointSIFT/tf_utils/tf_ops/pointSIFT_op
main.cpp: In lambda function:
main.cpp:22:48: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(0), 3, &dims1);
                                                ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~
main.cpp:24:48: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(1), 3, &dims2);
                                                ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~
main.cpp: In lambda function:
main.cpp:35:47: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(0), 3, &dim1); // batch_size * npoint * 3
                                               ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~
main.cpp: In lambda function:
main.cpp:46:47: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(0), 3, &dim1); // batch_size * npoint * 3
                                               ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~
main.cpp: In lambda function:
main.cpp:57:47: warning: ignoring return value of ‘tensorflow::Status tensorflow::shape_inference::InferenceContext::WithRank(tensorflow::shape_inference::ShapeHandle, tensorflow::int64, tensorflow::shape_inference::ShapeHandle*)’, declared with attribute warn_unused_result [-Wunused-result]
             c->WithRank(c->input(0), 3, &dim1); // batch_size * npoint * 3
                                               ^
In file included from main.cpp:8:0:
/tensorflow-1.15.2/python3.7/tensorflow_core/include/tensorflow/core/framework/shape_inference.h:394:10: note: declared here
   Status WithRank(ShapeHandle shape, int64 rank,
          ^~~~~~~~

如有出现Error很大一部分原因是sh文件的路径没修改好,还有没有复制.so

10.下载数据集

%cd /content
!wget --no-check-certificate https://shapenet.cs.stanford.edu/media/scannet_data_pointnet2.zip

在这里插入图片描述

11.解压缩数据集

%cd /content/pointSIFT/
!unzip /content/scannet_data_pointnet2.zip

在这里插入图片描述

12.运行代码

%cd /content/pointSIFT
!python train_and_eval_scannet.py --batch_size 8
/content/pointSIFT
train size 1201 and test size 312
WARNING:tensorflow:From /content/pointSIFT/models/pointSIFT_pointnet.py:10: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From train_and_eval_scannet.py:164: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

WARNING:tensorflow:From train_and_eval_scannet.py:94: The name tf.train.exponential_decay is deprecated. Please use tf.compat.v1.train.exponential_decay instead.

WARNING:tensorflow:From train_and_eval_scannet.py:100: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

WARNING:tensorflow:From /content/pointSIFT/tf_utils/pointSIFT_util.py:98: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /content/pointSIFT/tf_utils/tf_util.py:21: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /content/pointSIFT/tf_utils/pointSIFT_util.py:252: calling reduce_max_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/util/deprecation.py:574: calling conv1d (from tensorflow.python.ops.nn_ops) with data_format=NHWC is deprecated and will be removed in a future version.
Instructions for updating:
`NHWC` for data_format is deprecated, use `NWC` instead
WARNING:tensorflow:From /content/pointSIFT/tf_utils/pointSIFT_util.py:341: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /content/pointSIFT/tf_utils/tf_util.py:613: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From train_and_eval_scannet.py:197: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead.

build graph in gpu 0
WARNING:tensorflow:From /content/pointSIFT/models/pointSIFT_pointnet.py:72: The name tf.losses.sparse_softmax_cross_entropy is deprecated. Please use tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

WARNING:tensorflow:From /tensorflow-1.15.2/python3.7/tensorflow_core/python/ops/losses/losses_impl.py:121: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /content/pointSIFT/models/pointSIFT_pointnet.py:74: The name tf.add_to_collection is deprecated. Please use tf.compat.v1.add_to_collection instead.

WARNING:tensorflow:From train_and_eval_scannet.py:179: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From train_and_eval_scannet.py:216: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

WARNING:tensorflow:From train_and_eval_scannet.py:218: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.

WARNING:tensorflow:From train_and_eval_scannet.py:221: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From train_and_eval_scannet.py:226: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-11-03 03:02:46.702606: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299995000 Hz
2021-11-03 03:02:46.704836: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560f203f4840 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-11-03 03:02:46.704873: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-11-03 03:02:46.761618: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-11-03 03:02:46.974400: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:46.975314: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560f203f52c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-11-03 03:02:46.975351: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2021-11-03 03:02:46.975731: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:46.976454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:04.0
2021-11-03 03:02:46.983310: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 03:02:47.178342: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-11-03 03:02:47.203608: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-11-03 03:02:47.262938: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-11-03 03:02:47.487159: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-11-03 03:02:47.521078: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-11-03 03:02:47.987856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-03 03:02:47.988116: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:47.989027: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:47.989755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0
2021-11-03 03:02:47.990029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-11-03 03:02:47.991877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-03 03:02:47.991915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2021-11-03 03:02:47.991945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2021-11-03 03:02:47.992562: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:47.993393: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-11-03 03:02:47.996088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10813 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7)
WARNING:tensorflow:From train_and_eval_scannet.py:227: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

WARNING:tensorflow:From train_and_eval_scannet.py:229: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

2021-11-03 03:02:57.247497: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:02:57.281148: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:02:57.297586: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:02:57.313636: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:02:57.330606: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 16777216 exceeds 10% of system memory.
2021-11-03 03:03:29.105629: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-11-03 03:03:35.225050: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
epoch 1 , loss is 16.904939 take 320.285 s
epoch 2 , loss is 14.630943 take 280.282 s

并没有跑完,跑完会需要很久

结语:坑真的很多,TensorFlow版本变换后不兼容真的很麻烦,pointnet有pytorch版本,复现起来就很方便。我将本文修改后可直接运行的代码以及笔记本上传至GitHub,如果使用的都是Colab,应该可以直接运行。
我的GitHub仓库

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

rglkt

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值