ubuntu18.04和cuda10.0下安装Torch7(非pytorch)

我的电脑配置:
ubuntu18.04+CUDA10.0 + cudnn7.4.1.5

问题:Torch7官网上说的安装方法不适合cuda10.0。以下是本次成功安装Torch7后的记录。
以及运行论文8《Deep depth completion of a single RGB-D image》的代码行时的问题和解决办法。

一、安装准备

1、查看cuda版本

检查cuda版本有以下两种方法:

nvcc --version
cat /usr/local/cuda/version.txt

2、检查NVIDIA驱动是否正常

nvidia-smi

如果NVIDIA驱动不正常就卸载再安装:
(1)先把gcc变成gcc-8的版本:(不然后面安装驱动会有错。)https://blog.csdn.net/u013928488/article/details/107288413/

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-6 100
sudo update-alternatives --config gcc

(2)再参照博文来安装驱动:
https://blog.csdn.net/weixin_43820996/article/details/100676292

3、安装cmake-3.17.2

卸载 cmake

sudo apt remove --purge cmake
hash -r

安装 cmake-3.17.2

sudo apt install build-essential libssl-dev
wget https://github.com/Kitware/CMake/releases/download/v3.17.2/cmake-3.17.2.tar.gz
tar -zxvf cmake-3.17.2.tar.gz
cd cmake-3.17.2
./bootstrap
make 
sudo make install 
cmake --version

4、安装gcc7.5.0 (cuda10.0不支持超过7的gcc版本)

参考https://blog.csdn.net/liaoze22/article/details/107821653

二、安装Torch7

1、下载Torch7到~/torch

按Ctrl+alt+t打开终端,然后输入:

git clone https://github.com/torch/distro.git ~/torch --recursive
git clone https://github.com/nagadomi/distro.git ~/torch --recursive

2、修改文件install-deps

cd ~/torch
sudo gedit install-deps  #打开要修改的文件

然后把文件中第178行第261行的sudo apt-get install -y python-software-properties改成sudo apt-get install -y software-properties-common,保存

3、安装Torch7的依赖

cd ~/torch
git config --global url."https://".insteadOf git://
bash install-deps

4、删除FindCUDA.cmake

rm -fr cmake/3.6/Modules/FindCUDA*

5、添加文件atomic.patch

cd ~/torch
cd extra/cutorch
vim atomic.patch

将下面的内容复制进去:

diff --git a/lib/THC/THCAtomics.cuh b/lib/THC/THCAtomics.cuh
index 400875c..ccb7a1c 100644
--- a/lib/THC/THCAtomics.cuh
+++ b/lib/THC/THCAtomics.cuh
@@ -94,6 +94,7 @@ static inline __device__ void atomicAdd(long *address, long val) {
 }
 
 #ifdef CUDA_HALF_TENSOR
+#if !(__CUDA_ARCH__ >= 700 || !defined(__CUDA_ARCH__) )
 static inline  __device__ void atomicAdd(half *address, half val) {
   unsigned int * address_as_ui =
       (unsigned int *) ((char *)address - ((size_t)address & 2));
@@ -117,6 +118,7 @@ static inline  __device__ void atomicAdd(half *address, half val) {
    } while (assumed != old);
 }
 #endif
+#endif

然后保存并退出用vim命令打开的文件:按esc,再输入:wq!

patch -p1 < atomic.patch

6、安装

先获取权限,此处需要切换为root用户:(重要)

su root

再安装(应安装torch和Lua5.2。因为如果安装LuaJIT会导致后面require’cutorch’ 、require’nn’等失败)

cd /home/**/torch       #torch所在绝对路径。我的是/home/lt/torch
./clean.sh          #执行clean.sh脚本
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
TORCH_LUA_VERSION=LUA52 ./install.sh          #执行install.sh脚本

在这里插入图片描述

输入yes

7、使配置生效

su root
source ~/.bashrc
source ~/.profile

8、检测Torch7是否安装成功

(1)

su root
sudo gedit ~/.bashrc

看~/.bashrc文件的末尾是否多了类似的语句:
. /home/XXX/torch/install/bin/torch-activate
(2)

th
require'torch'
require'nn'
require'cutorch'

重启

三、require’cutorch’时遇到问题

1 问题:module ‘cutorch’ not found

module 'cutorch' not found:Failed loading module cutorch in LuaRocks rock cutorch scm-1

解决:在/home/**/torch/extra/cutorch目录下打开终端执行:

su root
export TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__"
luarocks make rocks/cutorch-scm-1.rockspec
th
require'cutorch'

2 问题:安装cutorch时遇到问题:

CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
CUDA_cublas_device_LIBRARY (ADVANCED)

原因:cuda版本和cmak版本冲突:cuda10.0需要CMake3.12.2+
解决:重装CMake3.14.3
(1)下载并解压:(如自行去官网下载可能会遇到没有bootstrap文件的问题)

sudo apt-get purge cmake
wget https://cmake.org/files/v3.14/cmake-3.14.3.tar.gz 
sudo tar -zxv -f cmake-3.14.3.tar.gz

(2)解压后得到一个单独的文件夹,名称为cmake-3.14.3。如果这个文件夹或者文件夹里面的文件有锁说明有权限设置,需要用指令chmod -R 777 cmake-3.14.0修改文件权限 。如果没有锁的话省略。
(3)检测gcc和g++是否安装:

gcc --version

(4)安装
在解压后的的目录下打开终端。或者cd /home/**/cmake.3.14.3

sudo ./bootstrap
sudo make
sudo make install
cmake --version     #查看是否安装成功

返回问题1
问题
cuda10.0要求gcc<=7
重装gcc,重装cmake

问题:require’cutorch’时遇到问题

cannot load '/home/lt/torch/install/lib/lua/5.2/libcutorch.so'

sudo chmod -R 777 /home/lt/torch
在普通用户下th ,require’cutorch’,require’cunn’都成功了。
但是在su root下时,不成功

问题:

'libcudnn (R5) not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure files named as libcudnn.so.5 or libcudnn.5.dylib are placed in
your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

Alternatively, set the path to libcudnn.so.5 or libcudnn.5.dylib
to the environment variable CUDNN_PATH and rerun torch.
For example: export CUDNN_PATH="/usr/local/cuda/lib64/libcudnn.so.5"

stack traceback:
	[C]: in function 'error'
	/home/lt/torch/install/share/lua/5.2/trepl/init.lua:389: in function 'require'
	main_test_bound_realsense.lua:4: in main chunk
	[C]: in function 'dofile'

解决:安装cudnn
Debian File形式的安装:https://blog.csdn.net/weixin_45591044/article/details/104608506
cuDNN安装成功,但是问题依然存在。

sudo find / -name ''libcudnn.*''   #查找
sudo cp -r /usr/lib/x86_64-linux-gnu/libcudnn.so.7.4.2    /usr/local/lib   #复制
sudo mv libcudnn.so.7.4.2    libcudnn.so.5    #重命名 先cd /usr/local/lib
sudo cp -r /usr/local/lib/libcudnn.so.5    /usr/local/cuda/lib64/libcudnn.so.5

四、运行论文《Deep depth completion of a single RGB-D image》的代码行时的问题

1、执行如下代码行时遇到的一些问题及解决

th main_test_bound_realsense.lua -test_model ../pre_train_model/bound.t7 -test_file ./data_list/realsense_list.txt -root_path ../data/realsense/
问题1:could not load library /usr/local/lib:
Found Environment variable CUDNN_PATH = /usr/local/lib:/home/**/torch/install/bin/lua: /home/**/torch/install/share/lua/5.2/trepl/init.lua:389: /home/**/torch/install/share/lua/5.2/trepl/init.lua:389: /home/**/torch/install/share/lua/5.2/cudnn/ffi.lua:1743: could not load library /usr/local/lib:
stack traceback:
	[C]: in function 'error'
	/home/**/torch/install/share/lua/5.2/trepl/init.lua:389: in function 'require'
	main_test_bound_realsense.lua:4: in main chunk
	[C]: in function 'dofile'
	...e/**/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: in ?

解决:运行之前先export CUDNN_PATH=/usr/local/cuda/lib64/libcudnn.so
变为永久:

sudo gedit  /etc/profile
export CUDNN_PATH=/usr/local/cuda/lib64/libcudnn.so        #在文件最后添加
source ~/.profile
问题2:Are you using an older or newer version of CuDNN?

cuDNN7.4版本对于Torch7来说太新了。

Found Environment variable CUDNN_PATH = /usr/local/cuda/lib64/libcudnn.so/home/**/torch/install/bin/lua: /home/**/torch/install/share/lua/5.2/trepl/init.lua:389: /home/**/torch/install/share/lua/5.2/trepl/init.lua:389: /home/**/torch/install/share/lua/5.2/cudnn/ffi.lua:1618: These bindings are for CUDNN 5.x (5005 <= cudnn.version > 6000) , while the loaded CuDNN is version: 7401  
Are you using an older or newer version of CuDNN?
stack traceback:
	[C]: in function 'error'
	/home/**/torch/install/share/lua/5.2/trepl/init.lua:389: in function 'require'
	main_test_bound_realsense.lua:4: in main chunk
	[C]: in function 'dofile'
	...e/**/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: in ?

解决:参照博文给Torch提供一个接口(可能是某种映射之类的)来兼容最新版本(7及以上版本)的CuDNN的结构。
https://blog.csdn.net/Geek_of_CSDN/article/details/80461129

cd /home/lt/torch
git clone https://github.com/soumith/cudnn.torch.git -b R7 && cd cudnn.torch && luarocks make cudnn-scm-1.rockspec
问题3:cannot open <../pre_train_model/bound.t7> in mode
/home/**/torch/install/bin/lua: cannot open <../pre_train_model/bound.t7> in mode r  at /home/**/torch/pkg/torch/lib/TH/THDiskFile.c:673
stack traceback:
	[C]: in ?
	[C]: in function 'DiskFile'
	/home/**/torch/install/share/lua/5.2/torch/File.lua:405: in function 'load'
	main_test_bound_realsense.lua:38: in main chunk
	[C]: in function 'dofile'
	...e/**/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: in ?

拷贝/pre_train_model/bound.t7

问题4:attempt to call global 'unpack' (a nil value)
/home/**/torch/install/bin/lua: BatchIterator_scannet.lua:24: attempt to call global 'unpack' (a nil value)
stack traceback:
	BatchIterator_scannet.lua:24: in function 'nextBatchRealsense'
	main_test_bound_realsense.lua:50: in main chunk
	[C]: in function 'dofile'
	...e/**/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: in ?

原因:由于 Lua 5.2中unpack功能现在在table.unpack。函数unpack已移入 table 库,因此必须用table.unpack 调用。
解决:把DeepCompletionRelease-master/torch目录下的文件BatchIterator_scannet.lua的第24行的unpack变为table.unpack
成功!

收获总结:出现问题时一定要要先看错误提示,再去网上找办法。不能一出现问题就马上去网上找办法。

2、运行如下代码时遇到的一些问题及解决办法

th main_test_realsense.lua -test_model ../pre_train_model/normal_scannet.t7 -test_file ./data_list/realsense_list.txt -root_path ../data/realsense/
问题1:module 'hdf5' not found:No LuaRocks module found for hdf5

A、在目录/usr/lib/x86_64-linux-gnu下打开终端并执行:

sudo ln -s libhdf5_serial.so.100.0.1 libhdf5.so
sudo ln -s libhdf5_serial_hl.so libhdf5_hl.so

参照https://blog.csdn.net/weixin_43165871/article/details/88992354

sudo apt-get install libhdf5-serial-dev hdf5-tools
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec LIBHDF5_LIBDIR="/usr/lib/x86_64-linux-gnu/"

问题:Missing dependencies for hdf5: totem
先参照:https://stackoverflow.com/questions/45499973/missing-dependency-for-hdf5-totem-error-failed-cloning-git-repository-git-clo

git clone https://github.com/deepmind/torch-totem.git
cd torch-totem
cp rocks/totem-0-0.rockspec ./   #copy the rockspec file in the root dirctory of project
luarock make

luarocks make hdf5-0-0.rockspec LIBHDF5_LIBDIR="/usr/lib/x86_64-linux-gnu/"
问题2:module 'bit' not found:No LuaRocks module found for bit

A:(1)先安装lua和luarocks
安装lua5.2.3: https://blog.csdn.net/hp_cpp/article/details/87641222
安装luarocks3.0.4:https://blog.csdn.net/hp_cpp/article/details/87643911
(2)再

luarocks install luabitop

失败!无法使用luarocks install来安装

B:换种方式安装bit模块:
1.http://bitop.luajit.org/download.html下载库
2.tar xvzf LuaBitOp-1.0.2解压
3.在目录~/LuaBitOp-1.0.2下打开终端执行make
4.make install
成功!

问题3:/torch/install/share/lua/5.2/hdf5/ffi.lua:73: expected align(#) on line 679

办法:gcc和g++版本都改为4.8:

sudo update-alternatives --config gcc

解决!

问题4:/home/lt/torch/install/share/lua/5.2/hdf5/ffi.lua:88: Unsupported HDF5 version: 1.10.4

办法A:安装HDF5-1.8.12版本:
(1)下载https://support.hdfgroup.org/ftp/HDF5/releases/hdf5-1.8/hdf5-1.8.12/src/
(2)

sudo tar -xvf hdf5-1.8.12.tar.gz
cd  hdf5-1.8.12/
sudo ./configure --prefix=/usr/local/hdf5         #安装路径
sudo make
sudo make check   
sudo make install
make check-install 

失败!

办法B:参照https://github.com/deepmind/torch-hdf5/issues/76#issuecomment-292811730修改/home/lt/torch/install/share/lua/5.2/hdf5目录下的config.lua文件:

hdf5._config = {
    HDF5_INCLUDE_PATH = "/usr/local/hdf5/include/",
    HDF5_LIBRARIES = "/usr/local/hdf5/lib/libhdf5.so;/usr/lib/x86_64-linux-gnu/librt.so;/usr/lib/x86_64-linux-gnu/libpthread.so;/home/lt/anaconda3/lib/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so"
}

成功!!
在/media/lt/c8470c47-e40b-4a4a-9a40-c8c0736564fe/lt/lsn/depth/code8/code8/torch/deta_list/realsense_list.txt中添加想要test的图像的数(如050),
便会得到对应图像的遮挡边界和表面法线。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值