首先确保CUDA,CUDNN已经安装配置好
1. 创建虚拟环境:
conda create –n py2ssd python=2.7 (我的虚拟环境名为:fdpy2)
2.安装基础库:
sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler
sudo apt-get install --no-install-recommends libboost-all-dev
sudo apt-get install libopenblas-dev liblapack-dev libatlas-base-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev
sudo apt-get install git cmake build-essential
3.安装protoc(caffe要求protoc版本不高于2.6.1)
参考:https://blog.csdn.net/lwplwf/article/details/76532804
先用命令 whereis protoc可以查看哪些路径下安装了protoc
命令which protoc可以查看默认选用protoc的路径
命令 protoc --version可以查看当前protoc版本
系统默认的protobuf路径为/usr/bin/protoc
若版本不满足,则安装protoc 2.6.1
(1)下载
https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz
(2)安装
tar -zxvf protobuf-2.6.1.tar.gz # 解压
cd protobuf-2.6.1/ # 进入目录
./configure # 配置安装文件 (默认安装位置为/usr/local/bin,若同时需要多个版本的protoc,也可自定义安装位置)
make # 编译
make check # 检测编译安装的环境
sudo make install # 安装
(3)检查安装版本
protoc –version
安装成功则显示:libprotoc 2.6.1
若出现错误或者还是显示的老版本号,错误原因:protobuf的默认安装library路径是/usr/local/lib,而/usr/local/lib不在ubuntu体系默认的LD_LIBRARY_PATH里,所以就找不到lib
则:sudo gedit ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
source ~/.bashrc
至此,protoc安装好了
4. 安装 caffe-ssd
从此链接下载caffe-ssd(他已经添加好了 ReLU6层): https://github.com/chuanqi305/ssd
cd caffe-ssd
sudo cp Makefile.config.example Makefile.config
sudo gedit Makefile.config
4-1:
修改Makefile.config:
Sudo gedit Makefile.config
(1)USE_CUDNN := 1
(2)USE_OPENCV := 1
(3)OPENCV_VERSION := 3
(4)CUDA_DIR := /usr/local/cuda
(5)如果是CUDA-9.0及以上的话,则将20和21的算力去除,如下:
CUDA_ARCH := -gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_52,code=sm_52 \
-gencode arch=compute_60,code=sm_60 \
-gencode arch=compute_61,code=sm_61 \
-gencode arch=compute_61,code=compute_61
(6) 注:此处为指定虚拟环境下(最开始新建的虚拟环:py2ssd)python包的路径,而不是大环境下的python
ANACONDA_HOME := $(HOME)/anaconda2/envs/py2ssd
PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
$(ANACONDA_HOME)/include/python2.7 \
$(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include
假如虚拟环境是python3.6版本的,第(6)点如下执行:
ANACONDA_HOME := $(HOME)/anaconda2/envs/py3
PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
$(ANACONDA_HOME)/include/python3.6m \
$(ANACONDA_HOME)/lib/python3.6/site-packages/numpy/core/include
并将 PYTHON_LIBRARIES := boost_python-py35 python3.6m 打开,修改成这样(系统中只有boost_python-py35,没有boost_python-py36),重点:同时将找到虚拟环境中的libpython3.6m.so文件,复制到/usr/lib/x86_64-linux-gnu下;确认/usr/lib/x86_64-linux-gnu下的libboost_python-py35.so的文件存在,如果不存在py35后缀,而存在别的py3*文件,如libboost_python-py34.so,则修改Makefile.config中的PYTHON_LIBRARIES修改为boost_python-py34 python3.6m。
(7) PYTHON_LIB := $(ANACONDA_HOME)/lib
(8) WITH_PYTHON_LAYER := 1
(9)替换以下两行
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
将上两行换成下面:
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
(10) BUILD_DIR := build
(11) TEST_GPUID := 0
(12) Q ?= @
4-2:
修改Makefile:
sudo gedit Makefile
(1)将下面一行替换成下面一行
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5
改为下面:
LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial
(2) 指定protoc路径
$(Q)protoc --proto_path=$(PROTO_SRC_DIR) --cpp_out=$(PROTO_BUILD_DIR) $<
改为下面:
$(Q)/usr/local/bin/protoc --proto_path=$(PROTO_SRC_DIR) --cpp_out=$(PROTO_BUILD_DIR) $<
(3)将这行:
#NVCCFLAGS += -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
修改如下:
NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
注意:假如你是在python3的虚拟环境下配置caffe的话,还应将Makelfile的215行左右的PYTHON_LIBRARY := boost_python python2.7 修改为 PYTHON_LIBRARY := boost_python python35 (当你的python为python3.6时)
5.编译caffe:
make all -j4
make test -j4
make runtest -j4
make pycaffe
6. import caffe (添加环境变量)
若果提示ImportError: No module named caffe,需要把caffe下的Python路径导入环境变量中去。sudo vim ~/.bashrc,最后一行加上export PYTHONPATH="/home/lz/Documents/caffe-ssd/python:$PYTHONPATH",export LD_LIBRARY_PATH=/home/lz/Documents/caffe-ssd/build/lib:$LD_LIBRARY_PATH这里的路径写上你自己的路径,记得source ~/.bashrc。否则的话只能在这个目录下执行Python,导入caffe了。
错误锦集:
1. 报错 nvcc fatal : Unknown option ‘fPIC’
解决:
NVCCFLAGS += -D_FORCE_INLINES -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
在-Xcompiler前少了一个空格
2. 报错 PyErr_Print’未定义的引用; Py_NoneStruct等未定义,修改如下,问题解决
解决:PYTHON_LIBRARIES := boost_python-py35 python3.6m (我的python是3.6的)
3. 报错F0104 17:15:55.187031 27536 math_functions.cu:42] Check failed: status == CUBLAS_STATUS_SUCCESS (13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED
解决:这个错误一般是出在make runtest期间,可停掉其他终端的进程,make clean后再 重新编译即可通过
4. 报错
.build_release/lib/libcaffe.so: undefined reference to `boost::cpp_regex_traits<char>::toi(char const*&, char const*, int) const'
.build_release/lib/libcaffe.so: undefined reference to `boost::re_detail::get_default_error_string(boost::regex_constants::error_type)'
.build_release/lib/libcaffe.so: undefined reference to `boost::re_detail::cpp_regex_traits_implementation<char>::transform(char const*, char const*) const'
.build_release/lib/libcaffe.so: undefined reference to `boost::re_detail::put_mem_block(void*)'
.build_release/lib/libcaffe.so: undefined reference to `boost::basic_regex<char, boost::regex_traits<char, boost::cpp_regex_traits<char> > >::do_assign(char const*, char const*, unsigned int)'
.build_release/lib/libcaffe.so: undefined reference to `boost::re_detail::raise_runtime_error(std::runtime_error const&)'
解决:交叉编译caffe时,遇见boost函数undefined reference to `boost::xxxxxx
修改makefile文件,将需要的boost::xxxxxx库,加到LIBRARIES后面,修改Makefile 中的LIBRARIES ,将boost_regex加入进去
# We will also explicitly add stdc++ to the link target.
LIBRARIES += boost_regex boost_atomic boost_thread stdc++
5.报错
CXX/LD -o .build_release/test/test_all.testbin src/caffe/test/test_caffe_main.cpp
.build_release/cuda/src/caffe/test/test_im2col_kernel.o:在函数‘caffe::Im2colKernelTest_Test2D_Test::TestBody()’中:
tmpxft_000065d0_00000000-5_test_im2col_kernel.compute_61.cudafe1.cpp:(.text._ZN5caffe28Im2colKernelTest_Test2D_TestIdE8TestBodyEv[_ZN5caffe28Im2colKernelTest_Test2D_TestIdE8TestBodyEv]+0xd1f):对‘void caffe::im2col_gpu_kernel(int, double const*, int, int, int, int, int, int, int, int, int, int, int, int, double*)’未定义的引用
.build_release/cuda/src/caffe/test/test_im2col_kernel.o:在函数‘caffe::Im2colKernelTest_Test2D_Test::TestBody()’中:
解决:答案在这里 https://github.com/BVLC/caffe/issues/6790
6. 当 sudo apt-get install libboost-all-dev时:
报错:
The following packages have unmet dependencies:
libboost-all-dev : Depends: libboost-iostreams-dev but it is not going to be installed
Depends: libboost-python-dev but it is not going to be installed
Depends: libboost-regex-dev but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
解决:
查看 hold packages
$ dpkg --get-selections | grep hold
如果没有,使用 aptitude 安装
$ sudo apt-get install aptitude
$ sudo aptitude install libboost-all-dev
或者更换源
7. 报错
[libprotobuf FATAL google/protobuf/stubs/common.cc:78] This program was compiled against version 2.6.1 of the Protocol Buffer runtime library, which is not compatible with the installed version (3.0.0). Contact the program author for an update. If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library. (Version verification failed in "/build/mir-O8_xaj/mir-0.26.3+16.04.20170605/obj-x86_64-linux-gnu/src/protobuf/mir_protobuf.pb.cc".)
解决:
这就是protoc 版本不匹配或者冲突,文章最开始有讲述如何安装protoc, 可参考:https://blog.csdn.net/zhou4411781/article/details/100676193
8. 报错 Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERRO
解决: 更换CUDA 和CUDNN版本,我的从9.0 更换至9.2,问题解决
9. 报错 src/caffe/data_transformer.cpp:2:33: fatal error: opencv2/core/core.hpp: No such file or directory
compilation terminated.
Makefile:580: recipe for target '.build_release/src/caffe/data_transformer.o' failed
make: *** [.build_release/src/caffe/data_transformer.o] Error 1
make: *** Waiting for unfinished jobs....
解决:sudo apt-get install libopencv-dev
10. 报错 cannot find -lopencv_imgcodecs
解决:imgcodecs 是opencv3里带有的,但是我的opencv是3版本的,不知道为啥报错,将Makefile.config 里的,use_opencv = 1 打开,同时将opencv_version = 3注释掉,勉强不报错了
11. 报错
.build_release/tools/caffe: error while loading shared libraries: libcudart.so.9.0: cannot open shared object file: No such file or directory
解决: sudo cp /usr/local/cuda/lib64/libcudart.so.9.0 /usr/local/lib/libcudart.so.9.0 && sudo ldconfig (应该是CUDA 安装好后未添加环境变量)