caffe errors

最新推荐文章于 2024-07-25 11:07:00 发布

mjiansun

最新推荐文章于 2024-07-25 11:07:00 发布

阅读量637

点赞数

分类专栏： Caffe

本文链接：https://blog.csdn.net/u013066730/article/details/79379694

版权

Caffe 专栏收录该内容

58 篇文章 5 订阅

订阅专栏

Caffe

1、caffe相信大家都很熟悉了，下面是一些基础依赖库

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler  
sudo apt-get install --no-install-recommends libboost-all-dev  
sudo apt-get install python-skimage ipython python-pil python-h5py ipython python-gflags python-yaml  
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev

2、克隆caffe

cd ~/git  
git clone https://github.com/BVLC/caffe.git  
cd caffe  
cp Makefile.config.example Makefile.config

3、如果安装了cuDNN然后把Makefile文件的USE_CUDNN := 1注释去掉

sed -i 's/# USE_CUDNN := 1/USE_CUDNN := 1/' Makefile.config

4、如果安装了OpenBLAS，修改BLAS参数

sed -i 's/BLAS := atlas/BLAS := open/' Makefile.config

5、安装需求build和测试caffe，编译PyCaffe

sudo pip install -r python/requirements.txt  
make all -j $(($(nproc) + 1))  
make test -j $(($(nproc) + 1))  
make runtest -j $(($(nproc) + 1))  
make pycaffe -j $(($(nproc) + 1))

6、添加caffe的环境变量

echo 'export CAFFE_ROOT=$(pwd)' >> ~/.bashrc  
echo 'export PYTHONPATH=$CAFFE_ROOT/python:$PYTHONPATH' >> ~/.bashrc  
source ~/.bashrc

7、测试caffe接口

ipython  
>>> import caffe  
>>> exit()

理论上以上都能成功安装，但是还是可能会报错的，这篇博客对报错描写的很详细

http://blog.csdn.net/u012576214/article/details/68947893

解决办法是依据出现错误的顺序而给出的，为了方便，可以直接先执行所有解决办法后再安装caffe。

1. ./include/caffe/common.hpp:5:27: fatal error: gflags/gflags.h: No such file or directory

解决办法：

sudo apt-get install libgflags-dev

2. ./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory

解决办法：

sudo apt-get install libblas-dev

3. ./include/caffe/util/hdf5.hpp:6:18: fatal error: hdf5.h: No such file or directory

lhdf5, lhdf5_hl

解决办法：在Makefile.config找到以下行并添加蓝色部分

//重要的一项
将# Whatever else you find you need goes here.下面的
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
修改为：
INCLUDE_DIRS :=  $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial
//这是因为ubuntu16.04的文件包含位置发生了变化，尤其是需要用到的hdf5的位置，所以需要更改这一路径

cd /usr/lib/x86_64-linux-gnu

\\然后根据情况执行下面两句：
sudo ln -s libhdf5_serial.so.10.1.0 libhdf5.so
sudo ln -s libhdf5_serial_hl.so.10.0.2 libhdf5_hl.so

4. ./include/caffe/util/db_lmdb.hpp:8:18: fatal error: lmdb.h: No such file or directory

解决办法：

sudo apt install liblmdb-dev

5. /usr/bin/ld: cannot find -lcblas
/usr/bin/ld: cannot find -latlas

解决办法：

sudo apt install libatlas-base-dev

6.libopencv-dev依赖出了问题，无法安装

在执行sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler这一行特么的，就libopencv-dev这个包出了问题

出现下面问题，真他么是日了，网上的一大堆教程都试过，最终还是选择的换源，但是换什么源比较适合，又试了一大堆，

libopencv-dev : 依赖: libopencv-objdetect-dev (= 2.4.8+dfsg1-2ubuntu1) 。。。

下面参考http://blog.csdn.net/wopawn/article/details/52302164

开始添加源

终端输入

cd /etc/apt/  
sudo cp sources.list sources.list.bak

然后

sudo gedit /etc/apt/sources.list

将下面源添加到最后一行

deb http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse

然后更新源和更新已安装的包：
终端输入

sudo apt-get update
sudo apt-get upgrade

再安装这些依赖应该就没问题了

sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libhdf5-serial-dev protobuf-compiler

7.遇到 "libcudart.so.8.0 cannot open shared object file: No such file or directory

解决办法是将一些文件复制到/usr/local/lib文件夹下：

#注意自己CUDA的版本号！

sudo cp /usr/local/cuda-8.0/lib64/libcudart.so.8.0 /usr/local/lib/libcudart.so.8.0 && sudo ldconfig

或者

昨天在服务器上安装darknet，想跑Real-Time Object Detection，需要配置CUDA，服务器里已经安装过了，修改makefile：

GPU,cudnn,opencv设为1

nvcc=/usr/local/cuda-8.0/bin/nvcc

然后make出现了

libcudart.so.8.0: cannot open shared object file: No such file or directory。

在网上找了很久发现了一个解决办法。

sudo ldconfig /usr/local/cuda/lib64

一句话奇迹般的解决了我的问题，查了一下ldconfig作用。ldconfig通常在系统启动时运行，而当用户安装了一个新的动态链接库时，就需要手工运行这个命令。

8.遇到Failed to compile cuda_ndarray.cu: libcublas.so.7.5: cannot open shared object file(CUDA7.5)

sudo ldconfig /usr/local/cuda-7.5/lib64

9.build_release/lib/libcaffe.so: undefined reference to cv::imread(cv::String const&, int)’ .

首先，我是已经配置过了opencv的，可以这样查询安装版本：

$ pkg-config --modversion opencv

因为编译好了，理所当然，输出结果是3.1.0

所以出现上面的错误，应该是opencv_imgcodecs链接的问题，比较有效的解决方案是，把opencv需要的lib添加到Makefile文件中，找到LIBRARIES（在PYTHON_LIBRARIES := boost_python python2.7 前一行）并修改为：

LIBRARIES += glog gflags protobuf leveldb snappy \        lmdb boost_system hdf5_hl hdf5 m \        opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs

问题10 syncedmem.hpp: 18 Check failed: error == cudaSuccess (30 vs. 0)

2016.06.28 更新

今天倩姐说她的torch跑不起来，我看了一下，可能是CUDA出问题了。我又将服务器上的caffe重新编译，果然不出所料，遇到的如下问题：

F0628 15:34:16.652927 50205 syncedmem.hpp:18] Check failed: error == cudaSuccess (30 vs. 0) unknown error
* Check failure stack trace: *
@ 0x2ab5de98fdaa (unknown)
@ 0x2ab5de98fce4 (unknown)
@ 0x2ab5de98f6e6 (unknown)
@ 0x2ab5de992687 (unknown)
@ 0x2ab5e0959ef9 caffe::SyncedMemory::mutable_cpu_data()
@ 0x2ab5e0957618 caffe::Blob<>::Reshape()
@ 0x2ab5e0957c7a caffe::Blob<>::Reshape()
@ 0x57643c caffe::MemoryDataLayerTest<>::SetUp()
@ 0x8fa70a testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x8efd71 testing::Test::Run()
@ 0x8efec7 testing::TestInfo::Run()
@ 0x8f0005 testing::TestCase::Run()
@ 0x8f027d testing::internal::UnitTestImpl::RunAllTests()
@ 0x8fa28a testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x8ef641 testing::UnitTest::Run()
@ 0x46d027 main
@ 0x2ab5e1933f45 (unknown)
@ 0x4748e9 (unknown)
@ (nil) (unknown)
make: * [runtest] Aborted (core dumped)

我在这里找到了答案，安装个东西就可以了：sudo apt-get install nvidia-modprobe

之后，再make runtest -j，搞定！

问题11 undefined reference to imdecode( )

今天给吉姐编译 Caffe 的时候，碰到如下的错误：

.build_release/lib/libcaffe.so: undefined reference to cv::imdecode(cv::_InputArray const&, int)
.build_release/lib/libcaffe.so: undefined reference to cv::imread(cv::String const&, int)

可能的解决方法包括：
1. Makefile.config中pkg-config --modversion opencv取消注释（亲测可用）
2. Makefile.config中OPENCV_VERSION := 3取消注释（亲测可用）
3. Makefile中找到LIBRARIES（在PYTHON_LIBRARIES := boost_python python2.7 前一行）并修改为（亲测可用）：

LIBRARIES += glog gflags protobuf leveldb snappy \
             lmdb boost_system hdf5_hl hdf5 m \ 
             opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs

参考：https://blog.csdn.net/u011636440/article/details/82660697?utm_source=blogxgwz9

问题12 libopencv_core.so.3.1: connot open shared object file: No such file or directory

但是随后又碰到了一个问题，是在 make runtest -j 的时候，报如下错误：

Error while loading libraries: libopencv_core.so.3.1: connot open shared object file: No such file or directory

Google 之，参考如下两个，找到解决办法：
1. https://github.com/BVLC/caffe/issues/3700
2. http://www.cnphp6.com/archives/141601

报这个错误是因为找不到 openCV3 的库，可以使用下面方式导入：

export LD_LIBRARY_PATH =/usr/local/lib:$LD_LIBRARY_PATH

再 make runtest -j 的时候，就全部成功了。

问题13： Check failed: error == cudaSuccess (8 vs. 0) invalid device function

今天看一篇Paper的时候，要用到Facebook基于caffe改动的适用于3D卷积的代码：C3D: a modified version of BVLC caffe to support 3D ConvNets。于是就git下来，进行配置，Facebook用的caffe是很早之前的caffe了，看源码应该是2014年的。
在配置时，make all -j、make test -j都通过了，唯独在make runtest -j这里卡住了，把我这个“专业配置caffe50年”的“老手”都难住了。但经过google，还是找到了解决办法。
我的这个解决办法不一定适用于你的，但如果能帮到你，那真是太好了！^_^…

我的问题如下：

出现问题，Google之，最后问题定位在Makefile.config中的这一部分：

# CUDA architecture setting: going with all of them (up to CUDA 5.5 compatible).
# For the latest architecture, you need to install CUDA >= 6.0 and uncomment
# the *_50 lines below.
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
             -gencode arch=compute_20,code=sm_21 \
             -gencode arch=compute_30,code=sm_30 \
             -gencode arch=compute_35,code=sm_35
             #-gencode=arch=compute_50,code=sm_50 \
             #-gencode=arch=compute_50,code=compute_50 \

这是我开始时未改动的Makefile.config中的部分，这种错误的情况是由于显卡计算能力的不同而又没配置好导致的。要将上面的CUDA_ARCH参数改为与你显卡相匹配的数值。
常见的显卡计算能力如下表：

我的是TITAN X计算能力是5.2，因此，我将上面的Makefile.config文件中的CUDA_ARCH参数改为如下：

CUDA_ARCH := #-gencode arch=compute_20,code=sm_20 \
             #-gencode arch=compute_20,code=sm_21 \
             #-gencode arch=compute_30,code=sm_30 \
             #-gencode arch=compute_35,code=sm_35
             #-gencode=arch=compute_50,code=sm_50 \
             #-gencode=arch=compute_50,code=compute_50 \
             -gencode arch=compute_52,code=compute_52

就是把其余的都注释掉，增加一行自己显卡与之相对应计算能力的设置：

CUDA_ARCH := -gencode arch=compute_52,code=compute_52

再重新编译caffe，再make runtest -j:

至于YOU HAVE 2 DISABLED TESTS，参见我这篇博客里，直接忽略掉，不影响。

Reference:
1. http://blog.csdn.net/u013078356/article/details/51009470
2. http://www.cnblogs.com/yymn/articles/5389904.html

问题14 fatal error: pyconfig.h: No such file or directory

紧接着问题1的环境，我在make pycaffe的时候，又报如下错误：

/usr/include/boost/python/detail/wrap_python.hpp:50:23: fatal error: pyconfig.h: No such file or directory
compilation terminated.
make: *** [python/caffe/_caffe.so] Error 1

我解决的方法参考自这个网页：

所以，按照大神的指示，敲：

$ export CPLUS_INCLUDE_PATH=/usr/include/python2.7

搞定～

之后，import caffe也能成功import.

问题15 cuDNN 版本问题导致在 `make` 时在 `cudnn_conv_layer` 报错

今天在编译fast-rcnn的 caffe 时，报如下错误：

src/caffe/layers/cudnn_conv_layer.cu: error: argument of type cudnnAddMode_t is incompatible with parameter of type const void *
detected during instantiation of void caffe::CuDNNConvolutionLayer Dtype Forward_gpu(const std vector caffe Blob Dtype *, std allocator caffe Blob Dtype &, const std vector caffe Blob Dtype , std allocator caffe Blob Dtype &) [with Dtype=float]
…………
src/caffe/layers/cudnn_conv_layer.cu: error: argument of type “const void *” is incompatible with parameter of type “cudnnTensorDescriptor_t”
…………
src/caffe/layers/cudnn_conv_layer.cu: error: argument of type “const void *” is incompatible with parameter of type “cudnnTensorDescriptor_t”
…………

20 errors detected in the compilation of “/tmp/tmpxft_000045c5_00000000-16_cudnn_conv_layer.compute_50.cpp1.ii”.

make: * [.build_debug/cuda/src/caffe/layers/cudnn_conv_layer.o] Error 1
make: * Waiting for unfinished jobs….

截图如下：

这种情况一般是 cuDNN 版本链接问题导致的，要么升级 cuDNN 的版本，要么将 cuDNN 的版本进行降级。这里，我一般要么是将 cnDNN v2 升级到 cuDNN v4，要么将 cuDNN v4 降级到 cuDNN v2，。虽说现在 cuDNN 的版本已经到 v5 了，但目前我刚刚说的两种思路都能解决我遇到的问题。
之后，fast-rcnn编译成功。

问题16 caffe/ proto/ caffe.pb.h: No such file or directory

这个问题，也是我在编译 fast-rcnn时遇到的：

In file included from ./include/caffe/util/device_alternate.hpp:40:0,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from ./include/caffe/layer.hpp:8,
from src/caffe/layer_factory.cpp:3:

./include/caffe/util/cudnn.hpp:8:34: fatal error: caffe/proto/caffe.pb.h: No such file or directory

compilation terminated.

这个问题的解决办法参考这个博客：http://blog.csdn.net/xmzwlw/article/details/48270225

用protoc从caffe/src/caffe/proto/caffe.proto生成caffe.pb.h和caffe.pb.cc

$ protoc --cpp_out=/home/chenxp/caffe/include/caffe/ caffe.proto

这个解决办法几乎百试百灵。

问题17 配置SSD-caffe测试时出现“Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal”解决

这是由于GPU数量不匹配造成的，如果训练自己的数据，那么我们只需要将solver.prototxt文件中的device_id项改为自己的GPU块数，一块就是0，两块就是1，以此类推。

但是SSD配置时的例子是将训练语句整合成一个python文件ssd_pascal.py，所以需要改此代码。相关配置训练方法请参看转载博文：http://blog.csdn.net/xunan003/article/details/78427446

解决方法：将ssd_pascal.py文件中第332行gpus = "0，1，2，3"的GPU选择改为gpus = "0"，后面的1，2，3都删掉即可。再次训练即可。

问题18 SSD出现“Check failed: error == cudaSuccess (2 vs. 0) invalid ...”的错误

当然，由于博主只有一块GPU且电脑运行内存有限，还需要将ssd_pascal.py文件中的337行batch_size = 32和338行accum_batch_size = 32都改小一倍，即更改批量大小，不然会出现“Check failed: error == cudaSuccess (2 vs. 0) invalid ...”的错误。

问题19 /usr/bin/ld: warning: libcudart.so.8.0, needed by /usr/local/lib/libopencv_core.so, not found (try using -rpath or -rpath-link)

问题也可以是：error while loading shared libraries: libcudart.so.8.0: cannot open shared object file: No such file or directory

解决办法：首先确认/etc/profile中的路径包含了cuda8.0的安装路径及相应的库文件

export PATH=$PATH:/usr/local/cuda-8.0/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-8.0/lib64
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-8.0/lib64

$source /etc/profile

使配置文件生效，再次执行。

若仍提示相同的错误，则执行以下命令，将相应的库文件复制到/usr/lib

sudo cp /usr/local/cuda-8.0/lib64/libcudart.so.8.0 /usr/local/lib/libcudart.so.8.0 && sudo ldconfig

sudo cp /usr/local/cuda-8.0/lib64/libcublas.so.8.0 /usr/local/lib/libcublas.so.8.0 && sudo ldconfig

sudo cp /usr/local/cuda-8.0/lib64/libcurand.so.8.0 /usr/local/lib/libcurand.so.8.0 && sudo ldconfig

ps. ldconfig命令是一个动态链接库管理命令，是为了让动态链接库为系统共享

问题20

F0913 14:55:17.167636 4775 relu_layer.cu:26] Check failed: error == cudaSuccess (11 vs. 0) invalid argument
*** Check failure stack trace: ***
    @     0x7f43fea005cd google::LogMessage::Fail()
    @     0x7f43fea02433 google::LogMessage::SendToLog()
    @     0x7f43fea0015b google::LogMessage::Flush()
    @     0x7f43fea02e1e google::LogMessageFatal::~LogMessageFatal()
    @     0x7f43ff30c1aa caffe::ReLULayer<>::Forward_gpu()
    @     0x7f43ff21e972 caffe::Net<>::ForwardFromTo()
    @     0x7f43ff21ea97 caffe::Net<>::Forward()
    @     0x7f43ff2d0d80 caffe::Solver<>::Step()
    @     0x7f43ff2d180e caffe::Solver<>::Solve()
    @     0x7f43ff2ee9f4 caffe::P2PSync<>::Run()
    @           0x40b0d0 train()
    @           0x4077c8 main
    @     0x7f43fd196830 __libc_start_main
    @           0x408099 _start
    @              (nil) (unknown)
Aborted (core dumped)

出现这种情况不要慌张，具体会有2中出现情况，不同情况不同的解决方法。

第一种：

如果出现在编译阶段，只能说朋友你换张好点的显卡吧。

第二种：

如果编译通过，但是在运行程序时碰到该问题，那么把caffe重新编译下就好了，具体操作：

cd /home/caffe
make clean
make all -j $(($(nproc) + 1))  
make test -j $(($(nproc) + 1))  
make runtest -j $(($(nproc) + 1))  
make pycaffe -j $(($(nproc) + 1))

然后使用gedit ~/.bashrc打开环境变量文件，在文件末尾另起一行，加入如下代码：

export CAFFE_ROOT=/home/mjsun/git/caffe-segnet
export PYTHONPATH=$CAFFE_ROOT/python:$PYTHONPATH

最后使用source ~/.bashrc就OK。

我出现的是第二种情况，真是够了，我以为有各种问题，差点把显卡驱动和cuda什么的全重装，但忍住了，抱着试试看的态度，重新编译了caffe，居然成功了，o_0。

问题21

AttributeError: 'LayerParameter' object has no attribute 'cpm_transform_param'

解决方法：

1）查看层有没有实现)

2）实现后有没有编译pycaffe

问题22

Unknown bottom blob 'data' (layer 'conv1', bottom index 0)

引发原因：

缺少对应的data.

解决方法：

可能是在训练的时候加入测试迭代数，可是网络文件里并没有测试网络。

http://blog.csdn.net/zziahgf/article/details/72900948

https://blog.csdn.net/u010167269/article/details/50703923

mjiansun

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

caffe errors

Caffe

1. ./include/caffe/common.hpp:5:27: fatal error: gflags/gflags.h: No such file or directory

2. ./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory

3. ./include/caffe/util/hdf5.hpp:6:18: fatal error: hdf5.h: No such file or directory

4. ./include/caffe/util/db_lmdb.hpp:8:18: fatal error: lmdb.h: No such file or directory

5. /usr/bin/ld: cannot find -lcblas /usr/bin/ld: cannot find -latlas

6.libopencv-dev依赖出了问题，无法安装

7.遇到 "libcudart.so.8.0 cannot open shared object file: No such file or directory

或者

8.遇到Failed to compile cuda_ndarray.cu: libcublas.so.7.5: cannot open shared object file(CUDA7.5)

9.build_release/lib/libcaffe.so: undefined reference to cv::imread(cv::String const&, int)’ .

问题10 syncedmem.hpp: 18 Check failed: error == cudaSuccess (30 vs. 0)

问题11 undefined reference to imdecode( )

问题12 libopencv_core.so.3.1: connot open shared object file: No such file or directory

问题13： Check failed: error == cudaSuccess (8 vs. 0) invalid device function

问题14 fatal error: pyconfig.h: No such file or directory

问题15 cuDNN 版本问题导致在 make 时在 cudnn_conv_layer 报错

问题16 caffe/ proto/ caffe.pb.h: No such file or directory

问题17 配置SSD-caffe测试时出现“Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal”解决

问题18 SSD出现“Check failed: error == cudaSuccess (2 vs. 0) invalid ...”的错误

问题19 /usr/bin/ld: warning: libcudart.so.8.0, needed by /usr/local/lib/libopencv_core.so, not found (try using -rpath or -rpath-link)

问题也可以是：error while loading shared libraries: libcudart.so.8.0: cannot open shared object file: No such file or directory

问题20

问题21

问题22

5. /usr/bin/ld: cannot find -lcblas
/usr/bin/ld: cannot find -latlas

问题15 cuDNN 版本问题导致在 `make` 时在 `cudnn_conv_layer` 报错