1、训练过程中报错:
Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
原因:GPU 被跑满,可以用nvidia-smi
命令查看GPU使用情况
解决:法1,关掉其他app;法2,将Makefile.config中的 USE_CUDNN := 1注释掉,不使用cudnn加速,重新使用make all -j8
编译(-jx表示使用x个线程,根据自己的电脑配置选择)
2、import caffe 报错:No module named _caffe
原因:重新编译caffe后忘记编译pycaffe
解决:make pycaffe
3、import caffe 报错:No module named caffe
原因:没有添加caffe/python目录到bash shell中
解决:在~/.bashrc文件中添加export PYTHONPATH="/home/abc/caffe/python:$PYTHONPATH"
,执行命令source ~/.bashrc
4、执行compute_image_mean命令报错:
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted (core dumped)
原因:label.txt文件中,label没有从0开始
解决:重新生成label.txt,label要从0开始,如下:
/home/wangjiachun/Documents/caffe/data/att_faces/train/s1/1.jpg 0
/home/wangjiachun/Documents/caffe/data/att_faces/train/s1/2.jpg 0
/home/wangjiachun/Documents/caffe/data/att_faces/train/s1/3.jpg 0
/home/wangjiachun/Documents/caffe/data/att_faces/train/s1/4.jpg 0
/home/wangjiachun/Documents/caffe/data/att_faces/train/s1/5.jpg 0
/home/wangjiachun/Documents/caffe/data/att_faces/train/s1/6.jpg 0
/home/wangjiachun/Documents/caffe/data/att_faces/train/s2/1.jpg 1
/home/wangjiachun/Documents/caffe/data/att_faces/train/s2/2.jpg 1
/home/wangjiachun/Documents/caffe/data/att_faces/train/s2/3.jpg 1
/home/wangjiachun/Documents/caffe/data/att_faces/train/s2/4.jpg 1
/home/wangjiachun/Documents/caffe/data/att_faces/train/s2/5.jpg 1
/home/wangjiachun/Documents/caffe/data/att_faces/train/s2/6.jpg 1
/home/wangjiachun/Documents/caffe/data/att_faces/train/s3/1.jpg 2
/home/wangjiachun/Documents/caffe/data/att_faces/train/s3/2.jpg 2
5、import caffe
报错如下:
Failed to include caffe_pb2, things might go wrong!
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/abc/workplace/caffe/python/caffe/__init__.py", line 4, in <module>
from .proto.caffe_pb2 import TRAIN, TEST
File "/home/abc/workplace/caffe/python/caffe/proto/caffe_pb2.py", line 17, in <module>
serialized_pb='\n\x0b\x63\x61\x66\x66\x65.proto\x12\x05\x63\x61\x66\x66\x65\"\x1c\n\tBlobShape\x12\x0f\n\x03\x64i...'
File "/home/abc/anaconda3/lib/python3.6/site-packages/google/protobuf/descriptor.py", line 824, in __new__
return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: expected bytes, str found
原因:编译生成的caffe_pb2.py有问题
解决:找一份正确的caffe_pb2.py替换原来的caffe/python/caffe/proto/caffe_pb2.py即可
6、训练过程中报错:
Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
解决:减小batch size