[ RUN ] PowerLayerTest/3.TestPowerOneGradient
F0319 15:50:19.414253 22426 math_functions.cu:92] Check failed: status == CUBLAS_STATUS_SUCCESS (13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED
*** Check failure stack trace: ***
@ 0x7f574fe7e78d google::LogMessage::Fail()
@ 0x7f574fe80d43 google::LogMessage::SendToLog()
@ 0x7f574fe7e31b google::LogMessage::Flush()
@ 0x7f574fe7fc8e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f574e348e4e caffe::caffe_gpu_scal<>()
@ 0x7f574e336906 caffe::PowerLayer<>::Forward_gpu()
@ 0x455772 caffe::Layer<>::Forward()
@ 0x496300 caffe::GradientChecker<>::CheckGradientSingle()
@ 0x50ea94 caffe::GradientChecker<>::CheckGradientEltwise()
@ 0x6b69b5 caffe::PowerLayerTest<>::TestBackward()
@ 0x7ac6b3 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x7a5cca testing::Test::Run()
@ 0x7a5e18 testing::TestInfo::Run()
@ 0x7a5ef5 testing::TestCase::Run()
@ 0x7a71cf testing::internal::UnitTestImpl::RunAllTests()
@ 0x7a74f3 testing::UnitTest::Run()
@ 0x44bfa9 main
@ 0x7f574d5a8830 __libc_start_main
@ 0x451c39 _start
Makefile:478: recipe for target 'runtest' failed
make: *** [runtest] 已放弃 (core dumped)
首先明确,这是make runtest的错误,所以一定不能是代码问题。一定是我的配置问题。虽然是segnet作者改得caffe,但是应该没有问题。不过我还是打算用官方的caffe跑一下试试。
我之前那篇教程,安装教程http://www.cnblogs.com/SweetBeens/p/8525131.html,提到如果ubuntu16.04装9,可能会有问题,那么问题来了。官网给出的doc提到了这个问题:
The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons.
To correct: check that the hardware, an appropriate version of the driver, and the cuBLAS library are correctly installed.
github上有个比较火的讨论贴:
https://github.com/BVLC/caffe/issues/2417
大家的解决方法是装cuda8.0,还有说不用cudnn。
但由于我不到黄河心不死,不想用cuda8
那么有如下思路:
1,跑官网caffe,看看是不是因为segnet_caffe版本太低或者什么的
2.改下gcc版本
3.更换driver和cuda版本
等我试试。而且为什么别人的有一些可用16.04+9.1,奇怪了。
拟解决过程
1.我跑了官网的caffe,出现如下错误:
[ FAILED ] EmbedLayerTest/3.TestForward, where TypeParam = caffe::GPUDevice<double> (1 ms)
[ RUN ] EmbedLayerTest/3.TestGradient
[ OK ] EmbedLayerTest/3.TestGradient (101 ms)
[ RUN ] EmbedLayerTest/3.TestSetUp
[ OK ] EmbedLayerTest/3.TestSetUp (0 ms)
[ RUN ] EmbedLayerTest/3.TestForwardWithBias
F0319 17:26:25.959848 30839 math_functions.cu:42] Check failed: status == CUBLAS_STATUS_SUCCESS (13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED
*** Check failure stack trace: ***
@ 0x7f787028d78d google::LogMessage::Fail()
@ 0x7f787028fd43 google::LogMessage::SendToLog()
@ 0x7f787028d31b google::LogMessage::Flush()
@ 0x7f787028ec8e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f786e105672 caffe::caffe_gpu_gemm<>()
@ 0x7f786e13d75a caffe::EmbedLayer<>::Forward_gpu()
@ 0x476522 caffe::Layer<>::Forward()
@ 0x4f5ef6 caffe::EmbedLayerTest_TestForwardWithBias_Test<>::TestBody()
@ 0x90b393 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x9049aa testing::Test::Run()
@ 0x904af8 testing::TestInfo::Run()
@ 0x904bd5 testing::TestCase::Run()
@ 0x905eaf testing::internal::UnitTestImpl::RunAllTests()
@ 0x9061d3 testing::UnitTest::Run()
@ 0x469fed main
@ 0x7f786d2ec830 __libc_start_main
@ 0x471a69 _start
Makefile:532: recipe for target 'runtest' failed
make: *** [runtest] 已放弃 (core dumped)
解决方式
下面两位回复说下载patch能够解决问题。:
“”问题解决,谢谢!顺便提供下链接,当时找了好一会才找到:https://developer.nvidia.com/cuda-90-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal 这是cuda-9.0系列的,下载patchesPatch 2 (Released Mar 5, 2018)“”
可以参考下。
作者本人直接使用anaconda了,并且使用的是最新的显卡驱动,所以装什么版本都没有问题。
这个实际上使用anaconda基本上就不需要考虑版本问题,前置依赖了,是一种国际广泛使用的深度学习软件。推荐大家使用,如果latex一样,学会了就在也不想用回word了。