文章目录
现在安装和上一年安装相比要便捷的多,没有太多琐碎的步骤,
注意cuda、cudnn、GPU、tensorflow之间的版本对应关系。
资料
WIN10安装TENSORFLOW(GPU版本)详解(超详细,从零开始)
cuda工具集和显卡驱动版本对照表
1.安装cuda
谷歌搜索:cuda 10.2.141 driver
https://developer.nvidia.com/cuda-10.2-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal
避免安装不必要的组件,这里选择自定义安装
C盘空间充足,所以这里我不做更改
完毕
关于cuda的环境变量,我这里是安装完毕后自动添加的有。
在cuda samples文件夹中启动示例,查看运行输出,可以看到运行示例程序时在GPU、CPU上的时间。
2. 安装cudnn
访问该页面(url)查看cudnn和cuda版本对应关系
cudnn下载页面:https://developer.nvidia.com/rdp/form/cudnn-download-survey
根据系统版本,我这里选的是cudnn library for win10
下载解压后
将F:\下载\ChromeDownload\cudnn-10.2-windows10-x64-v7.6.5.32\cuda\bin\cudnn*.dll
复制到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin
路径下
F:\下载\ChromeDownload\cudnn-10.2-windows10-x64-v7.6.5.32\cuda\include\cudnn*.h
复制到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include
F:\下载\ChromeDownload\cudnn-10.2-windows10-x64-v7.6.5.32\cuda\lib\x64
复制到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64
打开一个cmd,键入control sysdm.cpl
添加环境变量
变量名:CUDA_PATH
变量的值:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
我这边在安装cuda时已经自动添加过这个环境变量。
如果在开发过程中使用的是visual studio 那么还需要将cudnn.lib
添加到你的项目中,在“项目”->“属性” “链接器” “输入” “附加依赖项” 中添加cudnn.lib并确定即可
测试tensorflow是否有使用GPU
新建python文件
运行如下内容
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
查看输出,可以看到GPU设备,说明成功~~
2020-07-20 19:46:00.994189: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-07-20 19:46:00.994416: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-07-20 19:46:03.496997: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-20 19:46:03.506445: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27217c3dcf0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-20 19:46:03.506899: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-07-20 19:46:03.518994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-20 19:46:03.575822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 6 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2020-07-20 19:46:03.579216: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-07-20 19:46:03.662971: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-20 19:46:03.714476: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-20 19:46:03.735304: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-20 19:46:03.804328: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-20 19:46:03.838115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-20 19:46:03.947359: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-20 19:46:03.947536: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-07-20 19:46:04.065192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-20 19:46:04.065428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-07-20 19:46:04.065533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-07-20 19:46:04.070126: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27217c3c970 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-20 19:46:04.070448: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 11094447684184939916
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 11168722219354010654
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 5706144976707029859
physical_device_desc: "device: XLA_GPU device"
]
3 在jupyternotebook中运行测试文件发现无法调用GPU
总是提示无法加载not load dynamic library 'cudart64_101.dll
返回第二步,看到有这一句提示信息:
2020-07-20 19:46:00.994189: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘cudart64_101.dll’; dlerror: cudart64_101.dll not found
进入C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin
下可以看到有cudart64_102.dll
但是没有cudart64_101.dll
解决方法:
方法1.修改文件名
方法2.在网上重新下载cudart64_101.dll
放入文件夹里,CUDART64_101.DLL
再次运行,ok!
参考资料:
cudart64_101.dll not found解决方法
https://blog.csdn.net/qq_32939413/article/details/105525025
4.status: Internal: invalid device function错误的解决方法
原因总结:cuda与tensorflow版本不兼容
tensorflow-gpu error | Non-OK-status: GpuLaunchKernel | status: Internal: invalid device function
接手了一个新的模型,模型中tensorflow版本为1.15.2,安装tensorflow-gpu==1.15.2后,运行时提示加载动态库失败,想着原本cuda10.2下有同前缀的dll文件,就是后面数字不同,于是改名称后再运行,发现动态库是能加载了,但是到最后却提示“status: Internal: invalid device function”
没办法,又去官网下载cuda10.0版本,安装后配置环境变量
也可以把10.0中的dll文件copy到10.2下
如图:
再次运行,成功使用GPU
5. 使用pytorch
到pytorch官网,根据自己情况进行选择,如图,使用给出的命令用conda工具安装
传送门
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
6.conda环境离线迁移
到anaconda安装路径下anaconda\envs
,找到你想迁移的环境名的文件夹。先打成压缩包,复制到新机器的同路径下即可,若conda list看不到复制过去的环境名,重启一下即可
该方法只适用于同一大版本下的anaconda,anaconda2到anaconda3这样的就不行了。
参考资料:
anaconda使用教程+直接环境拷贝移植所遇到的问题解决
7.一个显卡上同时训练tensorflow模型和pytorch模型
后来又有一个需要训练的模型,用的是pytorch
一开始先启动的是pytorch,再启动tensorflow时发现提示无可用的设备
错误信息如下:
2020-09-21 23:12:59.250765: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-09-21 23:12:59.253410: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "run_train_win.py", line 46, in <module>
run_train()
File "run_train_win.py", line 42, in run_train
train(args=args)
File "../..\keras_bert_ner\train.py", line 138, in train
validation_data=devs)
......
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(6272, 11), b.shape=(11, 11), m=6272, n=11, k=11
[[{{node loss/CRF_loss/crf_loss/MatMul_1}}]]
[[Mean/_831]]
(1) Internal: Blas GEMM launch failed : a.shape=(6272, 11), b.shape=(11, 11), m=6272, n=11, k=11
[[{{node loss/CRF_loss/crf_loss/MatMul_1}}]]
0 successful operations.
0 derived errors ignored.
搜集资料后看到有回答说是tensorflow启动时默认占用整个显卡,所以当tensorflow后启动时发现显卡设备已被使用,所以导致tensorflow无法正常加载
参考资料:
https://www.zhihu.com/question/353248304
周军:
我来说一个和显存无关的,一张卡上要先load tf 再load pytorch,不然会有cudnn 初始化错误
改为先启动tensorflow后启动pytorch,两者都顺利的启动了起来
后启动的pytorch中途抛错:RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
错误内容:
2020-09-21T23:37:43.852019 step: 1700, loss: 7405.79
Traceback (most recent call last):
File "train.py", line 162, in <module>
train(model, train_iter, optimizer, criterion, device)
File "train.py", line 32, in train
loss.backward()
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\tensor.py", line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\autograd\__init__.py", line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Exception raised from _cudnn_rnn_backward_input at ..\aten\src\ATen\native\cudnn\RNN.cpp:923 (most recent call first):
00007FFC409A75A200007FFC409A7540 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]
00007FFB87654F3600007FFB87654E80 torch_cuda.dll!at::native::Descriptor<cudnnRNNStruct,&cudnnCreateRNNDescriptor,&cudnnDestroyRNNDescriptor>::Descriptor<cu
dnnRNNStruct,&cudnnCreateRNNDescriptor,&cudnnDestroyRNNDescriptor> [<unknown file> @ <unknown line number>]
00007FFB8766BDBB00007FFB87669770 torch_cuda.dll!at::native::_cudnn_rnn_backward [<unknown file> @ <unknown line number>]
00007FFB87669CD000007FFB87669770 torch_cuda.dll!at::native::_cudnn_rnn_backward [<unknown file> @ <unknown line number>]
00007FFB876C284800007FFB8767E0A0 torch_cuda.dll!at::native::set_storage_cuda_ [<unknown file> @ <unknown line number>]
00007FFB876D107D00007FFB8767E0A0 torch_cuda.dll!at::native::set_storage_cuda_ [<unknown file> @ <unknown line number>]
00007FFBDE95BBF100007FFBDE8CD9D0 torch_cpu.dll!at::native::mkldnn_sigmoid_ [<unknown file> @ <unknown line number>]
00007FFBDE9AB9DA00007FFBDE9A8FA0 torch_cpu.dll!at::bucketize_out [<unknown file> @ <unknown line number>]
00007FFBDE992ECA00007FFBDE992D40 torch_cpu.dll!at::_cudnn_rnn_backward [<unknown file> @ <unknown line number>]
00007FFBDFC9088900007FFBDFC4E010 torch_cpu.dll!torch::autograd::GraphRoot::apply [<unknown file> @ <unknown line number>]
00007FFBDFC9D12D00007FFBDFC4E010 torch_cpu.dll!torch::autograd::GraphRoot::apply [<unknown file> @ <unknown line number>]
00007FFBDE95BBF100007FFBDE8CD9D0 torch_cpu.dll!at::native::mkldnn_sigmoid_ [<unknown file> @ <unknown line number>]
00007FFBDE9AB9DA00007FFBDE9A8FA0 torch_cpu.dll!at::bucketize_out [<unknown file> @ <unknown line number>]
00007FFBDE992ECA00007FFBDE992D40 torch_cpu.dll!at::_cudnn_rnn_backward [<unknown file> @ <unknown line number>]
00007FFBDFB9C12D00007FFBDFB9BAF0 torch_cpu.dll!torch::autograd::generated::CudnnRnnBackward::apply [<unknown file> @ <unknown line number>]
00007FFBDFB87E9100007FFBDFB87B50 torch_cpu.dll!torch::autograd::Node::operator() [<unknown file> @ <unknown line number>]
00007FFBE00EF9BA00007FFBE00EF300 torch_cpu.dll!torch::autograd::Engine::add_thread_pool_task [<unknown file> @ <unknown line number>]
00007FFBE00F03AD00007FFBE00EFFD0 torch_cpu.dll!torch::autograd::Engine::evaluate_function [<unknown file> @ <unknown line number>]
00007FFBE00F4FE200007FFBE00F4CA0 torch_cpu.dll!torch::autograd::Engine::thread_main [<unknown file> @ <unknown line number>]
00007FFBE00F4C4100007FFBE00F4BC0 torch_cpu.dll!torch::autograd::Engine::thread_init [<unknown file> @ <unknown line number>]
00007FFBC5FF0A7700007FFBC5FCA150 torch_python.dll!THPShortStorage_New [<unknown file> @ <unknown line number>]
00007FFBE00EBF1400007FFBE00EB780 torch_cpu.dll!torch::autograd::Engine::get_base_engine [<unknown file> @ <unknown line number>]
00007FFC819803BA00007FFC81980360 ucrtbase.dll!o_exp [<unknown file> @ <unknown line number>]
00007FFC82567E9400007FFC82567E80 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]
00007FFC84F87AD100007FFC84F87AB0 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]
搜集资料后看到如下解决方法
import torch.backends.cudnn
torch.backends.cudnn.enabled = False
定位到其在pytorch中的定义
# Add type annotation for the replaced module
enabled: bool
deterministic: bool
benchmark: bool
暂时找到这样一篇描述
pytorch torch.backends.cudnn设置作用
以及其它可能有用的资料
https://blog.csdn.net/qq_39938666/article/details/86611474
https://github.com/pytorch/pytorch/issues/17543
https://github.com/NVIDIA/tacotron2/issues/109
错误日志
CUDA out of memory
选的epochs和batch_size太大
100%|██████████████████████████████████████████████████████████████████████
████████████████████████████████| 6001/6001 [00:02<00:00, 2226.86it/s]
Load Data Done
Initial model...
Initial model Done
Start Train...
Traceback (most recent call last):
File "train.py", line 165, in <module>
train(model, train_iter, optimizer, criterion, device)
File "train.py", line 28, in train
loss = model.neg_log_likelihood(x, y) # logits: (N, T, VOCAB), y: (N, T)
File "D:\programing\Bert-BiLSTM-CRF-pytorch\Bert-BiLSTM-CRF-pytorch\crf.py", line 150, in neg_log_likelihood
feats = self._get_lstm_features(sentence) #[batch_size, max_len, 16]
File "D:\programing\Bert-BiLSTM-CRF-pytorch\Bert-BiLSTM-CRF-pytorch\crf.py", line 159, in _get_lstm_features
embeds = self._bert_enc(sentence) # [8, 75, 768]
File "D:\programing\Bert-BiLSTM-CRF-pytorch\Bert-BiLSTM-CRF-pytorch\crf.py", line 108, in _bert_enc
encoded_layer, _ = self.bert(x)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 733, in forward
output_all_encoded_layers=output_all_encoded_layers)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 406, in forward
hidden_states = layer_module(hidden_states, attention_mask)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 392, in forward
intermediate_output = self.intermediate(attention_output)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 365, in forward
hidden_states = self.intermediate_act_fn(hidden_states)
File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 124, in gelu
return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
RuntimeError: CUDA out of memory. Tried to allocate 754.00 MiB (GPU 0; 11.00 GiB total capacity; 4.27 GiB already allocated; 524.59 MiB free; 8.10
GiB reserved in total by PyTorch)