win10 配置cuda、cudnn、tensorflow、pytorch过程记录


现在安装和上一年安装相比要便捷的多,没有太多琐碎的步骤,

注意cuda、cudnn、GPU、tensorflow之间的版本对应关系。

资料

tensorflow官网

Win10安装tensorflow-gpu步骤

windows下cuda的安装

cudnn官方文档

tensorflow查看使用的是cpu还是gpu

WIN10安装TENSORFLOW(GPU版本)详解(超详细,从零开始)

cuda工具集和显卡驱动版本对照表
在这里插入图片描述

1.安装cuda

在这里插入图片描述
在这里插入图片描述

谷歌搜索:cuda 10.2.141 driver
https://developer.nvidia.com/cuda-10.2-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exelocal
在这里插入图片描述

在这里插入图片描述
避免安装不必要的组件,这里选择自定义安装
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
C盘空间充足,所以这里我不做更改
在这里插入图片描述
在这里插入图片描述
完毕
在这里插入图片描述
在这里插入图片描述
关于cuda的环境变量,我这里是安装完毕后自动添加的有。
在这里插入图片描述

在cuda samples文件夹中启动示例,查看运行输出,可以看到运行示例程序时在GPU、CPU上的时间。
在这里插入图片描述

在这里插入图片描述

2. 安装cudnn

访问该页面(url)查看cudnn和cuda版本对应关系
在这里插入图片描述
cudnn下载页面:https://developer.nvidia.com/rdp/form/cudnn-download-survey
根据系统版本,我这里选的是cudnn library for win10
在这里插入图片描述
下载解压后
F:\下载\ChromeDownload\cudnn-10.2-windows10-x64-v7.6.5.32\cuda\bin\cudnn*.dll复制到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin路径下
在这里插入图片描述
F:\下载\ChromeDownload\cudnn-10.2-windows10-x64-v7.6.5.32\cuda\include\cudnn*.h复制到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include
在这里插入图片描述
F:\下载\ChromeDownload\cudnn-10.2-windows10-x64-v7.6.5.32\cuda\lib\x64复制到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64
在这里插入图片描述
打开一个cmd,键入control sysdm.cpl
在这里插入图片描述
添加环境变量
变量名:CUDA_PATH
变量的值:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
在这里插入图片描述
我这边在安装cuda时已经自动添加过这个环境变量。

如果在开发过程中使用的是visual studio 那么还需要将cudnn.lib添加到你的项目中,在“项目”->“属性” “链接器” “输入” “附加依赖项” 中添加cudnn.lib并确定即可
在这里插入图片描述

测试tensorflow是否有使用GPU

新建python文件
运行如下内容

from tensorflow.python.client import device_lib

print(device_lib.list_local_devices())

查看输出,可以看到GPU设备,说明成功~~

2020-07-20 19:46:00.994189: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-07-20 19:46:00.994416: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-07-20 19:46:03.496997: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-07-20 19:46:03.506445: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27217c3dcf0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-07-20 19:46:03.506899: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-07-20 19:46:03.518994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-07-20 19:46:03.575822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 Ti computeCapability: 6.1
coreClock: 1.62GHz coreCount: 6 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 104.43GiB/s
2020-07-20 19:46:03.579216: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-07-20 19:46:03.662971: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-07-20 19:46:03.714476: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-07-20 19:46:03.735304: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-07-20 19:46:03.804328: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-07-20 19:46:03.838115: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-07-20 19:46:03.947359: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-07-20 19:46:03.947536: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-07-20 19:46:04.065192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-20 19:46:04.065428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-07-20 19:46:04.065533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-07-20 19:46:04.070126: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27217c3c970 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-07-20 19:46:04.070448: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 11094447684184939916
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 11168722219354010654
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 5706144976707029859
physical_device_desc: "device: XLA_GPU device"
]

3 在jupyternotebook中运行测试文件发现无法调用GPU

总是提示无法加载not load dynamic library 'cudart64_101.dll

返回第二步,看到有这一句提示信息:
2020-07-20 19:46:00.994189: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library ‘cudart64_101.dll’; dlerror: cudart64_101.dll not found

进入C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin下可以看到有cudart64_102.dll但是没有cudart64_101.dll

解决方法:
方法1.修改文件名
方法2.在网上重新下载cudart64_101.dll放入文件夹里,CUDART64_101.DLL

再次运行,ok!
在这里插入图片描述

参考资料:
cudart64_101.dll not found解决方法
https://blog.csdn.net/qq_32939413/article/details/105525025

4.status: Internal: invalid device function错误的解决方法

原因总结:cuda与tensorflow版本不兼容

tensorflow-gpu error | Non-OK-status: GpuLaunchKernel | status: Internal: invalid device function

接手了一个新的模型,模型中tensorflow版本为1.15.2,安装tensorflow-gpu==1.15.2后,运行时提示加载动态库失败,想着原本cuda10.2下有同前缀的dll文件,就是后面数字不同,于是改名称后再运行,发现动态库是能加载了,但是到最后却提示“status: Internal: invalid device function”

没办法,又去官网下载cuda10.0版本,安装后配置环境变量

也可以把10.0中的dll文件copy到10.2下
如图:
在这里插入图片描述

再次运行,成功使用GPU

5. 使用pytorch

到pytorch官网,根据自己情况进行选择,如图,使用给出的命令用conda工具安装
传送门
在这里插入图片描述

conda install pytorch torchvision cudatoolkit=10.2 -c pytorch

6.conda环境离线迁移

到anaconda安装路径下anaconda\envs,找到你想迁移的环境名的文件夹。先打成压缩包,复制到新机器的同路径下即可,若conda list看不到复制过去的环境名,重启一下即可

该方法只适用于同一大版本下的anaconda,anaconda2到anaconda3这样的就不行了。

参考资料:
anaconda使用教程+直接环境拷贝移植所遇到的问题解决

7.一个显卡上同时训练tensorflow模型和pytorch模型

后来又有一个需要训练的模型,用的是pytorch

一开始先启动的是pytorch,再启动tensorflow时发现提示无可用的设备
错误信息如下:

2020-09-21 23:12:59.250765: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-09-21 23:12:59.253410: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "run_train_win.py", line 46, in <module>
    run_train()
  File "run_train_win.py", line 42, in run_train
    train(args=args)
  File "../..\keras_bert_ner\train.py", line 138, in train
    validation_data=devs)
......
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Blas GEMM launch failed : a.shape=(6272, 11), b.shape=(11, 11), m=6272, n=11, k=11
         [[{{node loss/CRF_loss/crf_loss/MatMul_1}}]]
         [[Mean/_831]]
  (1) Internal: Blas GEMM launch failed : a.shape=(6272, 11), b.shape=(11, 11), m=6272, n=11, k=11
         [[{{node loss/CRF_loss/crf_loss/MatMul_1}}]]
0 successful operations.
0 derived errors ignored.

搜集资料后看到有回答说是tensorflow启动时默认占用整个显卡,所以当tensorflow后启动时发现显卡设备已被使用,所以导致tensorflow无法正常加载

参考资料:
https://www.zhihu.com/question/353248304
周军:
我来说一个和显存无关的,一张卡上要先load tf 再load pytorch,不然会有cudnn 初始化错误

改为先启动tensorflow后启动pytorch,两者都顺利的启动了起来在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

后启动的pytorch中途抛错:RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

错误内容:

2020-09-21T23:37:43.852019 step: 1700, loss: 7405.79
Traceback (most recent call last):
  File "train.py", line 162, in <module>
    train(model, train_iter, optimizer, criterion, device)
  File "train.py", line 32, in train
    loss.backward()
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\autograd\__init__.py", line 127, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Exception raised from _cudnn_rnn_backward_input at ..\aten\src\ATen\native\cudnn\RNN.cpp:923 (most recent call first):
00007FFC409A75A200007FFC409A7540 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>]
00007FFB87654F3600007FFB87654E80 torch_cuda.dll!at::native::Descriptor<cudnnRNNStruct,&cudnnCreateRNNDescriptor,&cudnnDestroyRNNDescriptor>::Descriptor<cu
dnnRNNStruct,&cudnnCreateRNNDescriptor,&cudnnDestroyRNNDescriptor> [<unknown file> @ <unknown line number>]
00007FFB8766BDBB00007FFB87669770 torch_cuda.dll!at::native::_cudnn_rnn_backward [<unknown file> @ <unknown line number>]
00007FFB87669CD000007FFB87669770 torch_cuda.dll!at::native::_cudnn_rnn_backward [<unknown file> @ <unknown line number>]
00007FFB876C284800007FFB8767E0A0 torch_cuda.dll!at::native::set_storage_cuda_ [<unknown file> @ <unknown line number>]
00007FFB876D107D00007FFB8767E0A0 torch_cuda.dll!at::native::set_storage_cuda_ [<unknown file> @ <unknown line number>]
00007FFBDE95BBF100007FFBDE8CD9D0 torch_cpu.dll!at::native::mkldnn_sigmoid_ [<unknown file> @ <unknown line number>]
00007FFBDE9AB9DA00007FFBDE9A8FA0 torch_cpu.dll!at::bucketize_out [<unknown file> @ <unknown line number>]
00007FFBDE992ECA00007FFBDE992D40 torch_cpu.dll!at::_cudnn_rnn_backward [<unknown file> @ <unknown line number>]
00007FFBDFC9088900007FFBDFC4E010 torch_cpu.dll!torch::autograd::GraphRoot::apply [<unknown file> @ <unknown line number>]
00007FFBDFC9D12D00007FFBDFC4E010 torch_cpu.dll!torch::autograd::GraphRoot::apply [<unknown file> @ <unknown line number>]
00007FFBDE95BBF100007FFBDE8CD9D0 torch_cpu.dll!at::native::mkldnn_sigmoid_ [<unknown file> @ <unknown line number>]
00007FFBDE9AB9DA00007FFBDE9A8FA0 torch_cpu.dll!at::bucketize_out [<unknown file> @ <unknown line number>]
00007FFBDE992ECA00007FFBDE992D40 torch_cpu.dll!at::_cudnn_rnn_backward [<unknown file> @ <unknown line number>]
00007FFBDFB9C12D00007FFBDFB9BAF0 torch_cpu.dll!torch::autograd::generated::CudnnRnnBackward::apply [<unknown file> @ <unknown line number>]
00007FFBDFB87E9100007FFBDFB87B50 torch_cpu.dll!torch::autograd::Node::operator() [<unknown file> @ <unknown line number>]
00007FFBE00EF9BA00007FFBE00EF300 torch_cpu.dll!torch::autograd::Engine::add_thread_pool_task [<unknown file> @ <unknown line number>]
00007FFBE00F03AD00007FFBE00EFFD0 torch_cpu.dll!torch::autograd::Engine::evaluate_function [<unknown file> @ <unknown line number>]
00007FFBE00F4FE200007FFBE00F4CA0 torch_cpu.dll!torch::autograd::Engine::thread_main [<unknown file> @ <unknown line number>]
00007FFBE00F4C4100007FFBE00F4BC0 torch_cpu.dll!torch::autograd::Engine::thread_init [<unknown file> @ <unknown line number>]
00007FFBC5FF0A7700007FFBC5FCA150 torch_python.dll!THPShortStorage_New [<unknown file> @ <unknown line number>]
00007FFBE00EBF1400007FFBE00EB780 torch_cpu.dll!torch::autograd::Engine::get_base_engine [<unknown file> @ <unknown line number>]
00007FFC819803BA00007FFC81980360 ucrtbase.dll!o_exp [<unknown file> @ <unknown line number>]
00007FFC82567E9400007FFC82567E80 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]
00007FFC84F87AD100007FFC84F87AB0 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]

搜集资料后看到如下解决方法

import torch.backends.cudnn
torch.backends.cudnn.enabled = False

定位到其在pytorch中的定义

# Add type annotation for the replaced module
enabled: bool
deterministic: bool
benchmark: bool

暂时找到这样一篇描述

pytorch torch.backends.cudnn设置作用

以及其它可能有用的资料
https://blog.csdn.net/qq_39938666/article/details/86611474

https://github.com/pytorch/pytorch/issues/17543

https://github.com/NVIDIA/tacotron2/issues/109

错误日志

CUDA out of memory

选的epochs和batch_size太大

100%|██████████████████████████████████████████████████████████████████████
████████████████████████████████| 6001/6001 [00:02<00:00, 2226.86it/s]
Load Data Done
Initial model...
Initial model Done
Start Train...
Traceback (most recent call last):
  File "train.py", line 165, in <module>
    train(model, train_iter, optimizer, criterion, device)
  File "train.py", line 28, in train
    loss = model.neg_log_likelihood(x, y) # logits: (N, T, VOCAB), y: (N, T)
  File "D:\programing\Bert-BiLSTM-CRF-pytorch\Bert-BiLSTM-CRF-pytorch\crf.py", line 150, in neg_log_likelihood
    feats = self._get_lstm_features(sentence)  #[batch_size, max_len, 16]
  File "D:\programing\Bert-BiLSTM-CRF-pytorch\Bert-BiLSTM-CRF-pytorch\crf.py", line 159, in _get_lstm_features
    embeds = self._bert_enc(sentence)  # [8, 75, 768]
  File "D:\programing\Bert-BiLSTM-CRF-pytorch\Bert-BiLSTM-CRF-pytorch\crf.py", line 108, in _bert_enc
    encoded_layer, _  = self.bert(x)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 733, in forward
    output_all_encoded_layers=output_all_encoded_layers)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 406, in forward
    hidden_states = layer_module(hidden_states, attention_mask)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 392, in forward
    intermediate_output = self.intermediate(attention_output)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 365, in forward
    hidden_states = self.intermediate_act_fn(hidden_states)
  File "D:\main\Anaconda3\envs\Bert-BiLSTM-CRF-pytorch\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 124, in gelu
    return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0)))
RuntimeError: CUDA out of memory. Tried to allocate 754.00 MiB (GPU 0; 11.00 GiB total capacity; 4.27 GiB already allocated; 524.59 MiB free; 8.10
 GiB reserved in total by PyTorch)
  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值