多模态实验记录--MMIM

aglo

已于 2024-03-13 12:12:51 修改

阅读量992

点赞数 9

文章标签： nlp

于 2024-03-12 16:34:55 首次发布

本文链接：https://blog.csdn.net/w946612410/article/details/136654901

版权

1、遇到的问题

1.1 环境安装：

当使用比较新的显卡（比如NVIDIA GeForce RTX 4090）时，由于显卡的架构比较新，可能旧版本的pytorch库没有支持到。这时候就会出现capability sm_86 is not compatible的问题，同时根据输出可以看到 The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75当前pytorch只能支持上面几种架构。

由于4090算力是8.9，在较老的torch版本代码上跑会有问题，需要使用与cuda对应的版本

pip install torch==1.11.0+cu118 torchvision==0.12.0+cu118 torchaudio==0.11.0+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html
# 1.13.1版本torch
pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 -f https://download.pytorch.org/whl/cu118/torch_stable.html

conda install cudatoolkit=11.8
pip install tensorflow==2.11.0

pip install torch-sparse -f https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
pip install torch-cluster -f https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
pip install torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.13.1+cu117.html
pip install torch_geometric

1.2 解决算力高的问题

1，安装高版本cuda

无论你使用的别人的网络，它告诉你使用的cuda版本是多少，比如cuda10.9，但是4070ti一般可以装到cuda12以上，使用nVidia ——smi，检测你的显卡驱动支持的最高等级cuda，然后安装cuda。安装完成使用，nvcc——V，查看安装成功的cuda版本。

高版本cuda向下兼容。

2，安装高版本torch

无论你使用的别人的网络，它告诉你使用的torch版本是多少，你可以安装一下尝试一下，如果不行，直接安装torch1.13版本。

或者安装你的cuda支持的最高等级的torch。

高版本torch向下兼容。

3，pip安装
在这里插入图片描述

cudatoolkit与PyTorch版本的对应关系：
在这里插入图片描述

Tensorflow与Python、CUDA、cuDNN的版本对应表
在这里插入图片描述

1.3 新安装一个其他版本的cuda

举个例子：GPU的cuda版本为11.0，低于pytorch2.0需要的cuda11.8，因此考虑新安装一个高版本的cuda。

1、新建conda虚拟环境
为了不影响其他版本的cuda，先新建虚拟环境。这里安装的是python3.10版本

conda create -n env_name python==3.10

2、CUDA11.8安装
网上的教程有很多，也很复杂。但我突然发现conda官网里有一键下载cuda版本包的命令，抱着试试看的心态就使用了。
在这里插入图片描述
可以从label中筛选所需要的cuda版本，然后复制下载命令，直接在命令行中运行。
安装好之后输入nvcc -V，可以查看到相应版本的cuda。

3、 torch2.0安装
这里选择使用下载.whl安装包的形式。在这个网站上选择相应版本的torch。我选择的是第一个，即torch2.0版本+cuda11.8+python3.10，并且是linux系统的。(win_amd64指的是windows系统）
右键选择复制链接，然后在之前安装好的conda环境中，输入wget + 链接进行下载。如

wget https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=4b690e2b77f21073500c65d8bb9ea9656b8cb4e969f357370bbc992a3b074764

安装

下载结束后使用pip install 安装包名字.whl 进行安装

pip install torch-2.0.0+cu118-cp310-cp310-linux_x86_64.whl

查询是否成功
输入python进入python环境，输入torch.__version__进行查询

python
torch.__version__

结果如图所示：
在这里插入图片描述

到这一步就在新的环境里配上cuda和torch了，运行项目代码总算不报错了。

1.4 其他

1、运行ABSA-PyTorch报错ImportError: cannot import name ‘SAVE_STATE_WARNING‘ from ‘torch.optim.lr_scheduler‘

这是由于 transfomers版本太高，由于原本环境要求的是transfomers==4.0.0，实测换成4.9.2、4.17.0可以

2、Expected a ‘cuda‘ device type for generator but found ‘cpu‘的解决方法
具体代码如下：

// An highlighted block
#原来的代码
loader = DataLoader(dataset, batch_size=num_classes, shuffle=True)
#添加generator=torch.Generator(device='cuda')改成如下结果：
loader = DataLoader(dataset, batch_size=num_classes, shuffle=True, generator=torch.Generator(device='cuda'))

3、Is “Some weights of the model were not used” warning normal when pre-trained BERT only by MLM
警告告诉您一些权重是随机初始化的（这里是分类头），这是正常的，因为您正在为不同的任务实例化预训练模型

2、代码解释

2.1、torch.backends.cudnn.deterministic

为什么使用相同的网络结构，跑出来的效果完全不同，用的学习率，迭代次数，batch size 都是一样？固定随机数种子是非常重要的。但是如果你使用的是PyTorch等框架，还要看一下框架的种子是否固定了。还有，如果你用了cuda，别忘了cuda的随机数种子。这里还需要用到torch.backends.cudnn.deterministic.

torch.backends.cudnn.deterministic是啥？顾名思义，将这个 flag 置为True的话，每次返回的卷积算法将是确定的，即默认算法。如果配合上设置 Torch 的随机种子为固定值的话，应该可以保证每次运行网络的时候相同输入的输出是固定的，代码大致这样:

def init_seeds(seed=0):
    torch.manual_seed(seed) # sets the seed for generating random numbers.
    torch.cuda.manual_seed(seed) # Sets the seed for generating random numbers for the current GPU. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored.
    torch.cuda.manual_seed_all(seed) # Sets the seed for generating random numbers on all GPUs. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored.

    if seed == 0:
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False

参考

解决pytorch capability sm_86 is not compatible with the current PyTorch installation 问题

4070ti安装torch环境的一些记录

Pytorch环境搭建

【pytorch】torch.backends.cudnn.deterministic

如何用conda安装PyTorch（windows、GPU）最全安装教程（cudatoolkit、python、PyTorch、Anaconda版本对应问题）（完美解决安装CPU而不是GPU的问题）

在linux集群服务器上使用conda安装高版本cuda(cuda-11.8)和pytorch2.0

aglo

关注

9
点赞
踩
18

收藏

觉得还不错? 一键收藏
0
评论
多模态实验记录--MMIM

当使用比较新的显卡（比如NVIDIA GeForce RTX 4090）时，由于显卡的架构比较新，可能旧版本的pytorch库没有支持到。这时候就会出现capability sm_86 is not compatible的问题，同时根据输出可以看到 The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75当前pytorch只能支持上面几种架构。
复制链接

扫一扫