【遇到的问题】
大概意思就是说当前GPU的算力与当前版本的Pytorch依赖的CUDA算力不匹配(3090算力为8.6,而当前版本的pytorch依赖的CUDA算力仅支持3.7,5.0,6.0,6.1,7.0,7.5)
参考:低版本GPU可在相对高的CUDA版本下运行,例如:算力为8.0的GPU可在算力为8.6的CUAD版本下运行,而相反则不行。同理算力8.x的显卡不可以在支持最高算力7.x的CUDA版本下运行。
因此应该尽量安装更高版本的CUDA
重新安装稍高版本后这个cell(模型定义)没有运行超时但提示还在(又换了最新版本提示就无了)
后验证模型又遇到新问题
所以从头整理一下整个逻辑。
从头捋了一遍之后成功运行了TOT
【传数据集到服务器】
scp -P 20022 -r G:\AAA\Tasks\pytorch-master\pytorch-master\bert-sst2\sst2_shuffled.tsv suntingyu@xx.xx.xx.xxx:/data/home/suntingyu/BertProject
#从本地上传文件 如果是传文件夹就要-r, 单个文件不需要
【cmd显示】
C:\Users\Administrator>scp -P 20022 -r G:\AAA\Tasks\pytorch-master\pytorch-master\bert-sst2\sst2_shuffled.tsv suntingyu@xx.xx.xx.xxx:/data/home/suntingyu/BertProject
sst2_shuffled.tsv 100% 1144KB 1.7MB/s 00:00
————————————————————————————————————————
【卸载anaconda】
rm -rf /data/home/suntingyu/anaconda3
注释掉.bashrc的路径
【重装anaconda】
#https://repo.anaconda.com/archive/找对应版本
wget https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
suntingyu@ps-30802:~$ chmod +x Anaconda3-2019.10-Linux-x86_64.sh
suntingyu@ps-30802:~$ /data/home/suntingyu/Anaconda3-2019.10-Linux-x86_64.sh
conda env list
#创建虚拟环境
conda create -n pytorch python==3.7
==> WARNING: A newer version of conda exists. <==
current version: 4.7.12
latest version: 22.9.0
Please update conda by running
$ conda update -n base -c defaults conda
conda env list
# conda environments:
#
base * /data/home/suntingyu/anaconda3
pytorch /data/home/suntingyu/anaconda3/envs/pytorch
(base) suntingyu@ps-30802:~$ conda --version
conda 22.9.0
#激活虚拟环境
conda activate pytorch
#退出环境
conda deactivate
(base) suntingyu@ps-30802:~$ nvidia-smi
Sat Oct 22 22:00:52 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:18:00.0 Off | N/A |
| 30% 33C P8 12W / 350W | 2MiB / 12053MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:3B:00.0 Off | N/A |
| 30% 29C P8 19W / 350W | 2MiB / 12053MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... On | 00000000:AF:00.0 Off | N/A |
| 0% 27C P8 29W / 350W | 3413MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 2 N/A N/A 3477 C ...oyao/anaconda3/bin/python 3411MiB |
+-----------------------------------------------------------------------------+
———————————————————————————————————————————————————————————————————————
#换清华源
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/peterjc123/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --set show_channel_urls yes
#安装pytorch
conda install pytorch torchvision torchaudio cudatoolkit=11.3 #把官网最后-c pytorch 去掉
#安装包
conda install package_name
#卸载包
conda remove package_name
#显示所有安装的包
conda list
#删除虚拟环境命令
conda remove -n env_name --all
#安装pytorch完成检测
(pytorch) suntingyu@ps-30802:~$ python
Python 3.7.0 (default, Oct 9 2018, 10:31:47)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.12.1
>>> print(torch.cuda.is_available())
True
pip install transformers