使用Linux的一些tips（四）

SiranTang

于 2022-05-27 00:00:00 发布

阅读量121

点赞数

分类专栏： Linux系统文章标签： linux

本文链接：https://blog.csdn.net/SiranTang/article/details/123842714

版权

Linux系统专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1、Pip

WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.To avoid this problem you can invoke Python with ‘-m pip’ instead of running pip directly.

python -m pip -V

ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/dist-packages/dataclasses.py’Consider using the --user option or check the permissions.

 pip install --upgrade pip --user（加--user）

2、requirements.txt

生成requirements.txt文件：
pip freeze > requirements.txt
安装requirements.txt依赖：
pip install -r requirements.txt

3、分布式（在多张卡上跑程序）

开头
USE_CUDA = torch.cuda.is_available()
os.environ[‘CUDA_VISIBLE_DEVICES’] = ‘1,2’
加载模型、参数、数据等
model = Model()
params = output_name_and_params(model)
loss_fn = MarginLoss()
model.cuda()
model = nn.DataParallel(model,device_ids=[0,1])
loss_fn.cuda()
运行
CUDA_VISIBLE_DEVICES=1,2 python train.py >result/reproduce_segcap_Thyroid.txt

分布式训练模型后，模型保存的参数中会带有model.，加载时会报错

解决方法1: 对load的模型创建新的字典，去掉不需要的key值"module".

# original saved file with DataParallel
state_dict = torch.load('checkpoint.pt')  # 模型可以保存为pth文件，也可以为pt文件。
# create new OrderedDict that does not contain `module.`
from collections import OrderedDict
new_state_dict = OrderedDict()
for k, v in state_dict.items():
name = k[7:] # remove `module.`，表面从第7个key值字符取到最后一个字符，正好去掉了module.
new_state_dict[name] = v #新字典的key值对应的value为一一对应的值。 
 # load params
 model.load_state_dict(new_state_dict) # 从新加载这个模型。

解决方法2：直接用空白’‘代替’module.’

model.load_state_dict({k.replace('module.',''):v for k,v in torch.load('checkpoint.pt').items()})
# 相当于用''代替'module.'。
#直接使得需要的键名等于期望的键名。

解决方法3：加载模型之后，接着将模型DataParallel，此时就可以load_state_dict

model = VGG()# 实例化自己的模型；
checkpoint = torch.load('checkpoint.pt', map_location='cpu')  # 加载模型文件，pt, pth 文件都可以；
if torch.cuda.device_count() > 1:
 # 如果有多个GPU，将模型并行化，用DataParallel来操作。这个过程会将key值加一个"module. ***"。
model = nn.DataParallel(model) 
model.load_state_dict(checkpoint) # 接着就可以将模型参数load进模型。