1、Pip
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.To avoid this problem you can invoke Python with ‘-m pip’ instead of running pip directly.
python -m pip -V
ERROR: Could not install packages due to an EnvironmentError: [Errno 13] Permission denied: '/usr/local/lib/python3.6/dist-packages/dataclasses.py’Consider using the
--user
option or check the permissions.
pip install --upgrade pip --user(加--user)
2、requirements.txt
- 生成requirements.txt文件:
pip freeze > requirements.txt
- 安装requirements.txt依赖:
pip install -r requirements.txt
3、分布式(在多张卡上跑程序)
-
开头
USE_CUDA = torch.cuda.is_available()
os.environ[‘CUDA_VISIBLE_DEVICES’] = ‘1,2’ -
加载模型、参数、数据等
model = Model()
params = output_name_and_params(model)
loss_fn = MarginLoss()
model.cuda()
model = nn.DataParallel(model,device_ids=[0,1])
loss_fn.cuda() -
运行
CUDA_VISIBLE_DEVICES=1,2 python train.py >result/reproduce_segcap_Thyroid.txt -
分布式训练模型后,模型保存的参数中会带有model.,加载时会报错
-
解决方法1: 对load的模型创建新的字典,去掉不需要的key值"module".
# original saved file with DataParallel state_dict = torch.load('checkpoint.pt') # 模型可以保存为pth文件,也可以为pt文件。 # create new OrderedDict that does not contain `module.` from collections import OrderedDict new_state_dict = OrderedDict() for k, v in state_dict.items(): name = k[7:] # remove `module.`,表面从第7个key值字符取到最后一个字符,正好去掉了module. new_state_dict[name] = v #新字典的key值对应的value为一一对应的值。 # load params model.load_state_dict(new_state_dict) # 从新加载这个模型。
-
解决方法2:直接用空白’‘代替’module.’
model.load_state_dict({k.replace('module.',''):v for k,v in torch.load('checkpoint.pt').items()}) # 相当于用''代替'module.'。 #直接使得需要的键名等于期望的键名。
-
解决方法3:加载模型之后,接着将模型DataParallel,此时就可以load_state_dict
model = VGG()# 实例化自己的模型; checkpoint = torch.load('checkpoint.pt', map_location='cpu') # 加载模型文件,pt, pth 文件都可以; if torch.cuda.device_count() > 1: # 如果有多个GPU,将模型并行化,用DataParallel来操作。这个过程会将key值加一个"module. ***"。 model = nn.DataParallel(model) model.load_state_dict(checkpoint) # 接着就可以将模型参数load进模型。
-