目录
AttributeError: module 'distutils' has no attribute 'version'
RuntimeError: Distributed package doesn’t have NCCL built in|PyTorch踩坑
windows下“import torch” 报错:“OSError: [WinError 1455] 页面文件太小,无法完成操作” 的问题
train.py:AttributeError:"DataContainer" object has no attribute 'type'
AttributeError: module 'distutils' has no attribute 'version'
在开始准备训练模型时会报这个错误:
解决: setuptools版本问题”,版本过高导致的问题;setuptools版本
推荐安装:setuptools 57.5.0
pip uninstall setuptools
pip install setuptools==57.5.0 //需要比你之前的低
注意:以下是windows下会产生的错误
RuntimeError: Distributed package doesn’t have NCCL built in|PyTorch踩坑
在windows系统上复现车道线检测GANet网络时了发生如下错误
raise RuntimeError("Distributed package doesn’t have NCCL "
RuntimeError: Distributed package doesn’t have NCCL built in
原因:windows不支持NCCL,应该修改为gloo
解决方案:在代码distributed_c10d.py里prefix_store = PrefixStore(group_name, store)下添加一段代码:
backend = "gloo"
修改后的片段如下:
prefix_store = PrefixStore(group_name, store)
backend = "gloo"
if backend == Backend.GLOO:
pg = ProcessGroupGloo(
prefix_store,
rank,
world_size,
timeout=timeout)
_pg_map[pg] = (Backend.GLOO, store)
_pg_names[pg] = group_name
elif backend == Backend.NCCL:
if not is_nccl_available():
raise RuntimeError("Distributed package doesn't have NCCL "
"built in")
网络上还有其他的一些方法,都是在这个文件里添加backend='gloo',但是在我用上面的方法解决的。
windows下“import torch” 报错:“OSError: [WinError 1455] 页面文件太小,无法完成操作” 的问题
解决方法:在mmdet\datasets\builder.py里找到num_workers将其赋值为0
"""
rank, world_size = get_dist_info()
if dist:
# DistributedGroupSampler will definitely shuffle the data to satisfy
# that images on each GPU are in the same group
if shuffle:
sampler = DistributedGroupSampler(dataset, samples_per_gpu,
world_size, rank)
else:
sampler = DistributedSampler(
dataset, world_size, rank, shuffle=False)
batch_size = samples_per_gpu
num_workers = workers_per_gpu
else:
sampler = GroupSampler(dataset, samples_per_gpu) if shuffle else None
batch_size = num_gpus * samples_per_gpu
# num_workers = num_gpus * workers_per_gpu
num_workers = 0
init_fn = partial(
worker_init_fn, num_workers=num_workers, rank=rank,
seed=seed) if seed is not None else None
如果还不行,那可能需要调整页面文件大小
参考:解决pycharm中: OSError: [WinError 1455] 页面文件太小,无法完成操作 的问题 - 程序那点事 - 博客园 (cnblogs.com)
train.py:AttributeError:"DataContainer" object has no attribute 'type'
这个问题困惑了我两天,我的cuda版本为11.2,因此torch1.6.0无法使用,在尝试了torch1.8.0至1.10后依旧不行,最后试了一次torch1.7.0+cuda11.0最终成功跑通了!
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\DINGZH~1\\AppData\\Local\\Temp\\tmpmn2vq6tw\\tmpxi4eg_tv.py'
Traceback (most recent call last):
File "e:/codingprogram/project/GANet/lane_application/lane_detection.py", line 28, in <module>
model,data_loader,show_dst,args = load_model(config_file, checkpoint_file, device=device)
File "e:/codingprogram/project/GANet/lane_application/lane_detection.py", line 14, in load_model
model,data_loader,show_dst,args = load(image_path,save_path)
File "e:\codingprogram\project\ganet\tools\ganet\culane\test_dataset.py", line 309, in load
cfg = mmcv.Config.fromfile(args.config)
File "D:\software\Anaconda\envs\ganet\lib\site-packages\mmcv\utils\config.py", line 165, in fromfile
cfg_dict, cfg_text = Config._file2dict(filename)
File "D:\software\Anaconda\envs\ganet\lib\site-packages\mmcv\utils\config.py", line 92, in _file2dict
osp.join(temp_config_dir, temp_config_name))
File "D:\software\Anaconda\envs\ganet\lib\shutil.py", line 121, in copyfile
with open(dst, 'wb') as fdst:
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\DINGZH~1\\AppData\\Local\\Temp\\tmpmn2vq6tw\\tmpxi4eg_tv.py'
解决方案:
简单地讲,就是替换config.py下一行代码即可