服务器远程训练,远程服务器 Linux 用cityscape训练DeepLabv3模型（Pytorch版）

最新推荐文章于 2022-05-31 14:17:52 发布

把夏天绑在鞋带上

最新推荐文章于 2022-05-31 14:17:52 发布

阅读量292

点赞数

文章标签：服务器远程训练

参考

https://blog.csdn.net/qq_45389690/article/details/111591713?utm_medium=distribute.pc_relevant_download.none-task-blog-baidujs-2.nonecase&depth_1-utm_source=distribute.pc_relevant_download.none-task-blog-baidujs-2.nonecase

https://blog.csdn.net/weixin_41919571/article/details/107906066

代码

https://github.com/jfzhang95/pytorch-deeplab-xception

出现问题

ImportError: No module named pycocotools.coco

解决

https://blog.csdn.net/u011961856/article/details/77676461

https://blog.csdn.net/joejeanjean/article/details/78839318?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-6&spm=1001.2101.3001.4242

https://blog.csdn.net/haiyonghao/article/details/80472713?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-11&spm=1001.2101.3001.4242

一定要把PythonAPI目录下除setup.py之外的所有文件拷贝到pytorch-deeplab-xception-master文件夹下

出现问题

ImportError: No module named ‘Queue’

解决

https://blog.csdn.net/DarrenXf/article/details/82962412

出现问题

from utils.loss import SegmentationLosses

ImportError: No module named loss

参考

https://blog.csdn.net/Diliduluw/article/details/103742766

解决

在utils文件下新建一个空白的__init__.py

出现问题

AttributeError: ‘module’ object has no attribute ‘kaiming_normal_’

参考

https://blog.csdn.net/songchunxiao1991/article/details/83104893?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.control

https://blog.csdn.net/Aug0st/article/details/42707709

然后我删除了所有pyc文件 (主要是不敢更新Python27\Lib\urllib2.pyc文件怕影响别的算法)

删除指令 find /dir -name “*.pyc” | xargs rm -rf

但并没用

然后

我又配了一个python3.5 pytorch0.4.1的环境

出现问题

RuntimeError: CUDA error: out of memory

解决

train.py中改batch-size的default=2

出现问题

ValueError: Expected more than 1 value per channel when training, got input size [1, 256, 1, 1]

参考

https://blog.csdn.net/weixin_43925119/article/details/109755329

https://www.cnblogs.com/zmbreathing/p/pyTorch_BN_error.html

https://blog.csdn.net/sinat_39307513/article/details/87917537

https://blog.csdn.net/qq_42079689/article/details/102587401?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control

https://blog.csdn.net/jining11/article/details/111478935?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-2&spm=1001.2101.3001.4242

https://blog.csdn.net/qq_36321330/article/details/108954588?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.control

https://blog.csdn.net/qq_34124009/article/details/109100053?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control

https://blog.csdn.net/qq_21230831/article/details/103711545?utm_medium=distribute.pc_relevant.none-task-blog-OPENSEARCH-7.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-OPENSEARCH-7.control

https://blog.csdn.net/lrs1353281004/article/details/108262018?utm_medium=distribute.pc_relevant.none-task-blog-baidujs_title-7&spm=1001.2101.3001.4242

模型中含有nn.BatchNorm层，训练时需要batch_size大于1，来计算当前batch的running mean and std，数据数量除以batch_size后刚好余1时就会报错。

解决

改batch_size 使其除完不余1，但我改完5之后内存又不行了，报错。

于是我去找了/usr/local/lib/python3.5/dist-packages/torch/utils/data/dataloader.py文件，改里面drop_last=True。

因为远程服务器不能直接打开这个文件，所以粘过来又拷过去的。

标签：blog,Pytorch,DeepLabv3,csdn,details,https,Linux,article,net

来源： https://blog.csdn.net/weixin_49636863/article/details/113673775

把夏天绑在鞋带上

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
服务器远程训练,远程服务器 Linux 用cityscape训练DeepLabv3模型（Pytorch版）

参考https://blog.csdn.net/qq_45389690/article/details/111591713?utm_medium=distribute.pc_relevant_download.none-task-blog-baidujs-2.nonecase&depth_1-utm_source=distribute.pc_relevant_download.none-t...
复制链接

扫一扫