Simple Baselines for Human Pose Estimation复现记录

冰糖狮子头

已于 2023-04-16 20:31:43 修改

阅读量276

点赞数 2

文章标签： pytorch python 深度学习

于 2023-04-16 13:30:02 首次发布

本文链接：https://blog.csdn.net/yuandeyixinren11/article/details/129641703

版权

1.禁用 cudnn batch_norm：

1.1查看当前环境中已安装的pytorch位置：

在终端或者命令行输入python，进入python环境
在这里插入图片描述

1.2为pytorch的路径设置环境变量

~$ PYTORCH=/home/liuman/anaconda3/envs/pytorch02/lib/python3.8/site-packages/torch

在这里插入图片描述

2.克隆此存储库，我们将克隆的目录称为 ${POSE_ROOT}

3.安装依赖项：


(pytorch02) liuman@gpu01-beiserver:~$ cd /home/liuman/HPE/code/SimpleBaseline/
(pytorch02) liuman@gpu01-beiserver:~/HPE/code/SimpleBaseline$ ls
CONTRIBUTING.md  experiments  lib  LICENSE  pose_estimation  README.md  requirements.txt  SECURITY.md
(pytorch02) liuman@gpu01-beiserver:~/HPE/code/SimpleBaseline$ pip install -r requirements.txt

报错：
在这里插入图片描述
解决方法：
在requirements.txt文件中，修改需要的opencv-python版本为相近的版本，如3.4.11.41
(如果换一个版本还是安装不成功，就再换一个 )

4.制作库

：

$cd lib
$make

会依次执行lib目录下面makefile.txt文件中的指令：

all:
	cd nms; python setup.py build_ext --inplace; rm -rf build; cd ../../
clean:
	cd nms; rm *.so; cd ../../

5.安装COCOAPI:

# COCOAPI=/path/to/clone/cocoapi
git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
cd $COCOAPI/PythonAPI
# Install into global site-packages
make install
# Alternatively, if you do not have permissions or prefer
# not to install the COCO API into global site-packages
python3 setup.py install --user

在这里插入图片描述

6.下载imagenet、coco、mpii的pytorch预训练模型（没有下载caffe-style的）

:
在这里插入图片描述
(我之前已经将预训练模型下载到了本地，就不再放在这个项目的model目录下了，用到的时候修改路径)

7.初始化输出（训练模型输出目录）和日志（张量板日志目录）目录

mkdir output
mkdir log

在这里插入图片描述
（data文件也没有重新下载，用的时候修改路径）

8.在coco train2017数据集上训练：

python pose_estimation/train.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml

报错：
在这里插入图片描述
解决方法：
不改变pyyaml的版本，直接替换load()这个函数
用safe_load()替换load()

在这里插入图片描述
解决方法：
更新tensorboardx版本：

pip install --upgrade tensorboardx

重新训练有warning:

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

解决：
把lr_scheduler.step()放在每一个epoch训练结束之后图片描述

再训练会报错Fail to read某张图片，但是这张图片是有的，应该就是没读取出来。

debug train.py，明明给了cfg的路径，还是报错：

usage: train.py [-h] --cfg CFG
train.py: error: the following arguments are required: --cfg

解决方法，将required=True删掉：

# parser.add_argument('--cfg',
#                     help='experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml',
#                     required=True,
#                     type=str)
parser.add_argument('--cfg',
                    help='experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml',
                    type=str)

debug的时候想要step into model=eval(…)，结果：
Couldn’t apply path mapping to the remote file.
解决方法是start ssh section.

训练的时候报错（图片明明存在，却读取不到）：

ValueError: Caught ValueError in DataLoader worker process 3.

ValueError: Fail to read /home/liuman/HPE/dataset/mscoco/2017/images/train2017/000000468530.jpg

解决方法：
将train_loader中的num_workers改为0

有些参数本来加载在gpu0上，现在却被程序加载在gpu4上。程序默认使用gpu0作为主gpu，但是现在我想使用gpu4

RuntimeError: module must have its parameters and buffers on device cuda:4 (device_ids[0]) but found one of them on device: cuda:0

解决：

CUDA_VISIBLE_DEVICES=4 python pose_estimation/train.py \
    --cfg experiments/coco/resnet50/256x192_d256x3_adam_lr1e-3.yaml

冰糖狮子头

关注

2
点赞
踩
3

收藏

觉得还不错? 一键收藏
2
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫