SPIN源码复现

KangXi_TangYuan

已于 2023-11-15 05:35:24 修改

阅读量337

点赞数 1

文章标签： python 人工智能计算机视觉

于 2023-11-03 04:59:48 首次发布

本文链接：https://blog.csdn.net/KangXi_TangYuan/article/details/131406128

版权

新手小白记录SPIN github源码复现流程

原文标题：Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop
源码链接：https://github.com/nkolot/SPIN

环境：
ubuntu
python 3.9
cuda 11.0
GPU 3090或TITAN

一、安装库

直接复制粘贴运行：

virtualenv spin -p python3
source spin/bin/activate
pip install -U pip
pip install -r requirements.txt

直接pip install这个txt文件，遇到了各种各样的报错，主要来源于neural-renderer-pytorch和torch，因此以以下顺序手动安装。

1.手动下载 torch

先查看cuda版本：

nvcc --version

在pytorch官网根据cuda版本选择相应torch版本，我cuda版本11.0，选择v1.7.1安装：

pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

检查torch版本：

python3
>>>import torch
>>>torch.__version__

后面所有的包都可以这样检查。

2.手动下载neural-renderer-pytorch
由于torch版本不匹配，直接pip install这个还是会报错（看网上人说要torch1.5.0以前的版本才不会报错,但已经无法安装），直接推荐这篇，很好用。
直接下载人家改好的包，我纠结了一下应该放到什么地方，最后放到了整个project (SPIN-master)的同级目录下，但环境还是要在spin里，运行：

cd到 neural_renderer-master 文件夹下，执行 python setup.py install

检查一下：

python3
>>>import neural_renderer

没报错就没问题。

3.下载其他库
将源txt文件里的neural-renderer-pytorch和torch==1.1.0两行注释掉（前面加#即可），然后按原码安装其他库：

pip install -r requirements.txt

由于文件里的scipy==1.0.0下载会报错，我直接改成了scipy，下载的版本为1.11.0，重新执行就没有报错了。

4.检查所有安装库
按文件顺序挨个检查：

python3
>>>import neural_renderer
>>>import numpy
>>>import cv2
>>>import OpenGL
>>>import pyrender
>>>import skimage
>>> import scipy
>>> import tensorboard
>>> import chumpy
>>>import smplx
>>>import spacepy
>>>import torch
>>> import torchgeometry
>>> import torchvision
>>> import tqdm
>>> import trimesh

其中只有chumpy报错：ImportError: cannot import name 'bool' from 'numpy'

解决：卸载当前numpy:

pip uninstall numpy

并重新安装特定版本：

pip install numpy==1.23.1

重新import一下发现没问题了，至此，支持库终于安装完了。

忽略pycdf那一步。

二、运行demo

1.下载extra data:

bash fetch_data.sh

全都下载至data文件夹

2.下载SMPL模型：
由于运行demo只需要neutral model，因此暂时忽略另外那个male and female models。前往该链接下载，需要先注册一下。
在这里插入图片描述
下载好的压缩包叫做smplify_public，需要的model是里面的：smplify_public\code\models\basicModel_neutral_lbs_10_207_0_v1.0.0.pkl，但此时我不知道应该把它放哪…，所以先直接下一步。

3.运行demo
作者给了三种方式运行，我选择了第二种Bounding Box的：

python3 demo.py --checkpoint=data/model_checkpoint.pt --img=examples/im1010.jpg --bbox=examples/im1010_bbox.json

这一步就是反反复复跟着报错改并重新运行。

报错1：ImportError: cannot import name 'ModelOutput' from 'smplx.body_models'
解决：将models/smpl.py中第5行、第27行的ModelOutput改为SMPLOutput

报错2：Path data/smpl does not exist!
去data目录发现确实没有smpl这个东西，看了一下代码，这个时候就知道上面的model就是要放在这，因此：
先将前面下载的basicModel_neutral_lbs_10_207_0_v1.0.0.pkl放到data目录下；然后将config.py最后一行的SMPL_MODEL_DIR = 'data/smpl'改为
data/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl

报错3：cannot import name 'OSMesaCreateContextAttribs' from 'OpenGL.osmesa'
解决：pip install --upgrade pyopengl==3.1.4

报错4：scipy.misc is deprecated and has no attribute imresize
解决：这是最耗时的一个。理论上需要将scipy改为版本1.2.1，但失败了。
因此使用np.array(Image.fromarray(arr).resize()代替 scipy.misc.imresize()，具体为：

在utils/imutils.py中：

在导包的部分增加：
from PIL import Image
将第80行的：new_img = scipy.misc.imresize(new_img, res) 改为：
new_img = np.array(Image.fromarray(np.uint8(new_img)).resize(size=(res)))

这里new_img是ndarray类型，res是元组(224,224)
Image.fromarray()就是将ndarray数据类型转换为图片类型
np.uint8()是根据进一步的报错KeyError: ((1, 1, 3), '<f8')改的

如果没有PIL则安装：

pip install pillow

终于demo运行成功！在example目录下生成了两个.png图片.

三、运行training code

这段太糟心了，耗时三天，简单记录一下我走过的弯路吧，由于仅凭记忆，可能有顺序不对或者忘记的地方。

运行命令用的是作者给的example：

python3 train.py --name train_example --pretrained_checkpoint=data/model_checkpoint.pt --run_smplify

首先就是下载数据集，这里我只下载了MPI-INF-3DHP，原因如下：
由于只是想要让程序能先跑起来，本来想着随便下一个小的数据集试一下，但由于3D数据集只有三个：Human3.6M、3DPW和MPI-INF-3DHP，作者没有给H36M的npz文件，自己下载又很麻烦又拿不到MoShed data，而3DPW只用于evaluation，所以好像只能选择MPI-INF-3DHP，然而使用这个数据集我踩了个天坑！后面详说。

以下所有部分都仅基于MPI-INF-3DHP数据集，(这个数据集非常大，最终占用空间150G）

1.下载数据集
首先按照流程下载数据集，有一个包含matlab文件的zip文件，随便解压到什么地方都行，根据README的指示修改conf.ig文件：要先提前手动设置好数据集的根文件夹放，并修改路径，subjects=(1 2 3 4 5 6 7 8)，由于我不知道mask是什么，所以将download_masks设置成了0，其他都是1，（虽然没有求证，但貌似应该都设置成1），然后运行get_dataset.sh下载数据集,我顺便把testset也下了。下载好的数据集应该是7+25个G (还没完！)

下载好的目录结构是S1-S8，每个S下面有Seq1,Seq2，每个Seq下面的imageSequence里都有13个video.（我发现了一个bug，S1-Seq1-video_0和同组其他video不一样，貌似是给错了，给成了Seq2的，目前还未解决。）

2.修改相关文件
（1）首先二话不说先改一个bug：
datasets/preprocess/mpi_fin_3dhp.py文件第140行左右有个：
for i, img_i in enumerate(img_list):
改成 for i, img_i in enumerate(sorted(img_list)):

这是最重要的一个文件，后面都用mpi_fin_3dhp.py代替

（2）下载好的数据库根目录路径添加到config.py文件下MPI_INF_3DHP_ROOT = ''位置"（最后面不要有‘/’）

（3）修改datasets/mixed_dataset.py，由于作者使用了6个数据集的混合数据集，每个数据集还有一定的占比，这里直接改成MPI-INF-3DHP专属版：

"""
This file contains the definition of different heterogeneous datasets used for training
"""
import torch
import numpy as np

from .base_dataset import BaseDataset

class MixedDataset(torch.utils.data.Dataset):

    def __init__(self, options, **kwargs):
        # self.dataset_list = ['h36m', 'lsp-orig', 'mpii', 'lspet', 'coco', 'mpi-inf-3dhp']
        # self.dataset_dict = {'h36m': 0, 'lsp-orig': 1, 'mpii': 2, 'lspet': 3, 'coco': 4, 'mpi-inf-3dhp': 5}
        self.dataset_list = ['mpi-inf-3dhp']
        self.dataset_dict = {'mpi-inf-3dhp': 0}

        self.datasets = [BaseDataset(options, ds, **kwargs) for ds in self.dataset_list]
        total_length = sum([len(ds) for ds in self.datasets])
        length_itw = sum([len(ds) for ds in self.datasets[1:-1]])
        self.length = max([len(ds) for ds in self.datasets])
        """
        Data distribution inside each batch:
        30% H36M - 60% ITW - 10% MPI-INF
        """
        # self.partition = [.3, .6*len(self.datasets[1])/length_itw,
        #                   .6*len(self.datasets[2])/length_itw,
        #                   .6*len(self.datasets[3])/length_itw,
        #                   .6*len(self.datasets[4])/length_itw,
        #                   0.1]
        # self.partition = np.array(self.partition).cumsum()

    def __getitem__(self, index):
        # p = np.random.rand()
        # for i in range(6):
        #     if p <= self.partition[i]:
        #         return self.datasets[i][index % len(self.datasets[i])]
        return self.datasets[0][index]

    def __len__(self):
        return self.length

（4）这时候直接运行train会在base_dataset.py报错，发现它想要读取某jpg文件。但我哪有jpg我只有video! 这个时候我还没意识到之前下载的mpi_inf_3dhp_train.npz是干嘛用的，我以为我缺少了一步数据预处理，于是踩进了坑里！
我直接去运行了preprocess_datasets.py，发现报错，说什么没有H36M之类的，于是直接另起一个文件改成MPI-INF-3DHP专属版：

#!/usr/bin/python
"""
Preprocess datasets and generate npz files to be used for training testing.
It is recommended to first read datasets/preprocess/README.md
"""

import config as cfg
from datasets.preprocess import mpi_inf_3dhp_extract


# define path to store extra files
out_path = cfg.DATASET_NPZ_PATH
openpose_path = cfg.OPENPOSE_PATH

# MPI-INF-3DHP dataset preprocessing (training set)
mpi_inf_3dhp_extract(cfg.MPI_INF_3DHP_ROOT, openpose_path, out_path, 'train', extract_img=True, static_fits=cfg.STATIC_FITS_DIR)

（5）运行preprocess_datasets.py，其实运行这个文件就是在运行mpi_fin_3dhp.py，如果我没记错顺序的话这个时候会发现还是报错，说没有openpose.json之类的，找了一圈解决办法无果。于是仔细看了代码和mpi_inf_3dhp_train.npz（在data/dataset_extra里）才搞懂。由于前面run demo的时候已经知道了，就算有了img(jpg文件)，还需要三种办法对它进行预处理得到bounding box，这不是我能操作的，又不想去执行OpenPose（看见都头大）。但其实作者已经将完全处理好的数据给我们了，就封装在npz文件里（可以用numpy查看），唯独没有img(没有jpg文件)。mpi_fin_3dhp.py就是作者制作npz的源码，先提取img再将需要的东西保存到npz，于是我自然而然地想到：npz已经有了，那我只做提取不就行了嘛？

【避雷！！！】于是我将mpi_fin_3dhp.py中train_data()函数内的，从‘# per frame’以后的部分都注释掉，重新运行了preprocess_datasets.py。千万不要这么做！千万不要这么做！千万不要这么做！因为extract的img是所有视频的所有帧（不是所有视频，每个Seq只提取 video_0,1,2,4,5,6,7,8），每个video就有6000多张图片，有的视频甚至有12000多张图片，每个图片大概500kb，这样全部都提取完保守估计也要有400G！实际上我拉了8张卡多线程提了一宿，不知道什么时候停了（虚拟空间已经满了），大概成功了3/4，已经显示有600+G了。

（6）实际上作者是每10张图片才存一张到npz，所以npz里只有不到一万张图片的信息。只看npz前几个imgname我以为它全都是每十帧取一帧，所以帧数为1,11,21,31…但其实不是。如果把第一个video的全部6000多个imgname打出来会发现，它会在不知什么时候突然变了，连跳几百帧。所以不能直接写一个每10帧一提取，而是要根据imgname提取。代码如下。这时候再重新运行preprocess_datasets.py就ok了，8线程同时提取也就几分钟。最终数据150G。

def train_data(dataset_path, openpose_path, out_path, joints_idx, scaleFactor, extract_img=False, fits_3d=None):
    joints17_idx = [4, 18, 19, 20, 23, 24, 25, 3, 5, 6, 7, 9, 10, 11, 14, 15, 16]

    h, w = 2048, 2048
    imgnames_, scales_, centers_ = [], [], []
    parts_, Ss_, openposes_ = [], [], []

    # training data
    user_list = range(1,9)
    #user_list = range(8, 9) 
    seq_list = range(1,3)
    vid_list = list(range(3)) + list(range(4, 9))

    counter = 0



    a = np.load('/vol/research/yy01071_sound/kxy-MSc_project/SPIN-master/data/dataset_extras/mpi_inf_3dhp_train.npz')

    b = a["imgname"]




    for user_i in user_list:
        for seq_i in seq_list:
            seq_path = os.path.join(dataset_path,
                                    'S' + str(user_i),
                                    'Seq' + str(seq_i))
            # mat file with annotations
            annot_file = os.path.join(seq_path, 'annot.mat')
            annot2 = sio.loadmat(annot_file)['annot2']
            annot3 = sio.loadmat(annot_file)['annot3']
            # calibration file and camera parameters
            calib_file = os.path.join(seq_path, 'camera.calibration')
            Ks, Rs, Ts = read_calibration(calib_file, vid_list)

            for j, vid_i in enumerate(vid_list):

                # image folder
                imgs_path = os.path.join(seq_path,
                                         'imageFrames',
                                         'video_' + str(vid_i))

                # extract frames from video file
                if extract_img:

                    # if doesn't exist
                    if not os.path.isdir(imgs_path):
                        os.makedirs(imgs_path)

                    # video file
                    vid_file = os.path.join(seq_path,
                                            'imageSequence',
                                            'video_' + str(vid_i) + '.avi')
                    vidcap = cv2.VideoCapture(vid_file)








                    pre_list = []

                    l = len(b)
                    for i in range(l):
                        cur_b = b[i]
                        sp = cur_b.split("/")
                        s = sp[0]
                        seq = sp[1]
                        c_video = sp[3]
                        c_frame = sp[-1]
                        if s == 'S' + str(user_i):
                            if seq == 'Seq' + str(seq_i):
                                if c_video == 'video_' + str(vid_i):
                                    num = c_frame[6:12]
                                    pre_list.append(int(num))



                    # process video
                    frame = 0
                    index = 0
                    num=pre_list[index]
                    while 1:
                        # extract all frames
                        success, image = vidcap.read()
                        if not success:
                            break
                        frame += 1

                        # 后加
                        if frame != num:
                            continue
                        index +=1

                        # image name
                        imgname = os.path.join(imgs_path,
                                               'frame_%06d.jpg' % frame)
                        # save image
                        cv2.imwrite(imgname, image)
                        try:
                            num = pre_list[index]
                        except:
                            pass

3.重新运行training code
重新运行train又会有一些小的报错。

报错1: scipy.misc is deprecated and has no attribute imrotate
解决：#new_img = scipy.misc.imrotate(new_img, rot)
scipy.ndimage.interpolation.rotate(new_img, rot)

报错2：Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or logical_not() operator instead.
解决：/spin/lib/python3.9/site-packages/torchgeometry/core/conversions.py 302行：
# mask_c0 = mask_d2 * mask_d0_d1
# mask_c1 = mask_d2 * (1 - mask_d0_d1)
# mask_c2 = (1 - mask_d2) * mask_d0_nd1
# mask_c3 = (1 - mask_d2) * (1 - mask_d0_nd1)
mask_c0 = mask_d2 * mask_d0_d1
mask_c1 = mask_d2 * ~(mask_d0_d1)
mask_c2 = ~(mask_d2) * mask_d0_nd1
mask_c3 = ~(mask_d2) * ~(mask_d0_nd1)

报错3：result type Byte can’t be cast to the desired output type Bool
解决：trainer 227行
#valid_fit = valid_fit | has_smpl
valid_fit = valid_fit | has_smpl.bool()

报错4：内存超了
解决：utils/train_options.py里batch_size=8

以上，终于可以正常train啦！
可以加上wandb查看训练过程

4.附所有datasets.npz的数据大小

train dataset：
lsp, coco, lspet, mpii
[‘imgname’, ‘center’, ‘scale’, ‘part’, ‘openpose’]
lsp (1000, )
lspet (9428, )
mpii (14810, )
coco (28344, )

mpi_inf_3dhp (96507,)
[‘imgname’, ‘center’, ‘scale’, ‘part’, ‘S’, ‘pose’, ‘shape’, ‘has_smpl’, ‘openpose’]
S (96507, 24, 4)
pose (96507, 72)
shape (96507, 10)

validation dataset：
h36m-1 (109867,)
h36m-2 (27558,)
[‘imgname’, ‘center’, ‘scale’, ‘S’]

mpi_inf_3dhp (2929, )
[‘imgname’, ‘center’, ‘scale’, ‘part’, ‘S’]

lsp (1000, )
[‘imgname’, ‘maskname’, ‘partname’, ‘center’, ‘scale’, ‘part’]

3dpw (35515,)
[‘imgname’, ‘center’, ‘scale’, ‘pose’, ‘shape’, ‘gender’]