项目实践 | 多人姿态估计实践（代码+权重=一键运行）

最新推荐文章于 2024-04-12 09:41:11 发布

AI算法修炼营

最新推荐文章于 2024-04-12 09:41:11 发布

阅读量843

点赞数

点击最上方蓝色【AI算法修炼营】关注公众号

回复【姿态估计】即可获得完整的项目代码以及文档说明。

1、姿态估计的简介

2、Realtime Multi-Person 2D Human Pose Estimation using Part Affinity Fields

2.1、模型结构（附Keras代码）

2.2、算法流程

2.3、测试结果

参考

1、姿态估计的简介

姿态估计问题就是确定某一三维目标物体的方位指向问题。姿态估计在机器人视觉、动作跟踪和单照相机定标等很多领域都有应用。在不同领域用于姿态估计的传感器是不一样的。

2、Realtime Multi-Person 2D Human Pose Estimation using Part Affinity Fields

2.1、模型结构

网络分为两路结构，一路是上面的卷积层，用来获得置信图；一路是下面的卷积层，用来获得PAFs。网络分为多个stage，每一个stage结束的时候都有中继监督。每一个stage结束之后，S以及L都和stage1中的F合并。上下两路的loss都是计算预测和理想值之间的L2 loss。



def relu(x): return Activation('relu')(x)




def conv(x, nf, ks, name, weight_decay):
    kernel_reg = l2(weight_decay[0]) if weight_decay else None
    bias_reg = l2(weight_decay[1]) if weight_decay else None


    x = Conv2D(nf, (ks, ks), padding='same', name=name,
               kernel_regularizer=kernel_reg,
               bias_regularizer=bias_reg,
               kernel_initializer=random_normal(stddev=0.01),
               bias_initializer=constant(0.0))(x)
    return x




def pooling(x, ks, st, name):
    x = MaxPooling2D((ks, ks), strides=(st, st), name=name)(x)
    return x




def vgg_block(x, weight_decay):
    # Block 1
    x = conv(x, 64, 3, "conv1_1", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 64, 3, "conv1_2", (weight_decay, 0))
    x = relu(x)
    x = pooling(x, 2, 2, "pool1_1")


    # Block 2
    x = conv(x, 128, 3, "conv2_1", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 3, "conv2_2", (weight_decay, 0))
    x = relu(x)
    x = pooling(x, 2, 2, "pool2_1")


    # Block 3
    x = conv(x, 256, 3, "conv3_1", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 256, 3, "conv3_2", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 256, 3, "conv3_3", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 256, 3, "conv3_4", (weight_decay, 0))
    x = relu(x)
    x = pooling(x, 2, 2, "pool3_1")


    # Block 4
    x = conv(x, 512, 3, "conv4_1", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 512, 3, "conv4_2", (weight_decay, 0))
    x = relu(x)


    # Additional non vgg layers
    x = conv(x, 256, 3, "conv4_3_CPM", (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 3, "conv4_4_CPM", (weight_decay, 0))
    x = relu(x)


    return x




def stage1_block(x, num_p, branch, weight_decay):
    # Block 1
    x = conv(x, 128, 3, "Mconv1_stage1_L%d" % branch, (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 3, "Mconv2_stage1_L%d" % branch, (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 3, "Mconv3_stage1_L%d" % branch, (weight_decay, 0))
    x = relu(x)
    x = conv(x, 512, 1, "Mconv4_stage1_L%d" % branch, (weight_decay, 0))
    x = relu(x)
    x = conv(x, num_p, 1, "Mconv5_stage1_L%d" % branch, (weight_decay, 0))


    return x




def stageT_block(x, num_p, stage, branch, weight_decay):
    # Block 1
    x = conv(x, 128, 7, "Mconv1_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 7, "Mconv2_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 7, "Mconv3_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 7, "Mconv4_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 7, "Mconv5_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, 128, 1, "Mconv6_stage%d_L%d" % (stage, branch), (weight_decay, 0))
    x = relu(x)
    x = conv(x, num_p, 1, "Mconv7_stage%d_L%d" % (stage, branch), (weight_decay, 0))


    return x




def apply_mask(x, mask1, mask2, num_p, stage, branch):
    w_name = "weight_stage%d_L%d" % (stage, branch)
    if num_p == 38:
        w = Multiply(name=w_name)([x, mask1]) # vec_weight


    else:
        w = Multiply(name=w_name)([x, mask2])  # vec_heat
    return w




def get_training_model(weight_decay):


    stages = 6
    np_branch1 = 38
    np_branch2 = 19


    img_input_shape = (None, None, 3)
    vec_input_shape = (None, None, 38)
    heat_input_shape = (None, None, 19)


    inputs = []
    outputs = []


    img_input = Input(shape=img_input_shape)
    vec_weight_input = Input(shape=vec_input_shape)
    heat_weight_input = Input(shape=heat_input_shape)


    inputs.append(img_input)
    inputs.append(vec_weight_input)
    inputs.append(heat_weight_input)


    img_normalized = Lambda(lambda x: x / 256 - 0.5)(img_input) # [-0.5, 0.5]


    # VGG
    stage0_out = vgg_block(img_normalized, weight_decay)


    # stage 1 - branch 1 (PAF)
    stage1_branch1_out = stage1_block(stage0_out, np_branch1, 1, weight_decay)
    w1 = apply_mask(stage1_branch1_out, vec_weight_input, heat_weight_input, np_branch1, 1, 1)


    # stage 1 - branch 2 (confidence maps)
    stage1_branch2_out = stage1_block(stage0_out, np_branch2, 2, weight_decay)
    w2 = apply_mask(stage1_branch2_out, vec_weight_input, heat_weight_input, np_branch2, 1, 2)


    x = Concatenate()([stage1_branch1_out, stage1_branch2_out, stage0_out])


    outputs.append(w1)
    outputs.append(w2)


    # stage sn >= 2
    for sn in range(2, stages + 1):
        # stage SN - branch 1 (PAF)
        stageT_branch1_out = stageT_block(x, np_branch1, sn, 1, weight_decay)
        w1 = apply_mask(stageT_branch1_out, vec_weight_input, heat_weight_input, np_branch1, sn, 1)


        # stage SN - branch 2 (confidence maps)
        stageT_branch2_out = stageT_block(x, np_branch2, sn, 2, weight_decay)
        w2 = apply_mask(stageT_branch2_out, vec_weight_input, heat_weight_input, np_branch2, sn, 2)


        outputs.append(w1)
        outputs.append(w2)


        if (sn < stages):
            x = Concatenate()([stageT_branch1_out, stageT_branch2_out, stage0_out])


    model = Model(inputs=inputs, outputs=outputs)


    return model




def get_testing_model():
    stages = 6
    np_branch1 = 38
    np_branch2 = 19


    img_input_shape = (None, None, 3)


    img_input = Input(shape=img_input_shape)


    img_normalized = Lambda(lambda x: x / 256 - 0.5)(img_input) # [-0.5, 0.5]


    # VGG
    stage0_out = vgg_block(img_normalized, None)


    # stage 1 - branch 1 (PAF)
    stage1_branch1_out = stage1_block(stage0_out, np_branch1, 1, None)


    # stage 1 - branch 2 (confidence maps)
    stage1_branch2_out = stage1_block(stage0_out, np_branch2, 2, None)


    x = Concatenate()([stage1_branch1_out, stage1_branch2_out, stage0_out])


    # stage t >= 2
    stageT_branch1_out = None
    stageT_branch2_out = None
    for sn in range(2, stages + 1):
        stageT_branch1_out = stageT_block(x, np_branch1, sn, 1, None)
        stageT_branch2_out = stageT_block(x, np_branch2, sn, 2, None)


        if (sn < stages):
            x = Concatenate()([stageT_branch1_out, stageT_branch2_out, stage0_out])


    model = Model(inputs=[img_input], outputs=[stageT_branch1_out, stageT_branch2_out])


    return model

Loss方程中有一个空间上的加权，是因为有些数据集没有完全标注所有的人，用其提供的mask说明有些区域是可能包含没有标记的人。最终的loss是各个阶段的loss相加。

论文在MPII和COCO数据集上都取得了非常好的效果，制作的demo效果也非常好，只是对尺度比较小的人检测效果不如其他算法。

论文所提方法

1，使用置信图进行关节检测

每一个关节对应一个置信图，图像每一个像素点都有一个置信度，置信图中每点的值与ground truth的距离相关。关于多个人的检测，是将K个人的置信图合并取该点每个人的最大值。这里使用最大而不是平均是因为即使峰值很近也不会影响精度。测试阶段使用非极大值抑制来获得身体部分的候选。

2，使用PAF进行身体部分组合

对于多个人的问题，检测了不同人的部分，但是还需要将每个人的身体分别组合在一起形成full-body，使用的方法就是论文的精华PAF。这个方法的好处在于将位置和方向信息都包含了。每一种limb（肢）在关联的两个body part之间都有一个亲和区域，其中的每一个像素都有一个2D 向量的描述方向。亲和区map的维度是w*h*2 (因为向量是二维的)。若某个点有多人重叠，则将k个人的vector求和，再除以人数。

3，bottom-up方法

在得到了置信图和PAF之后，需要考虑如何利用这些信息找到两两body-part最优化的连接方式，这转换为图论问题。论文使用的是Hungarian algorithm。图中的节点就是body part中的检测候选，边就是这些候选最优的连接方式。每条边上的权值就是亲和区的聚合。因此这样的匹配问题就是找到一组连接使

得没有两条边是共享一个节点的，也就是找到权值最大的边连接方式。

2.2、算法流程

下图（a-b-c-d-e）流程为：

1、输入一个图像

2、对该图像分别预测关键点的热度图和PAF

3、再根据关键点和肢体最二分匹配进行关联

4、最终得到图中所有人的所有姿态

2.3、测试结果

参考：

https://blog.csdn.net/qq_36165459/article/details/78322184

https://zhuanlan.zhihu.com/p/79594205

https://blog.csdn.net/diligent_321/article/details/86659763

https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation

注意：数据集为coco数据集，可以自行下载。



目标检测系列秘籍一：模型加速之轻量化网络秘籍二：非极大值抑制及回归损失优化秘籍三：多尺度检测秘籍四：数据增强秘籍五：解决样本不均衡问题秘籍六：Anchor-Free视觉注意力机制系列Non-local模块与Self-attention之间的关系与区别？视觉注意力机制用于分类网络：SENet、CBAM、SKNetNon-local模块与SENet、CBAM的融合：GCNet、DANetNon-local模块如何改进？来看CCNet、ANN
语义分割系列一篇看完就懂的语义分割综述最新实例分割综述：从Mask RCNN 到 BlendMask超强视频语义分割算法！基于语义流快速而准确的场景解析CVPR2020 | HANet:通过高度驱动的注意力网络改善城市场景语义分割

基础积累系列卷积神经网络中的感受野怎么算？
图片中的绝对位置信息，CNN能搞定吗？理解计算机视觉中的损失函数深度学习相关的面试考点总结


自动驾驶学习笔记系列 Apollo Udacity自动驾驶课程笔记——高精度地图、厘米级定位 Apollo Udacity自动驾驶课程笔记——感知、预测 Apollo Udacity自动驾驶课程笔记——规划、控制自动驾驶系统中Lidar和Camera怎么融合？

竞赛与工程项目分享系列如何让笨重的深度学习模型在移动设备上跑起来基于Pytorch的YOLO目标检测项目工程大合集目标检测应用竞赛：铝型材表面瑕疵检测基于Mask R-CNN的道路物体检测与分割
SLAM系列视觉SLAM前端：视觉里程计和回环检测视觉SLAM后端：后端优化和建图模块视觉SLAM中特征点法开源算法：PTAM、ORB-SLAM视觉SLAM中直接法开源算法：LSD-SLAM、DSO视觉SLAM中特征点法和直接法的结合：SVO
2020年最新的iPad Pro上的激光雷达是什么？来聊聊激光SLAM

AI算法修炼营

关注

0
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
项目实践 | 多人姿态估计实践（代码+权重=一键运行）

点击最上方蓝色【AI算法修炼营】关注公众号回复【姿态估计】即可获得完整的项目代码以及文档说明。目录1、姿态估计的简介2、Realtime Multi-Person 2D Human Po...
复制链接

扫一扫