关键点模型算法---服装（Keypoints Detection）

最新推荐文章于 2025-03-21 08:15:00 发布

柏常青

最新推荐文章于 2025-03-21 08:15:00 发布

阅读量9k

点赞数 21

分类专栏： Pytorch 文章标签： pytorch 深度学习 python

本文链接：https://blog.csdn.net/beauthy/article/details/114318277

版权

Pytorch 专栏收录该内容

19 篇文章

订阅专栏

本文介绍了一种基于全卷积网络的服装关键点检测算法，该算法能够从输入的服装图片中准确检测并定位关键点，如裤子的7个关键点、裙子的4个关键点等。文章详细阐述了模型设计、网络结构及测试效果。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

关键点模型算法：

简介：服装关键点检测算法的基本思路是：输入服装图片，经过网络，输出关键点集合。
首先：思考
在这里插入图片描述

思考，对衣服（物体）做了精确的标记，我们是不是就可以通过画线准确分割出衣服（物体）了呢？
是不是也可以轻易抠出图中的衣服（物体）了呢？
还能抠出图像做其他事情，形变，膨胀，拉伸。。。
那么，对于算法模型，是怎么得出关键点的呢？或者说，模型学习了哪些经验就可以自己判断输出关键点位置了呢？
模型结构：输入，输出，中间层设计应该考虑些什么问题？

以下内容将简述模型设计，网络结构可测试效果。

数据IO：

输入代标记图像x
例如：
输出标记结果：Y
在这里插入图片描述
由此训练出关键点检测模型。

模型创建：

算法整体流程：
在这里插入图片描述

 model = CoordRegressionNetwork(n_locations=24)

在这里，算法模型封装为一个数值坐标回归类CoordRegressionNetwork类中:

class CoordRegression(nn.Module):
    def __init__(self, n_locations):
        super().__init__()
        self.fcn = KeyPointsModel()
        self.hm_conv = nn.Conv2d(24, n_locations, kernel_size=1, bias=False)

    def forward(self, images):
        # run images trough FCN
        fcn_out = self.fcn(images)
        # use a 1x1 conv to get one un_normalized heat-map per location
        unnormalized_heatmaps = self.hm_conv(fcn_out)
        # normalize the heatmaps
        heatmaps = dsntnn.flat_softmax(unnormalized_heatmaps)

        # calculate the coordinates
        coords = dsntnn.dsnt(heatmaps)

        return coords, heatmaps

根据def forward(self, images)函数可以看出，模型的输入是图像，输出是坐标位置coords和热图heatmaps。

CoordRegressionNetwork类封调用了两个模型：一个图像坐标关键点特征信息提取网络KeyPointsModel生成热图；一个坐标回归网络dsntnn，用于生成关键点坐标集合和每个点检点坐标组成的热图。

坐标关键点特征信息提取网络KeyPointsModel

subroutine1:子模块

class KeyPointsModel(nn.Module):
    def __init__(self):
        super(KeyPointsModel, self).__init__()

        # these layers have no relu layer
        no_relu_layers = ['conv6_2_CPM', 'Mconv7_stage2', 'Mconv7_stage3',
                          'Mconv7_stage4', 'Mconv7_stage5', 'Mconv7_stage6']
        self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0, return_indices=True)
        self.maxunpool = nn.MaxUnpool2d(2, stride=2)

        # stage 1
        block1_0_0 = OrderedDict([
            ('conv1_1', [3, 64, 3, 1, 1]),
            ('conv1_2', [64, 64, 3, 1, 1]),
        ])
        block1_0_1 = OrderedDict([
            ('conv2_1', [64, 128, 3, 1, 1]),
            ('conv2_2', [128, 128, 3, 1, 1]),
        ])
        block1_0_2 = OrderedDict([
            ('conv3_1', [128, 256, 3, 1, 1]),
            ('conv3_2', [256, 256, 3, 1, 1]),
            ('conv3_3', [256, 256, 3, 1, 1]),
            ('conv3_4', [256, 256, 3, 1, 1]),
        ])
        block1_0_3 = OrderedDict([
            ('conv4_1', [256, 512, 3, 1, 1]),
            ('conv4_2', [512, 512, 3, 1, 1]),
            ('conv4_3', [512, 512, 3, 1, 1]),
            ('conv4_4', [512, 512, 3, 1, 1]),
            ('conv5_1', [512, 512, 3, 1, 1]),
            ('conv5_2', [512, 512, 3, 1, 1]),
            ('conv5_3_CPM', [512, 128, 3, 1, 1])
        ])



        block1_1 = OrderedDict([
            ('conv6_1_CPM', [128, 512, 1, 1, 0]),
            ('conv6_2_CPM', [512, 24, 1, 1, 0])
        ])

        blocks = {}
        blocks['block1_0_0'] = block1_0_0
        blocks['block1_0_1'] = block1_0_1
        blocks['block1_0_2'] = block1_0_2
        blocks['block1_0_3'] = block1_0_3

        blocks['block1_1'] = block1_1

        # stage 2-6
        for i in range(2, 7):
            blocks['block%d' % i] = OrderedDict([
                ('Mconv1_stage%d' % i, [152, 128, 7, 1, 3]),
                ('Mconv2_stage%d' % i, [128, 128, 7, 1, 3]),
                ('Mconv3_stage%d' % i, [128, 128, 7, 1, 3]),
                ('Mconv4_stage%d' % i, [128, 128, 7, 1, 3]),
                ('Mconv5_stage%d' % i, [128, 128, 7, 1, 3]),
                ('Mconv6_stage%d' % i, [128, 128, 1, 1, 0]),
                ('Mconv7_stage%d' % i, [128, 24, 1, 1, 0])
            ])

        for k in blocks.keys():
            blocks[k] = make_layers(blocks[k], no_relu_layers)

        self.model1_0_0 = blocks['block1_0_0']
        self.model1_0_1 = blocks['block1_0_1']
        self.model1_0_2 = blocks['block1_0_2']
        self.model1_0_3 = blocks['block1_0_3']

        self.model1_1 = blocks['block1_1']
        self.model2 = blocks['block2']
        self.model3 = blocks['block3']
        self.model4 = blocks['block4']
        self.model5 = blocks['block5']
        self.model6 = blocks['block6']

    def forward(self, x):
        # block0
        out1_0_0 = self.model1_0_0(x)
        output, indices = self.maxpool(out1_0_0)
        output_un_pool = self.maxunpool(output, indices)

        out1_0_1 = self.model1_0_1(output_un_pool)
        output, indices = self.maxpool(out1_0_1)
        output_un_pool = self.maxunpool(output, indices)

        out1_0_2 = self.model1_0_2(output_un_pool)
        output, indices = self.maxpool(out1_0_2)
        output_un_pool = self.maxunpool(output, indices)

        out1_0 = self.model1_0_3(output_un_pool)

        # block1
        out1_1 = self.model1_1(out1_0)
        concat_stage2 = torch.cat([out1_1, out1_0], 1)
        out_stage2 = self.model2(concat_stage2)
        concat_stage3 = torch.cat([out_stage2, out1_0], 1)
        out_stage3 = self.model3(concat_stage3)
        concat_stage4 = torch.cat([out_stage3, out1_0], 1)
        out_stage4 = self.model4(concat_stage4)
        concat_stage5 = torch.cat([out_stage4, out1_0], 1)
        out_stage5 = self.model5(concat_stage5)
        concat_stage6 = torch.cat([out_stage5, out1_0], 1)
        out_stage6 = self.model6(concat_stage6)
        return out_stage6

网络结构的层级数量可以根据精度和效率，适当调整。如果代码比较抽象，可以对比看看下面的流程图。每一层的输入，输出，通道数信息也有标注。
关于此处subroutine1网络的直观图详细如下
在这里插入图片描述

我贴网络层详细设计是为了帮助自己强化学习设计思想，有助于直观地理解特征提取过程，以及AI学习是怎么学的。哪些是学习记忆的单元，以及都存储在什么地方的？

数值坐标回归网络DSNTNN

    def forward(self, images):
        # run images trough FCN
        fcn_out = self.fcn(images)
        # use a 1x1 conv to get one un_normalized heat-map per location
        unnormalized_heatmaps = self.hm_conv(fcn_out)
        # normalize the heatmaps
        heatmaps = dsntnn.flat_softmax(unnormalized_heatmaps)

        # calculate the coordinates
        coords = dsntnn.dsnt(heatmaps)

        return coords, heatmaps

需要预先安装：dsntnn
unnormalized_heatmaps = self.hm_conv(fcn_out) 中的hm_conv卷积层定义为：self.hm_conv = nn.Conv2d(24, n_locations, kernel_size=1, bias=False)其中的n_locations为本衣服模型要输出的关键点个数。

# use a 1x1 conv to get one un_normalized heat-map per location
unnormalized_heatmaps = self.hm_conv(fcn_out)

生成粗略的特征图热图后，结果通过dsntnn.flat_softmax生成标准的热图。
热图回归为数值坐标coords = dsntnn.dsnt(heatmaps)，生成对应个数n_locations的关键点集coords。

测试结果：

提示：这里统计学习计划的总量
测试类型：
trousers,7个关键点
skirt:4个关键点
outwear：15个关键点
dress：15个关键点
blouse：13个关键点在这里插入图片描述

资源

论文：Key Points Detection Algorithm of Object Based on Full Convolution Network
项目代码：我的码云：https://gitee.com/rpr/key-points-detection-method-with-torch
项目需安装dsntnn包
训练数据集和测试数据集：链接：https://pan.baidu.com/s/1ZdmzHcJA9FbNDpZF1bm4Hg 提取码：ps4v
训练好的KPDEM_model.zip包括用关键点模型算法（Keypoints Detection）训练服装数据后，得到的裤子、短裙、外套，大衣，dress的关键点检测模型。

需要数据集等其他信息，请给我留言，并附上邮箱信息。