多人部件解析--Towards Real World Human Parsing: Multiple-Human Parsing in the Wild

最新推荐文章于 2024-08-20 08:59:05 发布

lien0906

最新推荐文章于 2024-08-20 08:59:05 发布

阅读量674

点赞数

分类专栏：深度学习学术论文简读

深度学习同时被 2 个专栏收录

84 篇文章

订阅专栏

学术论文简读

5 篇文章

订阅专栏

介绍了MHP数据库和MH-Parser算法，用于解决实际图像中多人人体解析问题，并通过自监督结构敏感学习方法提高分割精度。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Towards Real World Human Parsing: Multiple-Human Parsing in the Wild
https://arxiv.org/abs/1705.07206

数据库没给出来啊！

本文针对当前 human parsing 数据库基本都是单人标记，而图像实际情况经常含有多人，这里我们提出了一个 Multiple-Human Parsing (MHP) 数据库，一般2-16人每张图像。接着我们提出了一个 Multiple-Human Parser (MH-Parser) 算法，在单人解析过程中同时考虑 global context and local cues，得到不错的效果。

先看数据库：
这里写图片描述

各个数据库规模：
这里写图片描述

Dataset statistics
这里写图片描述

MH-Parser：
这里写图片描述

MH-Parser 主要包含五个模块：
1）Representation learner：是一个CNN特征器，它提取的特征由后面几个模块共享，这里使用全卷积网络，以保持 spatial 信息

2）Global parser ：获取整幅图像的全局信息，生成 a semantic parsing map of the whole image

3） Candidate nominator：包括三个子模块 Region Proposal Network (RPN), a bounding box classifier
and a bounding box regression，类似于 Faster RCNN，将每个人检测出来，得到矩形框

4）Local parser：针对每个含有人的矩形框，进行 semantic labels 语义标记

5）Global-local aggregator ：同时将 local parser and the global parser 网络中隐含的信息输入，用于单人矩形框的 semantic parsing predictions

4.2 Detect-and-parse baseline

检测阶段和解析阶段是分离的：
In the detection stage, we use the representation learner and the candidate nominator as the detection
model.

In the parsing stage, we use the representation learner and the local prediction as the
the parsing model.

这里写图片描述

图像分割"LIP: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing"

原创 2017年07月28日 14:59:43

数据集：http://hcp.sysu.edu.cn/lip
code: https://github.com/Engineering-Course/LIP_SSL.
做人体部件分割，构建了一个新的数据库“LIP”，包含19个语义标记。在训练中融入结构信息，提升分割效果。
人体分割具体应用：行人再认证，行为分析等。
目前三个人体部件数据库ATR,Pascal-Person-Part和LIP复杂度比较：
这里写图片描述

使用目前主流分割方法FCN-8S,SegNet,DeepLabV2和Attention机制在ＬＩＰ数据库上的结果如下：
这里写图片描述

目前方法主要的问题：
１．背部图像左右胳膊容易混淆
２．头部在图像中不存在时，效果最差，说明头部是人体分割的重要线索。
３．对小物体检测不好，如鞋子

Self-supervised Structure-sensitive Learning
论文提出的方法，使用人体结构指导训练，定义９个连接点建立姿态结构，分别是head, upper body, lower body, left arm,right arm, left leg, right leg, left shoe and right shoe区域的中心点，网络结构如下图所示。
这里写图片描述
对于每个分解的结果和对应的真值，获取连接点作为热度图，使用Euclidean距离评价生成的结构。之后使用连接点结构损失加权像素级分割损失，即structure-sensitive损失。
即，

LStructure=LJointLParsing