Describing People: A Poselet-Based Approach to Attribute Classification

最新推荐文章于 2022-06-30 08:48:21 发布

DuinoDu

最新推荐文章于 2022-06-30 08:48:21 发布

阅读量997

点赞数

分类专栏：精细化识别

本文链接：https://blog.csdn.net/DuinoDu/article/details/52717659

版权

精细化识别专栏收录该内容

0 篇文章 0 订阅

订阅专栏

原文链接

1. Abstract

Use a part-based approach based on poselets.(Poselets is proposed by Lubomir Bourdev in 2009)

2. Introduce

Convert finegrained to attribute classification problem. For one attribute, we need to conbine many cues. For classification, detecting and aligning the parts is of much importance. But localizing body parts is a tough task.

The training input is a set of images in which the people of interset are specified via their visible bounds and the values of their attributes. Use a three layer feed-forward network. Three layers mean three steps of work. This layer is not the layer in deep learning.

In the first layer(first step), predict 9 attributes(is-male, has-hat, has-t-shirt,…) for each human part.

In the second layer(second step), combine information from all such predictions, as the gender given the face, the leges, and other parts, into one single attribute classification.

In the third layer(third layer), leverage dependencies between different attributes, such as the fact that gender is correlated with the presence of long hair.

In fact, this article regards poselets as a general tool for decomposing the viewpoint and pose.

3. Algorithm

Step 1

Detect the poselets on the test image and get $q^i$ for the probability of poselet type i.

Step 2

For each poselet type i, extract a feature vector consisting of HOG cells at three, a color histogram and skin-mask features.

Step 3(first layer)

For each poselet type i and each attribute j, evalute a classifier $r^i_j$ for attribute j conditioned on the poselet i. These classifiers are called poselet-level attribute classifiers. Classifier is a linear SVM followed by a logistic g.(What is the relationship between SVM and logistic here?)

Step 4(second layer)

For the output of poselet-level attribute classifiers, we zero-center them(move the center to zero) and modulate them by the poselet detection probabilities $q^i$ (multiply q) to get the input of a second classifier called person-level attribute classifier, whose goal is to combine the evidence from all body parts.

Step 5(third layer)

For each attribute j, evalute a third classifier called context-level attribute classifier. Input feature vector is the scores of all person-level classifiers for all attributes $s_j$ . Classifier is an SVM with quadratic kernel.