DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations – CVPR 2016

最新推荐文章于 2020-08-28 20:16:09 发布

2014wzy

最新推荐文章于 2020-08-28 20:16:09 发布

阅读量926

点赞数

分类专栏：论文

论文专栏收录该内容

22 篇文章 1 订阅

订阅专栏

DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations – CVPR 2016

论文(project)：http://personal.ie.cuhk.edu.hk/~lz013/projects/DeepFashion.html

1 clothes的challenges

Large variation in style, texture, and cutting ;

Cloth items are frequently subject to deformation and occlusion;

Often exhibit serious variations when taken under different scenarios, like selfies,online shopping photos.

2 multi-tasks：predicting clothing categories,attributes and landmarks, cross-pose/cross-domain correspondences of cloth pairs

Given the additional landmarks locations may improve recognition;

Massive attributes lead to better partition of the cloth feature space, facilitating the recognition and retrieval of cross-domain clothes images.

---

50 cats

1000 attrs

4~8 landmarks

>30w pairs correspondences

bounding boxes

>80w images

3 core ideas

Handling clothing deformation/occlusion by pooling/gating feature maps upon estimated landmark locations;

Supervised by massive attribute labels;

4 contributions

a) Building the large-scale clothes dataset of over 80w images, namely deepfashion,annotated with cats, attrs, landmarks, and cross-pose/cross-domain pair correspondences;

b) Developing FashionNet to jointly predict attrs and landmarks, in this case estimatedlandmarks are then employed to pool/gate the learned features;

c) Defining benchmark datasets and evaluation protocols for three widely accepted tasks inclothes recognition and retrieval.

5 approaches -> FashionNet

Simultaneously predicting landmarks and attributes

基于VGGNet的multi-tasks的网络，其中利用landmarks来获取local features，并和global features进行fusion，

最后同时预测category，attribute和triple（对应于pair correspondence，在测试阶段，该branch可能不会用到）。

This procedure performs in aniterative manner, and the whole framework can be learned end-to-end.

Red branch:capturing global features of entire clothing item

Green branch:capturing local features pooling over estimated clothing landmarks

Blue branch:predicting landmark’s location as well as their visibility (or occluded or not)

Moreover, the outputs of thebranches in red and green are concatenated together ad in ‘fc2_fusion’ tojointly predict the clothes categories, attributes and to model clothes pairs.

Forward Pass

A cloth image is fed into the network

passed through the branch inblue to predict the landmarks’ locations

the estimated landmarks areemployed to pool or gate the features in ‘pool5_local’,

which is a land markpooling layer, leading to local features that are invariant to the deformationsand occlusions of clothes

the global features of‘fc6_global’ and the landmark-pooled local features of ‘fc6_local’ areconcatenated together in ‘fc7_fusion’.

Backward Pass

Four type loss functions

Landmark localization: regression loss (L2 norm with visibility variables) -> Errors not bp when a landmark is occluded

Landmark visibility: softmax loss

Clothes categories: softmax loss

Attributes prediction: weighed cross-entropy loss

Two coefficients are determined by theratio of the numbers of positive and negative samples (to control the imbalancesbetween attributes)

Pairwise clothes images: triplet loss for metric learning

Iterative training strategy：

类似faster-rcnn的4-stages，这里是2-stages：

Joint-training：4个loss同时进行bp，只是着重于训练landmark的localization。

通过设置不同的lossweights来实现

treat the branch in blue as the main task, and the remaining branches asthe auxiliary tasks,

by setting the loss weights of landmark visibility and localizationto be large, while the others have small weights.

显然认为correlated的multi-tasks的joint optimization可以加快收敛。

Predicting clothing categories andattributes, as well as to learn the pairwise relations between clothes images:

主要学习类别分类、属性分类和pairwise的correspondences，而landmark的location和visibility则用来获取localfeature为branchin green提取localfeatures。

（这里的branchin blue可以学习，也可以不学习，学习的话，其lossweights会很小）

Landmark PoolingLayer

类似于roi-pooling layer，只不过bounding box变成了landmark的location和visibility而已。

如果visibility为可见，则根据location在特征图（如conv4）上的映射，选取一个region，

进行max-pooling；如果为不可见，则对应的输出特征由0填充。

最后把所有landmark得到的localfeatures连接起来(concatenate)。

*****比较好奇如何根据location来确定其region的大小*****

6 evaluation protocals

Category and Attribute Prediction:
63720 images
Top-k accuracy for cats
Top-k recall for attrs

In-Shop Clothes Retrieval
54642 images, 11735 clothing items
Top-k retrieval accuracy

Consumer-to-Shop Clothes Retrieval
251361 consumer-to-shop image pairs from Mogujie
Top-k retrieval accuracy

7 experiments

从实验来看，multi-tasks的joint training对类别分类来说，提升最大。其中landmark的pool/gate features以及attributes的分类能够提升类别分类。

论文中提及到可以用human pose estimation来替代landmark localization，不过效果不如landmark localization。

---

整体上来说，整个framework是比较经典的，multi-tasks的joint training，

如landmark localization辅助于local feature的提取，从而更加学习到global和local的信息，使得特征空间更加判断性。

缺点就是需要构造这样丰富的标注信息。

另外论文中没有分析triplet training对效果带来的提升进行分析。

---

感觉最近几年，cloth parsing & segmentation，cloth categories的classification，cloth&instance search/retrieval，比较热门，

各大会议都有这方面的论文。

骚年吖，可以考虑下这个方向，说不定下一个一作就是你了

2014wzy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录