人群计数：L2R --Leveraging Unlabeled Data for Crowd Counting by Learning to Rank

最新推荐文章于 2019-08-31 21:39:32 发布

目睹闰土刺猹的瓜

最新推荐文章于 2019-08-31 21:39:32 发布

阅读量396

点赞数

分类专栏： Crowd Counting 文章标签：人群计数 L2R 人群密度估计数据集增强深度学习

本文链接：https://blog.csdn.net/weixin_44585583/article/details/99063379

版权

Crowd Counting 专栏收录该内容

17 篇文章 4 订阅

订阅专栏

Goal：

**
Leverage abundantly available unlabeled crowd imagery in a learning-to-rank framework to count crowd.

Why did the learning-to-rank framework work?

**
acquiring data for crowd counting is laborious，we propose a self-supervised task for crowd-counting which exploits crowd images which are not hand-labeled with person counts during training.

Rather than regressing to the absolute number of persons in the image, we train a network which compares images and ranks them according to the number of persons in the images.

The basic idea is all patches contained within a larger patch must have a fewer or equal number of persons than the larger patch

How to collect a large dataset?

Keyword query: We collect a crowd scene dataset from Google Images by using different key words

Query-by-example image retrieval: For each specific existing crowd counting dataset, we collect a dataset by using the training images as queries with the visual image search engine Google Images.

How to generate ranked datasets?

**
在这里插入图片描述

How do we learn from ranked datasets?

**
This question can be divided into three steps as following:

Crowd density estimation network：

consider a network which is trained on available crowd counting datasets with ground truth annotations as the baseline method to which we compare. Our baseline network is derived from the VGG-16 network. We remove its two fully connected layers, and the max-pooling layer (pool5) to prevent further reduction of spatial resolution. In their place we add a single convolutional layer which directly regresses to the crowd density map.

As the counting loss, we use the Euclidean distance between the estimated and ground truth density maps.

Instead of using the whole image as an input, we randomly sample square patches of varying size (from 56 to 448 pixels). And we verify that this multi-scale sampling is important for good performance.

Crowd ranking network：

The data does not have crowd density maps and only ranking data is available via the sampling procedure described in Algorithm 1. We replace the Euclidean loss by an average pooling layer followed by a ranking loss. The average pooling layer converts the density map into an estimate of the number of persons per spatial unit. When network outputs the correct ranking there is no backpropagated gradient. However, when the network estimates are not in accordance with the correct ranking the backpropagated gradient causes the network to increase its estimate for the patch with lower score and to decrease its estimate for the one with higher score.
Combining counting and ranking data:

we discuss three approaches to combining ground truth labeled crowd scenes with data for which only rank information is available.

a) Ranking plus fine-tuning ; b) Alternating-task training ; c) Multi-task training: L = Lc + αLr;
在这里插入图片描述

目睹闰土刺猹的瓜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
人群计数：L2R --Leveraging Unlabeled Data for Crowd Counting by Learning to Rank

**Goal：**Leverage abundantly available unlabeled crowd imagery in a learning-to-rank framework to count crowd.**Why did the learning-to-rank framework work?**acquiring data for crowd counting i...
复制链接

扫一扫

专栏目录