人群计数:L2R --Leveraging Unlabeled Data for Crowd Counting by Learning to Rank

**

Goal:

**
Leverage abundantly available unlabeled crowd imagery in a learning-to-rank framework to count crowd.

**

Why did the learning-to-rank framework work?

**
acquiring data for crowd counting is laborious,we propose a self-supervised task for crowd-counting which exploits crowd images which are not hand-labeled with person counts during training.

Rather than regressing to the absolute number of persons in the image, we train a network which compares images and ranks them according to the number of persons in the images.

The basic idea is all patches contained within a larger patch must have a fewer or equal number of persons than the larger patch

**

How to collect a large dataset?

**

Keyword query: We collect a crowd scene dataset from Google Images by using different key words

Query-by-example image retrieval: For each specific existing crowd counting dataset, we collect a dataset by using the training images as queries with the visual image search engine Google Images.

**

How to generate ranked datasets?

**
在这里插入图片描述

**

How do we learn from ranked datasets?

**
This question can be divided into three steps as following:

  1. Crowd density estimation network:

consider a network which is trained on available crowd counting datasets with ground truth annotations as the baseline method to which we compare. Our baseline network is derived from the VGG-16 network. We remove its two fully connected layers, and the max-pooling layer (pool5) to prevent further reduction of spatial resolution. In their place we add a single convolutional layer which directly regresses to the crowd density map.

As the counting loss, we use the Euclidean distance between the estimated and ground truth density maps.

Instead of using the whole image as an input, we randomly sample square patches of varying size (from 56 to 448 pixels). And we verify that this multi-scale sampling is important for good performance.

  1. Crowd ranking network:

    The data does not have crowd density maps and only ranking data is available via the sampling procedure described in Algorithm 1. We replace the Euclidean loss by an average pooling layer followed by a ranking loss. The average pooling layer converts the density map into an estimate of the number of persons per spatial unit. When network outputs the correct ranking there is no backpropagated gradient. However, when the network estimates are not in accordance with the correct ranking the backpropagated gradient causes the network to increase its estimate for the patch with lower score and to decrease its estimate for the one with higher score.

  2. Combining counting and ranking data:

we discuss three approaches to combining ground truth labeled crowd scenes with data for which only rank information is available.

a) Ranking plus fine-tuning ; b) Alternating-task training ; c) Multi-task training: L = Lc + αLr;
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值