人群计数:Cross-scene Crowd Counting via Deep Convolutional Neural Networks

**

Goal:

**
proposed a deep CNN with two related learning objectives –crowd density and crowd count.

**

Contribution :

**

  1. Our CNN model is trained for crowd scenes by a switchable learning process with two learning objectives, crowd density maps and crowd counts. The two different but related objectives can alternatively assist each other to obtain better local optima.
  2. The target scenes require no extra labels in our framework for cross-scene counting. The pre-trained CNN model is fine-tuned for each target scene to overcome the domain gap between different scenes. The fine-tuned model is specifically adapted to the new target scene.
  3. The framework does not rely on foreground segmentation results because only appearance information is considered in our method. No matter whether the crowd is moving or not, the crowd texture would be captured by the CNN model and can obtain a reasonable counting result.
  4. We also introduce a new dataset named WorldExpo’10 for evaluating cross-scene crowd counting methods. To the best of our knowledge, this is the largest dataset for evaluating crowd counting algorithms.

**

Architecture :

**
在这里插入图片描述
The main objective for our crowd CNN model is to learn a mapping F : X → D, where X is the set of low-level features extracted from training images and D is the crowd density map of the image.

**

training :

**

Training set :

Perspective normalization is necessary to estimate the pedestrian scales. Patches randomly selected from the training images are treated as training samples, and the density maps of corresponding patches are treated as the ground truth for the crowd CNN model.

The input is the image patches cropped from training images. In order to obtain pedestrians at similar scales, the size of each patch at different locations is chosen according to the perspective value of its center pixel.

Here we constrain each patch to cover a 3-meter by 3-meter square in the actual scene as shown in Figure 3. Then the patches are warped to 72 pixels by 72 pixels as the input of the Crowd CNN model.

Training target :

The two loss functions of density map and crowd number:
在这里插入图片描述
Training process:
在这里插入图片描述

**

Cross-scene Crowd Counting :

**

In order to bridge the distribution gap between the training and test scenes, we design a nonparametric fine-tuning scheme to adapt our pre-trained CNN model to unseen target scenes.

Giving a target video from the unseen scenes, samples with similar properties from the training scenes are retrieved and added to training data to fine-tune the crowd CNN model. The retrieval task consists of two steps, candidate scenes retrieval and local patch retrieval.

Two steps : (a) Retrieving candidate scenes by matching perspective maps of the training scenes and the test scene. (b) Local patches similar to those in the test scene are retrieved from the candidate scenes.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值