人群计数:Cross-scene Crowd Counting via Deep Convolutional Neural Networks

**

Goal:

**
proposed a deep CNN with two related learning objectives –crowd density and crowd count.

**

Contribution :

**

  1. Our CNN model is trained for crowd scenes by a switchable learning process with two learning objectives, crowd density maps and crowd counts. The two different but related objectives can alternatively assist each other to obtain better local optima.
  2. The target scenes require no extra labels in our framework for cross-scene counting. The pre-trained CNN model is fine-tuned for each target scene to overcome the domain gap between different scenes. The fine-tuned model is specifically adapted to the new target scene.
  3. The framework does not rely on foreground segmentation results because only appearance information is considered in our method. No matter whether the crowd is moving or not, the crowd texture would be captured by the CNN model and can obtain a reasonable counting result.
  4. We also introduce a new dataset named WorldExpo’10 for evaluating cross-scene crowd counting methods. To the best of our knowledge, this is the largest dataset for evaluating crowd counting algorithms.

**

Architecture :

**
在这里插入图片描述
The main objective for our crowd CNN model is to learn a mapping F : X → D, where X is the set of low-level features extracted from training images and D is the crowd density map of the image.

**

training :

**

Training set :

Perspective normalization is necessary to estimate the pedestrian scales. Patches randomly selected from the training images are treated as training samples, and the density maps of corresponding patches are treated as the ground truth for the crowd CNN model.

The input is the image patches cropped from training images. In order to obtain pedestrians at similar scales, the size of each patch at different locations is chosen according to the perspective value of its center pixel.

Here we constrain each patch to cover a 3-meter by 3-meter square in the actual scene as shown in Figure 3. Then the patches are warped to 72 pixels by 72 pixels as the input of the Crowd CNN model.

Training target :

The two loss functions of density map and crowd number:
在这里插入图片描述
Training process:
在这里插入图片描述

**

Cross-scene Crowd Counting :

**

In order to bridge the distribution gap between the training and test scenes, we design a nonparametric fine-tuning scheme to adapt our pre-trained CNN model to unseen target scenes.

Giving a target video from the unseen scenes, samples with similar properties from the training scenes are retrieved and added to training data to fine-tune the crowd CNN model. The retrieval task consists of two steps, candidate scenes retrieval and local patch retrieval.

Two steps : (a) Retrieving candidate scenes by matching perspective maps of the training scenes and the test scene. (b) Local patches similar to those in the test scene are retrieved from the candidate scenes.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
boosting-crowd-counting-via-multifaceted-attention是一种通过多方面注意力提升人群计数的方法。该方法利用了多个方面的特征来准确估计人群数量。 在传统的人群计数方法中,往往只关注人群的整体特征,而忽略了不同区域的细节。然而,不同区域之间的人群密度可能存在差异,因此细致地分析这些区域是非常重要的。 该方法首先利用卷积神经网络(CNN)提取图像的特征。然后,通过引入多个注意力机制,分别关注图像的局部细节、稀疏区域和密集区域。 首先,该方法引入了局部注意力机制,通过对图像的局部区域进行加权来捕捉人群的局部特征。这使得网络能够更好地适应不同区域的密度变化。 其次,该方法采用了稀疏区域注意力机制,它能够识别图像中的稀疏区域并将更多的注意力放在这些区域上。这是因为稀疏区域往往是需要重点关注的区域,因为它们可能包含有人群密度的极端变化。 最后,该方法还引入了密集区域注意力机制,通过提取图像中人群密集的区域,并将更多的注意力放在这些区域上来准确估计人群数量。 综上所述,boosting-crowd-counting-via-multifaceted-attention是一种通过引入多个注意力机制来提高人群计数的方法。它能够从不同方面细致地分析图像,并利用局部、稀疏和密集区域的特征来准确估计人群数量。这个方法通过考虑人群分布的细节,提供了更精确的人群计数结果。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值