人群计数:SFANet--Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

There are two questions in crowd counting:

large head scale variations caused by camera perspective and diverse crowd distributions with high background noisy scenes.

Faced with such problems, we proposed a model named SFANet to solve the two questions.

**

Abstract:

**
The proposed SFANet contains two main components: a VGG backbone convolutional neural network (CNN) as the front-end feature map extractor and a dual path multi-scale fusion networks as the back-end to generate density map. These dual path multi-scale fusion networks have the same structure, one path is responsible for generating attention map by highlighting crowd regions in images, the other path is responsible for fusing multiscale features as well as attention map to generate the final high-quality and high-resolution density maps.

**

Contribution:

**

  1. We design a multi-scale fusion network architecture to fuse the feature maps from multi-layers to make the network more robust for the head scale variation and background noise, and also generating high-resolution density maps.
  2. We incorporating the attention model into the network by adding a path of multi-scale feature fusion as attention map path, which makes the proposed method focus on head regions for the density map regression task, therefore improving its robustness to complex backgrounds and diverse crowd distributions.
  3. We propose a novel multi-task training loss, combining Euclidean loss and attention map loss to make network convergence faster and better performance. The former loss minimizes the pixel-wise error and the latter one focus on locating the head regions.

**

Architecture:

**

在这里插入图片描述
The architecture is made up of three paths:

  1. feature map extractor (FME)
  2. density map path (DMP)
  3. attention map path (AMP)

FME:
Our network adopts the first 13 layers from VGG16-bn as the front-end feature map extractor (FME) to extract multi-scale feature maps with different level semantics information and different scale feature information.
The low level and small scale features can well represent the detail edge patterns which are essential for regressing the value of congested region in density map.
The high level and large scale features have useful semantics information to eliminate the background noise. So we use them together.

DMP:
We design a path of multi-scale feature fusion as density map path (DMP) to combine these two advantages of different level features. Another advantage of multi-scale feature fusion structure is that we can gain the high-resolution density map by upsample operation.
Another issue is that DMP network does regression for every density map pixel, while do not explicitly give more attention to head regions during training and testing. That is to say, there are high background noise.

AMP:
To further tackle the high background noise issue. It has the same structure to learn a probability map that indicates high probability head regions. This attention map is used to suppress non-head regions in the last feature maps of DMP, which makes DMP focus on learning the regression task only in high probability head regions.
We also introduce a multi-task loss by adding a attention map loss for AMP, which improves the network performance with more explicit supervised signal.

**

Loss Function:

**
The Euclidean loss is used to measure estimation error at pixel level, which is defined as follows:
在这里插入图片描述

The attention map loss function is a binary class entropy, defined as:

在这里插入图片描述
α is a weighting weight that is set as 0.1 in the experiments.

**

Attention map groundtruth:

**

Based on density map groundtruth, we continue use Gaussian kernel to compute attention map groundtruth as follows:
在这里插入图片描述
where th is the threshold set as 0.001 in our experiments. With equation7, 8, we obtain a binary attention map groundtruth in order to guide the AMP to focus on the head regions and also the surround places. In experiment, we set µ = 3 and ρ = 2 for generating attention map groundtruth.

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
boosting-crowd-counting-via-multifaceted-attention是一种通过多方面注意力提升人群计数的方法。该方法利用了多个方面的特征来准确估计人群数量。 在传统的人群计数方法中,往往只关注人群的整体特征,而忽略了不同区域的细节。然而,不同区域之间的人群密度可能存在差异,因此细致地分析这些区域是非常重要的。 该方法首先利用卷积神经网络(CNN)提取图像的特征。然后,通过引入多个注意力机制,分别关注图像的局部细节、稀疏区域和密集区域。 首先,该方法引入了局部注意力机制,通过对图像的局部区域进行加权来捕捉人群的局部特征。这使得网络能够更好地适应不同区域的密度变化。 其次,该方法采用了稀疏区域注意力机制,它能够识别图像中的稀疏区域并将更多的注意力放在这些区域上。这是因为稀疏区域往往是需要重点关注的区域,因为它们可能包含有人群密度的极端变化。 最后,该方法还引入了密集区域注意力机制,通过提取图像中人群密集的区域,并将更多的注意力放在这些区域上来准确估计人群数量。 综上所述,boosting-crowd-counting-via-multifaceted-attention是一种通过引入多个注意力机制来提高人群计数的方法。它能够从不同方面细致地分析图像,并利用局部、稀疏和密集区域的特征来准确估计人群数量。这个方法通过考虑人群分布的细节,提供了更精确的人群计数结果。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值