人群计数:SFANet--Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

There are two questions in crowd counting:

large head scale variations caused by camera perspective and diverse crowd distributions with high background noisy scenes.

Faced with such problems, we proposed a model named SFANet to solve the two questions.

**

Abstract:

**
The proposed SFANet contains two main components: a VGG backbone convolutional neural network (CNN) as the front-end feature map extractor and a dual path multi-scale fusion networks as the back-end to generate density map. These dual path multi-scale fusion networks have the same structure, one path is responsible for generating attention map by highlighting crowd regions in images, the other path is responsible for fusing multiscale features as well as attention map to generate the final high-quality and high-resolution density maps.

**

Contribution:

**

  1. We design a multi-scale fusion network architecture to fuse the feature maps from multi-layers to make the network more robust for the head scale variation and background noise, and also generating high-resolution density maps.
  2. We incorporating the attention model into the network by adding a path of multi-scale feature fusion as attention map path, which makes the proposed method focus on head regions for the density map regression task, therefore improving its robustness to complex backgrounds and diverse crowd distributions.
  3. We propose a novel multi-task training loss, combining Euclidean loss and attention map loss to make network convergence faster and better performance. The former loss minimizes the pixel-wise error and the latter one focus on locating the head regions.

**

Architecture:

**

在这里插入图片描述
The architecture is made up of three paths:

  1. feature map extractor (FME)
  2. density map path (DMP)
  3. attention map path (AMP)

FME:
Our network adopts the first 13 layers from VGG16-bn as the front-end feature map extractor (FME) to extract multi-scale feature maps with different level semantics information and different scale feature information.
The low level and small scale features can well represent the detail edge patterns which are essential for regressing the value of congested region in density map.
The high level and large scale features have useful semantics information to eliminate the background noise. So we use them together.

DMP:
We design a path of multi-scale feature fusion as density map path (DMP) to combine these two advantages of different level features. Another advantage of multi-scale feature fusion structure is that we can gain the high-resolution density map by upsample operation.
Another issue is that DMP network does regression for every density map pixel, while do not explicitly give more attention to head regions during training and testing. That is to say, there are high background noise.

AMP:
To further tackle the high background noise issue. It has the same structure to learn a probability map that indicates high probability head regions. This attention map is used to suppress non-head regions in the last feature maps of DMP, which makes DMP focus on learning the regression task only in high probability head regions.
We also introduce a multi-task loss by adding a attention map loss for AMP, which improves the network performance with more explicit supervised signal.

**

Loss Function:

**
The Euclidean loss is used to measure estimation error at pixel level, which is defined as follows:
在这里插入图片描述

The attention map loss function is a binary class entropy, defined as:

在这里插入图片描述
α is a weighting weight that is set as 0.1 in the experiments.

**

Attention map groundtruth:

**

Based on density map groundtruth, we continue use Gaussian kernel to compute attention map groundtruth as follows:
在这里插入图片描述
where th is the threshold set as 0.001 in our experiments. With equation7, 8, we obtain a binary attention map groundtruth in order to guide the AMP to focus on the head regions and also the surround places. In experiment, we set µ = 3 and ρ = 2 for generating attention map groundtruth.

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值