人群计数：SFANet--Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

最新推荐文章于 2024-03-18 18:20:56 发布

目睹闰土刺猹的瓜

最新推荐文章于 2024-03-18 18:20:56 发布

阅读量2.4k

点赞数

分类专栏： Crowd Counting 文章标签：人群计数人群密度估计 SFANet 深度学习

本文链接：https://blog.csdn.net/weixin_44585583/article/details/97917723

版权

Crowd Counting 专栏收录该内容

17 篇文章 4 订阅

订阅专栏

There are two questions in crowd counting:

large head scale variations caused by camera perspective and diverse crowd distributions with high background noisy scenes.

Faced with such problems, we proposed a model named SFANet to solve the two questions.

Abstract：

**
The proposed SFANet contains two main components: a VGG backbone convolutional neural network (CNN) as the front-end feature map extractor and a dual path multi-scale fusion networks as the back-end to generate density map. These dual path multi-scale fusion networks have the same structure, one path is responsible for generating attention map by highlighting crowd regions in images, the other path is responsible for fusing multiscale features as well as attention map to generate the final high-quality and high-resolution density maps.

Contribution:

We design a multi-scale fusion network architecture to fuse the feature maps from multi-layers to make the network more robust for the head scale variation and background noise, and also generating high-resolution density maps.
We incorporating the attention model into the network by adding a path of multi-scale feature fusion as attention map path, which makes the proposed method focus on head regions for the density map regression task, therefore improving its robustness to complex backgrounds and diverse crowd distributions.
We propose a novel multi-task training loss, combining Euclidean loss and attention map loss to make network convergence faster and better performance. The former loss minimizes the pixel-wise error and the latter one focus on locating the head regions.

Architecture:

在这里插入图片描述
The architecture is made up of three paths:

feature map extractor (FME)
density map path (DMP)
attention map path (AMP)

FME:
Our network adopts the first 13 layers from VGG16-bn as the front-end feature map extractor (FME) to extract multi-scale feature maps with different level semantics information and different scale feature information.
The low level and small scale features can well represent the detail edge patterns which are essential for regressing the value of congested region in density map.
The high level and large scale features have useful semantics information to eliminate the background noise. So we use them together.

DMP:
We design a path of multi-scale feature fusion as density map path (DMP) to combine these two advantages of different level features. Another advantage of multi-scale feature fusion structure is that we can gain the high-resolution density map by upsample operation.
Another issue is that DMP network does regression for every density map pixel, while do not explicitly give more attention to head regions during training and testing. That is to say, there are high background noise.

AMP:
To further tackle the high background noise issue. It has the same structure to learn a probability map that indicates high probability head regions. This attention map is used to suppress non-head regions in the last feature maps of DMP, which makes DMP focus on learning the regression task only in high probability head regions.
We also introduce a multi-task loss by adding a attention map loss for AMP, which improves the network performance with more explicit supervised signal.

Loss Function:

**
The Euclidean loss is used to measure estimation error at pixel level, which is defined as follows:
在这里插入图片描述

The attention map loss function is a binary class entropy, defined as:

在这里插入图片描述
α is a weighting weight that is set as 0.1 in the experiments.

Attention map groundtruth:

Based on density map groundtruth, we continue use Gaussian kernel to compute attention map groundtruth as follows:
在这里插入图片描述
where th is the threshold set as 0.001 in our experiments. With equation7, 8, we obtain a binary attention map groundtruth in order to guide the AMP to focus on the head regions and also the surround places. In experiment, we set µ = 3 and ρ = 2 for generating attention map groundtruth.

目睹闰土刺猹的瓜

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
2
评论
人群计数：SFANet--Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

There are two questions in crowd counting:large head scale variations caused by camera perspective and diverse crowd distributions with high background noisy scenes.Faced with such problems, we prop...
复制链接

扫一扫

专栏目录