人群计数:DRSAN--Crowd Counting using Deep Recurrent Spatial-Aware Network

Goal:

estimating the total number of people in unconstrained crowded scenes.

**

Highlight:

**
Now there are two difficulties in the crowd counting, one is the variation of crowd scale, the other is camera perspective that causes huge appearance variations in people’s scales and rotations. In this paper, we solve the two questions.
we propose a unified neural network framework, named Deep Recurrent Spatial-Aware Network, which adaptively addresses the two issues in a learnable spatial transform module with a region-wise refinement process.

**Specifically: ** our framework incorporates a Recurrent Spatial-Aware Refinement (RSAR) module iteratively conducting two components:

i) a Spatial Transformer Network that dynamically locates an attentional region from the crowd density map and transforms it to the suitable scale and rotation for optimal crowd estimation;

ii) a Local Refinement Network that refines the density map of the attended region with residual learning.

**

Contribution:

**
• We provide an adaptive mode to simultaneously handle the effect of both scale and rotation variation by introducing a spatial transform module for crowd counting. To the best of our knowledge, we are the first to address the issue of the rotation variation on this task.

• We propose a novel deep recurrent spatial-aware network framework to recurrently select a region (with learnable scale and rotation parameters) from an initial density map for refinement, dependent on feature warping and residual learning.

**

Architecture:

**
including a Global Feature Embedding (GFE) module and a Recurrent Spatial-Aware Refinement (RSAR) module. Specifically, the GFE module takes the whole image as input for global feature extraction, which is further used to estimate an initial crowd density map. And then the RSAR module is applied to iteratively locate image regions with a spatial transformer-based attention mechanism and refine the attended density map region with residual learning

在这里插入图片描述

There are two models in the architecture: GFE and RSAR

Global Feature Embedding
Goal: transform the input image into high-dimensional feature maps, which is further used to generate an initial crowd density map of the image.

GFE module is composed of three columns of CNNs, each of which has seven convolutional layers with different kernel sizes and channel numbers as well as three max-pooling layers.
Given an image I, we extract its global feature g by feeding it into GFM and concatenating the outputs of all the columns. After obtaining the global feature g, we generate the initial crowd density map M0 of image I using a convolutional layer with a kernel size of 1 × 1.
在这里插入图片描述

Recurrent Spatial-Aware Refinement:

Recurrent Attentive Refinement (RSAR) module to iteratively refine the crowd density map. Our proposed RSAR consists of two alternately performed components:
i) a Spatial Transformer Network dynamically locates an attentional region from the crowd density map;
ii) a Local Refinement Network refines the density map of the selected region with residual learning.
A high-quality crowd density map with accurately estimated crowd number would be acquired after a refinement of n iterations.

The two architecture:
在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值