[论文笔记 CVPR2020]Attention Scaling for Crowd Counting

最新推荐文章于 2022-07-18 16:12:53 发布

bridgeqiqi

最新推荐文章于 2022-07-18 16:12:53 发布

阅读量1.8k

点赞数 1

分类专栏：论文笔记文章标签：计算机视觉机器学习深度学习

本文链接：https://blog.csdn.net/bridgeqiqi/article/details/107435658

版权

论文笔记专栏收录该内容

3 篇文章 0 订阅

订阅专栏

[论文笔记 CVPR2020]Attention Scaling for Crowd Counting

论文地址：https://openaccess.thecvf.com/content_CVPR_2020/papers/Jiang_Attention_Scaling_for_Crowd_Counting_CVPR_2020_paper.pdf

概述摘要 Abstract

基于卷积神经网络的方法通常是将人群计数任务看做成一个回归任务，也就是说建立图像内容到密度图分布之间的一个映射关系。但是对于不同的密集程度会出现overestimate或者underestimate的问题，为了解决这个问题，提出了新的方法和网络结构。

在这里插入图片描述

模型包含两部分，DANet和ASNet。
DANet = Density Attention Network
ASNet = Attention Scaling Network

首先DANet会根据不同密集程度区域，进行密度级别的语义分割，生成不同密度级别区域的attention masks，其次ASNet会生成密度图和尺度因子（scaling factor），然后对应不同密度区域的密度图、mask、scaling factor相乘。在不同密集程度的区域，尺度因子会帮助减少estimation errors。

针对这个模型，提出新的损失函数Adaptive Pyramid Loss对模型进行优化，将density map分成若干个区域来分别计算local normalized loss然后求和得到最终的estimation loss。

动机 Motivation

观察到在高密集的区域往往会overestimate，在低密集区域会underestimate人数。

贡献 Contributions

Propose a novel attention scaling convolutional neural network(ASNet) that learns scaling factors to automatically adjust the density estimation of each corresponding sub-region, which reduces the local estimation error.
Propose a density attention network(DANet) that provides ASNet with attention masks concerning regions of different density levels.
Propose a novel adaptive pyramid loss that can ease the training bias and stengthen the generalization ability of the counting network.
Compared with other 16 newly reported state-of-the-art results, our proposed approach denmonstrates its superiority on four challenging crowd datasets.

密度等级标签的生成 Density-level ground-truth generation

pixel-wise density segmentation task
将每个pixel都匹配上一种density level，相同的density level的pixels形成该密度级别的区域region(mask)。

生成pixel-wise ground truth算法步骤

使用64*64的sliding window，统计非零区域的人数大小，求得最大值 $M a x C n t$ ，最小值 $M i n C n t$ ，平均值 $AvgCnt_{11}$ 。这样就会构成一个密度等级的阈值集合 ${MinCnt, AvgCnt_{11}, MaxCnt\}$ 。这样就可以将密度等级划分成low和high两个等级；
类似地，可以继续迭代地划分下去，在 $MinCnt, AvgCnt_{11}]$ 和 $AvgCnt_{11}, MaxCnt]$ 区间之间可以再分别计算出均值， $AvgCnt_{21}, AvgCnt_{22}$ ,得到新的阈值集合， ${MinCnt, AvgCnt_{21}, AvgCnt_{11}, AvgCnt_{22}, MaxCnt\}$ 。
这样就能得到标签用于训练DANet，在网络输出得到 $N$ 个foreground attention masks后，使用一次膨胀操作来扩大每个mask。
有overlap的情况的话，就在overlapped区域取平均值。