【人群密度估计】-论文CCTrans: Simplifying and Improving Crowd Counting with Transformer 以及工程落地记录

weixin_40293999

已于 2023-08-17 18:51:07 修改

阅读量966

点赞数

分类专栏：深度学习 linux pytorch 文章标签： transformer 深度学习人工智能

于 2023-07-17 17:29:53 首次发布

本文链接：https://blog.csdn.net/weixin_40293999/article/details/131769507

版权

深度学习同时被 3 个专栏收录

93 篇文章 21 订阅

订阅专栏

linux

55 篇文章 2 订阅

订阅专栏

pytorch

19 篇文章 2 订阅

订阅专栏

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

前言

CCTrans: Simplifying and Improving Crowd Counting with Transformer

论文地址：https://arxiv.org/pdf/2109.14483v1.pdf

代码：https://github.com/wfs123456/CCTrans

该论文使用的Twins-large模型权重链接已上传百度云：

链接：https://pan.baidu.com/s/1QSTA7P-81_uHD6wqzk93Og
提取码：8wd4

1. 论文

abstract：
最近用于人群计数的方法是基于卷积神经网络(CNN)的，它具有很强的提取局部特征的能力。但是，由于有限的接受域，CNN在建模全球背景方面天生就失败了。但是，transformer可以很容易地为全局上下文建模。在本文中，我们提出了一种称为CCTrans的简单方法来简化设计pipeline。具体来说，我们利用金字塔视觉transformer主干来捕获全局人群信息，即金字塔特征聚合(PFA)模型，以结合低级和高级特征,一个有效的回归头与多尺度扩张卷积（MDC）来预测密度图。此外，我们还为我们的管道定制了损失函数。在没有花哨功能的情况下，大量的实验表明，我们的方法在弱监督和完全监督的人群计数的几个基准上取得了新的最先进的结果。此外，我们目前在NWPU-Crowd的排行榜上排名为No. 1。
introduction：
从图像生成的人群计数中仍然存在两个挑战:
1）在相机附近的人群有更大的尺度和更低的密度；
2）在原理相机镜头附近人群有更小的尺度和更高的密度；
transformer的一大优点是能够捕获长上下文依赖关系并享受全局接受域。
本论文的贡献有4各方面：
1.To simplify the pipelines, we utilize the transformer to
construct a simple but high-performance crowd counting
model called CCTrans, which can extract semantic features with global context information.
构建了CCTrans，可以从全局上下文信息中提取语义特征。
2. We design an effective feature aggregation block and a simple regression head with multi-scale receptive fields. With two simple designed blocks, we can strengthen the captured features and get accurate regression results.
我们设计了一个有效的特征聚合块和一个简单的多尺度接受域回归头。通过两个简单的块设计，我们可以增强捕获的特征，得到准确的回归结果。
3. To further strengthen our method, we tailor the loss functions in both weakly and fully-supervised manners for our method. Specifically, we utilize a smooth weighted loss for the former and smooth L1 for the latter.
为了进一步加强我们的方法，我们以弱监督和全监督的方式为我们的方法定制损失函数。具体来说，我们对前者使用平滑加权损失，对后者使用平滑L1。
4. Extensive experiments across five popular benchmarks
including UCF CC 50, ShanghaiTech Part A and Part B,UCF QNRF, and NWPU-Crowd, show that our method
achieves new state-of-the-art results under both weakly and fully-supervised settings. Moreover, CCTrans ranks
No.1 on the leaderboard of NWPU-Crowd.
在很多数据集上表现的很好。

网络结构：

在这里插入图片描述
首先将输入图像变换成一维序列，然后将输出图像送入transformer-based骨干。我们采用了一个金字塔转换器(Chu et al. 2021a)，通过各种方式捕捉全局上下文，将采样阶段。每个阶段的输出被重构成二维特征图，用于金字塔特征聚合。最后,使用一个简单的多尺度接受域回归头对最终结果进行回归。我们支持两种方式监督。1）对于完全监督的方式，CCTrans回归密度图。2）对于弱监管方式，CCTrans总结道将预测密度图的所有像素值作为计数回归的人群数。
Transformer Backbone：
a pyramid transformer backbone Twins 的特点：本地和全局交替的关注方案，拥有本地和全聚德感受野，捕获长短期关系。

2. 构建代码

----20230717----待续
代码：https://github.com/wfs123456/CCTrans

(py39_torch1.10.1) [jianming_ge@localhost CCTrans-CrownCount]$ python test_image_patch.py
Traceback (most recent call last):
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/test_image_patch.py", line 6, in <module>
    from Networks import ALTGVT
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/Networks/ALTGVT.py", line 11, in <module>
    from timm.models.layers import DropPath, to_2tuple, trunc_normal_
ModuleNotFoundError: No module named 'timm'
(py39_torch1.10.1) [jianming_ge@localhost CCTrans-CrownCount]$
(py39_torch1.10.1) [jianming_ge@localhost CCTrans-CrownCount]$
(py39_torch1.10.1) [jianming_ge@localhost CCTrans-CrownCount]$ pip install timm
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple/, https://pypi.ngc.nvidia.com

报错：No module named ‘timm’
安装：pip install timm

继续运行：

(py39_torch1.10.1) [jianming_ge@localhost CCTrans-CrownCount]$ python test_image_patch.py

usage: test_image_patch.py [-h] [--device DEVICE] [--batch-size BATCH_SIZE]
                           [--crop-size CROP_SIZE] --model-path MODEL_PATH
                           [--data-path DATA_PATH] [--dataset DATASET]
                           [--pred-density-map-path PRED_DENSITY_MAP_PATH]
test_image_patch.py: error: the following arguments are required: --model-path

要下载权重文件，哪索性就看看 test_image_patch.py 都需要前置哪些东西？

parser = argparse.ArgumentParser(description='Test ')
parser.add_argument('--device', default='0', help='assign device')
parser.add_argument('--batch-size', type=int, default=8,
                        help='train batch size')
parser.add_argument('--crop-size', type=int, default=256,
                    help='the crop size of the train image')
parser.add_argument('--model-path', type=str, required=True,
                    help='saved model path')
parser.add_argument('--data-path', type=str,
                    help='dataset path')
parser.add_argument('--dataset', type=str, default='sha',
                    help='dataset name: qnrf, nwpu, sha, shb, custom')
parser.add_argument('--pred-density-map-path', type=str, default='inference_results',
                    help='save predicted density maps when pred-density-map-path is not empty.')

所以命令应该是:

python test_image_patch.py --device 0 --batch-size 8 --model-path xxxx   --data-path  xxxx --pred-density-map-path xxxx

在这里插入图片描述
权重在git仓库的这个位置下载，上载到项目目录的 weights/best_model_mae 54.36.pth
执行：

python test_image_patch.py --device 0 --batch-size 1 --model-path weights/best_model_mae\ 54.36.pth  --data-path example_images

报错：

(py39_torch1.10.1) [jianming_ge@localhost CCTrans-CrownCount]$ python test_image_patch.py --device 0 --batch-size 1 --model-path weights/best_model_mae\ 54.36.pth  --data-path example_images 
number of img [val]: 0
Traceback (most recent call last):
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/test_image_patch.py", line 127, in <module>
    test(args, isSave= True)
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/test_image_patch.py", line 60, in test
    model = ALTGVT.alt_gvt_large(pretrained=True)
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/Networks/ALTGVT.py", line 554, in alt_gvt_large
    model = ALTGVT(
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/Networks/ALTGVT.py", line 501, in __init__
    super(ALTGVT, self).__init__(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads,
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/Networks/ALTGVT.py", line 488, in __init__
    super(PCPVT, self).__init__(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads,
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/Networks/ALTGVT.py", line 422, in __init__
    super(CPVTV2, self).__init__(img_size, patch_size, in_chans, num_classes, embed_dims, num_heads, mlp_ratios,
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/Networks/ALTGVT.py", line 314, in __init__
    _block = nn.ModuleList([
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/Networks/ALTGVT.py", line 315, in <listcomp>
    block_cls(
  File "/home/jianming_ge/workplace/zhongwaiyun/CCTrans-CrownCount/Networks/ALTGVT.py", line 241, in __init__
    super(GroupBlock, self).__init__(dim, num_heads, mlp_ratio, qkv_bias, drop, attn_drop,
  File "/home/jianming_ge/miniconda3/envs/py39_torch1.10.1/lib/python3.9/site-packages/timm/models/vision_transformer.py", line 141, in __init__
    self.ls1 = LayerScale(dim, init_values=init_values) if init_values else nn.Identity()
  File "/home/jianming_ge/miniconda3/envs/py39_torch1.10.1/lib/python3.9/site-packages/timm/models/vision_transformer.py", line 107, in __init__
    self.gamma = nn.Parameter(init_values * torch.ones(dim))
TypeError: unsupported operand type(s) for *: 'type' and 'Tensor'

暂时搁置，这个代码的git仓库，是作者根据另外一个git仓库修改的，为的是复现论文。

3.另外一个git

git来源：https://github.com/cvlab-stonybrook/DM-Count
解压后运行，报错

(py39_torch1.10.1) [jianming_ge@localhost DM-Count]$ python demo.py 
Traceback (most recent call last):
  File "/home/jianming_ge/workplace/zhongwaiyun/DM-Count/demo.py", line 3, in <module>
    import gdown
ModuleNotFoundError: No module named 'gdown'
(py39_torch1.10.1) [jianming_ge@localhost DM-Count]$ pip install gdown

pip install gdown

继续报错，需要安装gradio
pip install gradio

再运行还报错：
在这里插入图片描述
这应该是去下载权重文件了，没有权重文件。
手动下载下。

import torch
from models import vgg19
import gdown
from PIL import Image
from torchvision import transforms
import gradio as gr
import cv2
import numpy as np
import scipy

model_path = "pretrained_models/model_qnrf.pth"
url = "https://drive.google.com/uc?id=1nnIHPaV9RGqK8JHL645zmRvkNrahD9ru"
gdown.download(url, model_path, quiet=False)

device = torch.device('cpu')  # device can be "cpu" or "gpu"

model = vgg19()
model.to(device)
model.load_state_dict(torch.load(model_path, device))
model.eval()


def predict(inp):
    inp = Image.fromarray(inp.astype('uint8'), 'RGB')
    inp = transforms.ToTensor()(inp).unsqueeze(0)
    inp = inp.to(device)
    with torch.set_grad_enabled(False):
        outputs, _ = model(inp)
    count = torch.sum(outputs).item()
    vis_img = outputs[0, 0].cpu().numpy()
    # normalize density map values from 0 to 1, then map it to 0-255.
    vis_img = (vis_img - vis_img.min()) / (vis_img.max() - vis_img.min() + 1e-5)
    vis_img = (vis_img * 255).astype(np.uint8)
    vis_img = cv2.applyColorMap(vis_img, cv2.COLORMAP_JET)
    vis_img = cv2.cvtColor(vis_img, cv2.COLOR_BGR2RGB)
    return vis_img, int(count)


title = "Distribution Matching for Crowd Counting"
desc = "A demo of DM-Count, a NeurIPS 2020 paper by Wang et al. Outperforms the state-of-the-art methods by a " \
       "large margin on four challenging crowd counting datasets: UCF-QNRF, NWPU, ShanghaiTech, and UCF-CC50. " \
       "This demo uses the QNRF trained model. Try it by uploading an image or clicking on an example " \
       "(could take up to 20s if running on CPU)."
examples = [
    ["example_images/3.png"],
    ["example_images/2.png"],
    ["example_images/1.png"],
]
inputs = gr.inputs.Image(label="Image of Crowd")
outputs = [gr.outputs.Image(label="Predicted Density Map"), gr.outputs.Label(label="Predicted Count")]
gr.Interface(fn=predict, inputs=inputs, outputs=outputs, title=title, description=desc, examples=examples,
             allow_flagging=False).launch()

demo.py 已经给地址了。
model_path = “pretrained_models/model_qnrf.pth”
url = “https://drive.google.com/uc?id=1nnIHPaV9RGqK8JHL645zmRvkNrahD9ru”

在这里插入图片描述
而且已经存在了，所以屏蔽这两行。

再次运行。
可以看到它去下载vgg的权重了。

cp pretrained_models/vgg19-dcbb9e9d.pth  /home/jianming_ge/.cache/torch/hub/checkpoints/

修改一下demo.py,

import torch
from models import vgg19
from PIL import Image
from torchvision import transforms
import cv2
import numpy as np
import scipy
import os
import time
model_path = "pretrained_models/model_nwpu.pth"
# url = "https://drive.google.com/uc?id=1nnIHPaV9RGqK8JHL645zmRvkNrahD9ru"
# gdown.download(url, model_path, quiet=False)

device = torch.device('cuda')  # device can be "cpu" or "gpu"

model = vgg19()
model.to(device)
model.load_state_dict(torch.load(model_path, device))
model.eval()


def predict(inp):
    inp = Image.fromarray(inp.astype('uint8'), 'RGB')
    inp = transforms.ToTensor()(inp).unsqueeze(0)
    inp = inp.to(device)
    with torch.set_grad_enabled(False):
        outputs, _ = model(inp)
    count = torch.sum(outputs).item()
    vis_img = outputs[0, 0].cpu().numpy()
    # normalize density map values from 0 to 1, then map it to 0-255.
    vis_img = (vis_img - vis_img.min()) / (vis_img.max() - vis_img.min() + 1e-5)
    vis_img = (vis_img * 255).astype(np.uint8)
    vis_img = cv2.applyColorMap(vis_img, cv2.COLORMAP_JET)
    vis_img = cv2.cvtColor(vis_img, cv2.COLOR_BGR2RGB)
    return vis_img, int(count)


root = "./example_images/"
examples = os.listdir(root)


for k, i in enumerate(examples):
    t0 = time.time()
    img = cv2.imread(root + i)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    vis_img, count = predict(img)
    cv2.imwrite("ni" + i, vis_img)
    print(i,"       ", count, time.time()-t0)

问题处理完成，
使用的样本数据
在这里插入图片描述

model:
model_sh_A.pth
1.png 167 0.20009064674377441
2.png 108 0.07396221160888672
3.png 973 0.06588959693908691
4.png 1 0.01804351806640625
5.png 5 0.036794185638427734
6.png 5 0.03268027305603027

model_sh_B.pth
1.png 28 0.20793676376342773
2.png 11 0.08083581924438477
3.png 59 0.06795501708984375
4.png 1 0.029388427734375
5.png 0 0.03298187255859375
6.png 4 0.03132462501525879

model_qnrf.pth
1.png 181 0.20715713500976562
2.png 451 0.08031249046325684
3.png 789 0.05513763427734375
4.png 2 0.031174182891845703
5.png 26 0.03781270980834961
6.png 7 0.0357813835144043

model_nwpu.pth
1.png 199 0.2031877040863037
2.png 314 0.06879138946533203
3.png 1321 0.054349422454833984
4.png 2 0.013224124908447266
5.png 36 0.025160789489746094
6.png 7 0.0295562744140625

在这里插入图片描述

model_nwpu.pth 和 model_qnrf.pth 的效果是不错的。
ref:https://zhuanlan.zhihu.com/p/417535737

weixin_40293999

关注

0
点赞
踩
10

收藏

觉得还不错? 一键收藏
1
评论
【人群密度估计】-论文CCTrans: Simplifying and Improving Crowd Counting with Transformer 以及工程落地记录

论文地址：https://arxiv.org/pdf/2109.14483v1.pdf代码：https://github.com/wfs123456/CCTrans该论文使用的Twins-large模型权重链接已上传百度云：链接：https://pan.baidu.com/s/1QSTA7P-81_uHD6wqzk93Og提取码：8wd4。
复制链接

扫一扫