CSRNet浅析

最新推荐文章于 2023-06-18 21:49:29 发布

judgechen1997

最新推荐文章于 2023-06-18 21:49:29 发布

阅读量3.1k

点赞数 10

分类专栏： crowd counting 文章标签： DL CV Crowd counting

本文链接：https://blog.csdn.net/judgechen1997/article/details/100893647

版权

crowd counting 专栏收录该内容

5 篇文章 1 订阅

订阅专栏

细看了CSRNet的代码，大致看了下paper，注意到一些细节

image.py脚本是对数据的augmentation

import random
from PIL import Image,ImageFilter,ImageDraw
import numpy as np
import h5py
import cv2


def load_data(img_path, train=True):
    gt_path = img_path.replace('.jpg', '.h5').replace('images', 'ground_truth')
    img = Image.open(img_path).convert('RGB')
    gt_file = h5py.File(gt_path)
    target = np.asarray(gt_file['density'])
    # data augmentation while training
    # if False:  # excuse me ???
    if train:
        crop_size = (img.size[0]/2, img.size[1]/2)
        # if random.randint(0, 9) <= -1:  # what ??? maybe < 4
        if random.randint(0, 9) <= 4:
            # would be four quarters of the image without overlapping in a certain probability
            dx = int(random.randint(0, 1)*img.size[0]*1./2)
            dy = int(random.randint(0, 1)*img.size[1]*1./2)
        else:  # Execute this branch, crop a image with half height and width randomly
            dx = int(random.random()*img.size[0]*1./2)
            dy = int(random.random()*img.size[1]*1./2)

        img = img.crop((dx, dy, crop_size[0]+dx, crop_size[1]+dy))
        target = target[dy:crop_size[1]+dy, dx:crop_size[0]+dx]
        
        if random.random() > 0.8:  # Mirror flip with a probability of 0.2
            target = np.fliplr(target)
            img = img.transpose(Image.FLIP_LEFT_RIGHT)

    target = cv2.resize(target, (target.shape[1]/8, target.shape[0]/8), interpolation=cv2.INTER_CUBIC)*64
    return img, target

代码本身是有一些问题的，比如一开始写的是if False…
而且，训练模式下的第一个分支if random.randint(0, 9) <= -1显然也是执行不到的，本意应该是以一定的概率去把图片四等分取其一，或者任意crop
在这里插入图片描述

有个细节是本文的硬伤，在经过模型的三次max-poling后，分辨率降到了1/8，相应的GT与之计算损失时，也要先resize到1/8，再将整体乘以64，这样才能保证每个人头周围积分之后约为1，代码中image.py/load_data也有体现。
这样无论是output还是GT，精度损失都很严重，所以后续的论文也是FPN思路，至少要上采样，保证输出高精度density map，与高质量GT计算loss。

（开始还担心会有某个地方像素点将要超过1的风险，大概估算了下一般不会发生，假设600x600的图片上有2500个人。平均人间距为12， $\sigma=12*0.3=3.6$ ,代入到二维正态分布的峰值：
$1/(2*\pi*\sigma^2)=0.012$ , $64 * 0.012 = 0.768$ ，这大概已经是max的上限了）

judgechen1997

关注

10
点赞
踩
13

收藏

觉得还不错? 一键收藏
1
评论
CSRNet浅析

细看了CSRNet的代码，大致看了下paper，注意到一些细节image.py脚本是对数据的augmentationimport randomfrom PIL import Image,ImageFilter,ImageDrawimport numpy as npimport h5pyimport cv2def load_data(img_path, train=True): ...
复制链接

扫一扫