细看了CSRNet的代码,大致看了下paper,注意到一些细节
image.py脚本是对数据的augmentation
import random
from PIL import Image,ImageFilter,ImageDraw
import numpy as np
import h5py
import cv2
def load_data(img_path, train=True):
gt_path = img_path.replace('.jpg', '.h5').replace('images', 'ground_truth')
img = Image.open(img_path).convert('RGB')
gt_file = h5py.File(gt_path)
target = np.asarray(gt_file['density'])
# data augmentation while training
# if False: # excuse me ???
if train:
crop_size = (img.size[0]/2, img.size[1]/2)
# if random.randint(0, 9) <= -1: # what ??? maybe < 4
if random.randint(0, 9) <= 4:
# would be four quarters of the image without overlapping in a certain probability
dx = int(random.randint(0, 1)*img.size[0]*1./2)
dy = int(random.randint(0, 1)*img.size[1]*1./2)
else: # Execute this branch, crop a image with half height and width randomly
dx = int(random.random()*img.size[0]*1./2)
dy = int(random.random()*img.size[1]*1./2)
img = img.crop((dx, dy, crop_size[0]+dx, crop_size[1]+dy))
target = target[dy:crop_size[1]+dy, dx:crop_size[0]+dx]
if random.random() > 0.8: # Mirror flip with a probability of 0.2
target = np.fliplr(target)
img = img.transpose(Image.FLIP_LEFT_RIGHT)
target = cv2.resize(target, (target.shape[1]/8, target.shape[0]/8), interpolation=cv2.INTER_CUBIC)*64
return img, target
代码本身是有一些问题的,比如一开始写的是if False…
而且,训练模式下的第一个分支if random.randint(0, 9) <= -1显然也是执行不到的,本意应该是以一定的概率去把图片四等分取其一,或者任意crop
有个细节是本文的硬伤,在经过模型的三次max-poling后,分辨率降到了1/8,相应的GT与之计算损失时,也要先resize到1/8,再将整体乘以64,这样才能保证每个人头周围积分之后约为1,代码中image.py/load_data也有体现。
这样无论是output还是GT,精度损失都很严重,所以后续的论文也是FPN思路,至少要上采样,保证输出高精度density map,与高质量GT计算loss。
(开始还担心会有某个地方像素点将要超过1的风险,大概估算了下一般不会发生,假设600x600的图片上有2500个人。平均人间距为12,
σ
=
12
∗
0.3
=
3.6
\sigma=12*0.3=3.6
σ=12∗0.3=3.6,代入到二维正态分布的峰值:
1
/
(
2
∗
π
∗
σ
2
)
=
0.012
1/(2*\pi*\sigma^2)=0.012
1/(2∗π∗σ2)=0.012 ,
64
∗
0.012
=
0.768
64*0.012=0.768
64∗0.012=0.768, 这大概已经是max的上限了)