Now, there is a serious problem, the scarce data led many methods suffer from over-fitting to a different extent.
**
contribution:
**
-
We are the first to develop a data collector and labeler for crowd counting, which can automatically collect and annotate images without any labor costs. By using them, we create the first large-scale, synthetic and diverse crowd counting dataset.
-
We present a pretrained scheme to facilitate the original method’s performance on the real data, which can more effectively reduce the estimation errors compared with random initialization and ImageNet model. Further, through the strategy, our proposed SFCN achieves the state-of-the-art results.
-
We are the first to propose a crowd counting method via domain adaptation, which does not use any label of the real data. By our designed SE Cycle GAN, the domain gap between the synthetic and real data can be significantly reduced. Finally, the proposed method outperforms the two baselines.
**
GCC dataset:
**
The full name of GCC is GTA5 Crowd Counting. It has four highlights:
- free collection and annotation
- larger data volume and higher resolution
- more diversified scenes
- more accurate annotations
The process of getting a image for training:
a) select a location and setup the cameras
b) segment Region of interest (ROI) for crowd
c) set weather and time.
Place persons:
a) create persons in the ROI and get the head positions
b) obtain the person mask from stencil
c) integrate multiple images into one image
d) remove the positions of occluded heads.
How to use GCC ?
- Random splitting the training set and testing set.
- Cross-camera splitting: as for a specific location, one surveillance camera is randomly selected for testing and the others for training.
- Cross-location splitting: we randomly choose 75/25 locations for training/testing.
This table shows the advantage of using GCC to pretrain their model:
**
generating density map:
**
There are two ways to estimate the destiny map:
1. superised crowd counting: pretrained GCC model on finetuning real dataset.
2. Crowd Counting via Domain Adaptation: learning mapping between the synthetic domain S and the real-world domain R, then training the SFCN just on GCC.
The relationship of them is shown in below:
superrised crowd counting:
A spatial encoder via a sequence of convolution on the four directions (down, up, left-to-right and right-to-left). After the spatial encoder, a regression layer is added, which directly outputs the density map with input’s 1/8 size.
We design a spatial FCN (SFCN) to produce the density map, which adopt VGG-16 or ResnNet-101 as the backbone. We modify the stride size to 1 in conv4 x of ResNet-101 backbone, which makes conv4 x output the feature maps with 1/8 size of the
input image.
Crowd Counting via Domain Adaptation:
Propose a crowd counting method via domain adaptation learns specific patterns or features from the synthetic data and transfers them to the real world.
To be specific, we present a SSIM Embedding (SE) Cycle GAN to transform the synthetic image to the photo-realistic image. Then we will train a SFCN on the translated data. No finetune on the real dataset. Then we get a satisfactory result: