在阅读DeepLab时,发现paper中首先介绍了PASCAL VOC 2012数据集,然后又说使用一个augment后的dataset来进行训练。论文中是这样说的:
The proposed models are evaluated on the PASCAL VOC 2012 semantic segmentation benchmark [1] which contains 20 foreground object classes and one background class. The original dataset contains 1, 464 (train), 1, 449 (val ), and 1, 456 (test) pixel-level annotated images. We augment the dataset by the extra annotations provided by [76], resulting in 10, 582 (trainaug) training images. The performance is measured in terms of pixel intersection-over-union averaged across the 21 classes (mIOU).
接下来看一下这个original dataset和augment the datase的区别。
一、PASCAL VOC 2012 segmentation
VOC 2012官方已经说的非常清楚,1464 (train), 1449 (val), and 1456 (test)
.
详细的分布如下: