PytorchInsight
This is a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results.
This repository aims to accelarate the advance of Deep Learning Research, make reproducible results and easier for doing researches, and in Pytorch.
Including Papers (to be updated):
Attention Models
SENet: Squeeze-and-excitation Networks
SKNet: Selective Kernel Networks
CBAM: Convolutional Block Attention Module
GCNet: GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
BAM: Bottleneck Attention Module
SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks
SRMNet: SRM: A Style-based Recalibration Module for Convolutional Neural Networks
Non-Attention Models
OctNet: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution
imagenet_tricks.py: Bag of Tricks for Image Classification with Convolutional Neural Networks
Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer
Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay
mixup: Beyond Empirical Risk Minimization
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Trained Models and Performance Table
Single crop validation error on ImageNet-1k (center 224x224 crop from resized image with shorter side = 256).
classifiaction training settings for media and large models
Details
RandomResizedCrop, RandomHorizontalFlip; 0.1 init lr, total 100 epochs, decay at every 30 epochs; SGD with naive softmax cross entropy loss, 1e-4 weight decay, 0.9 momentum, 8 gpus, 32 images per gpu
Examples
ResNet50
Note
The newest code adds one default operation: setting all bias wd = 0, please refer to the theoretical analysis of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay" (to appear), thereby the training accuracy can be slightly boosted
classifiaction training settings for mobile/small models
Details
RandomResizedCrop, RandomHorizontalFlip; 0.4 init lr, total 300 epochs, 5 linear warm up epochs, cosine lr decay; SGD with softmax cross entropy loss and label smoothing 0.1, 4e-5 weight decay on conv weights, 0 weight decay on all other weights, 0.9 momentum, 8 gpus, 128 images per gpu
Examples
ShuffleNetV2
Typical Training & Testing Tips:
Small Models
ShuffleNetV2_1x
python -m torch.distributed.launch --nproc_per_node=8 imagenet_mobile.py --cos -a shufflenetv2_1x --data /path/to/imagenet1k/ \
--epochs 300 --wd 4e-5 --gamma 0.1 -c checkpoints/imagenet/shufflenetv2_1x --train-batch 128 --opt-level O0 --nowd-bn # Triaing
python -m torch.distributed.launch --nproc_per_node=2 imagenet_mobile.py -a shufflenetv2_1x --data /path/to/imagenet1k/ \
-e --resume ../pretrain/shufflenetv2_1x.pth.tar --test-batch 100 --opt-level O0 # Testing, ~69.6% top-1 Acc
Large Models
SGE-ResNet
python -W ignore imagenet.py -a sge_resnet101 --data /path/to/imagenet1k/ --epochs 100 --schedule 30 60 90 \
--gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --gpu-id 0,1,2,3,4,5,6,7 # Training
python -m torch.distributed.launch --nproc_per_node=8 imagenet_fast.py -a sge_resnet101 --data /path/to/imagenet1k/ \
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --train-batch 32 \
--opt-level O0 --wd-all --label-smoothing 0. --warmup 0 # Training (faster)
python -W ignore imagenet.py -a sge_resnet101 --data /path/to/imagenet1k/ --gpu-id 0,1 -e --resume ../pretrain/sge_resnet101.pth.tar \
# Testing ~78.8% top-1 Acc
python -m torch.distributed.launch --nproc_per_node=2 imagenet_fast.py -a sge_resnet101 --data /path/to/imagenet1k/ -e --resume \
../pretrain/sge_resnet101.pth.tar --test-batch 100 --opt-level O0 # Testing (faster) ~78.8% top-1 Acc
WS-ResNet with e-shifted L2 regularizer, e = 1e-3
python -m torch.distributed.launch --nproc_per_node=8 imagenet_fast.py -a ws_resnet50 --data /share1/public/public/imagenet1k/ \
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/es1e-3_ws_resnet50 --train-batch 32 \
--opt-level O0 --label-smoothing 0. --warmup 0 --nowd-conv --mineps 1e-3 --el2
Results of "SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks"
Note the following results (old) do not set the bias wd = 0 for large models
Classification
Model
#P
GFLOPs
Top-1 Acc
Top-5 Acc
Download1
Download2
log
Detection
Model
#p
GFLOPs
Detector
Neck
AP50:95 (%)
AP50 (%)
AP75 (%)
Download
ResNet50
23.51M
88.0
Faster RCNN
FPN
37.5
59.1
40.6
SGE-ResNet50
23.51M
88.1
Faster RCNN
FPN
38.7
60.8
41.7
ResNet50
23.51M
88.0
Mask RCNN
FPN
38.6
60.0
41.9
SGE-ResNet50
23.51M
88.1
Mask RCNN
FPN
39.6
61.5
42.9
ResNet50
23.51M
88.0
Cascade RCNN
FPN
41.1
59.3
44.8
SGE-ResNet50
23.51M
88.1
Cascade RCNN
FPN
42.6
61.4
46.2
ResNet101
42.50M
167.9
Faster RCNN
FPN
39.4
60.7
43.0
SE-ResNet101
47.28M
168.3
Faster RCNN
FPN
40.4
61.9
44.2
SGE-ResNet101
42.50M
168.1
Faster RCNN
FPN
41.0
63.0
44.3
ResNet101
42.50M
167.9
Mask RCNN
FPN
40.4
61.6
44.2
SE-ResNet101
47.28M
168.3
Mask RCNN
FPN
41.5
63.0
45.3
SGE-ResNet101
42.50M
168.1
Mask RCNN
FPN
42.1
63.7
46.1
ResNet101
42.50M
167.9
Cascade RCNN
FPN
42.6
60.9
46.4
SE-ResNet101
47.28M
168.3
Cascade RCNN
FPN
43.4
62.2
47.2
SGE-ResNet101
42.50M
168.1
Cascade RCNN
FPN
44.4
63.2
48.4
Results of "Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer"
Note that the following models are with bias wd = 0.
Classification
Model
Top-1
Download
WS-ResNet50
76.74
WS-ResNet50(e = 1e-3)
76.86
WS-ResNet101
78.07
WS-ResNet101(e = 1e-6)
78.29
WS-ResNeXt50(e = 1e-3)
77.88
WS-ResNeXt101(e = 1e-3)
78.80
WS-DenseNet201(e = 1e-8)
77.59
WS-ShuffleNetV1(e = 1e-8)
68.09
WS-ShuffleNetV2(e = 1e-8)
69.70
WS-MobileNetV1(e = 1e-6)
73.60
Results of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay"
To appear
Citation
If you find our related works useful in your research, please consider citing the paper:
@inproceedings{li2019selective,
title={Selective Kernel Networks},
author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Yang, Jian},
journal={IEEE Conference on Computer Vision and Pattern Recognition},
year={2019}
}
@inproceedings{li2019spatial,
title={Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks},
author={Li, Xiang and Hu, Xiaolin and Xia, Yan and Yang, Jian},
journal={arXiv preprint arXiv:1905.09646},
year={2019}
}
@inproceedings{li2019understanding,
title={Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer},
author={Li, Xiang and Chen, Shuo and Yang, Jian},
journal={arXiv preprint arXiv:},
year={2019}
}
@inproceedings{li2019generalization,
title={Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay},
author={Li, Xiang and Chen, Shuo and Gong, Chen and Xia, Yan and Yang, Jian},
journal={arXiv preprint arXiv:},
year={2019}
}