pytorch模型compression-prune,sparsity,quantization
sparsity
Pruning individual weight elements is called element-wise pruning, and it is also sometimes referred to as fine-grained pruning
Filter Pruner
相连卷积层变换
conv + bn + conv 变换
non-serial data-dependencies
element-wise-summed and fed into a third Convolution
channel pruner
mobile_net quantization 实验数据
图片大小 27 x 33 pixel
model | size | top1(accuracy) | inference time |
---|---|---|---|
未量化模型 | 9.2M | 99.1 | 7.46 ms(cpu) / 4.85 ms(gpu) |
量化后模型 | 3.1M | 98.9 | 1.75 ms(gpu) |
半精度模型 | 4.7M | 99.1 | 5.28 ms(gpu) |
图片大小 136 x 91 pixel
model | size | top1(accuracy) | inference time |
---|---|---|---|
未量化模型 | 9.2M | 99.1 | 16 ms(cpu) / 4.83 ms(gpu) |
量化后模型 | 3.1M | 98.9 | 5.99 ms(gpu) |
半精度模型 | 4.7M | 99.1 | 5.13 ms(gpu) |
图片大小 620 x 827 pixel
model | size | top1(accuracy) | inference time |
---|---|---|---|
未量化模型 | 9.2M | 99.1 | 480 ms(cpu) / 21.2 ms(gpu) |
量化后模型 | 3.1M | 98.9 | 123 ms(gpu) |
半精度模型 | 4.7M | 99.1 | 17.6 ms(gpu) |