Network Slimming For YOLOv2/v3
论文:Learning Efficient Convolutional Networks through Network Slimming
Github(官方开源):https://github.com/liuzhuang13/slimming
其他实现:
- https://github.com/talebolano/yolov3-network-slimming
- https://github.com/Lam1360/YOLOv3-model-pruning
- 等等(此处只列举对yolov2及v3的试验,通用的实现比较多)
本次试验参考的是第一个,仅仅针对yolov2大模型(backbone darknet19),v3后续试验后补充,后续会补充对Network Slimming 通道剪枝算法的学习笔记!
一、YOLOv2测试
测试数据集:在自己的数据集上操作(行人车辆数据集),实现展示(进行中),下面是网络结构、模型大小及性能差异。
1、模型结构
Before slimming
layer filters size input output
0 conv 32 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 32
1 max 2 x 2 / 2 608 x 608 x 32 -> 304 x 304 x 32
2 conv 64 3 x 3 / 1 304 x 304 x 32 -> 304 x 304 x 64
3 max 2 x 2 / 2 304 x 304 x 64 -> 152 x 152 x 64
4 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
5 conv 64 1 x 1 / 1 152 x 152 x 128 -> 152 x 152 x 64
6 conv 128 3 x 3 / 1 152 x 152 x 64 -> 152 x 152 x 128
7 max 2 x 2 / 2 152 x 152 x 128 -> 76 x 76 x 128
8 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
9 conv 128 1 x 1 / 1 76 x 76 x 256 -> 76 x 76 x 128
10 conv 256 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 256
11 max 2 x 2 / 2 76 x 76 x 256 -> 38 x 38 x 256
12 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
13 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
14 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
15 conv 256 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 256
16 conv 512 3 x 3 / 1 38 x 38 x 256 -> 38 x 38 x 512
17 max 2 x 2 / 2 38 x 38 x 512 -> 19 x 19 x 512
18 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
19 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
20 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
21 conv 512 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 512
22 conv 1024 3 x 3 / 1 19 x 19 x 512 -> 19 x 19 x1024
23 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
24 conv 1024 3 x 3 / 1 19 x 19 x1024 -> 19 x 19 x1024
25 route 16
26 conv 64 1 x 1 / 1 38 x 38 x 512 -> 38 x 38 x 64
27 reorg / 2 38 x 38 x 64 -> 19 x 19 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 19 x 19 x1280 -> 19 x 19 x1024
30 conv 35 1 x 1 / 1 19 x 19 x1024 -> 19 x 19 x 35
31 detection
After slimming
layer filters size input output
0 conv 28 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 28
1 max 2 x 2 / 2 608 x 608 x 28 -> 304 x 304 x 28
2 conv 61 3 x 3 / 1 304 x 304 x 28 -> 304 x 304 x 61
3 max 2 x 2 / 2 304 x 304 x 61 -> 152 x 152 x 61
4 conv 117 3 x 3 / 1 152 x 152 x 61 -> 152 x 152 x 117
5 conv 61 1 x 1 / 1 152 x 152 x 117 -> 152 x 152 x 61
6 conv 127 3 x 3 / 1 152 x 152 x 61 -> 152 x 152 x 127
7 max 2 x 2 / 2 152 x 152 x 127 -> 76 x 76 x 127
8 conv 223 3 x 3 / 1 76 x 76 x 127 -> 76 x 76 x 223
9 conv 128 1 x 1 / 1 76 x 76 x 223 -> 76 x 76 x 128
10 conv 241 3 x 3 / 1 76 x 76 x 128 -> 76 x 76 x 241
11 max 2 x 2 / 2 76 x 76 x 241 -> 38 x 38 x 241
12 conv 351 3 x 3 / 1 38 x 38 x 241 -> 38 x 38 x 351
13 conv 211 1 x 1 / 1 38 x 38 x 351 -> 38 x 38 x 211
14 conv 208 3 x 3 / 1 38 x 38 x 211 -> 38 x 38 x 208
15 conv 248 1 x 1 / 1 38 x 38 x 208 -> 38 x 38 x 248
16 conv 476 3 x 3 / 1 38 x 38 x 248 -> 38 x 38 x 476
17 max 2 x 2 / 2 38 x 38 x 476 -> 19 x 19 x 476
18 conv 394 3 x 3 / 1 19 x 19 x 476 -> 19 x 19 x 394
19 conv 479 1 x 1 / 1 19 x 19 x 394 -> 19 x 19 x 479
20 conv 448 3 x 3 / 1 19 x 19 x 479 -> 19 x 19 x 448
21 conv 499 1 x 1 / 1 19 x 19 x 448 -> 19 x 19 x 499
22 conv 847 3 x 3 / 1 19 x 19 x 499 -> 19 x 19 x 847
23 conv 672 3 x 3 / 1 19 x 19 x 847 -> 19 x 19 x 672
24 conv 778 3 x 3 / 1 19 x 19 x 672 -> 19 x 19 x 778
25 route 16
26 conv 13 1 x 1 / 1 38 x 38 x 476 -> 38 x 38 x 13
27 reorg / 2 38 x 38 x 13 -> 19 x 19 x 52
28 route 27 24
29 conv 625 3 x 3 / 1 19 x 19 x 830 -> 19 x 19 x 625
30 conv 35 1 x 1 / 1 19 x 19 x 625 -> 19 x 19 x 35
31 detection
After second slimming
layer filters size input output
0 conv 21 3 x 3 / 1 608 x 608 x 3 -> 608 x 608 x 21
1 max 2 x 2 / 2 608 x 608 x 21 -> 304 x 304 x 21
2 conv 51 3 x 3 / 1 304 x 304 x 21 -> 304 x 304 x 51
3 max 2 x 2 / 2 304 x 304 x 51 -> 152 x 152 x 51
4 conv 107 3 x 3 / 1 152 x 152 x 51 -> 152 x 152 x 107
5 conv 60 1 x 1 / 1 152 x 152 x 107 -> 152 x 152 x 60
6 conv 119 3 x 3 / 1 152 x 152 x 60 -> 152 x 152 x 119
7 max 2 x 2 / 2 152 x 152 x 119 -> 76 x 76 x 119
8 conv 215 3 x 3 / 1 76 x 76 x 119 -> 76 x 76 x 215
9 conv 126 1 x 1 / 1 76 x 76 x 215 -> 76 x 76 x 126
10 conv 228 3 x 3 / 1 76 x 76 x 126 -> 76 x 76 x 228
11 max 2 x 2 / 2 76 x 76 x 228 -> 38 x 38 x 228
12 conv 291 3 x 3 / 1 38 x 38 x 228 -> 38 x 38 x 291
13 conv 193 1 x 1 / 1 38 x 38 x 291 -> 38 x 38 x 193
14 conv 188 3 x 3 / 1 38 x 38 x 193 -> 38 x 38 x 188
15 conv 233 1 x 1 / 1 38 x 38 x 188 -> 38 x 38 x 233
16 conv 449 3 x 3 / 1 38 x 38 x 233 -> 38 x 38 x 449
17 max 2 x 2 / 2 38 x 38 x 449 -> 19 x 19 x 449
18 conv 356 3 x 3 / 1 19 x 19 x 449 -> 19 x 19 x 356
19 conv 472 1 x 1 / 1 19 x 19 x 356 -> 19 x 19 x 472
20 conv 423 3 x 3 / 1 19 x 19 x 472 -> 19 x 19 x 423
21 conv 499 1 x 1 / 1 19 x 19 x 423 -> 19 x 19 x 499
22 conv 361 3 x 3 / 1 19 x 19 x 499 -> 19 x 19 x 361
23 conv 285 3 x 3 / 1 19 x 19 x 361 -> 19 x 19 x 285
24 conv 259 3 x 3 / 1 19 x 19 x 285 -> 19 x 19 x 259
25 route 16
26 conv 11 1 x 1 / 1 38 x 38 x 449 -> 38 x 38 x 11
27 reorg / 2 38 x 38 x 11 -> 19 x 19 x 44
28 route 27 24
29 conv 117 3 x 3 / 1 19 x 19 x 303 -> 19 x 19 x 117
30 conv 35 1 x 1 / 1 19 x 19 x 117 -> 19 x 19 x 35
31 detection
二、性能差异对比
yolov2-darknet19 --original
class_id = 0, name =person, ap = 89.36 %
class_id = 1, name = car, ap = 90.27 %
for thresh = 0.25, precision = 0.90, recall = 0.93, F1-score = 0.91
for thresh = 0.25, average IoU = 73.99 %
mean average precision (mAP) = 0.898142, or 89.81 %
yolov2-darknet19 --network slimming
class_id = 0, name = person, ap = 90.15 %
class_id = 1, name = car, ap = 90.35 %
for thresh = 0.25, precision = 0.90, recall = 0.95, F1-score = 0.92
for thresh = 0.25, average IoU = 76.74 %
mean average precision (mAP) = 0.902466, or 90.25 %
反而提升了:):):)
yolov2-darknet19 --network slimming second
class_id = 0, name = person, ap = 80.00 %
class_id = 1, name = car, ap = 88.22 %
for thresh = 0.25, precision = 0.78, recall = 0.89, F1-score = 0.83
for thresh = 0.25, average IoU = 61.19 %
mean average precision (mAP) = 0.841116, or 84.11 %
三、模型大小对比
原始模型:192MB
first剪枝以后:97MB(模型压缩比:1.97938)
second剪枝以后:37.3MB(5.1474)
三、Inference耗时对比(GTX 1050Ti)
Input dim: 640*480 Loop times:100
Original
MAX: 86.39 ms
MIN: 69.282 ms
AVE: 71.5727 ms
First Networking Slimming
MAX: 64.965 ms
MIN: 53.079 ms
AVE: 55.2884 ms
Second Network Slimming
MAX: 49.047 ms
MIN: 41.019 ms
AVE: 42.8398 ms