network compression意义:一些设备,比如可穿戴设备资源(内存和算力)有限,因此需要compress这些network to fit these devices
Q:why larger network is easier to optimize?
A plausible explanation is that a large network contains many small networks. Every group of initial parameters is a lottery ticket. If you use small network, you have little tickets and the change for best result is small. If you use large network, you have more ticket and the probability for optimization is large.
Another hypothesis: small network also can get good results, which is be diametrically opposed to lottery ticket hypothesis
GPU only speed matrix computation. Irrugular structure is hard to compute. So in practice, we set these pruned weights to zero. But if you use this, your real network is still large
Using pruning presents no gain for GPU speed up.