For most of the quantinazed network,the quantinazied activation of each hidden layer is bounded to a very limited range[dorefa,wage,TRAINED TERNARY QUANTIZATION…]. which implies the range of mean and variance will also be very limited, as shown in figure.That makes estimate mean and variance with high percision possible. We propose a simple but useful sample method ‘dignoal-scatter sample’ DSS to estmate mean and variance of a batch of data and the workload of calculation can be reduced to 1/8 or 1/16 without significant accuracy loss. Here we only aim at convolution network.
DSS requires a diagonal matrix with size K and scatter rate S to establish a sample convolution kernel. If S is 0, the sample convolution kernel is a diagonal matrix with all values on diagonal are 1s and 0s for others. If S is not 0, that means the number of 0s is S times more than the number of 1s on diagonal of sample convolution kernel.The illustrition of DSS is in figure2
we test our method on CIFAR-10. the resluts show that with proper sample it’s possible to reduce the computational efforts without any significant accuracy loss. We found one of the most balanced method between accuracy and sample rate is to set kernersize = image size and scatter_rate = 0.
the test result is in figure 3
According to [norm matters], compared with L2 batch normlization,L1 batch normlization not only eliminates the computational efforts but also make Batch Nomrllizaion more suitable for low-precision training, We