我们在RTX_2080_Ti上测试了ChannelsLast优化的效果,测试结果是完全没有效果、而且反而时间会增大;也在Slack上咨询了一下MosaicML的开发者,
测试记录:
第一次在Colab测试时,他们回复说:
Daya Khudia: The gpu used with colab is Tesla K80 (torch.cuda.get_device_name(0)) and channels_last is helpful on GPUs with tensor cores (i.e., Volta or newer).
也就是说,只有在Volta以上架构的GPU上,apply_channels_last()
才会有效果;
第二次在RTX_2080_Ti上测试,
Daya Khudia: I tried this on NVIDIA GeForce RTX 3080 and it seems channels_last is slower for this model. This model uses mostly depthwise and pointwise (1x1) convolutions and channels_last doesn’t always perform better than non-channels last.
This model takes 22 secs for the train part without channels_last and 35 secs with channels_last.
One of the individual 1x1 conv takes .17 ms without channels_last and .40 ms with channels_last so definitely 1x1 convs are slower with channels_last.
其实就是对shufflenet_v2_x1_0
完全没有加速效果。