使用composer进行算子优化的测试笔记

最新推荐文章于 2023-03-06 15:36:40 发布

songyuc

最新推荐文章于 2023-03-06 15:36:40 发布

阅读量435

点赞数

文章标签： composer 深度学习 python

本文链接：https://blog.csdn.net/songyuc/article/details/126609073

版权

我们在RTX_2080_Ti上测试了ChannelsLast优化的效果，测试结果是完全没有效果、而且反而时间会增大；也在Slack上咨询了一下MosaicML的开发者，
测试记录：
第一次在Colab测试时，他们回复说：

Daya Khudia: The gpu used with colab is Tesla K80 (torch.cuda.get_device_name(0)) and channels_last is helpful on GPUs with tensor cores (i.e., Volta or newer).

也就是说，只有在Volta以上架构的GPU上，apply_channels_last()才会有效果；
第二次在RTX_2080_Ti上测试，

Daya Khudia: I tried this on NVIDIA GeForce RTX 3080 and it seems channels_last is slower for this model. This model uses mostly depthwise and pointwise (1x1) convolutions and channels_last doesn’t always perform better than non-channels last.
This model takes 22 secs for the train part without channels_last and 35 secs with channels_last.
One of the individual 1x1 conv takes .17 ms without channels_last and .40 ms with channels_last so definitely 1x1 convs are slower with channels_last.

其实就是对shufflenet_v2_x1_0完全没有加速效果。