pytorch性能瓶颈检查

看书的时候,看到pytorch居然自带了瓶颈检查的工具:torch.utils.bottleneck
用法:

python -m torch.utils.bottleneck  待测脚本路径

示例效果:

--------------------------------------------------------------------------------
  Environment Summary
--------------------------------------------------------------------------------
PyTorch 1.7.1+cu101 DEBUG compiled w/ CUDA 10.1
Running with Python 3.8 and CUDA 10.1.243

`pip3 list` truncated output:
numpy==1.19.2
torch==1.7.1+cu101
torch-cluster==1.5.9
torch-geometric==1.6.3
torch-scatter==2.0.6
torch-sparse==0.6.9
torch-spline-conv==1.2.1
torchaudio==0.7.2
torchsummary==1.5.1
torchvision==0.8.2+cu101
--------------------------------------------------------------------------------
  cProfile output
--------------------------------------------------------------------------------
         705723930 function calls (591115631 primitive calls) in 252.079 seconds

   Ordered by: internal time
   List reduced from 960 to 15 due to restriction <15>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
114583910/48   99.432    0.000  197.735    4.119 /usr/lib/python3.8/copy.py:128(deepcopy)
  9648/48   40.553    0.004  197.735    4.119 /usr/lib/python3.8/copy.py:200(_deepcopy_list)
229172740   25.338    0.000   25.338    0.000 {method 'get' of 'dict' objects}
       24   16.446    0.685   16.446    0.685 {method 'run_backward' of 'torch._C._EngineBase' objects}
114605951   11.720    0.000   11.720    0.000 {built-in method builtins.id}
114574262   11.361    0.000   11.361    0.000 /usr/lib/python3.8/copy.py:182(_deepcopy_atomic)
117258341    9.575    0.000    9.575    0.000 {method 'append' of 'list' objects}
      576    5.353    0.009    5.353    0.009 {built-in method equal}
  1658851    4.531    0.000    4.531    0.000 {method 'uniform' of 'numpy.random.mtrand.RandomState' objects}
      100    4.132    0.041    4.132    0.041 {built-in method __new__ of type object at 0x908780}
  1005442    3.727    0.000    3.727    0.000 {method 'randint' of 'numpy.random.mtrand.RandomState' objects}
  2080353    2.508    0.000    2.508    0.000 {built-in method builtins.min}
     1918    1.864    0.001    2.046    0.001 /home3/jmh/scnet/TransformerFlow/data_builder.py:292(augment_expoential)
        1    1.658    1.658    2.691    2.691 {built-in method _pickle.load}
     9600    1.564    0.000    1.940    0.000 /home3/jmh/scnet/TransformerFlow/data_builder.py:132(refine_packet_length)


--------------------------------------------------------------------------------
  autograd profiler output (CPU mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                 EmbeddingBackward         6.68%     664.353ms         6.68%     664.353ms     664.353ms       0.000us           NaN       0.000us       0.000us             1  
          aten::embedding_backward         6.68%     664.345ms         6.68%     664.345ms     664.345ms       0.000us           NaN       0.000us       0.000us             1  
    aten::embedding_dense_backward         6.68%     664.340ms         6.68%     664.340ms     664.340ms       0.000us           NaN       0.000us       0.000us             1  
                 EmbeddingBackward         6.67%     663.371ms         6.67%     663.371ms     663.371ms       0.000us           NaN       0.000us       0.000us             1  
          aten::embedding_backward         6.67%     663.367ms         6.67%     663.367ms     663.367ms       0.000us           NaN       0.000us       0.000us             1  
    aten::embedding_dense_backward         6.67%     663.363ms         6.67%     663.363ms     663.363ms       0.000us           NaN       0.000us       0.000us             1  
                 EmbeddingBackward         6.66%     663.205ms         6.66%     663.205ms     663.205ms       0.000us           NaN       0.000us       0.000us             1  
          aten::embedding_backward         6.66%     663.200ms         6.66%     663.200ms     663.200ms       0.000us           NaN       0.000us       0.000us             1  
                 EmbeddingBackward         6.66%     663.198ms         6.66%     663.198ms     663.198ms       0.000us           NaN       0.000us       0.000us             1  
    aten::embedding_dense_backward         6.66%     663.196ms         6.66%     663.196ms     663.196ms       0.000us           NaN       0.000us       0.000us             1  
          aten::embedding_backward         6.66%     663.192ms         6.66%     663.192ms     663.192ms       0.000us           NaN       0.000us       0.000us             1  
    aten::embedding_dense_backward         6.66%     663.189ms         6.66%     663.189ms     663.189ms       0.000us           NaN       0.000us       0.000us             1  
                 EmbeddingBackward         6.66%     663.103ms         6.66%     663.103ms     663.103ms       0.000us           NaN       0.000us       0.000us             1  
          aten::embedding_backward         6.66%     663.098ms         6.66%     663.098ms     663.098ms       0.000us           NaN       0.000us       0.000us             1  
    aten::embedding_dense_backward         6.66%     663.095ms         6.66%     663.095ms     663.095ms       0.000us           NaN       0.000us       0.000us             1  
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 9.952s
CUDA time total: 0.000us

--------------------------------------------------------------------------------
  autograd profiler output (CUDA mode)
--------------------------------------------------------------------------------
        top 15 events sorted by cpu_time_total

	Because the autograd profiler uses the CUDA event API,
	the CUDA time column reports approximately max(cuda_time, cpu_time).
	Please ignore this output if your code does not use CUDA.

----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                              Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                 EmbeddingBackward         6.71%     161.735ms         6.71%     161.735ms     161.735ms     511.000us         6.63%     511.000us     511.000us             1  
          aten::embedding_backward         6.71%     161.720ms         6.71%     161.720ms     161.720ms     508.500us         6.60%     508.500us     508.500us             1  
    aten::embedding_dense_backward         6.71%     161.708ms         6.71%     161.708ms     161.708ms     503.500us         6.53%     503.500us     503.500us             1  
                 EmbeddingBackward         6.67%     160.718ms         6.67%     160.718ms     160.718ms     550.000us         7.14%     550.000us     550.000us             1  
                 EmbeddingBackward         6.67%     160.708ms         6.67%     160.708ms     160.708ms     496.000us         6.44%     496.000us     496.000us             1  
                 EmbeddingBackward         6.67%     160.677ms         6.67%     160.677ms     160.677ms     496.000us         6.44%     496.000us     496.000us             1  
                 EmbeddingBackward         6.66%     160.661ms         6.66%     160.661ms     160.661ms     528.000us         6.85%     528.000us     528.000us             1  
                 EmbeddingBackward         6.66%     160.504ms         6.66%     160.504ms     160.504ms     494.000us         6.41%     494.000us     494.000us             1  
                 EmbeddingBackward         6.66%     160.453ms         6.66%     160.453ms     160.453ms     552.000us         7.16%     552.000us     552.000us             1  
                 EmbeddingBackward         6.66%     160.446ms         6.66%     160.446ms     160.446ms     496.000us         6.44%     496.000us     496.000us             1  
                 EmbeddingBackward         6.65%     160.314ms         6.65%     160.314ms     160.314ms     488.000us         6.33%     488.000us     488.000us             1  
                 EmbeddingBackward         6.65%     160.290ms         6.65%     160.290ms     160.290ms     512.000us         6.65%     512.000us     512.000us             1  
                 EmbeddingBackward         6.65%     160.238ms         6.65%     160.238ms     160.238ms     576.000us         7.48%     576.000us     576.000us             1  
                 EmbeddingBackward         6.65%     160.237ms         6.65%     160.237ms     160.237ms     480.000us         6.23%     480.000us     480.000us             1  
                 EmbeddingBackward         6.64%     160.183ms         6.64%     160.183ms     160.183ms     514.000us         6.67%     514.000us     514.000us             1  
----------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 2.411s
CUDA time total: 7.705ms


它会分析cpu的执行耗时,以及GPU的执行耗时。
从这份瓶颈报告中,我可以很轻易的发现,deepcopy是目前我的程序的一个瓶颈。
于是对深层次拷贝做优化。

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值