torch profiler 性能分析 | 关于版本、设置的BUG

最新推荐文章于 2025-04-18 02:39:44 发布

Galaxy_Husky

最新推荐文章于 2025-04-18 02:39:44 发布

阅读量2.6k

点赞数 5

分类专栏： pytorch 文章标签： pytorch

本文链接：https://blog.csdn.net/qq_32044697/article/details/129137136

版权

pytorch 专栏收录该内容

4 篇文章

订阅专栏

torch profiler

版本问题
参数设置问题
其它报错

记录使用torch profiler分析性能过程中遇到的问题和解决方法。

版本问题

pytorch官网的 profiler tutorial 提到了在DDP模式下有个 Distributed View 可以分析多卡分布式训练时的同步时间和数据传输时间。
profiler_distributed_view
但是实际应用时却没有显示。根据github里的 issues 得知，只有pytorch版本降到1.11.0才能显示该视图。实测从2.0.0降到1.11.0后，显示正常。

参数设置问题

官网的示例设置如下：

prof = torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./log/resnet18'),
        record_shapes=True,
        with_stack=True)

用该设置在运行自己的程序时，非DDP模式下报错“Segmentation fault (core dumped) ”，DDP模式下报错“raise ProcessExitedException(
torch.multiprocessing.spawn.ProcessExitedException: process 2 terminated with signal SIGSEGV”。

另外，运行了github源码的 ddp example，显示一切正常。对两者进行对比后发现，实际with_stack设置为False才能正常运行。该参数的意思是记录操作的源信息(文件和行号)，具体产生报错的原因还未可知。

其它报错

因为版本和程序的不同可能会出现以下报错，但不影响运行。

运行自己的程序时出现了“[W CPUAllocator.cpp:219] Memory block of unknown size was allocated before the profiling started, profiler results will not include the deallocation event”
使用pytorch2.0.0和设置skip_first参数之后，程序都会出现“[W kineto_shim.cpp:343] Profiler is not initialized: skipping step() invocation”