OutOfMemoryError: CUDA out of memory. Tried to allocate 6.31 GiB. GPU 0 has a total capacity of 39.38 GiB of which 3.86 GiB is free. Process 580195 has 35.51 GiB memory in use. Of the allocated memory 20.52 GiB is allocated by PyTorch, and 14.48 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
- 设置环境变量:启用“expandable_segments:True”以优化内存分配,减少碎片。
set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
echo %PYTORCH_CUDA_ALLOC_CONF%
- 优化内存使用:在代码中尽量减少不必要的内存分配和释放,重用张量,及时删除不需要的张量,并释放缓存内存。
- 监控内存使用情况:使用工具实时监控GPU内存的使用情况,了解内存的分配和释放状态。
获取当前GPU的内存使用情况
print(f"GPU memory allocated:{torch.cuda.max_memory_allocated() /10242:.2f} MB")
print(f"GPUmemory cached:{torch.cuda.max_memory_reserved() / 10242:.2f} MB")
print(f"Memory allocated: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")
print(f"Memory cached: {torch.cuda.memory_reserved(0) / 1e9:.2f} GB")
- 调整模型和训练参数:如果内存仍然不足,可以考虑减小批次大小,优化模型结构,或者使用多GPU训练以分担内存负载。