今天在运行一项图像refocus的程序时遇到了CUDA out of memory. Tried to allocate 114.00 MiB的问题,判断是显存不够的问题,于是记录以下解决的方法:
(1)单卡改多卡,这次还是报错,而且有新的错误:RuntimeError: Caught RuntimeError in replica 0 on device 0.
该问题的解决方法可以参考:
①https://blog.csdn.net/liu_yuan_kai/article/details/109290375?utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.control&dist_request_id=1331996.9521.16188988554774151&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7EBlogCommendFromMachineLearnPai2%7Edefault-1.control
②https://blog.csdn.net/senius/article/details/96599955?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase
(2)还是使用单卡,最终是通过在报错的那一行前面加上:
if hasattr(torch.cuda, 'empty_cache'):
torch.cuda.empty_cache()
并且在test之前添加:
with torch.no_grad():
# test process