Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
昨天复现了论文的代码,然后今天想多训练一点数据,将batch_size从144调整为500,结果就出现这个错误:Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
原因就是GPU的使用率太高了(数据量调太大了),可能接近或者超过100%,爆显存了。
所以把batch_size调小一些就没问题了。