Trying to backward through the graph a second time, but the buffers have already been freed

最新推荐文章于 2024-03-22 20:52:42 发布

MarToony|名角

最新推荐文章于 2024-03-22 20:52:42 发布

阅读量482

点赞数

分类专栏： Pytorch框架学习

本文链接：https://blog.csdn.net/m0_38052500/article/details/119479046

版权

Pytorch框架学习专栏收录该内容

11 篇文章 1 订阅

订阅专栏

错误信息：
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
尝试解决：

第一种：按照提示信息加上retain_graph。报以下信息：

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 5]], which is output 0 of TBackward, is at version 8; expected version 7 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

以上信息我无法找到。
我的一个猜想是即使找到了，未必一定是本质问题；

第二种：你碰到过的最难调试的 Bug 是什么样的？ - 索罗格的回答 - 知乎
作者的调试方式是：推到重写；原来自定义的损失函数中有个变量没有变成tensor且未放到GPU中。
受到作者的思路启发，回想自身的代码，确实也存在自定义的损失函数，既然如此也可能是一些变量没有变成tensor的原因。而后，经检查没有异样。
但是检查的时候有一点：我回想到之前将某个tensor变量在GPU化的时候，去掉了其属性data，由此再加上之后，就又可以正常运行了。

centroids = centroid_init(trainloader, encoder, k, d).to(output_device)
# 修改后
centroids = centroid_init(trainloader, encoder, k, d).data.to(output_device)
# 其中centroid_init的内部实现是：
def centroid_init(trainloader, encoder, k, d):
    centroid_sums = torch.zeros(k, d).to(output_device)
    centroid_counts = torch.zeros(k).to(output_device)
    for batch in trainloader:
        X_var, y_var = batch["data"].to(output_device), batch["target"].to(output_device)
        cluster_assignments = torch.LongTensor(X_var.size(0)).random_(k).to(output_device)
        embeddings = encoder(X_var)
        update_clusters(centroid_sums, centroid_counts, cluster_assignments, embeddings)
    centroid_means = centroid_sums / centroid_counts[:, None]
    return centroid_means.clone()
#

其实在print输出tensor对象的时候，加不加data，输出信息都是一样的；其实也确实不太明白一个tensor对象在复制到gpu上时，为什么要如此？—— 求指教

MarToony|名角

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Trying to backward through the graph a second time, but the buffers have already been freed

错误信息：RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.尝试解决：第一种：按照提示信息加上retain_graph。报以下信息：RuntimeError: one of the variables nee
复制链接

扫一扫