pytorch 中的 .detach() .clone()

最新推荐文章于 2024-04-17 11:23:00 发布

wwweiyx

最新推荐文章于 2024-04-17 11:23:00 发布

阅读量1.2k

点赞数 1

分类专栏： pytorch笔记文章标签： pytorch 深度学习 python

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weiyuxin107/article/details/125150999

版权

pytorch笔记专栏收录该内容

5 篇文章 0 订阅

订阅专栏

pytorch tensor 中的 .clone() 和 .detach()

detach() 的用法

在写代码时经常能见到通过 tensor.detach().clone() 操作生成一个和原本 tensor 值相同的新 tensor

为什么需要同时使用 .clone() 和 .detach() ，接下来通过代码进行说明
1. 生成两个 tensor，并且求梯度
```
a = torch.tensor([1.0, 1.0], requires_grad=True)
b = torch.tensor([2.0, 2.0], requires_grad=True)
loss = a@b
loss.backward()
print(a, b)
print(a.grad, b.grad)
```
  输出结果：
  
  tensor([1., 1.], requires_grad=True) tensor([2., 2.], requires_grad=True)
  tensor([2., 2.]) tensor([1., 1.])
  
  可以看到 a, b 的梯度分别为 [2., 2.]，[1., 1.]
2. 使用 a_=a.detch() 脱离计算图
  
  在上面的代码中加上 a_=a.detch() 并且使用 a_ 计算和 backward()
```
a = torch.tensor([1.0, 1.0], requires_grad=True)
b = torch.tensor([2.0, 2.0], requires_grad=True)
a_ = a.detach()
loss = a_@b
loss.backward()
print(a, b)
print(a.grad, b.grad)
```
  输出结果：
  
  tensor([1., 1.], requires_grad=True) tensor([2., 2.], requires_grad=True)
  None tensor([1., 1.])
  
  此时 a 的梯度为 none，因为.detach() 生成了一个新的 tensor 并且从计算图中脱离。a_ 运算后产生的梯度并不会传回 a
  
  关于 detach()具体细节可以查询官方文档
  
  生成的 a_ 是不会计算梯度的：
```
print(a_.requires_grad)
# out: False
```
3. 需要注意的是 .detach() 生成的 tensor 和原本的 tensor 共享内存
```
a = torch.tensor([1.0, 1.0], requires_grad=True)
b = torch.tensor([2.0, 2.0], requires_grad=True)
a_ = a.detach()
# 对 a_ 修改
a_[0] +=2 
loss = a@b
loss.backward()
print(a, b)
print(a.grad, b.grad)
```
  输出：
  
  tensor([3., 1.], requires_grad=True) tensor([2., 2.], requires_grad=True)
  tensor([2., 2.]) tensor([3., 1.])
  
  可以看到 a, b 的梯度分别为 [2., 2.]，[3., 1.]，原本a, b 的梯度分别为 [2., 2.]，[1., 1.]，但是因为改变了 a_[0]，导致 a[0] 也变了，所以梯度也发生了变化。
  
  上面的代码中，通过.detach() 生成了新的 tensor，然后修改新生成的 tensor。在计算原本 a@b 的backward() 发现 b 的梯度发生了变化。这是因为修改了 a_ 的同时 a 也发生了变换。所以需要 .clone()
4. 使用 .clone() 生成新的 tensor
```
a = torch.tensor([1.0, 1.0], requires_grad=True)
b = torch.tensor([2.0, 2.0], requires_grad=True)
a_ = a.clone()
a_[0] +=2 
loss = a@b
loss.backward()
print(a, b)
print(a.grad, b.grad)
```
  输出:
  
  tensor([1., 1.], requires_grad=True) tensor([2., 2.], requires_grad=True)
  tensor([2., 2.]) tensor([1., 1.])
  
  上述代码中，a_由a.clone() 生成。对a_ 进行修改并不会影响原本的a
5. .clone() 和.detach()
  
  .clone() 生成的 tensor 是可微的，在 backward 时候会将梯度回传
  
  在官方文档-clone() 中提到
  
  This function is differentiable, so gradients will flow back from the result of this operation to input. To create a tensor without an autograd relationship to input see detach().
```
a = torch.tensor([1.0, 1.0], requires_grad=True)
b = torch.tensor([2.0, 2.0], requires_grad=True)
a_ = a.clone()
a_[0] +=2 
loss = a_@b
loss.backward()
print(a, b)
print(a.grad, b.grad)
```
  tensor([1., 1.], requires_grad=True) tensor([2., 2.], requires_grad=True)
  tensor([2., 2.]) tensor([3., 1.])
  
  可以看到，b 的梯度是 [3., 1.] 而不是 [1., 1.]
  
  这是因为使用 .clone() 生成的 tensor 进行运算，反向求导产生的梯度会传回输入节点。
  
  所以一般.clone() 和.detach() 需要配合使用

关注

1
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。