pytorch Embedding max_norm使用注意事项

最新推荐文章于 2022-10-23 21:47:45 发布

qq_46111795

最新推荐文章于 2022-10-23 21:47:45 发布

阅读量1.3k

点赞数 1

分类专栏： python 文章标签： pytorch 深度学习 python

本文链接：https://blog.csdn.net/qq_46111795/article/details/125536423

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

这篇博客讨论了PyTorch中Embedding层的max_norm参数如何影响权重。当max_norm设置为True时，forward函数会原地修改权重，导致在权重上进行可微操作前需要先复制权重。文中通过一个实例展示了如果不复制权重，反向传播时由于权重被修改，可能会导致错误。重点强调了在涉及权重操作时，理解PyTorch的内部工作原理的重要性。

摘要由CSDN通过智能技术生成

pytorch Embedding

在官方文档中，Embedding的max_norm参数给出了一个注意事项：When max_norm is not None, Embedding’s forward method will modify the weight tensor in-place. Since tensors needed for gradient computations cannot be modified in-place, performing a differentiable operation on Embedding.weight before calling Embedding’s forward method requires cloning Embedding.weight when max_norm is not None。
大意是当Embedding层的max_norm参数不为None时，调用forward函数会原地修改Embedding的weight的值，所以如果要在调用forward函数之前对weight进行可微操作需要对weight进行复制。

n, d, m = 3, 5, 7
embedding = nn.Embedding(n, d, max_norm=True)
W = torch.randn((m, d), requires_grad=True)
idx = torch.tensor([1, 2])
a = embedding.weight.clone() @ W.t()  # weight must be cloned for this to be differentiable
b = embedding(idx) @ W.t()  # modifies weight in-place
out = (a.unsqueeze(0) + b.unsqueeze(1))
loss = out.sigmoid().prod()
loss.backward()

Embedding的forward会对weight进行修改的是因为在forward函数中调用的是F.embedding函数，F.embedding如果设置了max_norm的话，会对传入的weight进行原地的修改。在上述代码中，b = embedding(idx)调用了forward函数，a = embedding.weight @ W.t()在forward之前对embedding.weight使用了可微操作，所以a处需要复制，因为如果不复制，在反向传播的时候，经过b的运算，weight参数被修改了，之前记录的和反向传播的时候得到的参数不一致，会产生错误。如果交换ab两行的位置，也就是在forward之后再对embedding.weight调用可谓操作，就不需要复制了

qq_46111795

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
pytorch Embedding max_norm使用注意事项

在官方文档中，Embedding的max_norm参数给出了一个注意事项：When max_norm is not None, Embedding’s forward method will modify the weight tensor in-place. Since tensors needed for gradient computations cannot be modified in-place, performing a differentiable operation on Embeddin
复制链接

扫一扫

专栏目录