深入浅出Pytorch函数——torch.nn.init.orthogonal_

von Neumann

已于 2023-09-25 19:48:15 修改

阅读量3.3k

点赞数 2

分类专栏：深入浅出Pytorch函数文章标签：人工智能深度学习 pytorch nn init

于 2023-08-19 21:14:49 首次发布

本文链接：https://blog.csdn.net/hy592070616/article/details/132384274

版权

深入浅出Pytorch函数专栏收录该内容

47 篇文章

订阅专栏

本文详细介绍了PyTorch库中torch.nn.init模块的orthogonal_函数，它用于生成半正交矩阵初始化神经网络参数，遵循Saxe等人的研究方法。该函数适用于至少二维的张量，并演示了其使用实例和实现原理。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

torch.nn.init模块中的所有函数都用于初始化神经网络参数，因此它们都在torc.no_grad()模式下运行，autograd不会将其考虑在内。

根据Saxe, A等人在《Exact solutions to the nonlinear dynamics of learning in deep linear neural networks》中描述的方法，用（半）正交矩阵填充输入的张量或变量。输入张量必须至少是2维的，对于更高维度的张量，超出的维度会被展平，视作行等于第一个维度，列等于稀疏矩阵乘积的2维表示，其中非零元素生成自 $\text{std}^2)$ 。

语法

torch.nn.init.orthogonal_(tensor, gain=1)

参数

tensor：[Tensor] 一个 $N$ 维张量torch.Tensor，其中 $N\geq 2$
gain：[可选] 比例因子

返回值

一个torch.Tensor且参数tensor也会更新

实例

w = torch.empty(3, 5)
nn.init.orthogonal_(w)

函数实现

def orthogonal_(tensor, gain=1):
    r"""Fills the input `Tensor` with a (semi) orthogonal matrix, as
    described in `Exact solutions to the nonlinear dynamics of learning in deep
    linear neural networks` - Saxe, A. et al. (2013). The input tensor must have
    at least 2 dimensions, and for tensors with more than 2 dimensions the
    trailing dimensions are flattened.

    Args:
        tensor: an n-dimensional `torch.Tensor`, where :math:`n \geq 2`
        gain: optional scaling factor

    Examples:
        >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_LAPACK)
        >>> w = torch.empty(3, 5)
        >>> nn.init.orthogonal_(w)
    """
    if tensor.ndimension() < 2:
        raise ValueError("Only tensors with 2 or more dimensions are supported")

    if tensor.numel() == 0:
        # no-op
        return tensor
    rows = tensor.size(0)
    cols = tensor.numel() // rows
    flattened = tensor.new(rows, cols).normal_(0, 1)

    if rows < cols:
        flattened.t_()

    # Compute the qr factorization
    q, r = torch.linalg.qr(flattened)
    # Make Q uniform according to https://arxiv.org/pdf/math-ph/0609050.pdf
    d = torch.diag(r, 0)
    ph = d.sign()
    q *= ph

    if rows < cols:
        q.t_()

    with torch.no_grad():
        tensor.view_as(q).copy_(q)
        tensor.mul_(gain)
    return tensor