Pytorch 转 Tensorflow代码的一些参考和坑（持续更新）

早起学习晚上搬砖

已于 2022-11-30 16:36:55 修改

阅读量1.6k

点赞数 2

分类专栏： Tensorflow2 文章标签： pytorch tensorflow 深度学习

于 2022-11-30 16:33:21 首次发布

本文链接：https://blog.csdn.net/zzl18681269883/article/details/128116560

版权

Tensorflow2 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

最近在移植Pytorch代码的时候，发现很多函数都是可以对应上，或者需要进行修改，因此写此篇文章用于记录遇到的各种函数和解决办法，以及存在的BUG和解决方案。后面遇到问题会继续更新此博客

1.函数对应

Pytorch & Tensorflow 函数对应
Pytorch	描述	Tensorflow	描述
masked_fill_	掩码操作用value填充tensor中与mask中值为1位置相对应的元素。mask的形状必须与要填充的tensor形状一致	tf.where	将mask作为condition, x值为value， y值为原本tensorf的值 x = tf.where(mask, x=mask_value, y=x)
nn.function.pad		tf.pad
nn.finfo	根据括号中的类型来获得信息，获得符合这个类型的数型	tf.experimental.numpy.finfo()
nn.einsum		tf.einsum
nn.chunk()	分块	tf.split()
nn.Parameter()		tf.Variable()
torch.no_grad()		tf.stop_gradient()
nn.cumsum		利用tf.cast和tf.math.cumsum
torch.arange		tf.range
torch.stack		tf.stack
torch.cat		tf.concat
torch.permute		tf.transpose

Pytorch & Tensorflow Layer对应
Pytorch	Tensorflow	备注
nn.Linear	tf.keras.Dense	TF不需要输入的维度，只要输出维度
nn.Sequential	tf.keras.Sequential	TF中注意在()里面加[ ]
nn.ModuleList	直接新建一个空列表，然后list.append()
nn.Identity()	自己构建一个Module，直接返回输入值

2.函数移植

1.trunc_normal_ :

描述：trunc_normal_ , 在pytorch中，该函数是截取正泰分布，限制变量的取值范围。

在pytorch中的代码如下所示

def _no_grad_trunc_normal_(tensor, mean, std, a, b):
    # Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf
    def norm_cdf(x):
        # Computes standard normal cumulative distribution function
        return (1. + math.erf(x / math.sqrt(2.))) / 2.

    if (mean < a - 2 * std) or (mean > b + 2 * std):
        warnings.warn("mean is more than 2 std from [a, b] in nn.init.trunc_normal_. "
                      "The distribution of values may be incorrect.",
                      stacklevel=2)

    with torch.no_grad():
        # Values are generated by using a truncated uniform distribution and
        # then using the inverse CDF for the normal distribution.
        # Get upper and lower cdf values
        l = norm_cdf((a - mean) / std)
        u = norm_cdf((b - mean) / std)

        # Uniformly fill tensor with values from [l, u], then translate to
        # [2l-1, 2u-1].
        tensor.uniform_(2 * l - 1, 2 * u - 1)

        # Use inverse cdf transform for normal distribution to get truncated
        # standard normal
        tensor.erfinv_()

        # Transform to proper mean, std
        tensor.mul_(std * math.sqrt(2.))
        tensor.add_(mean)

        # Clamp to ensure it's in the proper range
        tensor.clamp_(min=a, max=b)
        return tensor

def trunc_normal_(tensor: Tensor, mean: float = 0., std: float = 1., a: float = -2., b: float = 2.) -> Tensor:
    r"""Fills the input Tensor with values drawn from a truncated
    normal distribution. The values are effectively drawn from the
    normal distribution :math:`\mathcal{N}(\text{mean}, \text{std}^2)`
    with values outside :math:`[a, b]` redrawn until they are within
    the bounds. The method used for generating the random values works
    best when :math:`a \leq \text{mean} \leq b`.

    Args:
        tensor: an n-dimensional `torch.Tensor`
        mean: the mean of the normal distribution
        std: the standard deviation of the normal distribution
        a: the minimum cutoff value
        b: the maximum cutoff value

    Examples:
        >>> w = torch.empty(3, 5)
        >>> nn.init.trunc_normal_(w)
    """
    return _no_grad_trunc_normal_(tensor, mean, std, a, b)

移植到Tensorflow中为

import tensorflow as tf
import math
import warnings


def _no_grad_trunc_normal_(tensor, mean, std, a, b):
    # Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf
    def norm_cdf(x):
        # Computes standard normal cumulative distribution function
        return (1. + math.erf(x / math.sqrt(2.))) / 2.

    if (mean < a - 2 * std) or (mean > b + 2 * std):
        warnings.warn("mean is more than 2 std from [a, b] in nn.init.trunc_normal_. "
                      "The distribution of values may be incorrect.",
                      stacklevel=2)

    # with torch.no_grad():
    # Values are generated by using a truncated uniform distribution and
    # then using the inverse CDF for the normal distribution.
    # Get upper and lower cdf values
    l = tf.stop_gradient(norm_cdf((a - mean) / std))
    u = tf.stop_gradient(norm_cdf((b - mean) / std))

    # Uniformly fill tensor with values from [l, u], then translate to
    # [2l-1, 2u-1].
    tensor = tf.stop_gradient(tf.random.uniform(shape=tf.shape(tensor), minval=2 * l - 1, maxval=2 * u - 1))
    # tensor.uniform_(2 * l - 1, 2 * u - 1)
    # Use inverse cdf transform for normal distribution to get truncated
    # standard normal
    tensor = tf.stop_gradient(tf.math.erfinv(x=tensor))
    # tensor.erfinv_()

    # Transform to proper mean, std
    tensor = tf.stop_gradient(tf.math.multiply(x=tensor, y=std * math.sqrt(2.)))
    # tensor.mul_(std * math.sqrt(2.))
    tensor = tf.stop_gradient(tf.math.add(x=tensor, y=mean))
    # tensor.add_(mean)

    # Clamp to ensure it's in the proper range
    tensor = tf.stop_gradient(tf.clip_by_value(t=tensor, clip_value_min=a, clip_value_max=b))
    # tensor.clamp_(min=a, max=b)
    return tensor

def trunc_normal_(tensor, mean: float = 0., std: float = 1., a: float = -2., b: float = 2.):
    r"""Fills the input Tensor with values drawn from a truncated
    normal distribution. The values are effectively drawn from the
    normal distribution :math:`\mathcal{N}(\text{mean}, \text{std}^2)`
    with values outside :math:`[a, b]` redrawn until they are within
    the bounds. The method used for generating the random values works
    best when :math:`a \leq \text{mean} \leq b`.

    Args:
        tensor: an n-dimensional `torch.Tensor`
        mean: the mean of the normal distribution
        std: the standard deviation of the normal distribution
        a: the minimum cutoff value
        b: the maximum cutoff value

    Examples:
        # >>> w = torch.empty(3, 5)
        # >>> nn.init.trunc_normal_(w)
    """
    return _no_grad_trunc_normal_(tensor, mean, std, a, b)

2.flatten()操作

torch.nn.Flatten(tensor, start_dim=1, end_dim=- 1)，在pytorch中，该函数可以压缩任何范围内的维度。

在Tensorflow中实现的代码如下

def flatten(tensor, start_dim=0, end_dim=-1):
    """
    :param tensor:
    :param start_dim:
    :param end_dim:
    :return: 对应pytorch的flatten操作，将索引为 start_dim 和 end_dim 之间（包括该位置）的数量相乘，其余位置不变。因为默认 start_dim=0，end_dim=-1，所以 torch.flatten(t) 返回只有一维的数据。
    """
    input_dim = tf.shape(tensor).numpy().tolist()
    assert start_dim <= len(input_dim) - 1 and end_dim <= len(input_dim) - 1  # 确保数值不会高于维度
    li = []
    # if start_dim == 0
    if end_dim == -1:
        li.extend(input_dim[:start_dim])
        li.extend([-1])
        # print(li)
        tensor = tf.reshape(tensor, shape=li)
    else:
        li.extend(input_dim[:start_dim])
        li.extend([-1])
        li.extend(input_dim[end_dim + 1:])
        # print(li)
        tensor = tf.reshape(tensor, shape=li)
    return tensor

3.nn.Module.apply(待解决)

该函数在pytorch中，是遍历所有子层，通常用于整体初始化参数,在Tensorflow中目前我还不晓得如何在继承Module中实现该方法，一个一个调用太慢了。或许利用全局变量是个解决思路

3.其他库的函数调用

4.BUG

1.einops

在Tensorflow使用的问题如下：

1. Tensor type unknown to einops <class 'keras.engine.keras_tensor.KerasTensor'>

如果采用函数式API的方式构建模型，特别是当中又调用了其他的网络模块，则可能发生

Tensor type unknown to einops <class 'keras.engine.keras_tensor.KerasTensor'>

解决办法是改为继承tf.keras.Model方式编写，并在该__init__时实现其他模块类，在

call中进行调用，则可以解决。

2.单行调用形式

Residual(PreNorm(dim, Attention(dim, heads=heads, dropout=dropout, num_keypoints=num_keypoints, scale_with_head=scale_with_head))),

其中Residual，PreNorm，Attention都是单独的Module，而在Tensorflow中，我采用的是再新建一个继承Module的类，按调用顺序依次执行。