Pytorch 踩坑系列

最新推荐文章于 2024-05-17 21:00:00 发布

Yogaht

最新推荐文章于 2024-05-17 21:00:00 发布

阅读量4.8k

点赞数

分类专栏： pytorch 神经网络语音识别文章标签： pytorch

本文链接：https://blog.csdn.net/Yogaht/article/details/92631120

版权

语音识别同时被 3 个专栏收录

4 篇文章 2 订阅

订阅专栏

神经网络

2 篇文章 0 订阅

订阅专栏

pytorch

1 篇文章 0 订阅

订阅专栏

1.torch.split()

torch.split(tensor, split_size_or_sections, dim=0)

将输入张量分割成相等形状的chunks（如果可分）。

split_size_or_sections为int型，则tensor被分为相等的大小,如果沿指定维的张量形状大小不能被split_size_or_sections整分，则最后一个分块会小于其它分块。

split_size_or_sections为list型，则按list分割。

参数:

tensor (Tensor) – 待分割张量
split_size (int or list) – 单个分块的形状大小
dim (int) – 沿着此维进行分割

踩坑：之前版本的split函数第二个参数为split_size（int），如果torch版本较新则会报错：

TypeError: split() got an unexpected keyword argument 'split_size'

解决方法就是split_size改成split_size_or_sections.

2.torch.sort()

torch.sort(input, dim=None, descending=False, out=None) -> (Tensor, LongTensor)

对输入张量input沿着指定维按升序排序。如果不给定dim，则默认为输入的最后一维。如果指定参数descending为True，则按降序排序

返回元组 (sorted_tensor, sorted_indices) ， sorted_indices 为原始输入中的下标。

参数:

input (Tensor) – 要对比的张量
dim (int, optional) – 沿着此维排序
descending (bool, optional) – 布尔值，控制升降排序
out (tuple, optional) – 输出张量。必须为ByteTensor或者与第一个参数tensor相同类型。

用到这个函数是在pytorch中RNN模型处理变长序列时，我在做序列标注任务时，当一个batch作为输入时，不同句子词数不同，因此需要将短句padding到batch最长句的len，这样输入到rnn中会对模型有影响，查看torch.nn.RNN可以看到对于输入input，有这样一句：

The input can also be a packed variable length sequence. See torch.nn.utils.rnn.pack_padded_sequence() or torch.nn.utils.rnn.pack_sequence() for details.

于是就开始排坑啦，后面补这两个函数的用法。

3.torch.nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first=False, enforce_sorted=True)

Packs a Tensor containing padded sequences of variable length.

input can be of size T x B x * where T is the length of the longest sequence (equal to lengths[0]), B is the batch size, and * is any number of dimensions (including 0). If batch_first is True, B x T x * input is expected.

For unsorted sequences, use enforce_sorted = False. If enforce_sorted is True, the sequences should be sorted by length in a decreasing order, i.e. input[:,0] should be the longest sequence, and input[:,B-1] the shortest one. enforce_sorted = True is only necessary for ONNX export.

NOTE

This function accepts any input that has at least two dimensions. You can apply it to pack the labels, and use the output of the RNN with them to compute the loss directly. A Tensor can be retrieved from a PackedSequenceobject by accessing its .data attribute.

Parameters

input (Tensor) – padded batch of variable length sequences.
lengths (Tensor) – list of sequences lengths of each batch element.
batch_first (bool, optional) – if True, the input is expected in B x T x * format.
enforce_sorted (bool, optional) – if True, the input is expected to contain sequences sorted by length in a decreasing order. If False, this condition is not checked. Default: True.

Returns

a PackedSequence object

这个函数的用处就是在RNN模型处理变长序列时，比如NER任务，在输入为一个batch的句子，输入的句子的长度必然是不同的，为了保证维度一致我们会把其中的短句子padding到和长句等长，但是我们又不希望这些padding的值参与训练，因此这里就是告诉RNN模型输入的padding情况。下面看这个函数的参数和输入。

input(T x B x *）T就是max_len，你一个batch中句子的最大长度，B是batch，后面一般就是emb_dim.

lengths就是batch中每个句子的有效长度

enforce_sorted 这个参数在版本1.1.0之前好像都是没有的，设置是否需要将lengths以及input按有效长度降序排序，默认False的时候需要降序排序，我自己没有设置过这个参数，因此需要自己调整一下。

word_emb = torch.randn(3,4,5)
print(word_emb)
lengths = [2,3,4]
lengths = torch.Tensor(lengths)
_, idx_sort = torch.sort(torch.Tensor(lengths), dim=0, descending=True)
_, idx_unsort = torch.sort(idx_sort, dim=0)
word_emb = word_emb.index_select(0, idx_sort)
lengths = list(lengths[idx_sort])
word_emb_after_packed = pack_padded_sequence(word_emb, lengths, batch_first=True)
print(word_emb_after_packed)
'''
word_emb:
tensor([[[-0.8626, -0.3088, -1.3562, -0.2448, -0.2467],
         [-0.2388,  1.1539,  0.1325, -0.6413, -0.7694],
         [ 0.3657,  0.4548, -1.3313,  1.8620, -1.1079],
         [-0.5403, -0.9424,  2.2213,  0.7689,  0.8932]],

        [[-0.9333,  1.5342, -1.4053,  0.7799,  0.4838],
         [ 0.1614, -0.2331,  1.6667, -1.7032,  0.3099],
         [-0.6210, -0.4821, -1.5498, -0.1731, -0.6864],
         [ 0.0037,  1.0089, -2.5998, -0.3588,  0.0582]],

        [[ 0.1429, -0.6191,  0.1100, -0.6952,  0.7599],
         [ 1.0877, -0.6400,  1.9040, -1.6933, -0.7815],
         [-1.9465, -0.7313, -0.0445, -1.9152,  1.7431],
         [-1.3321,  1.3924, -0.4106, -1.5812,  0.2697]]])
word_emb_after_packed:
PackedSequence(data=tensor([[ 0.1429, -0.6191,  0.1100, -0.6952,  0.7599],
        [-0.9333,  1.5342, -1.4053,  0.7799,  0.4838],
        [-0.8626, -0.3088, -1.3562, -0.2448, -0.2467],
        [ 1.0877, -0.6400,  1.9040, -1.6933, -0.7815],
        [ 0.1614, -0.2331,  1.6667, -1.7032,  0.3099],
        [-0.2388,  1.1539,  0.1325, -0.6413, -0.7694],
        [-1.9465, -0.7313, -0.0445, -1.9152,  1.7431],
        [-0.6210, -0.4821, -1.5498, -0.1731, -0.6864],
        [-1.3321,  1.3924, -0.4106, -1.5812,  0.2697]]), batch_sizes=tensor([3, 3, 2, 1]), sorted_indices=None, unsorted_indices=None)
'''

这里将input处理成将batch中每个句子的对应的相同索引位置的词的emb先后排序，PackedSequence有个参数batch_sizes=tensor([3,3,2,1])，就是代表这个batch中输入到RNN模型的前四个单元中分别对应的有效词的个数。

现在我们得到Packed后的RNN输入，模型现在只输入有效长度的emb了，那么我们如何将Packed后的数据还原成原始数据呢，或者说模型预测结果也是Packed的数据，我们如何将其还原成原始数据的padding形式。

4.torch.nn.utils.rnn.pad_packed_sequence

Pads a packed batch of variable length sequences.

It is an inverse operation to pack_padded_sequence().

The returned Tensor’s data will be of size T x B x *, where T is the length of the longest sequence and B is the batch size. If batch_first is True, the data will be transposed into B x T x * format.

Batch elements will be ordered decreasingly by their length.

Parameters

sequence (PackedSequence) – batch to pad
batch_first (bool, optional) – if True, the output will be in B x T x * format.
padding_value (float, optional) – values for padded elements.
total_length (int, optional) – if not None, the output will be padded to have length total_length. This method will throw ValueError if total_length is less than the max sequence length in sequence.

Returns

Tuple of Tensor containing the padded sequence, and a Tensor containing the list of lengths of each sequence in the batch.

直接上代码：

word_padded = nn.utils.rnn.pad_packed_sequence(word_emb_after_packed, batch_first=True)
print(word_padded)
output = word_padded[0].index_select(0, idx_unsort)
'''
word_padded:
(tensor([[[ 0.1429, -0.6191,  0.1100, -0.6952,  0.7599],
         [ 1.0877, -0.6400,  1.9040, -1.6933, -0.7815],
         [-1.9465, -0.7313, -0.0445, -1.9152,  1.7431],
         [-1.3321,  1.3924, -0.4106, -1.5812,  0.2697]],

        [[-0.9333,  1.5342, -1.4053,  0.7799,  0.4838],
         [ 0.1614, -0.2331,  1.6667, -1.7032,  0.3099],
         [-0.6210, -0.4821, -1.5498, -0.1731, -0.6864],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],

        [[-0.8626, -0.3088, -1.3562, -0.2448, -0.2467],
         [-0.2388,  1.1539,  0.1325, -0.6413, -0.7694],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]), tensor([4, 3, 2]))
output:
tensor([[[-0.8626, -0.3088, -1.3562, -0.2448, -0.2467],
         [-0.2388,  1.1539,  0.1325, -0.6413, -0.7694],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],

        [[-0.9333,  1.5342, -1.4053,  0.7799,  0.4838],
         [ 0.1614, -0.2331,  1.6667, -1.7032,  0.3099],
         [-0.6210, -0.4821, -1.5498, -0.1731, -0.6864],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000]],

        [[ 0.1429, -0.6191,  0.1100, -0.6952,  0.7599],
         [ 1.0877, -0.6400,  1.9040, -1.6933, -0.7815],
         [-1.9465, -0.7313, -0.0445, -1.9152,  1.7431],
         [-1.3321,  1.3924, -0.4106, -1.5812,  0.2697]]])
'''

仔细观察word_emb_after_packed和word_padded，以及最后的output，就很容易能理解了。

5.实现CNN

class Conv2D(object):
    def __init__(self, shape, output_channels, ksize=3, stride=1, method='VALID'):
        self.input_shape = shape
        self.output_channels = output_channels
        self.input_channels = shape[-1]
        self.batchsize = shape[0]
        self.stride = stride
        self.ksize = ksize
        self.method = method
        weights_scale = math.sqrt(ksize*ksize*self.input_channels/2)
        self.weights = np.random.standard_normal((ksize, ksize, self.input_channels, self.output_channels)) / weights_scale
        self.bias = np.random.standard_normal(self.output_channels) / weights_scale
        if method == 'VALID':
            self.eta = np.zeros((shape[0], (shape[1] - ksize ) // self.stride + 1, 
                                 (shape[2] - ksize ) // self.stride + 1, self.output_channels))

        if method == 'SAME':
            self.eta = np.zeros((shape[0], shape[1]//self.stride, shape[2]//self.stride,self.output_channels))

    def forward(self, x):
        col_weights = self.weights.reshape([-1, self.output_channels])
        self.col_image = []
        conv_out = np.zeros(self.eta.shape)
        for i in range(self.batchsize):
            img_i = x[i][np.newaxis, :]
            self.col_image_i = im2col(img_i, self.ksize, self.stride)
            conv_out[i] = np.reshape(np.dot(self.col_image_i, col_weights) + self.bias, self.eta[0].shape)
        return conv_out
def im2col(image, ksize, stride):
    # image is a 4d tensor([batchsize, width ,height, channel])
    image_col = []
    for i in range(0, image.shape[1] - ksize + 1, stride):
        for j in range(0, image.shape[2] - ksize + 1, stride):
            col = image[:, i:i + ksize, j:j + ksize, :].reshape([-1])
            image_col.append(col)
    image_col = np.array(image_col)
#     print(image_col)
    return image_col


if __name__ == '__main__':
    img = np.ones((1, 5, 5, 3))
    img *= 2
    conv = Conv2D(img.shape, 12, 3, 1)
    next = conv.forward(img)
    print(next)

Yogaht

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
2
评论
Pytorch 踩坑系列

1.torch.split()torch.split(tensor, split_size_or_sections, dim=0)将输入张量分割成相等形状的chunks（如果可分）。split_size_or_sections为int型，则tensor被分为相等的大小,如果沿指定维的张量形状大小不能被split_size_or_sections整分，则最后一个分块会小于其它分块。...
复制链接

扫一扫