pytorch 频率过滤保留高频保留低频

最新推荐文章于 2022-10-26 00:56:53 发布

weixin_37763484

最新推荐文章于 2022-10-26 00:56:53 发布

阅读量1k

点赞数 1

分类专栏： python 深度学习文章标签： pytorch 频率过滤

本文链接：https://blog.csdn.net/weixin_37763484/article/details/114592096

版权

python 同时被 2 个专栏收录

44 篇文章 0 订阅

订阅专栏

深度学习

24 篇文章 1 订阅

订阅专栏

实验中遇到这样一个需求，在[batch_size,seq_length]的二维tensor中，找到每一个batch_size中出现频率最高的那一项，过滤掉其他项(保持原有形状)，并获取其编码。
例如，输入序列是[ [1,1,1,2,2],[3,3,4,4,4]], 期望获得[[1,1,1,0,0]，[0,0,4,4,4]],之后再获取1和4的embedding表示。(如果想获得[1,4]也是类似的)，代码如下：

import numpy as np
import torch
    # 关键代码，数据会在后面展示 
    top_item = []
    for i in tmp:
        array = i.cpu().numpy()
        a, cnts = np.unique(array[array != 0], return_counts=True)
        top_item.append([a[cnts.argmax()]] * 20)
    x=np.where(tmp.cpu().numpy()==top_item,tmp.cpu().numpy(),0)
    top_item_tensor = torch.Tensor(x).type(torch.int64)
    top_item_emb=self.embedding(top_item_tensor.cuda())

其中 tmp 是类似于[ [1,1,1,2,2],[3,3,4,4,4]]的输入， top_item_tensor 是类似于[[1,1,1,0,0]，[0,0,4,4,4]]的数组，
top_item是用来巨鹿中间结果的辅助变量，
如果你的代码是在cpu上运行的，请忽略代码中的cpu()、cuda()等。
使用到的数据是下面这样的：

tmp
Out[273]:
tensor([[912, 912, 912,  ...,   0,   0,   0],
        [ 79,  79,  79,  ...,   0,   0,   0],
        [342, 871, 342,  ...,   0,   0,   0],
        ...,
        [882, 349, 346,  ...,   0,   0,   0],
        [722, 785, 873,  ...,   0,   0,   0],
        [785,   0,   0,  ...,   0,   0,   0]], device='cuda:0')
        
tmp.shape
Out[279]: torch.Size([128, 20])

top_item_tensor
Out[274]:
tensor([[912, 912, 912,  ...,   0,   0,   0],
        [ 79,  79,  79,  ...,   0,   0,   0],
        [342,   0, 342,  ...,   0,   0,   0],
        ...,
        [  0,   0,   0,  ...,   0,   0,   0],
        [  0,   0,   0,  ...,   0,   0,   0],
        [785,   0,   0,  ...,   0,   0,   0]])
top_item_tensor.shape
Out[278]: torch.Size([128, 20])


self.embedding
Out[276]: Embedding(12346, 64)

top_item_emb.shape
Out[277]: torch.Size([128, 1, 64])

weixin_37763484

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
pytorch 频率过滤保留高频保留低频

实验中遇到这样一个需求，在[batch_size,seq_length]的二维tensor中，找到每一个batch_size中出现频率最高的那一项，过滤掉其他项(保持原有形状)，并获取其编码。例如，输入序列是[ [1,1,1,2,2],[3,3,4,4,4]], 期望获得[[1,1,1,0,0]，[0,0,4,4,4]],之后再获取1和4的embedding表示。(如果想获得[1,4]也是类似的)，代码如下：import numpy as npimport torch # 关键代码，数据会在后
复制链接

扫一扫