Adaptive input representations源码阅读笔记

最新推荐文章于 2022-05-15 13:00:34 发布

菜小白—NLP

最新推荐文章于 2022-05-15 13:00:34 发布

阅读量492

点赞数

分类专栏： NLP 文章标签：自然语言处理

本文链接：https://blog.csdn.net/ACM_hades/article/details/104543812

版权

NLP 专栏收录该内容

40 篇文章 7 订阅

订阅专栏

一.参考链接

理论解读：https://blog.csdn.net/ACM_hades/article/details/104541116
代码参考连接：https://editor.csdn.net/md?articleId=104543812

二.代码

import torch.nn as nn

class AdaptiveInput(nn.Module):
    """
    This implementation and the above description are heavily cited from the softmax counterpart from
    https://pytorch.org/docs/stable/_modules/torch/nn/modules/adaptive.html
    """
    def __init__(self, in_features, n_classes, cutoffs=None,div_value=4., head_bias=False):
        """
        :param in_features:embeding的维度论文中的参数d
        :param n_classes:词汇表长度
        :param cutoffs: 是一个list，控制词汇表的划分，例如：cutoffs = [10, 100, 1000]
        表示将词汇表划分成五个clusters：[0-10]、[11-100]、[101-1000]和[1001-最后]
        :param div_value:就是论文中的参数 k
        :param head_bias:
        """
        super(AdaptiveInput, self).__init__()#初始化父类
        if not cutoffs:
            cutoffs = [10000, 60000, 190000]
        cutoffs = list(cutoffs)
        #检查cutoffs是否符合条件
        if (cutoffs != sorted(cutoffs)) \
                or (min(cutoffs) <= 0) \
                or (max(cutoffs) >= (n_classes - 1)) \
                or (len(set(cutoffs)) != len(cutoffs)) \
                or any([int(c) != c for c in cutoffs]):
            raise ValueError("cutoffs should be a sequence of unique, positive "
                             "integers sorted in an increasing order, where "
                             "each value is between 1 and n_classes-1")

        self.in_features = in_features
        self.n_classes = n_classes
        self.cutoffs = cutoffs + [n_classes]#在cutoffs后面插入词汇表的最大值
        self.div_value = div_value
        self.head_bias = head_bias
        #这里将词汇表的第一个子集V1c称为head，其他的vi称为cluster
        self.n_clusters = len(self.cutoffs) - 1 #cluster的个数
        self.head_size = self.cutoffs[0]  #V1子集的大小
        #定义V1的embeding矩阵E1与映射矩阵W1
        self.head = nn.Sequential(nn.Embedding(self.head_size, self.in_features),
                                  nn.Linear(self.in_features, self.in_features, bias=self.head_bias))
        #其他的Vi的embedding矩阵Ei与映射矩阵Wi，放入该列表中
        self.tail = nn.ModuleList()
        for i in range(self.n_clusters):
            hsz = int(self.in_features // (self.div_value ** (i + 1))) #Ei维度
            osz = self.cutoffs[i + 1] - self.cutoffs[i] #Vi中的词汇数
            #定义Ei与Wi
            projection = nn.Sequential(
                nn.Embedding(osz, hsz),
                nn.Linear(hsz, self.in_features, bias=False),
            )
            #添加到ModuleList中去
            self.tail.append(projection)


    def forward(self, input):
        """
        :param input: 一个句子list，中间元素是句子中词在词汇表中的编号
        """
        used_rows = 0
        input_size = list(input.size()) #[q_len]

        output = input.new_zeros(input_size + [self.in_features]).float() #[q_len,in_features]

        cutoff_values = [0] + self.cutoffs
        for i in range(len(cutoff_values) - 1):

            low_idx = cutoff_values[i] #Vi 第一个词的索引
            high_idx = cutoff_values[i + 1]  #Vi 中最后一个词的索引

            input_mask = (input >= low_idx) & (input < high_idx) #将句子中属于Vi的词标为1，其他标为0
            row_indices = input_mask.nonzero().squeeze() #取出input_mask中个行中为1的列索引

            if row_indices.numel() == 0:#句子中没有Vi中的词
                continue
            #去除句子中属于Vi的词并输入
            out = self.head(input[input_mask] - low_idx) if i == 0 else self.tail[i - 1](input[input_mask] - low_idx)
            output.index_copy_(0, row_indices, out) #按照源句子中的位置将词embedding向量放到输出矩阵的相应位置。
            used_rows += row_indices.numel()  #记录已经有多少个词一个转化为embedding向量

        if used_rows != input_size[0]:
            raise RuntimeError("Target values should be in [0, {}], "
                               "but values in range [{}, {}] "
                               "were found. ".format(self.n_classes - 1,
                                                     input.min().item(),
                                                     input.max().item()))
        return output


# Example
import torch
x = torch.arange(0,100).long()
inp = AdaptiveInput(128, 100, cutoffs=[4,8,16])
print(inp(x))

菜小白—NLP

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Adaptive input representations源码阅读笔记

一.参考链接理论解读：https://blog.csdn.net/ACM_hades/article/details/104541116代码参考连接：https://editor.csdn.net/md?articleId=104543812二.代码import torch.nn as nnclass AdaptiveInput(nn.Module): """ Th...
复制链接

扫一扫

专栏目录