paper_longformer

最新推荐文章于 2024-06-26 16:05:43 发布

愚昧之山绝望之谷开悟之坡

最新推荐文章于 2024-06-26 16:05:43 发布

阅读量144

点赞数

分类专栏： PP/TF/PT

本文链接：https://blog.csdn.net/qq_15821487/article/details/120223369

版权

Longformer 注意力机制局部注意力全局注意力预训练模型

关键词由CSDN通过智能技术生成

PP/TF/PT 专栏收录该内容

62 篇文章 0 订阅

订阅专栏

LongformerModel(
  (embeddings): LongformerEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=1)
    (position_embeddings): Embedding(512, 768, padding_idx=1)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): LongformerEncoder(
    (layer): ModuleList(
      (0): LongformerLayer(
        (attention): LongformerAttention(
          (self): LongformerSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (query_global): Linear(in_features=768, out_features=768, bias=True)
            (key_global): Linear(in_features=768, out_features=768, bias=True)
            (value_global): Linear(in_features=768, out_features=768, bias=True)
          )
          (output): LongformerSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): LongformerIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
        )
        (output): LongformerOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )

longformer
输入向量化层
中间encoding编码层，这里面有多头注意力层，有分局部注意力和全局注意力
输出层
中间线性层
输出层
把结构打印出来，一清二楚，非常明了
在这里插入图片描述
目录结构就在这里，其实路径和windows一样，只是前面的home可能会有区别而已，可以用~ 代替
cd ~/.cache/huggingface/transformers

在这里插入图片描述
studio或者jupyter或者colab直接把变量打印出即可，不必用print,默认是最后的变量，要打印出相应的size，直接张量.size()即可

在这里插入图片描述

一样都是统一继承LongformerPreTrainedModel这个模型，只是一个是标准的特征抽取没有接下游任务，其他的是接了下游任务针对特定任务的类而已。继承的这个类，涉及到参数的定义和加载等。

愚昧之山绝望之谷开悟之坡

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
paper_longformer

LongformerModel( (embeddings): LongformerEmbeddings( (word_embeddings): Embedding(30522, 768, padding_idx=1) (position_embeddings): Embedding(512, 768, padding_idx=1) (token_type_embeddings): Embedding(2, 768) (LayerNorm): LayerNorm((768,
复制链接

扫一扫

专栏目录