快速获取hugging face的LLM推理时的中间层attention weights的办法

最新推荐文章于 2024-09-09 09:47:53 发布

NOVAglow646

最新推荐文章于 2024-09-09 09:47:53 发布

阅读量165

点赞数 2

文章标签：人工智能深度学习

本文链接：https://blog.csdn.net/NOVAglow646/article/details/140757327

版权

以GPT-2为例，只需要在推理时加上output_attentions = True即可：

from transformers import GPT2Model, GPT2Config
configuration = GPT2Config(

            n_positions= 200,
            n_embd=256,
            n_layer=12,
            n_head=8,
            resid_pdrop=0.0,
            embd_pdrop=0.0,
            attn_pdrop=0.0,
            use_cache=False,
        ) # GPT-2配置
model = GPT2Model(configuration)
input_embedding = torch.randn(64, 200, 256) # [bsz, sequence_len, n_embd]
pred = model(inputs_embeds=embeds, output_attentions = True) #前传加上output_attentions = True即可

加上output_attentions = True，则返回的pred为包含logits, attentions等key的字典。其中的 attentions 是一个长度为模型层数的元组，其中第i个元素是一个形状为[bsz, n_head, sequence_len, sequence_len]的tensor，即为第i层的attention weight。这是transformer库的模型自带的功能，用起来很方便。

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

NOVAglow646

关注关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
快速获取hugging face的LLM推理时的中间层attention weights的办法

加上output_attentions = True，则返回的pred为包含logits, attentions等key的字典。其中的 attentions 是一个长度为模型层数的元组，其中第i个元素是一个形状为[bsz, n_head, sequence_len, sequence_len]的tensor，即为第i层的attention weight。这是transformer库的模型自带的功能，用起来很方便。以GPT-2为例，只需要在推理时加上。
复制链接

扫一扫