使用transformers的T5模型获取输入文本的sentence embedding句向量

 

from transformers import T5Tokenizer, T5Model
import torch

MODEL_NAME = 't5-small'
print(f'Loading {MODEL_NAME} Model...')

# 加载模型和tokenizer
tokenizer = T5Tokenizer.from_pretrained('t5-small')
model = T5Model.from_pretrained(MODEL_NAME)

# 输入文本并进行tokenizer
text = ['Hello world!', 'Hello python!']
inputs = tokenizer(text, return_tensors='pt', padding=True)


output = model.encoder(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'], return_dict=True)
pooled_sentence = output.last_hidden_state # shape is [batch_size, seq_len, hidden_size]
# pooled_sentence will represent the embeddings for each word in the sentence
# you need to sum/average the pooled_sentence
pooled_sentence = torch.mean(pooled_sentence, dim=1)

# 得到n_sample*512的句向量
print('pooled_sentence.shape', pooled_sentence.shape)
print(pooled_sentence)

# 输出:
# pooled_sentence.shape torch.Size([2, 512])
# tensor([[ 0.0123,  0.0010,  0.0202,  ..., -0.0176,  0.0122, -0.1353],
#         [ 0.0854,  0.0613, -0.0568,  ...,  0.0230, -0.0131, -0.2288]],
#        grad_fn=<MeanBackward1>)

Reference:

https://stackoverflow.com/questions/64579258/sentence-embedding-using-t5

  • 5
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值