【报错解决】ValueError: batch length of `text`: xx does not match batch length of `text_pair`: xx.

错误样例输入和输出

样例代码如下:

from transformers import GPT2Tokenizer,GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')
special_tokens_dict = {'cls_token': '<CLS>'}
num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)
text = ["this is the first sentences", "this is the second sentece, ", "this one is the third sentence"]
tp = ['first sentence']
output = tokenizer(text,tp)
print(output)

报错如下:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-66-71da4f74dda3> in <module>()
      8 text = ["this is the first sentences", "this is the second sentece, ", "this one is the third sentence"]
      9 tp = ['first sentence']
---> 10 output = tokenizer(text,tp)
     11 print(output)

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py in __call__(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2377             if text_pair is not None and len(text) != len(text_pair):
   2378                 raise ValueError(
-> 2379                     f"batch length of `text`: {len(text)} does not match batch length of `text_pair`: {len(text_pair)}."
   2380                 )
   2381             batch_text_or_text_pairs = list(zip(text, text_pair)) if text_pair is not None else text

ValueError: batch length of `text`: 3 does not match batch length of `text_pair`: 1.

在样例代码中,注意text_pair的长度是1,但是text的长度是3。这就是错误的原因

错误分析

Exception Class: TypeError

Raise code

lit_into_words:
            is_batched = isinstance(text, (list, tuple)) and text and isinstance(text[0], (list, tuple))
        else:
            is_batched = isinstance(text, (list, tuple))

        if is_batched:
            if isinstance(text_pair, str):
                raise TypeError(
                    "when tokenizing batches of text, `text_pair` must be a list or tuple with the same length as `text`."
                )
            if text_pair is not None and len(text) != len(text_pair):
                raise ValueError(
                    f"batch length of `text`: {len(text)} does not match batch length of `text_pair`: {len(text_pair)}."
                )
            batch_text_or_text_pairs = list(zip(text, text_pair)) if text_pair is not None else text
            retu

raise code 链接

分析如下:
这个错误是被Transformers.PreTrainedTokenizer class的主要代码部分被raised. 此主要方法用于tokenize和为模型准备一个或多个序列或一对或多对序列。如果text参数是以批处理形式给出的,那么text_pairs应该是一个与text长度相同的元组或列表。

解决的方法

确保text_pair要和text保持一样的长度,在text是batched sequences的情况下。
代码修改成如下:

from transformers import GPT2Tokenizer,GPT2Model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2Model.from_pretrained('gpt2')


special_tokens_dict = {'cls_token': '<CLS>'}


num_added_toks = tokenizer.add_special_tokens(special_tokens_dict)
text = ["this is the first sentences", "this is the second sentece, ", "this one is the third sentence"]
tp = ['first sentence',"second","third"]
output = tokenizer(text,tp)
print(output)

输出结果:

{'input_ids': [[5661, 318, 262, 717, 13439, 11085, 6827], [5661, 318, 262, 1218, 1908, 68, 344, 11, 220, 12227], [5661, 530, 318, 262, 2368, 6827, 17089]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1]]}
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

richardxp888

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值