transformers PreTrainedTokenizer

class transformers.PreTrainedTokenizer

Class attributes (overridden by derived classes)

属性描述
vocab_files_names (Dict[str, str])
pretrained_vocab_files_map (Dict[str, Dict[str, str]])
max_model_input_sizes (Dict[str, Optinal[int]])
pretrained_init_configuration (Dict[str, Dict[str, Any]])
model_input_names (List[str])
padding_side (str)

Parameters

参数描述
model_max_length (int, optional)
padding_side – (str, optional)
model_input_names (List[string], optional)
bos_token (str or tokenizers.AddedToken, optional)
eos_token (str or tokenizers.AddedToken, optional)
unk_token (str or tokenizers.AddedToken, optional)
sep_token (str or tokenizers.AddedToken, optional)
pad_token (str or tokenizers.AddedToken, optional)
cls_token (str or tokenizers.AddedToken, optional)
mask_token (str or tokenizers.AddedToken, optional)
additional_special_tokens (tuple or list of str or tokenizers.AddedToken, optional)

call

参数描述
text (str, List[str], List[List[str]])单个句子或多个句子
text_pair (str, List[str], List[List[str]])成对的单个句子或多个句子
add_special_tokens (bool, optional, defaults to True)
padding (bool, str or PaddingStrategy, optional, defaults to False)是否padding
truncation (bool, str or TruncationStrategy, optional, defaults to False)
max_length (int, optional)
stride (int, optional, defaults to 0)
is_pretokenized (bool, optional, defaults to False)是否已经编码成数字了
pad_to_multiple_of (int, optional)
return_tensors (str or TensorType, optional)‘tf’>tf.constant,‘pt’>torch.Tensor,‘np’>np.ndarray
return_token_type_ids (bool, optional)
return_attention_mask (bool, optional)
return_overflowing_tokens (bool, optional, defaults to False)
return_special_tokens_mask (bool, optional, defaults to False)
return_offsets_mapping (bool, optional, defaults to False)
return_length (bool, optional, defaults to False)
verbose (bool, optional, defaults to True)

Returns

参数描述
input_ids
token_type_ids
attention_mask
overflowing_tokens
num_truncated_tokens
special_tokens_mask
length

https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.PreTrainedTokenizer

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值