class transformers.PreTrainedTokenizer
Class attributes (overridden by derived classes)
属性 | 描述 |
---|
vocab_files_names (Dict[str, str]) | |
pretrained_vocab_files_map (Dict[str, Dict[str, str]]) | |
max_model_input_sizes (Dict[str, Optinal[int]]) | |
pretrained_init_configuration (Dict[str, Dict[str, Any]]) | |
model_input_names (List[str]) | |
padding_side (str) | |
Parameters
参数 | 描述 |
---|
model_max_length (int, optional) | |
padding_side – (str, optional) | |
model_input_names (List[string], optional) | |
bos_token (str or tokenizers.AddedToken, optional) | |
eos_token (str or tokenizers.AddedToken, optional) | |
unk_token (str or tokenizers.AddedToken, optional) | |
sep_token (str or tokenizers.AddedToken, optional) | |
pad_token (str or tokenizers.AddedToken, optional) | |
cls_token (str or tokenizers.AddedToken, optional) | |
mask_token (str or tokenizers.AddedToken, optional) | |
additional_special_tokens (tuple or list of str or tokenizers.AddedToken, optional) | |
call
参数 | 描述 |
---|
text (str, List[str], List[List[str]]) | 单个句子或多个句子 |
text_pair (str, List[str], List[List[str]]) | 成对的单个句子或多个句子 |
add_special_tokens (bool, optional, defaults to True) | |
padding (bool, str or PaddingStrategy, optional, defaults to False) | 是否padding |
truncation (bool, str or TruncationStrategy, optional, defaults to False) | |
max_length (int, optional) | |
stride (int, optional, defaults to 0) | |
is_pretokenized (bool, optional, defaults to False) | 是否已经编码成数字了 |
pad_to_multiple_of (int, optional) | |
return_tensors (str or TensorType, optional) | ‘tf’>tf.constant,‘pt’>torch.Tensor,‘np’>np.ndarray |
return_token_type_ids (bool, optional) | |
return_attention_mask (bool, optional) | |
return_overflowing_tokens (bool, optional, defaults to False) | |
return_special_tokens_mask (bool, optional, defaults to False) | |
return_offsets_mapping (bool, optional, defaults to False) | |
return_length (bool, optional, defaults to False) | |
verbose (bool, optional, defaults to True) | |
Returns
参数 | 描述 |
---|
input_ids | |
token_type_ids | |
attention_mask | |
overflowing_tokens | |
num_truncated_tokens | |
special_tokens_mask | |
length | |
https://huggingface.co/transformers/main_classes/tokenizer.html#transformers.PreTrainedTokenizer