1、in-graph tokenizer
看到TFBertTokenizer
提到的,官网解释为:
This is an in-graph tokenizer for BERT. It should be initialized similarly to other tokenizers, using the
from_pretrained()
method. It can also be initialized with thefrom_tokenizer()
method, which imports settings from an existing standard tokenizer object.
In-graph tokenizers, unlike other Hugging Face tokenizers, are actually Keras layers and are designed to be run when the model is called, rather than during preprocessing. As a result, they have somewhat more limited options than standard tokenizer classes. They are most useful when you want to create an end-to-end model that goes straight fromtf.string
inputs to outputs.
当前理解是,里面一些参数如padding
的设定可以根据训练过程中的batch进行调整(无需设定死),其结果会根据batch数据的不同而发生变化。