torchtext 包由数据处理实用程序和自然语言的流行数据集组成。
(1) batch (批次)
# Yield elements from data in chunks of batch_size.
# 以batch_size块从数据中产生元素。
torchtext.data.batch(data, batch_size, batch_size_fn=None)
(2) pool (数据资源池)
'''
Sort within buckets, then batch, then shuffle batches.
Partitions data into chunks of size 100*batch_size, sorts examples within each chunk using sort_key, then batch these examples and shuffle the batches.
在桶内排序,然后批处理,然后打乱顺序批处理。
将数据分割成大小为100*batch_size的块,使用sort_key对每个块中的示例进行排序,然后批处理这些示例并打乱顺序批次。
'''
torchtext.data.pool(data, batch_size, key, batch_size_fn=<function <lambda>>, random_shuffler=None, shuffle=False, sort_within_batch=False)
(3) get_tokenizer
torchtext.data.get_tokenizer(tokenizer, language='en')
(4) interleave_keys
'''
Interleave bits from two sort keys to form a joint sort key.
Examples that are similar in both of the provided keys will have similar values for the key defined by this function. Useful for tasks with two text fields like machine translation or natural language inference.
从两个排序键中交错位,形成一个联合排序键。
所提供的两个键中相似的示例将具有此函数定义的键的相似值。
适用于具有两个文本字段的任务,如机器翻译或自然语言推理。
'''
torchtext.data.interleave_keys(a, b)