tf.contrib.training.bucket_by_sequence_length

瑶光light

已于 2022-01-26 10:53:23 修改

阅读量201

点赞数

文章标签：深度学习 tensorflow

于 2022-01-26 10:22:02 首次发布

原文链接：https://docs.w3cub.com/tensorflow~python/tf/contrib/training/bucket_by_sequence_length

版权

tf.contrib.training.bucket_by_sequence_length(
    input_length,
    tensors,
    batch_size,
    bucket_boundaries,
    num_threads=1,
    capacity=32,
    bucket_capacities=None,
    shapes=None,
    dynamic_pad=False,
    allow_smaller_final_batch=False,
    keep_input=True,
    shared_name=None,
    name=None
)

作用：把差不多长的句子放在一起
Args:

input_length: int32 scalar Tensor, the sequence length of tensors.

tensors: The list or dictionary of tensors, representing a single element, to bucket. Nested lists are not supported.

batch_size: The new batch size pulled from the queue (all queues will have the same size). If a list is passed in then each bucket will have a different batch_size. (python int, int32 scalar or
iterable of integers of length num_buckets).

bucket_boundaries: int list, increasing non-negative numbers. The edges of the buckets to use when bucketing tensors.
Two extra buckets are created, one for input_length < bucket_boundaries[0] and one for input_length >= bucket_boundaries[-1].

num_threads: An integer. The number of threads enqueuing tensors.

capacity: An integer. The maximum number of minibatches in the top queue, and also the maximum number of elements within each bucket.

bucket_capacities: (Optional) None or a list of integers, the capacities of each bucket. If None, capacity is used (default). If specified, it must be a list of integers of length one larger than bucket_boundaries. Its i-th element is used as capacity for the i-th bucket queue.

shapes: (Optional) The shapes for each example. Defaults to the inferred shapes for tensors.

dynamic_pad: Boolean. Allow variable dimensions in input shapes. The given dimensions are padded upon dequeue so that tensors
within a batch have the same shapes.

allow_smaller_final_batch: (Optional)
Boolean. If True, allow the final batches to be smaller if there are
insufficient items left in the queues.

keep_input: A bool scalar Tensor. If
provided, this tensor controls whether the input is added to the queue or not.
If it evaluates True, then tensors are added to the bucket; otherwise they are
dropped. This tensor essentially acts as a filtering mechanism.

shared_name: (Optional). If set, the queues will be shared under the given name across multiple sessions.

name: (Optional) A name for the operations.

Returns:

A tuple (sequence_length, outputs) where sequence_length is a 1-D Tensor of size batch_size and outputs is a list or dictionary of batched, bucketed, outputs corresponding to elements of tensors.

Raises:

·
TypeError: if bucket_boundaries is not a
list of python integers.

·
ValueError: if bucket_boundaries is empty
or contains non-increasing values or if batch_size is a list and it’s length
doesn’t equal the number of buckets.

tf.data.experimental.bucket_by_sequence_length
https://runebook.dev/zh-CN/docs/tensorflow/data/experimental/bucket_by_sequence_length

很好的示例
https://github.com/wcarvalho/jupyter_notebooks/blob/ebe762436e2eea1dff34bbd034898b64e4465fe4/tf.bucket_by_sequence_length/bucketing%20practice.ipynb

瑶光light

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
tf.contrib.training.bucket_by_sequence_length

tf.contrib.training.bucket_by_sequence_length( input_length, tensors, batch_size, bucket_boundaries, num_threads=1, capacity=32, bucket_capacities=None, shapes=None, dynamic_pad=False, allow_smaller_final_batch=False
复制链接

扫一扫