tf.contrib.training.bucket_by_sequence_length

tf.contrib.training.bucket_by_sequence_length(
    input_length,
    tensors,
    batch_size,
    bucket_boundaries,
    num_threads=1,
    capacity=32,
    bucket_capacities=None,
    shapes=None,
    dynamic_pad=False,
    allow_smaller_final_batch=False,
    keep_input=True,
    shared_name=None,
    name=None
)

作用:把差不多长的句子放在一起
Args:

input_length: int32 scalar Tensor, the sequence length of tensors.

tensors: The list or dictionary of tensors, representing a single element, to bucket. Nested lists are not supported.

batch_size: The new batch size pulled from the queue (all queues will have the same size). If a list is passed in then each bucket will have a different batch_size. (python int, int32 scalar or
iterable of integers of length num_buckets).

bucket_boundaries: int list, increasing non-negative numbers. The edges of the buckets to use when bucketing tensors.
Two extra buckets are created, one for input_length < bucket_boundaries[0] and one for input_length >= bucket_boundaries[-1].

num_threads: An integer. The number of threads enqueuing tensors.

capacity: An integer. The maximum number of minibatches in the top queue, and also the maximum number of elements within each bucket.

bucket_capacities: (Optional) None or a list of integers, the capacities of each bucket. If None, capacity is used (default). If specified, it must be a list of integers of length one larger than bucket_boundaries. Its i-th element is used as capacity for the i-th bucket queue.

shapes: (Optional) The shapes for each example. Defaults to the inferred shapes for tensors.

dynamic_pad: Boolean. Allow variable dimensions in input shapes. The given dimensions are padded upon dequeue so that tensors
within a batch have the same shapes.

allow_smaller_final_batch: (Optional)
Boolean. If True, allow the final batches to be smaller if there are
insufficient items left in the queues.

keep_input: A bool scalar Tensor. If
provided, this tensor controls whether the input is added to the queue or not.
If it evaluates True, then tensors are added to the bucket; otherwise they are
dropped. This tensor essentially acts as a filtering mechanism.

shared_name: (Optional). If set, the queues will be shared under the given name across multiple sessions.

name: (Optional) A name for the operations.

Returns:

A tuple (sequence_length, outputs) where sequence_length is a 1-D Tensor of size batch_size and outputs is a list or dictionary of batched, bucketed, outputs corresponding to elements of tensors.

Raises:

·
TypeError: if bucket_boundaries is not a
list of python integers.

·
ValueError: if bucket_boundaries is empty
or contains non-increasing values or if batch_size is a list and it’s length
doesn’t equal the number of buckets.

tf.data.experimental.bucket_by_sequence_length
https://runebook.dev/zh-CN/docs/tensorflow/data/experimental/bucket_by_sequence_length

很好的示例
https://github.com/wcarvalho/jupyter_notebooks/blob/ebe762436e2eea1dff34bbd034898b64e4465fe4/tf.bucket_by_sequence_length/bucketing%20practice.ipynb

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值