tensorflow padded_batch的注意事项

最新推荐文章于 2022-07-13 21:09:42 发布

醉意流年go

最新推荐文章于 2022-07-13 21:09:42 发布

阅读量1k

点赞数

分类专栏： tensorflow 深度学习deep learning python

本文链接：https://blog.csdn.net/u010626747/article/details/113628905

版权

tensorflow 同时被 3 个专栏收录

25 篇文章 0 订阅

订阅专栏

深度学习deep learning

20 篇文章 0 订阅

订阅专栏

python

15 篇文章 0 订阅

订阅专栏

tensorflow版本：1.13.1

1、在padded_batch中，若函数

出现错误：TypeError: If shallow structure is a sequence, input must also be a sequence. Input has type: <class 'int'>.

案例如下，在以下案例中，出现上述错误的原因是：在paddd_batch中的参数：padding_values, 如果不设置此参数，则没有问题，即此行代码更改为：

dataset = dataset.padded_batch(batch_size, padded_shapes=([None], [None]))

模型的padding_values为0，对应着padded_shape中两个tensor的padding_values都为0，此时若是出现另外一种情况：某种情况下想要填充的值不为0，而是为-1，那么可以设置成如下代码即可：

dataset = dataset.padded_batch(batch_size, padded_shapes=([1501], [None]), padding_values=(tf.constant(-1, tf.int64), tf.constant(0, tf.int64)))

每个样本example返回两个tensor，每个tensor的shape不确定（从padded_shape可以看出来是两个tensor，并且形状是变长，不确定的），那么padding_values也需要保持每个tensor的padding_values是两个（example的返回两个tensor对应一个padding_values中的tensor）

2、TypeError: Batching of padded sparse tensors is not currently supported

若是出现上述的错误，那么是由于在解析单个example时，定义的变长tensor特征，即VarLenFeature，那么只需要针对VarLenFeature的tensor转成dense tensor 即可：

example["text_ids"] = tf.sparse_tensor_to_dense(example["text_ids"])
example["label_ids"] = tf.sparse_tensor_to_dense(example["label_ids"])

    def get_train_tfrecord(tfrecord_path=None, num_epochs=1, batch_size=16, shuffle=True):

        dataset = tf.data.TFRecordDataset(tfrecord_path)


        # feature_dict = {"text_ids": tf.FixedLenFeature([self.max_len], dtype=tf.int64), "label_ids": tf.FixedLenFeature([self.max_len], dtype=tf.int64)}

        def parse_one_example(example):
            feature_dict = {"text_ids": tf.VarLenFeature(tf.int64),
                            "label_ids": tf.VarLenFeature(tf.int64)}
            example = tf.parse_single_example(example, feature_dict)
            example["text_ids"] = tf.sparse_tensor_to_dense(example["text_ids"])
            example["label_ids"] = tf.sparse_tensor_to_dense(example["label_ids"])
            # text_ids = tf.to_int32(example["text_ids"])
            # label_ids = tf.to_int32(example["label_ids"])

            text_ids = example["text_ids"]
            print("text_ids:", text_ids)
            label_ids = example["label_ids"]

            return text_ids, label_ids

        dataset = dataset.map(parse_one_example)

        if shuffle:
            dataset = dataset.shuffle(buffer_size=1000, reshuffle_each_iteration=True)
        # dataset = dataset.batch(batch_size=batch_size)
        # dataset = dataset.padded_batch(batch_size, padded_shapes=([None], [None]))
        dataset = dataset.padded_batch(batch_size, padded_shapes=([None], [None]), padding_values=(tf.constant(-1, tf.int64), tf.constant(0, tf.int64)))
        dataset = dataset.repeat(num_epochs)
        iterator = dataset.make_one_shot_iterator()
        batch_data = iterator.get_next()
        return batch_data  # batch_text_ids = batch_data[0], batch_label_ids = batch_data[1]

醉意流年go

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
tensorflow padded_batch的注意事项

tensorflow版本：1.13.11、在padded_batch中，若函数出现错误：TypeError: If shallow structure is a sequence, input must also be a sequence. Input has type: <class 'int'>. 案例如下，在以下案例中，出现上述错误的原因是：在paddd_batch中的参数：padding_values, 如果不设置此参数，则没有问题，即此行代码更改为：...
复制链接

扫一扫