tf.keras.preprocessing.sequence.pad_sequences 截断填充序列
import tensorflow as tf
import numpy as np
pad_sequences(sequences, maxlen=None, dtype='int32', padding='pre', truncating='pre', value=0.0)
pads sequences to the same length. 截断填充,多个数组序列统一到同一长度,默认是前面截断或填充。
array_1 = list(np.random.randint(100, size=10))
array_2 = list(np.random.randint(100, size=8))
array_3 = list(np.random.randint(100, size=15))
seq = [array_1, array_2, array_3]
print('\n'.join(map(str, seq)))
[45, 42, 11, 24, 54, 78, 24, 71, 45, 71]
[60, 65, 17, 72, 46, 51, 88, 24]
[53, 56, 7, 47, 67, 14, 2, 28, 89, 5, 58, 43, 59, 26, 25]
下面的例子,后面填充,统一长度为9,默认前面截断。可以看到,第1、3个序列超过9个元素,前面部分被截断,第2个序列不足9个,后面填充0。
tf.keras.preprocessing.sequence.pad_sequences(seq, maxlen=9, padding='post', value=0)
array([[42, 11, 24, 54, 78, 24, 71, 45, 71],
[60, 65, 17, 72, 46, 51, 88, 24, 0],
[ 2, 28, 89, 5, 58, 43, 59, 26, 25]], dtype=int32)
下面的例子,后面填充,后面截断,统一长度为9。可以看到,第1、3个序列超过9个元素,后面部分被截断,第2个序列不足9个,后面填充0。
tf.keras.preprocessing.sequence.pad_sequences(seq, maxlen=9, padding='post', truncating='post', value=0)
array([[45, 42, 11, 24, 54, 78, 24, 71, 45],
[60, 65, 17, 72, 46, 51, 88, 24, 0],
[53, 56, 7, 47, 67, 14, 2, 28, 89]], dtype=int32)