在进行影评文本分类时,遇到一个报错,记录下解决过程。
报错信息:
InvalidArgumentError:
indices[401,33] = 77571 is not in [0, 10000)
[[{{node embedding/embedding_lookup}} = ResourceGather[Tindices=DT_INT32, dtype=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding/embeddings, embedding/Cast)]]
后在github上看了这边帖子帖子,再分析了一下编写的RNN代码流程:
1、首先RNN网络第一层定义的的输入是10000
vocab_size = 10000
model = tf.keras.Sequential()
model.add(tf.keras.layers.Embedding(vocab_size, 16))
model.add(tf.keras.layers.GlobalAveragePooling1D())
model.add(tf.keras.layers.Dense(16, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(1, activation=tf.nn.sigmoid))
model.summary()
2、后续进行模型训练的时候,传入的数据为15000 * 256长度的
# train_data.shape = (25000, 256)
# train_labels.shape = (25000,)
x_val = train_data[:10000]
partial_x_train = train_data[10000:]
y_val = train_labels[:10000]
partial_y_train = train_labels[10000:]
history = model.fit(partial_x_train,
partial_y_train,
epochs=40,
batch_size=512,
validation_data=(x_val, y_val),
verbose=1)
3、实际输入的长度与网络定义的长度不符,导致该错误发生