1,sparse_categorical_crossentropy和categorical_crossentropy的区别
对应解决这类问题:“
logits and labels must have the same first dimension, got logits shape [32,28] and labels shape [3360]
".出现这个错误的原因就是网络最终输出shape和label的shape不一致
My features and labels matrices have the same 2 dimensions (number of sentences, length of max sentence). 亦即都是(1257,111)
categorical_crossentropy和sparse_categorical_crossentropy - 熊猫blue - 博客园
tensorflow - ValueError: Shapes (None, 1) and (None, 2) are incompatible - Stack Overflow
都是计算多分类crossentropy的,只是对y(即label)的格式要求不同。
1)如果是categorical_crossentropy,那y必须是one-hot处理过的
2)如果是sparse_categorical_crossentropy,那y就是原始的整数形式,比如[1, 0, 2, 0, 2]这种
2,CNN is to look into using padding=“same” for your convolutional layer, which is sort of equivalent to using return_sequences=True in RNN.
3,apply tensorflow.keras.utils.to_categorical to change y(label)
to be one-hot-encoded
python - ValueError: Shapes (None, 1) and (None, 3) are incompatible - Stack Overflow
4,有关shapes 不兼容主要考虑flatten或GlobalMaxPooling1D或MaxPooling
层的增加或删除,CNN 是否 padding=“same”,以及每层RNN 都增加return_sequences=True或者最后一层RNN不增加return_sequences=True
例如最后一层RNN不增加return_sequences=True并包含flatten:
model = tf.keras.models.Sequential([
tf.keras.layers.Embedding(
input_dim=len(nlp.vocab.vectors) + 1,
input_length=max_tokens,
output_dim=100),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu")),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=1024, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=len(labels), activation='softmax'),
])
的model.summary()是:
例如每层RNN 都增加return_sequences=True并保留flatten:
model2 = tf.keras.models.Sequential([
tf.keras.layers.Embedding(
input_dim=len(nlp.vocab.vectors) + 1,
input_length=max_tokens,
output_dim=100),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=1024, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=len(labels), activation='softmax'),
])
的model.summary()是:
例如每层RNN 都增加return_sequences=True并去除flatten:
model = tf.keras.models.Sequential([
tf.keras.layers.Embedding(
input_dim=len(nlp.vocab.vectors) + 1,
input_length=max_tokens,
output_dim=100),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
#tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=1024, activation='relu'),
tf.keras.layers.Dropout(0.2),
#tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=len(labels), activation='softmax'),
])
的model.summary()是:
总结:
return_sequences=True保持原维度不变,如果去除它,则中间维度也会去除;
flatten会把后两个维度相乘变成一个维度,也即减少一个维度
a
Flatten
orGlobalMaxPooling1D
that would squash your 1-vector-per-input into a single output, rather than leaving 1-output-per-inputMaxPooling1D会改变中间一维的维度值
5,Classifying not just one token at a time, but a series of tokens需要改变embedding_layer的input_length=max_tokens.
对于Classifying just one token at a time: input_length=1;
对于Classifying just one token at a time: input_length=max_tokens;
tf.keras.layers.Embedding(
input_dim=len(nlp.vocab.vectors) + 1,
input_length=max_tokens,
output_dim=100),
Classifying not just one token at a time, but a series of tokens, using a convolutional, recurrent, or transformer network. If you pursue this direction, you will likely want to replace the code
for token in tokens: .... train_features.append([token_index])with code that groups tokens by sentence, e.g.:
for sent in nlp(text).sents: token_indexes = [] for token in sent: ... token_indexes.append(token_index) train_features.append(token_indexes)The shape of your matrices should now be
(number of sentences, maximum number of tokens in a sentence)
and you will need to take care to pad your matrices accordingly.
参考:
The paper discusses how we can identify name, location, and time key words in a sentence.
Fixed Positional Encodings Positional encoding for Attention model
splitting array to certain size (helpful for creating training batches) python - How do you split a list into evenly sized chunks? - Stack Overflow
The annotation guidelines are here: Overleaf, Online LaTeX Editor
python - How to use F-score as error function to train neural networks? - Stack Overflow