解决shapes (none,111) and (none,111) are incompatible

1,sparse_categorical_crossentropy和categorical_crossentropy的区别

对应解决这类问题:“logits and labels must have the same first dimension, got logits shape [32,28] and labels shape [3360]".

出现这个错误的原因就是网络最终输出shape和label的shape不一致

My features and labels matrices have the same 2 dimensions (number of sentences, length of max sentence). 亦即都是(1257,111)

categorical_crossentropy和sparse_categorical_crossentropy - 熊猫blue - 博客园

tensorflow - ValueError: Shapes (None, 1) and (None, 2) are incompatible - Stack Overflow

都是计算多分类crossentropy的,只是对y(即label)的格式要求不同。

1)如果是categorical_crossentropy,那y必须是one-hot处理过的

2)如果是sparse_categorical_crossentropy,那y就是原始的整数形式,比如[1, 0, 2, 0, 2]这种

 2,CNN is to look into using padding=“same” for your convolutional layer, which is sort of equivalent to using return_sequences=True in RNN. 

3,apply tensorflow.keras.utils.to_categorical to change y(label)  to be one-hot-encoded

python - ValueError: Shapes (None, 1) and (None, 3) are incompatible - Stack Overflow

4,有关shapes 不兼容主要考虑flatten或GlobalMaxPooling1D或MaxPooling层的增加或删除,CNN 是否 padding=“same”,以及每层RNN 都增加return_sequences=True或者最后一层RNN不增加return_sequences=True

例如最后一层RNN不增加return_sequences=True并包含flatten:

    model = tf.keras.models.Sequential([
        tf.keras.layers.Embedding(
            input_dim=len(nlp.vocab.vectors) + 1,
            input_length=max_tokens,
            output_dim=100),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu")),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=1024, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=len(labels), activation='softmax'),
])

的model.summary()是: 

例如每层RNN 都增加return_sequences=True并保留flatten:

    model2 = tf.keras.models.Sequential([
        tf.keras.layers.Embedding(
            input_dim=len(nlp.vocab.vectors) + 1,
            input_length=max_tokens,
            output_dim=100),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=1024, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=len(labels), activation='softmax'),
    ])

的model.summary()是: 

例如每层RNN 都增加return_sequences=True并去除flatten:

    model = tf.keras.models.Sequential([
        tf.keras.layers.Embedding(
            input_dim=len(nlp.vocab.vectors) + 1,
            input_length=max_tokens,
            output_dim=100),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
        tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(units=64, activation="relu", return_sequences=True)),
        #tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=1024, activation='relu'),
        tf.keras.layers.Dropout(0.2),
        #tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(units=len(labels), activation='softmax'),
    ])

的model.summary()是: 

总结:

return_sequences=True保持原维度不变,如果去除它,则中间维度也会去除;

flatten会把后两个维度相乘变成一个维度,也即减少一个维度

Flatten or GlobalMaxPooling1D that would squash your 1-vector-per-input into a single output, rather than leaving 1-output-per-input

MaxPooling1D会改变中间一维的维度值

5,Classifying not just one token at a time, but a series of tokens需要改变embedding_layer的input_length=max_tokens.

对于Classifying just one token at a time:  input_length=1;

对于Classifying just one token at a time:  input_length=max_tokens;

        tf.keras.layers.Embedding(
            input_dim=len(nlp.vocab.vectors) + 1,
            input_length=max_tokens,
            output_dim=100),

Classifying not just one token at a time, but a series of tokens, using a convolutional, recurrent, or transformer network. If you pursue this direction, you will likely want to replace the code

for token in tokens:
    ....
    train_features.append([token_index])

with code that groups tokens by sentence, e.g.:

for sent in nlp(text).sents:
    token_indexes = []
    for token in sent:
        ...
        token_indexes.append(token_index)
    train_features.append(token_indexes)

The shape of your matrices should now be (number of sentences, maximum number of tokens in a sentence) and you will need to take care to pad your matrices accordingly. 

参考:

The paper discusses how we can identify name, location, and time key words in a sentence.  

Fixed Positional Encodings Positional encoding for Attention model

splitting array to certain size (helpful for creating training batches) python - How do you split a list into evenly sized chunks? - Stack Overflow 

The annotation guidelines are here: Overleaf, Online LaTeX Editor 

python - How to use F-score as error function to train neural networks? - Stack Overflow

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值