在跑Bert-CRF的时候发现公司的电脑带不动,就把Bert改成了BiLSTM,因为之前没有试过中文命名实体识别的代码,所以出了一个小失误…花了半天才发现(这里用了(双向)最大熵隐马尔可夫模型,作用和用法都类似CRF,但是比CRF更快更简单。)
原代码如下:
input_id = Input(batch_shape=(None, None), )
x = Embedding(21128, 8)(input_id)
x = Bidirectional(LSTM(256))(x)
x = Dropout(0.3)(x)
x = Dense(1)(x)
output = Dense(len(categories) * 2 + 1)(x)
MME = MaximumEntropyMarkovModel(lr_multiplier=crf_lr_multiplier)
crf = CRF(7, sparse_target=True)
output = MME(output)
model = Model(input_id, output)
model.summary()
model.compile(
loss=MME.sparse_loss,
optimizer=Adam(learning_rate),
metrics=[MME.sparse_accuracy]
)
这里用的是苏剑林老师的bert4keras(苏老师很厉害!),一直报错:
ValueError: Invalid reduction dimension 2 for input with 2 dimensions.
for 'metrics/sparse_accuracy/All' (op: 'All') with input shapes: [?,7], [] and with computed input tensors: input[1] = <2>.
因为一直跑不痛就去看了别人的代码是怎么用keras构建的,结果model.summary之后才发现…CRF层要接收的参数是三维的,也就是说,LSTM层要return_sequences=True…
改了之后问题迎刃而解
input_id = Input(batch_shape=(None, None), )
x = Embedding(21128, 8)(input_id)
x = Bidirectional(LSTM(256, return_sequences=False))(x)
x = Dropout(0.3)(x)
x = Dense(1)(x)
output = Dense(len(categories) * 2 + 1)(x)
MME = MaximumEntropyMarkovModel(lr_multiplier=crf_lr_multiplier)
crf = CRF(7, sparse_target=True)
output = MME(output)
model = Model(input_id, output)
model.summary()
model.compile(
loss=MME.sparse_loss,
optimizer=Adam(learning_rate),
metrics=[MME.sparse_accuracy]
)
model.summary要这样的
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, None) 0
_________________________________________________________________
embedding_1 (Embedding) (None, None, 8) 169024
_________________________________________________________________
bidirectional_1 (Bidirection (None, None, 512) 542720
_________________________________________________________________
dropout_1 (Dropout) (None, None, 512) 0
_________________________________________________________________
dense_1 (Dense) (None, None, 1) 513
_________________________________________________________________
dense_2 (Dense) (None, None, 7) 14
_________________________________________________________________
maximum_entropy_markov_model (None, None, 7) 49
=================================================================
Total params: 712,320
Trainable params: 712,320
Non-trainable params: 0
_________________________________________________________________