python保存模型 drop_每次迭代后保存spacy的NER模型

最新推荐文章于 2023-12-26 20:09:09 发布

weixin_39914499

最新推荐文章于 2023-12-26 20:09:09 发布

阅读量211

点赞数

文章标签： python保存模型 drop

每次迭代后，我都试图保存到Spacy自定义NER模型。我们是否有类似于tensorflow中的API，以在每次/特定的迭代次数后节省模型权重。然后我可以重新加载保存的模型并从那里继续训练。在

另外，如何在linux中使用系统上的所有内核。我发现四个核心中只有两个被使用。他们使用多任务CNN为NER，我知道这需要更多的时间来重新训练CPU。还有其他方法可以加快NER模型的训练。在@plac.annotations(

model=("Model name. Defaults to blank 'en' model.", "option", "m", str),

output_dir=("Optional output directory", "option", "o", Path),

n_iter=("Number of training iterations", "option", "n", int))

def main(model=None, output_dir=None, n_iter=100):

"""Load the model, set up the pipeline and train the entity recognizer."""

if model is not None:

nlp = spacy.load(model) # load existing spaCy model

print("Loaded model '%s'" % model)

else:

nlp = spacy.blank('en') # create blank Language class

print("Created blank 'en' model")

if 'ner' not in nlp.pipe_names:

ner = nlp.create_pipe('ner')

nlp.add_pipe(ner, last=True)

# otherwise, get it so we can add labels

else:

ner = nlp.get_pipe('ner')

# add labels

for _, annotations in TRAIN_DATA:

for ent in annotations.get('entities'):

ner.add_label(ent[2])

other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']

with nlp.disable_pipes(*other_pipes): # only train NER

optimizer = nlp.begin_training()

for itn in range(n_iter):

random.shuffle(TRAIN_DATA)

losses = {}

for text, annotations in TRAIN_DATA:

nlp.update(

[text], # batch of texts

[annotations], # batch of annotations

drop=0.5, # dropout - make it harder to memorise data

sgd=optimizer, # callable to update weights

losses=losses)

print(losses)

# save model to output directory

if output_dir is not None:

output_dir = Path(output_dir)

if not output_dir.exists():

output_dir.mkdir()

nlp.to_disk(output_dir)

print("Saved model to", output_dir)

if __name__ == '__main__':

plac.call(main)

weixin_39914499

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python保存模型 drop_每次迭代后保存spacy的NER模型

每次迭代后，我都试图保存到Spacy自定义NER模型。我们是否有类似于tensorflow中的API，以在每次/特定的迭代次数后节省模型权重。然后我可以重新加载保存的模型并从那里继续训练。在另外，如何在linux中使用系统上的所有内核。我发现四个核心中只有两个被使用。他们使用多任务CNN为NER，我知道这需要更多的时间来重新训练CPU。还有其他方法可以加快NER模型的训练。在@plac.annot...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。