Problems with Named Entity Recognition in spaCy using German de_dep_news_trf Pipeline

题意:

使用spaCy的de_dep_news_trf管道进行德语命名实体识别(Named Entity Recognition, NER)时遇到的问题

问题背景:

I'm currently working on a project using spaCy with the German trained pipeline de_dep_news_trf.

Unfortunately, I'm having issues with named entity recognition (NER).

When I run a simple sentence like "Berlin ist die Hauptstadt von Deutschland. Angela Merkel war die Bundeskanzlerin.", no entities are detected.

I've followed these steps to set up my Python environment (3.12)(Windows) in a PyCharm Community project:

python.exe -m pip install --upgrade pip
pip install -U pip setuptools wheel
pip install -U spacy
python -m spacy download de_dep_news_trf --timeout 600
pip install spacy[transformers]

Here is a snippet of my code:

import spacy


def process_text_with_spacy(text_to_process):
    doc = nlp(text_to_process)
    data = {
        "text": text_to_process,
        "sentences": []
    }
    for sent in doc.sents:
        process_sentence_data = {
            "sentence": sent.text,
            "entities": []
        }
        for ent in sent.ents:
            process_sentence_data["entities"].append({
                "text": ent.text,
                "start": ent.start_char,
                "end": ent.end_char,
                "label": ent.label_
            })
        data["sentences"].append(process_sentence_data)
    return data


nlp = spacy.load('de_dep_news_trf')

sample_text = "Berlin ist die Hauptstadt von Deutschland. Angela Merkel war die Bundeskanzlerin."

processed_data = process_text_with_spacy(sample_text)

print("Text:", sample_text)
for sentence_data in processed_data["sentences"]:
    print("Sentence:", sentence_data["sentence"])
    print("Entities:", sentence_data["entities"])

Output:

Text: Berlin ist die Hauptstadt von Deutschland. Angela Merkel war die Bundeskanzlerin.
Sentence: Berlin ist die Hauptstadt von Deutschland.
Entities: []
Sentence: Angela Merkel war die Bundeskanzlerin.
Entities: []

When using de_core_news_lg, the output for each sentence is:

Text: Berlin ist die Hauptstadt von Deutschland. Angela Merkel war die Bundeskanzlerin.
Sentence: Berlin ist die Hauptstadt von Deutschland.
Entities: [{'text': 'Berlin', 'start': 0, 'end': 6, 'label': 'LOC'}, {'text': 'Deutschland', 'start': 30, 'end': 41, 'label': 'LOC'}]
Sentence: Angela Merkel war die Bundeskanzlerin.
Entities: [{'text': 'Angela Merkel', 'start': 43, 'end': 56, 'label': 'PER'}]

However, when I use de_dep_news_trf, the results are empty. Model de_dep_news_trf is selected based on "accuracy" from the SpaCy website.

Could someone explain why de_dep_news_trf does not return the same result? Is there a specific reason or setting that could cause this difference?

Thank you for your help!

问题解决:

Problem is because this model doesn't have function to recognize entities.

See documentation for de_dep_news_trf - it has components transformer, tagger, morphologizer, parser, lemmatizer, attribute_ruler but no ner for EntityRecognizer

So it may need to use one of other models :

  • de_core_news_sm
  • de_core_news_md
  • de_core_news_lg

  • 7
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

营赢盈英

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值