My solution to cs224n assignment3

最新推荐文章于 2024-08-18 20:25:35 发布

pku_zzy

最新推荐文章于 2024-08-18 20:25:35 发布

阅读量1.9k

点赞数

分类专栏： Machine Learing

本文链接：https://blog.csdn.net/PKU_ZZY/article/details/77731582

版权

Machine Learing 专栏收录该内容

45 篇文章 1 订阅

订阅专栏

My solution

a primer on NER

　　NER(Named entity recognition)命名实体识别是一种序列标注问题，
输入一个句子，输出一个标注的序列。

标注的种类有:
- Person (PER) (He or she are not considered named entities.)
- Organization (ORG)
- Location (LOC)
- Miscellaneous (MISC) (杂项)
- O(不是命名实体)

标注的评价指标:
　　对于非空的标注，计算Recall, Precision, F1-score(如果直接计算全体标注正确率，会因为O比较多造成较大干扰)。

另外有entity-level评价指标:
　　也就是说计算entitiy的Recall, Precision和F1-score的值，那么只有在一个词组全部标注对的时候，才算标注正确了这个entity。

更全面的刻画是confusion matrix，例如:

gold/guess	PER	ORG	LOC	MISC	O
PER	2973	59	41	14	62
ORG	152	1648	94	62	136
LOC	57	104	1868	25	40
MISC	47	58	45	1012	106
O	46	49	12	33	42619

window into NER

　　
最简单的思路是根据window里的x直接通过一个神经网络来预测。

这里写图片描述

以下为一些问题和解答:

(a)混淆

Q: Provide 2 examples of sentences containing a named entity with an ambiguous type
A: 这个问题只是举例子，比较简单，这个问题说明命名实体识别存在歧义，答案给的句子是:

“Spokesperson for Levis, Bill Murray, said … “, where it is ambiguous whether Levis is a person or an organization.
“Heartbreak is a new virus,” where Heartbreak could either be a MISC named entity (it’s actually the name of a virus), or simply a noun.

Q: Why might it be important to use features apart from the word itself to predict named entity labels?
A: 因为很多时候有些普通名词可能是一个组织，很多时候人名也可能是组织，所以说，单凭单个单词没有办法获得全部信息，需要结合周围语境综合判断，这就是window的出发点。

Q: Describe at least two features (apart from the word) that would help in predicting whether a word is part of a named entity or not.
A: 例如单词的大小写情况，以及词性，前后缀之类的。