《精通Python自然语言处理（ Deepti Chopra)》读书笔记（第十章）：NLP系统评估

最新推荐文章于 2021-04-20 01:48:14 发布

搞点學術的研究生

最新推荐文章于 2021-04-20 01:48:14 发布

阅读量591

点赞数 2

分类专栏：ＮＬＰ # 中文分词大数据文章标签：自然语言处理（NLP）中文分词 NLP系统评估精通Python自然语言处理

本文链接：https://blog.csdn.net/cjx14060307101/article/details/88636022

版权

ＮＬＰ同时被 3 个专栏收录

15 篇文章 3 订阅

订阅专栏

中文分词

13 篇文章 0 订阅

订阅专栏

大数据

7 篇文章 0 订阅

订阅专栏

《精通Python自然语言处理》

Deepti Chopra(印度)
王威译

第十章 NLP系统评估：性能分析

10.1 NLP系统评估要点

创建黄金标准注释语料库是一项主要的任务，而且其实成本也是非常昂贵的。它通过手工标注给定的测试数据来完成该操作。以这种方式筛选的标记被视为标准标记，其可用于表示大范围的信息。

10.1.1 NLP工具的评估（词性标注器、词干提取器及形态分析器）

训练一个一元语法标注器：

import nltk
from nltk.corpus import brown
sentences=brown.tagged_sents (categories= 'news')
sent=brown.sents (categories='news' )
unigram_sent = nltk.UnigramTagger (sentences)
print(unigram_sent.tag (sent [2008]))
print(unigram_sent.evaluate (sentences))

使用分离的数据对一元语法标注器执行训练和测试：

import nltk
from nltk.corpus import brown
sentences = irown.tagged_sents (categories= 'news’)
sz=int (len (sentences)*0.8)
print(sz)
training_sents = sentences[:sz]
testing_sentssentences [sz:]
unigram_tagger = nltk.UnigramTagger (training_sents)
print(unigram_cagger.evaluate (testing_sents))

使用bigram（二元语法）标注器：

import nltk
from nltk.corpus import brown
sentences = brcwn.tagged_sents (categories= 'news')
s2 = int (len (sentences)*0.8)
training_sents = sentences[:sz]
testing_sents = sentences[sz:]
bigram_tagger = nltk.UnigramTagger (training_sents)
bigram_tagger = nltk.BigramTagger (training_sents)
print(bigram_tagger.tag (sentences[2008]))

un_sent = sentences [4203]
print(bigram_tagger.tag(un_sent))

print(bigram_tagger.evaluate (testing sents))

实现组合标注器：

import nltk
from nltk.corpus import brown
sentence = brown.tagged_sents (categories = ' news')
sz=int (len(sentences)*0.8)
training_sents = sentences[:sz]
tesling_sents = sentences[sz:]
s0 = nltk.DefaultTagger('NNP')
s1 = nltk.UnLgramTagger (training_sents, backoff = s0)
s2 = n1k.BigramTagger (training_sents,backoff = s1)
print(s2.evaluate(testing_sents))

语块解析器的评估:

import nltk
chunkparser = nltk.RegexpParser ("")
print (nltk.chunk.accuracy(chunkparser, nltk.corpus.con112000.chunked_sents(
	'train.txt', chunk_types=('NP',))))

朴素语块解释器的评估：

import nltk
grammar = r"NP: (< [CDJNP]. *>+}"
cp = nltk.RegexpParser (grammar)
print(nltk.chunk.accuracy(cp, nltk.corpus.con112000.chunked_sents(
		'train.txt', chunk_types = ('NP',))))

计算分块数据的条件频率分布：

def chunk_tags(train) :
	"""Generate a following tags list chat appears inside chunks"""
	cfreqdist = nltk.ConditionalFreqDist()
	for t in train:
		for word, tag, chunktag in nltk.chunk.tree2conlltags(t):
			if chtag == "O":
				cfreqdist[tag].inc (False)
			else:
				cfreqdist[tag].inc (True)
	return [tag for tag in cfreqdist.conditions() if cfreqdist [tag] .max() == True]
training_sents = nltk.corpus.conll2000.chunked_sents('train.txt', chunk_types = ('NP',))
print(chunked_tags (train_sents))

执行chunker评估：

import nltk
correct = nltk.chunk.tagstr2tree ("[ the/DT little/JJ cat/NN ] sat/VBD on/IN [ the/DT mat/NN ]")
print (correct. flatten())
grammar = r"NP: {< [CDJNP] . *>+}”
cp = nltk.RegexpParser (grammar)
grammar = r"NP: {<PRP|DT| POS| JJ|CD|N.*>+)”
chunk_parser = nltk.RegexpParser (grammar)
tagged_tok = [("the", "DT"), (“little", "JJ"), ( "cat","NN"), ("sat", "VBD"), ("on","IN"), ("the", "DT"), "mat", "NN")]
chunkscore = nltk.chunk.ChunkScore()
guesaed = cp.parse(correct.flatten())
chunkscore.score(correct, guessed)
print (chunkscore)

评估一元语法chunker和二元语法chunker：

chunker_data = [[(t,c) for w, t, c in nltk.chunk.tree2conlltags (chtree)]
			for chtree in nltk.corpus.conll2000.chunked_sents('train.txt')]
unigram chunk = nltk.UnigramTagger (chunker_data)
print (nItk.tag.accuracy (unigram_chunk, chunker_data))

bigram_chunk = nltk.BigramTagger(chunker_data, backoff_unigram_chunker)
print( nltk. tag.accuracy (bigram_chunk, churker_data))

使用一个特征提取函数来检测给定的单词中所呈现的后缀，并用后缀来确定词性标记：

from nltk.corpus import brown
suffix_freqdist = nltk.FreqDist()
for wrd in brown.words() :
	wrd = wrd.lower()
	suffix_freqdist [wrd[-1:]] += 1
	suffix_fdist[wrd[-2:]] += 1
	suffix_fdist[wrd[-3:]] += 1
common_suffixes = [suffix for (suffix, count) in suffix_freqdist.most _common(100) ]
print (common_suffixes)

def pos_feature (wrd) :
	feature = { }
	for suffix in common_suffixes:
		feature['endswith({}) '. format(suffix)] = wrd.lower.endswith (suffix)
	return feature
tagged_wrds = brown.taged_wrds (categories = 'news ')
featureset = [(pos_feature(n), g)  for  (n,g)  in  tagged_wrds]
size = int (len (featureset) * 0.1)
train_set, test_set = featureset[size:], featureset[:size]
classifier1 = nltk.DecisionTreeClassifier. train(train_set)
print(nltk.classify.accuracy (classifier1, test_set))

classifier.classify(pos_features( 'cats'))
'NNS '

print (classifier.pseudocode (depth=4) )
if endswith(,) == True: return ','
if endswith(,) == False:
	if endswithlthe) == True: return 'AT'
	if endswith(the) == False:
		if endswith(s) == True:
			if endswith(is) == True: return ' BEZ'
			if endswith(is) == False: return 'VBZ'
		if endswith(s) == False:
			if endswith(.) == True: return' ·'
			if endswith(.) == False: return 'NN'

构建一个正则表达式标注器，并基于匹配模式进行标记分配：

import nltk
from nltk.corpus import brown
sentences = brown.tagged_sents (categories = 'news' )
sent = brown.sents (categories= 'news')
pattern= [
	(r’ .*ing$', 'VBG'),  				# for gerunds
	(z'. *eds', 'VBD'),  				# for simple past
	(r' . *es$', 'VBZ')，  				#for 3rd singular present
	(r'.*oulds', 'MD')，  				#for rodals
	(z'.*\'s$', 'NNS'),  				#for possessive nouns
	(r'. *s$'，'NNS'),  				#for plural nouns
	(r'^-?[0-9] + (.[0-9]+)?$','CD'), 	#for cardinal numbers
	(r' .*’, 'NN')  					#for nouns (default)
	]
regexpr_tagger = nitk .RegexpTagger (pattern)
print(regexpr_tagger.tag(sent[3]))

print(regexp_tagger.evaluate (sentences))

构建查找标注器（由于一些单词不再最常用的单词列表中，需要分配一个None标记）：

import nltk
from nltk.corpus import brown
freqd = nltk.FreqDist (brown.words (categories = 'news'))
cfreqd = n1tk.ConditicnalFreqDist (brown.tagged_words (categories = 'news')
mostfreq_words = freqd.most_common(100) 
likelytags = dict (word, cfreqdlwordj.max()) for (word,_ )  in mostfreq_words)
baselinetagger = nItk.UnigramTagger (model = likelytags)
print(baselinetagger.evaluate (brown tagged sents))

sent = brown.sents(categories = 'news') [3]
print(baselinetagger.tag(sent))

baselinetagger = nltk.UnigramTagger (model=likely_ tags, 	backoff=nltk.

				DefaultTagger('NN'))
def performance (cfreqd, wordlist) :
	It = dict( (word, cfreqd[word] ,max()) for word in wordlist)
	Baseline_tagger = nltk.UnigramTagger (model = lt, backoff = nltk.DefaultTagger('NN'))
	return baseline_tagger.evaluate (brown.tagged_sents (categories = 'news')) 

def display() :
	import pylab
	word_freqs = nltk.FreqDist (browm.words (categories = 'news')).most_common ()
	words_by_freq = [w for (w, ) in word_freqs]
	cfd = nltk.ConditionalFreqDist (brown.agged_words (categories = 'news')) 
	sizes = 2 ** pylab.arange(15)
	perfs = lperformance(cfd, words_by_freq[:size]) for size in sizes]
	pylab.plot(sizes, perfs, '-bo')
	pylab.title('Lookup Tagger Performance with Varying Model Size')
	pylab.xlabel('Model Size')
	pylab.ylabel('Performance')
	pylab.show()
display()

使用lancasterstemmer进行词干提取（使用黄金测试数据来完成一个stemmer的评估）：

import nltk 
from nltk.stem.lancaster import Lancasterstemmer
stri=LancasterStemmer ()
print(stri.stem('achievement'))

使用最大熵分类器设计一个基于分类的chunker：

class conseNPChunkagger( nltk.laggerI):
	def __init__(self, train_sente):
		train_set = []
		for tagsent in train_sents:
			untagsent = nitk.cag.untag(tagsent)
			history= []
			for i, (word, tag) in enumerate (tagsent):
				featureset = mpchunk_features (untagsent, i, history)
				train_set.append( (featureset, tag) )
				histcry.append(tag)
		self.classifier = nltk.MaxentClassifier.train(train_set, algorithm = ' megam', trace = 0)

	def tag(self, sentence) ;
		history = []
		for i, word in enumerate (sentence) :
			featureset = npchunk_features(sentence, i, history)
			tag = self.classifier.classify (featureset)
			histcry.append(tag)
		return zip(sentence, history)

class ConseNPChunker (nltk，ChunkParserI): [4]
	def__init__(self, train_sents) :
		tagsent = [ [ (w,t),c) for (w, t, c) in nltk.chunk.tree2conlltags(sent) ]
			for sent in train_sents]
		self.tagger = ConseNPChunkTagger (tagsent〉

	def parse (self, sentence) :
		tagsent= self.tagger.tag (sentence)
		conlltags = [(w, t, c) for (w, t), c] in tagsent]
		return nltk.chunk.conlltags2tree (conlltags)

使用一个特征提取器执行chunker评估：

def npchunk_features (sentence, i, history) :
…	word, pos = sentence[i]
…	return {"pos": pos}
	chunker = ConseNPChunker (train_sents)
	print (chunker.evaluate (test_sents) )

Chunker评估（类似于二元语法chunker）:

def npchunk features (sentence, i, history):
…	word, pos = sentence[i]
…	if i == 0:
…		previword, previpos = "<START>", "<START>"
…	else:
…		previword, previpos = sentence[i - 1]
…	return {"pos": pos, "previpos": previpos}
	chunker = ConseNPChunker (train_sents)
	print (chunker.evaluate(test_sents))

Chunker评估(添加当前单词的特征以便提高chunker的性能)：

def npchunk_features (sentence, i, history) :
…	word, pos = sentence[i]
…	if I == 0:
…		previword, previpos = "<START>", "<START>"
…	else:
…		previword, previpos = sentence[i-1]
...  return {"pos"; pos, "word": word, "previpos": previpos}
	chunker = ConseNPChunker (train_sents)
	print (chunker.evaluate(test_sents))

Chunker评估(添加特征集以便提高chunker的性能)：

def npchunk features (sentence, i，history):
	word, pos = sentence[i]
	if i == 0:
		previword, previpos = "<START>", "<START>”
	else:
		previword, previpos = sentence[i-1]
	if i == len(sentence) - 1:
		nextword, nextpos = "<END>", "<END>"
	else:
		nextword, nextpos = sentence[i+1]
	return {"pos": pos,
			"word": word,
			"previpos": previpos,
			"nextpos": nextpos,
			"previpost + pos": "%s + %s" % (previpos, pos) ,
			"pos+nextpos": "%s + %s" % (pos, nextpos),
			"tags-since-dt": tags_since_dt(sentence, 1) }
def tags_since_dt (sentence, i) :
	tags = set()
	for word, pos in sentence[:i] :
		if pos =='DT':
			tags = set()
		else:
			tags.add (pos)
	return ‘+' .join (sorted(tags))

chunker = ConsecutiveNPChunker (train_sents)
print (chunker.evaluate (test_sents))

10.1.2使用黄金数据执行解析器评估

以下两个手段可用于解析器性能的评估：

标记的依恋评估（Labelled Attachment Score，LAS）
标记的精确匹配（Labelled Exact Match，LEM）

10.2 IR系统的评估

IR系统评估需考虑如下几个方面：

所需资源、
文档的表述、
市场评估或用户黏性、
检索速度、
构建查询时的协助、
查找所需文档的能力。

（通常使用精确率、召回率、F值来评估IR系统）

10.3错误识别指标

错误识别是一个非常重要的可影响NLP系统性能的方面，可能涉及以下术语：

真正（True Positive，TP）	被正确识别为相关文档的相关文档集
真负（True Negative，TN）	被正确识别为无关文档的无关文档集
假正（False Positive，FP）	被错误识别为相关文档的无关文档集
假负（False Negative，FN）	被错误识别为无关文档的相关文档集

（通常使用精确率、召回率、F值来评估这个指标）

10.4基于词汇搭配的指标

为了检测一个给定的单词是否存在于文档中，构建一个特征提取器：

from nltk.corpus inport movie_reviews
docs = [(list(movie_reviews.words(fileid)),category)
			for category in movie_reviews.categories()
			for fileid in movie_reviews.fileids(category)]
random. shuffle (docs)
all_ wrds = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = list(all_wrds) [:2000]

def doc_features (doc) :
	doc_words.set (doc)
	features = {}
	for word in word_ features:
		features['contains({}) '.format (word)] = (word in doc_words)
	return features
print (doc_features (movie_reviews.words ('pos/cv957_8737.txt')))
print (nltk.classify.aecuracy(classifier, test_set))

classifier.show_most_informative_features(5)
Most Informative Features
	contains (outstanding) = True pos : neg = 11.1 :1.0
		contains (seagal) = True neg : pos = 7.7 :1.0
	contains (wonderfully) = True pos : neg = 6.8 :1.0
		contains(damon) = True pos : neg = 5.9 :1.0
		contains(wasted) = True neg : pos = 5.8 :1.0

用来确定给定的输出与预期的输出是否相同的指标：

from __future__ import print_function
from __future__ import division
def _edit_dist_init(len1, len2):
	lev = []
	for i in range (len1):
		lev. append([0] * len2) 		#initialization of 2D array to zero
	for i in range (len1):
		lev[i][0] = I	# column 0: 0,1,2,3….
	for j in range(len2):
		lev[0][j] = j 	#row 0: 0,1,2,3,4…
	return  lev

def _edit_dist_step(lev, i, j, s1, s2, transpositions = False):
	c1 = s1[i – 1]
	c2 = s2[j – 1]
	# skipping a character in s1
	a = lev[i - 1][j] + 1
	# skipping a character in s2
	b = lev[i][j – 1] + 1
	#substicution
	c = lev[i - 1][j – 1] + (c1 != c2)
	#transposition
	d=c+1	# never picked by default
	if transpositions and i> 1 and j > 1:
		if s1[i - 2] = c2 and s2[j – 2] == c1:
			d = lev[i – 2][j - 2] + 1
	# pick the cheapest
	lev[i][j] = min(a, b, c, d)

def edit_distance(s1, s2, transpositions=False) :
	#set up a 2-D array
	len1 = len(s1)
	len2 = len(s2)
	lev =_edit_dist_init(len1 + 1, 1en2 + 1)

	# iterate over the array
	for i in range(len1):
		for j in range(len2) :
			_edit_dist_step(lev, i + 1, j +1, s1, s2,transpositions = transpositions)
	return lev[len1] [len2]

def binary.distance (label1, label2) :
	"""Simple equality test .
	0.0 if the labels are identical, 1.0 if they are diferent.

from nltk.metrics import
print(binary_distance(1,2))
print(binary_distance(1,3)
	“””
	return 0.0 if label1 == label2 else 1.0

def jaccard_distance(label1, label2):
	“""Distance metric comparing set-similarity.”””
	return  (len(label1.union(labe12) - len(label1.intersection(label2))) / 
				len(labell.union (label2))

def masi_ distance (labell, label2):
	len_intersection = len(label1. intersection (label2))
	len_union = len (label1.union (label2))
	len_label1 = len(labell)
	len_label2 = len(label2)
	if len_label1 == len_label2 and len_ label1 == len_intersection:
		m=1
	elif len_intersection = min(len_label1, len_label2) :
		m =0.67
	elif len_intersection > 0:
		m= 0.33
	else:
		m = 0
	return 1 - (len_intersection / len_union) * m

def interval_distance (label1, label2):
	try:
		return pow(label1 - label2, 2)
#		return pow(list(label1) [0] – list(label2) [0],2)
	except:
		print ("non-numeric labels not supported with intervaldistance")

def presence(label) :
	return lambda x, y: 1.0 * ((label in x) == (label in y))

def fractional_presence(label):
	return lambda x, y: \ abs(((1.0 / len(x)) - (1.0 / len(y)))) * (label in x and label in y) \ or 0.0 * 			(label not in x and label not in y) \ or abs((1.0 / len(x))) * (label in x and label not in y) \ 		or ((1.0 / len(y))) * (label not in x and label in y)

def custom_distance (file) :
	data =[]
	with open(file, 'p') as infile:
		for l in infile:
			labelA, labelB, diat = l.strip().split("\t")
			labelA = frozenset ( [labelA])
			labelB = frozenset( [labelB])
			data[frozenset([labelA, labelB])] = float(dist)
	return lambda x,y:data [frozenset([x,y])]

def demo():
	edit_distance_examples = [
		("rain", "shine"), ("abcdef", "acbdef"), ("language","lnaguaeg"),
		("language", "Inaugage"), ("language", "lngauage")]
	for s1, s2 in edit_distance_examples:
		print("Edit distance between '%s' and '%s' :" % (s1, s2), edit_distance (s1, s2))
	for s1, s2 in edit_distance_examples:
		print("Edit distance with transpositions between '%s' and'%s':" % (s1, s2), edit_ distance(s1, s2, transpositions = True))

	s1 = set([1, 2, 3, 4])
	s2 = set([3, 4, 5])
	print("s1:", s1)
	print("s2:", s2)
	print("Binary distance:", binary_distance(s1, s2))
	print ("Jaccard distance:", jaccard_distance(s1, s2))
	print ("MASI distance:", masi_distance(s1, s2))

if __name__ == __ main__' :
	demo ()

10.5基于句法匹配的指标

句法匹配可以通过执行分块任务来完成。NLTK中提供了个叫作nitk. chunk.api的模块，其有助于识别语块并为给定的语块序列返回一个解析树。

句法匹配:

import  nltk 
from nltk.tree import Tree
print (Tree(1, [2, Tree(3, [4]) ,5])) 
ct=Tree('VP', [Tree('V',['gave']) ,Tree('NP',['her'])])
sent=Tree('S', [Tree('NP',['I']),ct])
print (sent)
print (sent[1])
print (sent[1,1])
t1 = Tree.from string("(S(NP I) (VP (V gave) (NP her))")
print(sent == t1)
t1[1][1].set_label('X')
t1[1][1].label ()
print (t1)
t1[0],t1[1,1] = t1[1,1], t1[0]
print (t1)
len(t1)

10.6使用浅层语义匹配的指标

计算wordnet相似度：

print(wordnet.N[‘dog’][0].path_similarity(wordnet.N[‘cat’][0]))

print(wordnet.V[‘run’][0].path_similarity(wordnet.N[‘walk’][0]))

（文章末尾有前面几张的链接，如果需要的话。）

“”"***笔者的话：整理了《精通Python自然语言处理》的第十章内容：NLP系统评估。本书的最后一张内容。介绍了有关NLP系统评估的内容，主要还是计算系统的准确度，本博客记录了书中的每段代码。希望对阅读这本书的人有所帮助。ＦＩＧＨＴＩＮＧ．．．（热烈欢迎大家批评指正，互相讨论）
（"Catch the moments of your life. Catch them while you are young and quick." --《我们这一天》） ***"""

搞点學術的研究生

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
《精通Python自然语言处理（ Deepti Chopra)》读书笔记（第十章）：NLP系统评估

《精通Python自然语言处理》Deepti Chopra(印度)王威译第十章 NLP系统评估：性能分析10.1 NLP系统评估要点创建黄金标准注释语料库是一项主要的任务，而且其实成本也是非常昂贵的。它通过手工标注给定的测试数据来完成该操作。以这种方式筛选的标记被视为标准标记，其可用于表示大范围的信息。10.1.1 NLP工具的评估（词性标注器、词干提取器及形态分析器）训练一个...
复制链接

扫一扫

专栏目录