CRF++

最新推荐文章于 2022-07-31 15:17:37 发布

想努力的人

最新推荐文章于 2022-07-31 15:17:37 发布

阅读量1.3k

点赞数

分类专栏： python 自然语言处理

本文链接：https://blog.csdn.net/LFGxiaogang/article/details/95649338

版权

python 同时被 2 个专栏收录

42 篇文章 1 订阅

订阅专栏

自然语言处理

26 篇文章 0 订阅

订阅专栏

https://blog.csdn.net/lilong117194/article/details/83106711 ----命名实体识别—CRF++地名识别（这篇文章很详细）

http://www.hankcs.com/nlp/the-crf-model-format-description.html -----CRF++模型格式说明

https://taku910.github.io/crfpp/ ---官方文档

https://www.jianshu.com/p/0c99ea1c730c ----crf实验

至于B特征函数（这里特指简单的f(s', s)），在Viterbi后向解码的时候，前一个标签确定了后就可以代入当前的B特征函数，计算出每个输出标签的分数，再次求和排序即可。

import codecs
import sys

import CRFPP

def crf_segmenter(input_file, output_file, tagger):
	input_data = codecs.open(input_file, 'r', 'utf-8')
	output_data = codecs.open(output_file, 'w', 'utf-8')
	for line in input_data.readlines():
		tagger.clear()
		for word in line.strip():
			word = word.strip()
			if word:
				tagger.add((word + "\to\tB").encode('utf-8'))  #使用多个特征
		tagger.parse()
		size = tagger.size()
		xsize = tagger.xsize()
		for i in range(0, size):
			for j in range(0, xsize):
				char = tagger.x(i, j).decode('utf-8')
				tag = tagger.y2(i)
				if tag == 'B':
					output_data.write(' ' + char)
				elif tag == 'M':
					output_data.write(char)
				elif tag == 'E':
					output_data.write(char + ' ')
				else:
					output_data.write(' ' + char + ' ')
		output_data.write('\n')
	input_data.close()
	output_data.close()

if __name__ == '__main__':
	if len(sys.argv) != 4:
		print "Usage: python " + sys.argv[0] + " model input output"
		sys.exit(-1)
	crf_model = sys.argv[1]
	input_file = sys.argv[2]
	output_file = sys.argv[3]
	tagger = CRFPP.Tagger("-m " + crf_model)
	crf_segmenter(input_file, output_file, tagger)

想努力的人

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
CRF++

https://blog.csdn.net/lilong117194/article/details/83106711 ----命名实体识别—CRF++地名识别（这篇文章很详细）http://www.hankcs.com/nlp/the-crf-model-format-description.html -----CRF++模型格式说明https://taku910.github.i...
复制链接

扫一扫

专栏目录

CRF++

“相关推荐”对你有帮助么？