记使用CRF++做中文命名实体识别

刚开始接触CRF++,难免有点摸不着头脑。
詹老师说的对,阅读文档就应该去看英文原版,看中文译过来的博客难免会丢失信息
CRF++主页:https://taku910.github.io/crfpp/
CRF++-0.58.tar.gz下载:http://code.google.com/p/crfpp/downloads/list
tips:梯子自备

Installation

% ./configure
% make
% su
# make install
You can change default install path by using –prefix option of configure script.
Try –help option for finding out other options.

Training and Test file formats

训练文件和测试文件都需要写成特定的格式才能正常运行。Generally speaking, training and test file must consist of multiple tokens. In addition, a token consists of multiple (but fixed-numbers) columns. The definition of tokens depends on tasks, however, in most of typical cases, they simply correspond to words. Each token must be represented in one line, with the columns separated by white space (spaces or tabular characters). A sequence of token becomes a sentence. To identify the boundary between sentences, an empty line is put.
你可以想给几列就给几列,前提是对于所有的token,列的数量必须固定。 Furthermore, there are some kinds of “semantics” among the columns. For example, 1st column is ‘word’, second column is ‘POS tag’ third column is ‘sub-category of POS’ and so on.
最后一列,代表真正要被CRF++训练的标记

注:特征模板有空再补上

Training (encoding)

% crf_learn template_file train_file model_file

template_file 和 train_file 都是需要你提前准备好的。crf_learn 训练的模型存在 model_file中。
这里写图片描述
- iter: 迭代次数
- terr: error rate with respect to tags. (# of error tags/# of all tag)
- serr: error rate with respect to sentences. (# of error sentences/#
of all sentences)
- obj: current object value. When this value converges to a fixed
point, CRF++ stops the iteration.
- diff: relative difference from the previous object value.

注:训练参数选择有空补

Testing (decoding)
% crf_test -m model_file test_files > result.txt
Evaluate

conlleval地址:http://www.cnts.ua.ac.be/conll2000/chunking/output.html

conlleval.pl -d "\t" < result.txt 

这里写图片描述

注:根据人民日报标注语料训练的例子,周末有空再编辑

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值