在windows x64上安装CRFPP并进行地址识别【亲测有效】--python自然语言处理实战

最新推荐文章于 2022-11-08 23:56:49 发布

Yolanda Yan 9

最新推荐文章于 2022-11-08 23:56:49 发布

阅读量530

点赞数 2

分类专栏： python NLP 文章标签： python 自然语言处理

本文链接：https://blog.csdn.net/Amy9_Miss/article/details/119490470

版权

python 同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

NLP

2 篇文章 0 订阅

订阅专栏

安装CRFPP

在百度网盘上下载CRF+±0.58

链接：点这里

提取码：peub

在这里插入图片描述

在windows x64上安装，需要在\CRF+±0.58\python\中，运行下面两个语句
```
python  setup.py build
python setup.py install
```

在这里插入图片描述

说明：python setup.py install 有可能会遇到权限不足的问题, 需要换成管理员模式安装

验证是否安装成功，如果导入后没有报错，则安装成功

在这里插入图片描述

模型训练及预测

在DOS里，进入CRF+±0.58文件路径里，输入如下语句，进行模型训练

# 训练
crf_learn -f 4 -p 8 -c 3 ../data/template ../data/train.txt ../result/model

在这里插入图片描述

数据集及模板点这里下载
百度网盘提取码：8jlu

预测时，输入如下语句

# 预测
crf_test -m ../result/model ../data/test.txt > ../result/test.rst

可以使用如下代码计算模型在测试集上的效果

def cal_f1(path):
    with open(path, encoding='utf8') as f:
        all_tag = 0  # 记录所有的标记数
        loc_tag = 0  # 记录真实的地理位置标记数
        pred_loc_tag = 0  # 记录预测的地理位置标记数
        correct_tag = 0  # 记录正确的标记数
        correct_loc_tag = 0  # 记录正确的地理位置标记数

        states = ['B', 'M', 'E', 'S']
        for line in f:
            line = line.strip()
            if line == '': continue
            _, r, p = line.split()
            all_tag += 1
            if r == p:
                correct_tag += 1
                if r in states:
                    correct_loc_tag += 1
            if r in states:
                loc_tag += 1
            if p in states:
                pred_loc_tag += 1
        loc_P = 1.0 * correct_loc_tag / pred_loc_tag
        loc_R = 1.0 * correct_loc_tag / loc_tag
        loc_f1 = (2 * loc_P * loc_R) / (loc_P + loc_R)
        print('loc_P:{0}, loc_R:{1}, loc_F1:{2}'.format(loc_P, loc_R, loc_f1))
        
if __name__ == '__main__':
    cal_f1('./result/test.rst')

运行结果如下
在这里插入图片描述

模型使用

用CRF++实现地址识别，具体代码如下

load_model()：用于加载之前训练的模型
locationNER()：接收字符串，输出其识别出的地名

def load_model(path):
    import os, CRFPP
    # -v 3:access deep information like alpha,beta,prob
    # -nN: enable nbest output. N should be >=2
    if os.path.exists(path):
        return CRFPP.Tagger('-m {0} -v 3 -n2'.format(path))


def locationNER(text):
    tagger = load_model('./result/model')
    for c in text:
        tagger.add(c)
    result = []
    # parse and changes internal stated as 'parsed'
    tagger.parse()
    word = ''
    for i in range(0, tagger.size()):
        for j in range(0, tagger.xsize()):
            ch = tagger.x(i, j)
            tag = tagger.y2(i)
            if tag == 'B':
                word = ch
            elif tag == 'M':
                word += ch
            elif tag == 'E':
                word += ch
                result.append(word)
            elif tag == 'S':
                word = ch
                result.append(word)
    return result

if __name__ == '__main__':
    text = '我中午要去北京饭店，下午去中山公园，晚上回亚运村。'
    print(text, locationNER(text), sep='==>')

    text = '我去回龙观，不去南锣鼓巷'
    print(text, locationNER(text), sep='==>')

    text = '打的去北京南路'
    print(text, locationNER(text), sep='==>')