【python】使用中科院NLPIR分词工具进行mysql数据分词

最新推荐文章于 2021-08-18 14:28:25 发布

四海八荒第一野怪

最新推荐文章于 2021-08-18 14:28:25 发布

阅读量1.2k

点赞数 1

分类专栏： python 文章标签： python mysql 数据库中科院数据

本文链接：https://blog.csdn.net/qq_25264951/article/details/56035334

版权

python 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文主要是使用中科院的分词工具对于数据库中的数据文本进行分词
在电脑上安装python，并导入python与数据库的连接插件MySQLdb 以及中科院的分词工具NLPIR

import pynlpir
import codecs
import math,MySQLdb
from search import *
pynlpir.open()
#连接数据库
conn=MySQLdb.connect(host="127.0.0.1",user="root",passwd="123456",db="",charset="utf8") 
cursor = conn.cursor() 
n = cursor.execute("select * from test where  id = 8 ")

停用词
st = codecs.open('E:\\testword\\stopwords.txt', 'rb',encoding='gbk')
读取数据库中的数据

for row in cursor.fetchall():  
    s=row[3]
    singletext_result = []
    #item中第一列存储的是关键词，第二列是词性
    print row[0]
    for item in pynlpir.segment(s):
        #print item[0]
        singletext_result.append(item[0])
    #print singletext_result
    #读取停用词
    for line in st:
        line = line.strip()
        stopwords.append(line)
    print stopwords

过滤停用词

#过滤停用词
    localtion = 0
    for word in singletext_result:
        localtion = localtion + 1
        if word not in stopwords:
            if word >= u'\u4e00' and word <= u'\u9fa5':#判断是否是汉字
                 delstopwords_singletxt.append(word)

构建词表

#构建词表
    for item in delstopwords_singletxt:
        if(search(item)):
            if(savecount(item)):
                print 'success to add count'
        else:
            if(save(item)):
                print 'success to add keyword'

四海八荒第一野怪

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
【python】使用中科院NLPIR分词工具进行mysql数据分词

本文主要是使用中科院的分词工具对于数据库中的数据文本进行分词在电脑上安装python，并导入python与数据库的连接插件MySQLdb 以及中科院的分词工具NLPIRimport pynlpirimport codecsimport math,MySQLdbfrom search import *pynlpir.open()#连接数据库conn=MySQLdb.connect(ho
复制链接

扫一扫