![](https://img-blog.csdnimg.cn/20201014180756757.png?x-oss-process=image/resize,m_fixed,h_64,w_64)
nlp
文章平均质量分 92
陈為先
多发paper多锻炼。
展开
-
朴素贝叶斯分类
train.py#!/usr/bin/env python# -*- coding: UTF-8 -*-import osimport shutilimport jiebaimport picklefrom sklearn.datasets.base import Bunchfrom sklearn.feature_extraction.text import TfidfVecto...原创 2019-03-22 16:08:10 · 428 阅读 · 0 评论 -
自动摘要(抽取式)
wiki语料处理与word2vec语料训练step1_wiki.py# -*- encoding:utf-8 -*-'''Function: 解析.bz2格式语料包,得到.txt'''import requestsfrom gensim.corpora.wikicorpus import extract_pages, filter_wikiimport bz2fileimpo...原创 2019-03-22 16:45:56 · 2584 阅读 · 0 评论 -
实体识别(corenlp)
# coding=utf-8import jsonimport loggingfrom stanfordcorenlp import StanfordCoreNLP# nlp_entity = StanfordCoreNLP(r'stanford-corenlp-full-2018-02-27', lang='zh', quiet=False, logging_level=logging...原创 2019-03-22 16:52:22 · 1356 阅读 · 2 评论 -
实体消歧(链接到实体库)
disambiguation.py#!/usr/bin/python3import pymysqlimport jsonimport requestsfrom SimilarityEN import similaritydef findCandidates(entity): # 打开数据库连接 db = pymysql.connect(host=host, port...原创 2019-03-22 16:57:15 · 3193 阅读 · 2 评论 -
关键词提取
# -*- encoding:utf-8 -*-'''Interface: KPextCN(text, keyphrasenum=4, stoppath='stopword.txt')SupportFile: stopword.txtFuction: 中文关键短语提取Algorithm: TextRank'''import jsonfrom textrank4zh import...原创 2019-03-22 17:04:40 · 507 阅读 · 0 评论