中文情感分析实例---WordNet

1.安装nltk

   pip install nltk

2.下载文本到本地

----wordnet汉语开放词网,可从以下网址下载:
http://compling.hss.ntu.edu.sg/cow/

----停用词:参考以下网页,另外加入常用标点符号
http://blog.csdn.net/u010533386/article/details/51458591

3.下载WordNet语料库

import nltk

nltk.download()

   ---运行后,出现图形界面,选择第二项"all-corpora”,然后download

4.代码

# encoding=utf-8
import jieba
import importlib, sys
import codecs

importlib.reload(sys)

import nltk
from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn

def doSeg(filename):
    f = open(filename, 'r+')
    file_list = f.read()
    f.close()

    seg_list = jieba.cut(file_list)

    stopwords = []
    for word in open("D:\Work\Python学习\SVM\stop_words.txt", "r"):
        stopwords.append(word.strip())

    ll = []
    for seg in seg_list:
        if (seg.encode("utf-8") not in stopwords and seg != ' ' and seg != '' and seg != '\n' and seg != "\n\n"):
            ll.append(seg)
    return ll

def loadWordNet():
    f = codecs.open("D:\Work\Python学习\SVM\cow-not-full.txt", "rb", "utf-8")
    known = set()
    for l in f:
        if l.startswith('#') or not l.strip():
            continue
        row = l.strip().split("\t")
        if len(row) == 3:
            (synset, lemma, status) = row
        elif len(row) == 2:
            (synset, lemma) = row
            status = 'Y'
        else:
            print("illformed line: " + l.strip())
        if status in ['Y', "0"]:
            if not (synset.strip(), lemma.strip()) in known:
                known.add((synset.strip(), lemma.strip()))
    return known

def findWordNet(known, key):
    ll = []
    for kk in known:
        if (kk[1] == key):
                ll.append(kk[0])
    return ll

def id2ss(ID):
    return wn._synset_from_pos_and_offset(str(ID[-1:]), int(ID[:8]))

def getSenti(word):
    return swn.senti_synset(word.name())

if __name__ == '__main__':
    known = loadWordNet()
    words = doSeg(sys.argv[1])

    n = 0
    p = 0
    for word in words:
        ll = findWordNet(known, word)
        if (len(ll) != 0):
            n1 = 0.0
            p1 = 0.0
            for wid in ll:
                desc = id2ss(wid)
                swninfo = getSenti(desc)
                p1 = p1 + swninfo.pos_score()
                n1 = n1 + swninfo.neg_score()
            if (p1 != 0.0 or n1 != 0.0):
                print(word + '-> n' + str(n1/len(ll)) + ", p " + str(p1/len(ll)))
            p = p + p1/len(ll)
            n = n + n1/len(ll)
    print("n:" + str(n) + ", p:" + str(p))


4.参考

https://blog.csdn.net/xieyan0811/article/details/62056558

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值