python 特征选择卡方_python – 理解卡方特征选择的问题

最新推荐文章于 2021-02-19 22:52:02 发布

weixin_39980841

最新推荐文章于 2021-02-19 22:52:02 发布

阅读量102

点赞数

文章标签： python 特征选择卡方

我一直在理解卡方特征选择的问题.我有两个类,正面和负面,每个类包含不同的术语和术语计数.我需要执行卡方特征选择以提取每个类的最具代表性的术语.问题是我最终得到了正面和负面类的完全相同的术语.这是我选择功能的

Python代码：

#!/usr/bin/python

# import the necessary libraries

import math

class ChiFeatureSelector:

def __init__(self, extCorpus, lookupCorpus):

# store the extraction corpus and lookup corpus

self.extCorpus = extCorpus

self.lookupCorpus = lookupCorpus

def select(self, outPath):

# dictionary of chi-squared scores

scores = {}

# loop over the words in the extraction corpus

for w in self.extCorpus.getTerms():

# build the chi-squared table

n11 = float(self.extCorpus.getTermCount(w))

n10 = float(self.lookupCorpus.getTermCount(w))

n01 = float(self.extCorpus.getTotalDocs() - n11)

n00 = float(self.lookupCorpus.getTotalDocs() - n10)

# perform the chi-squared calculation and store

# the score in the dictionary

a = n11 + n10 + n01 + n00

b = ((n11 * n00) - (n10 * n01)) ** 2

c = (n11 + n01) * (n11 + n10) * (n10 + n00) * (n01 + n00)

chi = (a * b) / c

scores[w] = chi

# sort the scores in descending order

scores = sorted([(v, k) for (k, v) in scores.items()], reverse = True)

i = 0

for (v, k) in scores:

print str(k) + " : " + str(v)

i += 1

if i == 10:

break

这就是我使用该类的方法(为了简洁起见省略了一些代码,是的,我已经检查过以确保这两个语料库不包含完全相同的数据.

# perform positive ngram feature selection

print "positive:\n"

f = ChiFeatureSelector(posCorpus, negCorpus)

f.select(posOutputPath)

print "\nnegative:\n"

# perform negative ngram feature selection

f = ChiFeatureSelector(negCorpus, posCorpus)

f.select(negOutputPath)

我觉得错误来自于我计算术语/文档表但我不确定.也许我不理解某些事情.有人能指出我正确的方向吗？

最佳答案在两类案例中,如果两者的特征,卡特等级排名是相同的

交换数据集.它们是最不同的特征

这两个班.

weixin_39980841

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 特征选择卡方_python – 理解卡方特征选择的问题

我一直在理解卡方特征选择的问题.我有两个类,正面和负面,每个类包含不同的术语和术语计数.我需要执行卡方特征选择以提取每个类的最具代表性的术语.问题是我最终得到了正面和负面类的完全相同的术语.这是我选择功能的Python代码：#!/usr/bin/python# import the necessary librariesimport mathclass ChiFeatureSelector:def...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。