python简单词频统计_Python中简单的词频统计

最新推荐文章于 2024-02-19 12:40:48 发布

weixin_39894104

最新推荐文章于 2024-02-19 12:40:48 发布

阅读量472

点赞数

文章标签： python简单词频统计

用的是ipython notebook

1.框架是打开文件，写入文件

for line in open(in_file):

continue

out = open(out_file, 'w')

out.write()```

2.简单的统计词频大致模板

def count(in_file,out_file):

#读取文件并统计词频

word_count={}#统计词频的字典

for line in open(in_file):

words = line.strip().split(" ")

for word in words:

if word in word_count:

word_count[word]+=1

else:

word_count[word]=1

out = open(out_file,'w')#打开一个文件

for word in word_count:

print word,word_count[word]#输出字典的key值和value值

out.write(word+"--"+str(word_count[word])+"\n")#写入文件

out.close()

count(in_file,out_file)```

一段很长的英文文本，此代码都是用split(" ")空格区分一个单词，显然是不合格的比如： "I will endeavor," said he,那么"I 和he,等等会被看成一个词，此段代码就是告诉你基本的统计词频思路。看如下一道题

1.在网上摘录一段英文文本(尽量长一些)，粘贴到input.txt，统计其中每个单词的词频(出现的次数)，并按照词频的顺序写入out.txt文件，每一行的内容为“单词:频次”

用的模板

#统计词频，按词频顺序写入文件

in_file = 'input_word.txt'

out_file = 'output_word.txt'

def count_word(in_file,out_file):

word_count={}#统计词频的字典

for line in open(in_file):

words = line.strip().split(" ")

for word in words:

if word in word_count:

word_count[word]+=1

else:

word_count[word]=1

out = open(out_file,'w')

for word in sorted(word_count.keys()):#按词频的顺序遍历字典的每个元素

print word,word_count[word]

out.write('%s:%d' % (word, word_count.get(word)))

out.write('\n')

out.close()

count_word(in_file,out_file)```

正则表达式的方法

import re

f = open('input_word.txt')

words = {}

rc = re.compile('\w+')

for l in f:

w_l = rc.findall(l)

for w in w_l:

if words.has_key(w):

words[w] += 1

else:

words[w] = 1

f.close()

f = open('out.txt', 'w')

for k in sorted(words.keys()):

print k,words[k]

f.write('%s:%d' % (k, words.get(k)))

f.write('\n')

f.close()```

weixin_39894104

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。