wanBlog

写给自己做记录。

词频统计
f = open('D:\\Walden.txt','r')
s = f.read()
s = s.replace('.','')
s = s.replace(',','')
s = s.replace('\'','')
s = s.replace('\"','')
s = s.lower()

print(s)
words = s.split()
print(words)
wset = set(words)
clist = list(wset)
for w in clist:
    print(w,clist.count(w))

用replace语句删除东西很方便。

如果想要排序,可以这样

import operator
list2 = []
f = open('D:\\Walden.txt','r')

s = f.read()
s = s.replace('.','')
s = s.replace(',','')
s = s.replace('\'','')
s = s.replace('!','')
s = s.lower()

print(s)
words = s.split()
print(words)
wset = set(words)
clist = list(wset)
for w in clist:
    wnum = [w,words.count(w)]
    print(wnum)
    list2.append(wnum)
print(list2)
list2.sort(key = operator.itemgetter(1))
print(list2)
print('end')

返回的结果总是

锘縇et freedom ring from the mighty mountains of new york !
let freedom ring from the heightening alleghenies of pennsylvania !
let freedom ring from the snowcapped rockies of colorado !
let freedom ring from the curvaceous slops of california !
but not only that let freedom ring from stone mountain of georgia !
let freedom ring from lookout mountain of tennessee !
let freedom ring from every hill and molehill of mississippi !
from every mountainside  let freedom ring !
['锘縇et', 'freedom', 'ring', 'from', 'the', 'mighty', 'mountains', 'of', 'new', 'york', '!', 'let', 'freedom', 'ring', 'from', 'the', 'heightening', 'alleghenies', 'of', 'pennsylvania', '!', 'let', 'freedom', 'ring', 'from', 'the', 'snowcapped', 'rockies', 'of', 'colorado', '!', 'let', 'freedom', 'ring', 'from', 'the', 'curvaceous', 'slops', 'of', 'california', '!', 'but', 'not', 'only', 'that', 'let', 'freedom', 'ring', 'from', 'stone', 'mountain', 'of', 'georgia', '!', 'let', 'freedom', 'ring', 'from', 'lookout', 'mountain', 'of', 'tennessee', '!', 'let', 'freedom', 'ring', 'from', 'every', 'hill', 'and', 'molehill', 'of', 'mississippi', '!', 'from', 'every', 'mountainside', 'let', 'freedom', 'ring', '!']
['mississippi', 1]
['but', 1]
['the', 4]
['california', 1]
['锘縇et', 1]
['freedom', 8]
['not', 1]
['curvaceous', 1]
['alleghenies', 1]
['hill', 1]
['molehill', 1]
['tennessee', 1]
['lookout', 1]
['heightening', 1]
['only', 1]
['slops', 1]
['of', 7]
['mountains', 1]
['!', 8]
['pennsylvania', 1]
['rockies', 1]
['snowcapped', 1]
['mighty', 1]
['ring', 8]
['and', 1]
['that', 1]
['stone', 1]
['every', 2]
['new', 1]
['mountain', 2]
['from', 8]
['colorado', 1]
['mountainside', 1]
['georgia', 1]
['york', 1]
['let', 7]
[['mississippi', 1], ['but', 1], ['the', 4], ['california', 1], ['锘縇et', 1], ['freedom', 8], ['not', 1], ['curvaceous', 1], ['alleghenies', 1], ['hill', 1], ['molehill', 1], ['tennessee', 1], ['lookout', 1], ['heightening', 1], ['only', 1], ['slops', 1], ['of', 7], ['mountains', 1], ['!', 8], ['pennsylvania', 1], ['rockies', 1], ['snowcapped', 1], ['mighty', 1], ['ring', 8], ['and', 1], ['that', 1], ['stone', 1], ['every', 2], ['new', 1], ['mountain', 2], ['from', 8], ['colorado', 1], ['mountainside', 1], ['georgia', 1], ['york', 1], ['let', 7]]
[['mississippi', 1], ['but', 1], ['california', 1], ['锘縇et', 1], ['not', 1], ['curvaceous', 1], ['alleghenies', 1], ['hill', 1], ['molehill', 1], ['tennessee', 1], ['lookout', 1], ['heightening', 1], ['only', 1], ['slops', 1], ['mountains', 1], ['pennsylvania', 1], ['rockies', 1], ['snowcapped', 1], ['mighty', 1], ['and', 1], ['that', 1], ['stone', 1], ['new', 1], ['colorado', 1], ['mountainside', 1], ['georgia', 1], ['york', 1], ['every', 2], ['mountain', 2], ['the', 4], ['of', 7], ['let', 7], ['freedom', 8], ['!', 8], ['ring', 8], ['from', 8]]
end

让我不禁以为是没有排序。而实际上是因为我没有认真观察,其实已经排好序了。

如果要倒序,只需要list2.sort(key = operator.itemgetter(1),reverse = True)

看得清清楚楚

import operator
list2 = []
f = open('D:\\Walden.txt','r')

s = f.read()
s = s.replace('.','')
s = s.replace(',','')
s = s.replace('\'','')
s = s.replace('!','')
s = s.lower()

print(s)
words = s.split()
print(words)
wset = set(words)
clist = list(wset)
for w in clist:
    wnum = [w,words.count(w)]
    print(wnum)
    list2.append(wnum)

list2.sort(key = operator.itemgetter(1),reverse = True)
print(list2)
print('end')

阅读更多
想对作者说点什么? 我来说一句

Hadoop词频统计(完整版)

2014年01月05日 154KB 下载

哈希表词频统计

2014年04月28日 5KB 下载

没有更多推荐了,返回首页

不良信息举报

词频统计

最多只允许输入30个字

加入CSDN,享受更精准的内容推荐,与500万程序员共同成长!
关闭
关闭