输出词频最高的n个词--python

#!/usr/bin/python
#-*- coding: utf8 -*-
def word_count(f_name, topN):
    """
        Print the topN word and its count
        @author: ken
        Assuming words are separated by doted character
        for example, contents in the text:
            bird,apple,yellow,apple,red,banana,apple,yellow
        if topN is 2, then output should be [('apple',3),('yellow',2)]
        Extended, take these into account:
        1.the contents in the file cannot be read into memory at once
        2.print the topN items before sort all the items, such as heap sort
    """
    f = open(f_name, 'r')
    words = f.read().split(',')
    w_c = {}
    for w in words:
        w = w.strip()
        if w_c.has_key(w):
            w_c[w] += 1
        else:
            w_c[w] = 1
    s_w_c = sorted(w_c.items(), lambda x, y: cmp(x[1], y[1]), reverse = True)
    w_total = len(w_c.keys())
    topN = topN if topN < w_total else w_total
    return s_w_c[:topN]


if __name__ == "__main__":
    print word_count('text', 2)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值