中文字符频率统计python_使用Python 统计高频字数的方法

最新推荐文章于 2022-11-25 22:30:48 发布

weixin_39929566

最新推荐文章于 2022-11-25 22:30:48 发布

阅读量480

点赞数

文章标签：中文字符频率统计python

问题

(来自Udacity机器学习工程师纳米学位预览课程)

用 Python 实现函数 count_words()，该函数输入字符串 s 和数字 n，返回 s 中 n 个出现频率最高的单词。返回值是一个元组列表，包含出现次数最高的 n 个单词及其次数,即 [(<单词1>, <次数1>), (<单词2>, <次数2>), ... ]，按出现次数降序排列。

可以假设所有输入都是小写形式，并且不含标点符号或其他字符（只包含字母和单个空格）。如果出现次数相同，则按字母顺序排列。

例如：

print count_words("betty bought a bit of butter but the butter was bitter",3)

输出

[('butter', 2), ('a', 1), ('betty', 1)]

解法

"""Count words."""

def count_words(s, n):

"""Return the n most frequently occuring words in s."""

w = {}

sp = s.split()

# TODO: Count the number of occurences of each word in s

for i in sp:

if i not in w:

w[i] = 1

else:

w[i] += 1

# TODO: Sort the occurences in descending order (alphabetically in case of ties)

top = sorted(w.items(), key=lambda item:(-item[1], item[0]))

top_n = top[:n]

# TODO: Return the top n most frequent words.

return top_n

def test_run():

"""Test count_words() with some inputs."""

print count_words("cat bat mat cat bat cat", 3)

print count_words("betty bought a bit of butter but the butter was bitter", 3)

if __name__ == '__main__':

test_run()

小结

主要两个小技巧：

用split()将输入字符串按空格分开；

用sorted()函数对字典先按值，再按键进行排序，尤其是item:(-item[1], item[0])) 代表先对item的第二个元素降序排列（item 之前用了-），然后对第一个元素升序排列。多个元素的元组亦然。

以上这篇使用Python 统计高频字数的方法就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持脚本之家。

weixin_39929566

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。