统计序列中出现次数最多的元素

当我们遇到一大段文本该如何获取文本中热词,首先我先对文本进行序列化(序列化过程就不说了),然后对其进行统计:

常规思路

words = [
    'look','into','my','eyes','look','into','my','eyes','the',
    'eyes','the','eyes','the','eyes','not','around','the','eyes',
    "don't",'look','around','the','eyes','look','into','my','eyes',
    "you're",'under'
]
# 集合对列表去重,防止重复统计
words_set = set(words)
words_dict = {}
# 遍历集合中的每个单词,并统计每一个单词出现次数,存入字典
for i in words_set:
    words_dict[i] = words.count(i)
print(words_dict)
 # 按字典的值对字典进行降序排序
 sorted(words_dict.items(),key = lambda x:x[1],reverse=True)

输出结果如下:

{'eyes': 8, 'not': 1, 'look': 4, 'into': 3, "you're": 1, "don't": 1, 'around': 2, 'the': 5, 'under': 1, 'my': 3}
[('eyes', 8), ('the', 5), ('look', 4),('into', 3), ('my', 3),
 ('around', 2), ('not', 1), ("you're", 1), ("don't", 1), ('under', 1)]

利用colletions.Counter类实现

words = [
    'look','into','my','eyes','look','into','my','eyes','the',
    'eyes','the','eyes','the','eyes','not','around','the','eyes',
    "don't",'look','around','the','eyes','look','into','my','eyes',
    "you're",'under'
]
from collections import Counter
# 直接对每一个单词进行次数统计,返回Counterl类(就是一个字典),里面元素默认降序排序
word_counts = Counter(words)
print(word_counts)
# Counter类中most_common方法返回出现频率最高的4个单词
top_four = word_counts.most_common(4)
print(top_four)

输出结果如下:

Counter({'eyes': 8, 'the': 5, 'look': 4, 'into': 3, 'my': 3, 'around': 2, 'not': 1, "don't": 1, "you're": 1, 'under': 1})
[('eyes', 8), ('the', 5), ('look', 4), ('into', 3)]

Counter实例还很容易和数学运算操作相结合,比如:

from collections import Counter
words = [
    'look','into','my','eyes','look','into','my','eyes','the',
    'eyes','the','eyes','the','eyes','not','around','the','eyes',
    "don't",'look','around','the','eyes','look','into','my','eyes',
    "you're",'under'
]
words1 = ['why','are','you','not','looking','in','my','eyes']
a = Counter(words)
print('a:',a)
b = Counter(words1)
print('b:',b)
print("a+b:",a+b)

输出结果如下:

a: Counter({'eyes': 8, 'the': 5, 'look': 4, 'into': 3, 'my': 3, 'around': 2, 'not': 1, "don't": 1, "you're": 1, 'under': 1})
b: Counter({'why': 1, 'are': 1, 'you': 1, 'not': 1, 'looking': 1, 'in': 1, 'my': 1, 'eyes': 1})
a+b: Counter({'eyes': 9, 'the': 5, 'look': 4, 'my': 4, 'into': 3, 'not': 2, 'around': 2, "don't": 1, "you're": 1, 'under': 1, 'why': 1, 'are': 1, 'you': 1, 'looking': 1, 'in': 1})

使用该方法将带来的便利显而易见。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值