面试题[Python]-统计一篇英文文章中词频前十的词及对应数量

本文介绍如何使用Python处理面试题,即统计一篇英文文章中出现频率最高的十个单词及其数量。提供了两种解决方案,包括使用普通容器对象和Python collections库的Counter对象。
摘要由CSDN通过智能技术生成

在做这道笔试题的时候, 面试官直接提示说可以使用Python内置库来做这道题. 下面的题解中会包含两种方案

题目

使用Python统计一篇文章中词频前十的词及其对应数量.

题解一

普通做法的话就是使用容器对象来记录单词及其计数, 然后再获取前十词频的词信息. 示例代码如下:

content = """The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
"""
word_list =
要对一篇文章进行词频统计并绘制云图,可以使用Python的jieba和wordcloud库。步骤如下: 1. 安装jieba和wordcloud库。在命令行输入以下命令: ``` pip install jieba pip install wordcloud ``` 2. 导入所需库: ```python import jieba from wordcloud import WordCloud import matplotlib.pyplot as plt from PIL import Image import numpy as np ``` 3. 打开文章并读取: ```python with open('article.txt', 'r', encoding='utf-8') as f: article = f.read() ``` 4. 使用jieba分,并统计词频: ```python words = jieba.lcut(article) # 使用精确模式分 freq = {} for word in words: if len(word) > 1: # 只统计长度大于1的语 freq[word] = freq.get(word, 0) + 1 ``` 5. 生成云图: ```python mask = np.array(Image.open('mask.png')) # 打开遮罩图片 wc = WordCloud(background_color='white', font_path='msyh.ttc', mask=mask, max_words=200, max_font_size=100, random_state=42) wc.generate_from_frequencies(freq) # 根据词频生成云图 plt.imshow(wc, interpolation='bilinear') plt.axis('off') plt.show() ``` 其,mask参数指定了云图的形状,max_words和max_font_size分别指定了最多显示的数和的最大字号,font_path指定了使用的字体。 完整代码如下: ```python import jieba from wordcloud import WordCloud import matplotlib.pyplot as plt from PIL import Image import numpy as np with open('article.txt', 'r', encoding='utf-8') as f: article = f.read() words = jieba.lcut(article) freq = {} for word in words: if len(word) > 1: freq[word] = freq.get(word, 0) + 1 mask = np.array(Image.open('mask.png')) wc = WordCloud(background_color='white', font_path='msyh.ttc', mask=mask, max_words=200, max_font_size=100, random_state=42) wc.generate_from_frequencies(freq) plt.imshow(wc, interpolation='bilinear') plt.axis('off') plt.show() ``` 其,article.txt是要处理的文章,mask.png是云图的形状图片,msyh.ttc是字体文件。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值