使用python分析英文文章词频1.0

⭐准备工作

1.环境:python3

2.模块:matplotlib(数据可视化库)

3.pycharm

首先准备一篇英文文章:

I come from a large family of nine brothers and sisters, and all of us have kids of our own. On each Christmas night, our entire family gathers at my oldest sister's home, exchanging gifts, watching the nativity skit1 put on by the smaller children, eating, singing and enjoying a visit from Santa himself.
 
The Christmas of 1988, my husband Bob and I had four children. Peter was eleven, Leigh-Ann was nine, Laura was six and Matthew was two. When Santa arrived, Matthew parked himself on Santa's lap and pretty much remained dazzled by him for the rest of the evening. Anyone who had their picture taken with Santa that Christmas also had their picture taken with little Matthew.
 
Little did any of us know how precious those photos with Santa and Matthew would become. Five days after Christmas, our sweet little Matthew died in an accident at home. We were devastated2. We were lucky to have strong support from our families and friends to help us through. I learned that the first year after a death is the hardest, as there are so many firsts to get through without your loved one. Birthdays and special occasions become sad, instead of joyous3.
 
When our first Christmas without Matthew approached, it was hard for me to get into the holiday spirit. Bob and I could hardly face putting up the decorations or shopping for special gifts for everyone. But we went through the motions for Peter, Leigh-Ann and Laura. Then, on December 13, something extraordinary happened to raise our spirits when we didn't think it was possible.
 
We were just finishing dinner when we heard a knock on the front door. When we went to answer it, no one was there. However, on the front porch was a card and gift. We opened the card and read that the gift-giver wanted to remain anonymous4; he or she just wanted to help us get through a rough time by cheering us up.
 
In the gift bag was a cassette of favorite Christmas music, which was in a little cardboard Christmas tree. The card described it as being "a cartridge5 in a pine tree," a twist on the "partridge in a pear tree" verse in the song, "The Twelve Days of Christmas." We thought that it was a very clever gift, and the thoughtfulness of our "elf" touched our hearts. We put the cassette in our player and, song by song, the spirit of Christmas began to warm our hearts.
 
That was the beginning of a series of gifts from the clever giver, one for each day until Christmas. Each gift followed the theme of "The Twelve Days of Christmas" in a creative way. The kids especially liked "seven swans a-swimming," which was a basket of swan-shaped soaps plus passes to the local swimming pool, giving the kids something to look forward to when the warm days of spring arrived. "Eight maids a-milking" included eight bottles of chocolate milk, eggnog and regular milk in glass bottles with paper faces, handmade aprons6 and caps. Every day was something very special. The "five golden rings" came one morning just in time for breakfast - five glazed7 doughnuts just waiting to be eaten.
 
We would get calls from our family, neighbors and friends who would want to know what we had received that day. Together, we would chuckle8 at the ingenuity9 and marvel10 at the thoughtfulness as we enjoyed each surprise. We were so caught up in the excitement and curiosity of what would possibly come next, that our grief didn't have much of a chance to rob us of the spirit of Christmas. What our elf did was absolutely miraculous11.
 
Each year since then, as we decorate our Christmas tree, we place on it the decorations we received that Christmas while we play the song "The Twelve Days of Christmas." We give thanks for our elf who was, we finally realized, our very own Christmas angel. We never did find out who it was, although we have our suspicions. We actually prefer to keep it that way. It remains12 a wondrous13 and magical experience - as mysterious and blessed as the very first Christmas.

命名为angel.txt,方便起见将其存入pycharm当前默认目录下

import string
from matplotlib import pyplot as plt

'''1.读取文章的每一行,分词,后把词作为字典的key保存,key的value代表这个词出现的次数
'''
hist = []

def process_line(line, hist):
    '''处理每一行'''
    for word in line.split():
        '''消除所有的标点符号'''
        word = word.strip(string.punctuation + string.whitespace)
        '''单词格式统一'''
        word = word.lower()
        '''append()将一个元素添加到列表末端'''
        hist.append(word)


with open('angel.txt','r') as f:
    for line in f:
        process_line(line,hist)

#print(len(hist))
#print(hist)
'''以上是清洗数据'''

'''词频列表'''
res = {}
for word in hist:
     #if word not in res:
         #res[word] = 1
     #else:
         #res[word] += 1
    res[word] = res.get(word,0) + 1
    '''get()相当于一条if。。。else。。。语句,D.get(k,[,d]) 如果参数k在字典D中,get将返回D[k],如果参数不在字典D中,
    则返回参数d。
    '''

#《零基础学python》第63页
t = sorted(res.items(),key=lambda d:d[1],reverse=True)
print(sorted(res.items(),key=lambda d:d[1],reverse=True))

#for key,value in res.items():
    #t.append([value,key])
#排序
#t.sort(reverse=True)
#print(t)
'''items()函数:用于字典的遍历,并返回(key,value)元组组成的列表'''

#画图
#添加数据
for i in range(20):
    #plt.bar([t[i][0]],[t[i][1]])
    plt.bar(t[i][:-1], t[i][1:])
plt.legend()
plt.xlabel('word')
plt.ylabel('rate')
plt.title('rate')
plt.show()
#plt.bar(['we'],[24])
#plt.bar(['of'],[23])
#plt.bar(['and'],[22])

运行结果:

  • 4
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值