综合练习:英文词频统计

  1. 词频统计预处理
  2. 下载一首英文的歌词或文章
  3. 将所有,.?!’:等分隔符全部替换为空格
  4. 将所有大写转换为小写
  5. 生成单词列表
  6. 生成词频统计
  7. 排序
  8. 排除语法型词汇,代词、冠词、连词
  9. 输出词频最大TOP10
word = '''
It's the most controversial Formula One introduction since the 2016 qualifying elimination clock. But unlike that ill-fated change, the halo should last more than two races.
Formally known as the "cockpit head protection system," the halo is proving highly divisive.
Team principals, drivers and fans are split over whether it is the right safety solution when the new grand prix season roars to life in Melbourne this week.
After years of research and development, FIA settled on the halo - a thong-like titanium and carbon-fibre structure above the cockpit - to protect drivers from flying debris following the fatal crashes of Jules Bianchi at the 2014 Japanese Grand Prix and Justin Wilson in an IndyCar race in the US the following year.
Mercedes team boss Toto Wolff is firmly among the halo haters. "If you give me a chainsaw I would take it off," he said at the launch of the team's 2018 car last month.
"I think we need to look after the driver's safety, but we need to come up with a solution that simply looks better," he added.
Motor racing purists are aghast because they say grand prix racing is supposed to be an open-cockpit formula; other fans moan it is just plain ugly; some drivers have said it restricts vision.
World champion Lewis Hamilton doesn't like the halo's look, but said: "We have known for some time it was coming and I think after a few races we will forget it is even there."
'''

symbol = [",", ".", "!", "?", "'", ":", "-"]
for i in symbol:
    word = word.replace(i, '')

newword = word.lower()
split
= word.split() newsword = {} for i in split: count = word.count(i) newsword[i] = count delwords = ''' a an the in on to at and of is was are were i he she you your they us their our it or for be too do no that s so as but it's ''' prep = delwords.split() for i in prep: if i in newsword.keys(): del (newsword[i])
newsword
= sorted(newsword.items(), key=lambda items: items[1], reverse=True)
for i in range(10): print(newsword[i])

 

转载于:https://www.cnblogs.com/BennyKuang/p/8625589.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值