《西游记》jieba分词

import jieba
excludes={"一个","那里","怎么","我们","不知","两个","甚么","不是",
          "只见","原来","如何","这个","不曾","不敢","闻言","正是",
          "只是","那怪","出来","一声","真个","不得","这里","今日",
          "那个","取经","却说","如今","三个","这般","就是","不见",
          "铁棒","认得","不能","不要","果然","上前","有些","性命",
          "com","faloo","http"}
txt = open("D:\学习\Python\第六章\西游记\西游记.txt", "r").read()
words = jieba.lcut(txt)
counts = {}
for word in words:
    if len(word) == 1:
        continue
    elif word=="唐僧" or word=="师父":
        rword="唐僧"
    elif word=="三藏" or word=="沙僧":
        rword="沙僧"
    elif word=="老孙" or word=="大圣" or word=="悟空" or word=="孙行者" or word=="孙大圣":
        rword="悟空"
    else:
        rword=word
    counts[rword] = counts.get(rword,0) + 1
for word in excludes:
    del counts[word]
items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True) 
for i in range(20):
    word, count = items[i]
    print("{0:<10}{1:>5}".format(word, count))

  • 12
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值