Python 统计文本中单词的个数

1.读文件,通过正则匹配

 1 def statisticWord():
 2     line_number = 0
 3     words_dict = {}
 4     with open (r'D:\test\test.txt',encoding='utf-8') as a_file:
 5         for line in a_file:
 6             words = re.findall(r'&#\d+;|&#\d+;|&\w+;',line)
 7             for word in words:
 8                 words_dict[word] = words_dict.get(word,0) + 1 #get the value of word, default is 0
 9         sort_words_dict = OrderedDict(sorted(words_dict.items(),key = lambda x : x[1], reverse = True))
10 #        sort_words_dict = sorted(words_dict, key = operator.itemgetter(1))
11         with open(r'D:\test\output.txt',encoding = 'utf-8', mode='w') as b_file:
12             for k,v in sort_words_dict.items():
13                 b_file.write("%-15s:%15s" % (k,v))
14                 b_file.write('\n')

2. 通过命令行参数

def statisticWord2():
    if len(sys.argv) == 1 or sys.argv[1] in {"-h", "--help"}:
        print("usage: filename_1 filename_2 ... filename_n")
        sys.exit()
    else:
        words = {}
        strip = string.whitespace + string.punctuation + string.digits + "\"'"
        for filename in sys.argv[1:]:
            for line in open(filename):
                for word in line.split():
                    word = word.strip(strip) # remove all the combination of strip in prefix or suffix
                    if len(word) >= 2:
                        words[word] = words.get(word, 0) + 1
        for word in sorted(words):
            print("'{0}' occurs {1} times".format(word,words[word]))

 

转载于:https://www.cnblogs.com/zyf7630/p/3209976.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值