找出文本中的中文，计数并按照逆序排序输出到文件中

最新推荐文章于 2024-07-18 09:43:25 发布

天天Jo

最新推荐文章于 2024-07-18 09:43:25 发布

阅读量195

点赞数

分类专栏： python 文章标签： python re

本文链接：https://blog.csdn.net/weixin_42784553/article/details/89874688

版权

python 专栏收录该内容

58 篇文章 1 订阅

订阅专栏

请读取以下附文（中英文混杂，有标点符号），将文中所有中文字找出来，且按每字在文章中出现的次数进行倒排，将结果输出到一个文件中（按行显示各字和统计个数）。
可以选择使用Python/Golang/C写代码。请注意考虑到文章文字较多，比如达到上万字时的执行性能和内存使用问题，可以考虑使用更优的比对和排序算法。

文件内容部分如下：

第一部分：六级英语阅读解析 Science, in practice, depends far less on the
experiments it prep是阿萨德阿萨德撒。。。。

中英文混杂，包含字符标点

解决办法


import re,time,operator

#使用字典进行计数存储
dict1={}
with open("a.txt","r",encoding="utf-8") as f:
    cons=f.readlines()
    for con in cons:
        for i in con.strip():
            if re.findall("[\u4E00-\u9FA5].*?",i):
                dict1.setdefault(i,0)
                dict1[i]=dict1[i]+1

#对字典进行一些处理然后进行 快速排序
def QuickSort(myList,start,end):
    if start < end:
        i,j = start,end
        base = myList[i]
        while i < j:
            while (i < j) and (myList[j][1] <= base[1]):
                j = j - 1
            myList[i] = myList[j]
            while (i < j) and (myList[i][1] >= base[1]):
                i = i + 1
            myList[j] = myList[i]
        myList[i] = base

        QuickSort(myList, start, i - 1)
        QuickSort(myList, j + 1, end)
    return myList

#常规的字典的sorted的排序方法
# list1=sorted(dict1.items(),key=lambda a:a[1],reverse=True)
# list1=sorted(dict1.items(),key = operator.itemgetter(1))

list2=list(dict1.items())
result_list=QuickSort(list2,0,len(list2)-1)

#将结果写入文件
with open("result.txt","w",encoding="utf-8") as f1:
    for j in result_list:
        f1.write(j[0]+":")
        f1.write(str(j[1])+"个"+"\n")

效率可能不是很高，有待提高希望有大神指点。。。