Python语言基础--4（词频统计）

最新推荐文章于 2023-02-20 11:21:37 发布

SunChao3555

最新推荐文章于 2023-02-20 11:21:37 发布

阅读量469

点赞数

分类专栏： Python 文章标签： Python 词频统计

本文链接：https://blog.csdn.net/SunChao3555/article/details/79137283

版权

Python 专栏收录该内容

41 篇文章 1 订阅

订阅专栏

#coding:utf-8
import time
import string
num=[6,2,7,4,1,3,5]
str='dfjyfhbs'
print sorted(num,reverse=True)
for a,b in zip(num,str):
    print b,'is',a
a=[]
t1=time.clock()
for i in range(1,20000):
    a.append(i)
print time.clock()-t1
t1=time.clock()
b=[i for i in range(1,200)]
#print b
print time.clock()-t1
#列表推导式 线'|'后面是for循环的表达式，而线'|'前面的可以认为是我们想要放在列表中的元素
#list=[item|for item in iterable]
c=[n for n in range(1,10) if n%2==0]
z=[letter.lower() for letter in 'ABCDEFG']
#c[2, 4, 6, 8]
#z['a', 'b', 'c', 'd', 'e', 'f', 'g']
#print c,'\n',z
#词频统计
path='C:\Users\Administrator\Desktop\s.txt '
with open(path,'r')as text:
    #strip(string.punctuation)可以去掉所有的标点符号
    #在文字的首位去掉了连在一起的标点符号，并把首字母大写的单词转化成小写
    words=[raw_word.strip(string.punctuation).lower() for raw_word in text.read().split()]
    #将列表用set函数转换成集合，自动去掉了其中所有重复的元素
    words_index=set(words)
    #创建一个以单词为key，出现频率为value的字典
    counts_dict={index:words.count(index) for index in words_index}
    print(words)
#打印整理后的函数，其中key=lambda x:counts_dict[x]叫做lambda表达式
#可以暂且理解为以字典中的值为排序的参数
for word in sorted(counts_dict,key=lambda x:counts_dict[x],reverse=True):
        print('{}---{} times'.format(word,words.count(word)))