词频统计:
转载来自:https://www.cnblogs.com/sigmod/p/wordcount.html
import re
str_text="""The placement of replicas is critical to HDFS reliability and performance. reater than network bandwidth between machines in different racks."""
"""去除掉所有非字母和数字的字符"""
str_text = re.sub('[^a-zA-Z0-9n]', ' ', str_text)
"""此处的split最初使用的是split(' '),后面修改为split(),默认以所有的空字符进行切片,如不用split()换行符处的list会打印‘’"""
str_lyst1=str_text.split()
count_dict={}
for item in str_lyst1:
if item in count_dict.keys():
count_dict[item] +=1
else:
count_dict[item] =1
count_list=sorted(count_dict.items(),key=lambda x:x[1],reverse=True)
print (count_list)
Python合并两个有序链表