python英文词频统计代码_python词频统计_英文

weixin_39661405

于 2020-11-23 04:35:55 发布

阅读量451

点赞数

文章标签： python英文词频统计代码

python词频统计_英文

2020-08-15 05:22

阅读数 22

<>代码

大家都在写中文的词频统计，我接触了python都有好几年了，还写英文的，真的是，就。直接贴个代码吧。

text = """ British newspapers are much smaller than they used to be and their

readers are often in a hurry , so newspapermen write as few words as possible .

They tell their readers at once what happened , where , when and how it

happened and what was the result : how many people were killed , what change

was done and so on . Readers want the fact set out as fully and accurately as

possible . Readers are also interested in the people who have seen the accident

. So a newspaperman always likes to get some information from someone who was

there , which can be given in the person’s own words . Because he can use only

a few words , the newspaperman must choose those words carefully , every one

must be effective . Instead of “ he called out in a loud voice ” , he writes ”

he shouted ” ; instead of “the loose stones rolled noisily down the side of the

mountain ” , he will write ” they thundered down the mountainside ” . Because

many of the readers are not very clever, and most of them are in a hurry. """

def getTxt(txt): #对文本预处理（包括） txt = txt.lower()#将所有的单词全部转化成小写 for ch in

",，,.!、!@#$%^'”“;'’": #将所有除了单词以外的符号换成空格 txt=txt.replace(ch, ' ') return txt

txtArr= getTxt(text).split() counts = {} for word in txtArr: counts[word] =

counts.get(word, 0) + 1 countsList = list(counts.items()) countsList.sort(key=

lambda x:x[1], reverse=True) for i in range(20): word, count = countsList[i]

print('{0:<10}{1:>10}'.format(word,count))

<>代码解说

* 在百度找了一篇英语阅读，作为text统计词频。

* str.lower()，将所有的单词全部转化成小写然后返回转化结果，原str不变

* str.replace(‘a’, ‘b’)，将str中的所有的a字符换成b字符并返回换后结果，原str不变

*

str.split()，split()不带参数默认为以所有的空字符，包括空格、换行(\n)、制表符(\t)等为分隔符分割str，并返回分割结果（list）

* dic.get(“a”,val)，在字典dic中取出键为a对应的值，如果字典中不存在键为a的键值对，则返回val

* list.sort( key=None, reverse=False)

key – 主要是用来进行比较的元素，只有一个参数，具体的函数的参数就是取自于可迭代对象中，指定可迭代对象中的一个元素来进行排序。

reverse – 排序规则，reverse = True 降序， reverse = False 升序（默认）。

文中用了lambda表达式，lambda是声明符，后面跟参数，：前面是参数，冒号后面的表达式是lambda的处理结果，这个表达式中，参数是x，处理结果是

x[1]。sort中key参数会给后面的表达式赋值一个list中的元素。如：list为[('a':5),('b':3)]，执行sort时会分别把('a':5)和

('b':3)赋值给key后面的lambda表达式，也就是x参数会接受到这两个值。countsList.sort(key=lambda x:x[1],

reverse=True) #等同与 def takeSecond(elem): return elem[1] countsList.sort(key=

takeSecond, reverse=True)

* print在python3 中已经被函数化了，python2中可以print a，python3 中必须print(a).

* 在python3中可以help(print), (注意，在python2中是不能help(print)的，因为其不是一个函数)

* print('{0:<10}{1:>10}'.format(word,count))

参数括号里第一个大括号的0表示这个大括号是给format中第一个参数word占位的，：后<号表示这一列左对齐，10表示这一列长度为10。第二个大括号里的1表示这个大括号是给format中第二个参数count占位的，：后的>表示这一列右对齐，1010表示这一列长度为10。只有单位的话，有人弄清楚可以跟我讲。。。

<>运行结果

* 下一个做。。。中文的分词和词云图吧，看着好像挺好玩的。

weixin_39661405

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python英文词频统计代码_python词频统计_英文

python词频统计_英文2020-08-15 05:22阅读数 22代码大家都在写中文的词频统计，我接触了python都有好几年了，还写英文的，真的是，就。直接贴个代码吧。text = """ British newspapers are much smaller than they used to be and theirreaders are often in a hurry , so ne...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。