python学习笔记__词频统计

统计英语6级试题中所有单词的词频,并返回一个如下样式的字典

{'and':100,'abandon':5}

英语6级试题的文件路径./artical.txt

Tip: 读取文件的方法

def get_artical(artical_path):
    with open(artical_path) as fr:
        data = fr.read()
    return data

get_artical('./artical.txt')

处理要求

  • (a) '\n'是换行符 需要删除
  • (b) 标点符号需要处理
['.', ',', '!', '?', ';', '\'', '\"', '/', '-', '(', ')']
  • (c) 阿拉伯数字需要处理
['1','2','3','4','5','6','7','8','9','0'] 
  • (d) 注意大小写 一些单词由于在句首,首字母大写了。需要把所有的单词转成小写
'String'.lower()
  • (e) 高分项

通过自己查找资料学习正则表达式,并在代码中使用(re模块)

可参考资料:https://docs.python.org/3.7/library/re.html

In [3]

# 请根据处理要求下面区域完成代码的编写。
def get_artical(artical_path):
    with open(artical_path) as fr:
        data = fr.read()
    return data

# get_artical()为自定义函数,可用于读取指定位置的试题内容。
# get_artical('./artical.txt')

In [1]

 

In [4]

import re #引入正则模块

# 请根据处理要求下面区域完成代码的编写。
def get_artical(artical_path):
    with open(artical_path) as fr:
        data = fr.read()
    return data

#处理函数
def handle(data):    
    counts = {}
    #data = data.lower()
    data1 = re.sub('\n',' ',data)#替换换行符为空格
    reg = "[^A-Za-z\']"
    data = re.sub(reg,' ',data1)#只保存英文,同时保留don't isn't类似单词
    data = data.lower()
    list_data = data.split()#列表
    #遍历统计
    for word in list_data:
        if word in counts.keys():
            counts[word] = counts[word] +1
        else:
            counts[word] = 1
    return counts

# get_artical()为自定义函数,可用于读取指定位置的试题内容。
data = get_artical('./artical.txt')
re_counts = handle(data)
re_counts = sorted(re_counts.items(),key=lambda x:x[1],reverse=True)#词频从大到小排序,
print(re_counts)
[('the', 169), ('to', 95), ('a', 87), ('of', 82), ('and', 74), ('in', 61), ('is', 34), ('antarctic', 32), ('they', 30), ('their', 28), ('are', 25), ('on', 25), ('what', 25), ('that', 24), ('for', 22), ('b', 20), ('c', 20), ('d', 20), ('fishing', 20), ('with', 19), ('be', 18), ('as', 18), ('it', 18), ('penguins', 17), ('you', 15), ('king', 15), ('krill', 14), ('students', 14), ('we', 13), ('about', 13), ('have', 12), ('will', 12), ('from', 12), ('passage', 11), ('do', 11), ('them', 11), ('change', 11), ('breeding', 11), ('was', 10), ('new', 10), ('may', 10), ('not', 10), ('many', 10), ('can', 10), ('grounds', 10), ('ocean', 10), ('marine', 10), ('this', 9), ('our', 9), ('an', 9), ('or', 9), ('around', 9), ('but', 9), ('schools', 9), ('one', 8), ('attitudes', 8), ('your', 8), ('find', 8), ('does', 8), ('species', 8), ('at', 7), ('when', 7), ("don't", 7), ('only', 7), ('penguin', 7), ('climate', 7), ('global', 7), ('out', 7), ('sea', 7), ('whales', 7), ('trips', 7), ('those', 6), ('if', 6), ('attitude', 6), ('way', 6), ('all', 6), ('said', 6), ('by', 6), ('behavior', 6), ('some', 6), ('study', 6), ('colonies', 6), ('being', 6), ('should', 6), ('conservation', 6), ('southern', 6), ('school', 6), ('help', 6), ('could', 6), ('based', 5), ('her', 5), ('has', 5), ('there', 5), ('time', 5), ('think', 5), ('how', 5), ('according', 5), ('no', 5), ('waters', 5), ('report', 5), ('region', 5), ('which', 5), ('impact', 5), ('campaign', 5), ('areas', 5), ('sanctuary', 5), ('ccamlr', 5), ('its', 5), ('parents', 5), ('children', 5), ('want', 5), ('questions', 4), ('following', 4), ('she', 4), ('been', 4), ('better', 4), ('most', 4), ('people', 4), ('world', 4), ('even', 4), ('become', 4), ('feel', 4), ('found', 4), ('say', 4), ('believe', 4), ('learn', 4), ('immediate', 4), ('environment', 4), ("one's", 4), ('antarctica', 4), ('future', 4), ('great', 4), ('protect', 4), ('seas', 4), ('area', 4), ('these', 4), ('key', 4), ('society', 4), ('pupils', 4), ('disadvantaged', 4), ('such', 4), ('community', 4), ('see', 4), ('author', 4), ('warming', 4), ('ecosystems', 4), ('islands', 4), ('food', 4), ('last', 3), ('child', 3), ('body', 3), ('few', 3), ('who', 3), ('operations', 3), ('would', 3), ('psychological', 3), ('turn', 3), ('into', 3), ('opportunities', 3), ('chance', 3), ('behaving', 3), ('events', 3), ('ways', 3), ('further', 3), ('though', 3), ('other', 3), ('increasingly', 3), ('consistent', 3), ('why', 3), ('line', 3), ('because', 3), ('good', 3), ('than', 3), ('changes', 3), ('level', 3), ("can't", 3), ('proposed', 3), ("'s", 3), ('over', 3), ('two', 3), ('industrial', 3), ('greenpeace', 3), ('ecosystem', 3), ('concern', 3), ('launched', 3), ('sanctuaries', 3), ("greenpeace's", 3), ('any', 3), ('huge', 3), ('just', 3), ('although', 3), ('between', 3), ('more', 3), ('migrate', 3), ('habitats', 3), ('pressures', 3), ('pounds', 3), ('cannot', 3), ('life', 3), ('travel', 3), ('trip', 3), ('face', 3), ('ingenuity', 3), ('field', 3), ('cover', 3), ('activities', 3), ('pristine', 3), ('disappear', 3), ('le', 3), ('bohec', 3), ('human', 3), ('water', 3), ('born', 2), ('outside', 2), ('likely', 2), ('asked', 2), ('without', 2), ('privilege', 2), ('example', 2), ('powerful', 2), ('us', 2), ('towards', 2), ('ideologies', 2), ('rather', 2), ('happens', 2), ('during', 2), ('both', 2), ('did', 2), ('family', 2), ('gender', 2), ('important', 2), ('same', 2), ('throughout', 2), ('goals', 2), ('also', 2), ('internally', 2), ('something', 2), ('however', 2), ('studies', 2), ('feelings', 2), ('thoughts', 2), ('easy', 2), ('beliefs', 2), ('behave', 2), ('awareness', 2), ('idea', 2), ('start', 2), ('already', 2), ('take', 2), ('consider', 2), ('case', 2), ('another', 2), ('contribute', 2), ('educational', 2), ('different', 2), ('suggest', 2), ('going', 2), ("person's", 2), ('come', 2), ('afford', 2), ('changing', 2), ('require', 2), ('threatening', 2), ("world's", 2), ('were', 2), ('feeding', 2), ('posed', 2), ('threat', 2), ('amid', 2), ('growing', 2), ('industry', 2), ('protection', 2), ('wildlife', 2), ('weddell', 2), ('peninsula', 2), ('decide', 2), ('expected', 2), ('later', 2), ('science', 2), ('manager', 2), ('sustainable', 2), ('he', 2), ('season', 2), ('protected', 2), ('part', 2), ('scientific', 2), ('policy', 2), ('discussions', 2), ('long', 2), ('open', 2), ('dialogue', 2), ('data', 2), ('experience', 2), ('too', 2), ('close', 2), ('keep', 2), ('best', 2), ('horizons', 2), ('ambitious', 2), ('divided', 2), ('unequal', 2), ('hungry', 2), ('families', 2), ('poverty', 2), ('says', 2), ('fundraising', 2), ('up', 2), ('neighbours', 2), ('cost', 2), ('able', 2), ('fuel', 2), ('spirit', 2), ('income', 2), ('well', 2), ('together', 2), ('enable', 2), ('benefit', 2), ('rich', 2), ('understanding', 2), ('participate', 2), ('much', 2), ('kids', 2), ('rising', 2), ('temperatures', 2), ('overfishing', 2), ('pushed', 2), ('extinction', 2), ('wilderness', 2), ('percent', 2), ('forced', 2), ('current', 2), ('findings', 2), ('separate', 2), ('population', 2), ('potentially', 2), ('relocate', 2), ('second', 2), ('largest', 2), ('isolated', 2), ('ice', 2), ('polar', 2), ('front', 2), ('feed', 2), ('kill', 2), ('longer', 2), ('entire', 2), ('serve', 2), ('like', 2), ('higher', 2), ('levels', 2), ('indicators', 2), ('suitable', 2), ('rise', 2), ('year', 1), ('hospital', 1), ('uk', 1), ('heart', 1), ('babies', 1), ('survive', 1), ('rare', 1), ('condition', 1), ('must', 1), ('endure', 1), ('numerous', 1), ('complex', 1), ('needs', 1), ('mother', 1), ('interviewed', 1), ('three', 1), ('weeks', 1), ('after', 1), ("daughter's", 1), ('birth', 1), ('prepared', 1), ('might', 1), ('daunting', 1), ('task', 1), ('caring', 1), ('answered', 1), ('hesitation', 1), ('far', 1), ('concerned', 1), ('rarely', 1), ('power', 1), ('tools', 1), ('allow', 1), ('mistakes', 1), ('loss', 1), ('beginnings', 1), ('settled', 1), ('thinking', 1), ('feeling', 1), ('particular', 1), ('objects', 1), ('use', 1), ('filter', 1), ('interpret', 1), ('react', 1), ("weren't", 1), ('learned', 1), ('number', 1), ('influences', 1), ('occur', 1), ('early', 1), ('childhood', 1), ('include', 1), ('happened', 1), ('directly', 1), ('presence', 1), ('acquire', 1), ('distinctive', 1), ('identity', 1), ('refined', 1), ('whom', 1), ('identify', 1), ('culture', 1), ('admire', 1), ('know', 1), ('personally', 1), ('friendships', 1), ('relationships', 1), ('particularly', 1), ('adolescence', 1), ('adulthood', 1), ('information', 1), ('receive', 1), ('especially', 1), ('ideas', 1), ('repeated', 1), ('association', 1), ('achievements', 1), ('attractive', 1), ('refines', 1), ('assume', 1), ('someone', 1), ('predicts', 1), ('necessarily', 1), ('predict', 1), ('general', 1), ('hold', 1), ('similar', 1), ("that's", 1), ('benefits', 1), ('recycling', 1), ('exercise', 1), ('views', 1), ('takes', 1), ('effort', 1), ('courage', 1), ('go', 1), ('beyond', 1), ('merely', 1), ('stating', 1), ('effective', 1), ("you'd", 1), ('prefer', 1), ('reflect', 1), ('anything', 1), ('burden', 1), ('so', 1), ('right', 1), ('now', 1), ('latter', 1), ('shapes', 1), ('improves', 1), ('wellbeing', 1), ('determines', 1), ('respond', 1), ('interact', 1), ('refinement', 1), ("idols'", 1), ('behaviors', 1), ('contact', 1), ('opposite', 1), ('interaction', 1), ('cultures', 1), ("people's", 1), ('person', 1), ('mentality', 1), ('expression', 1), ('interpersonal', 1), ('relations', 1), ('matter', 1), ('hypocritical', 1), ('lack', 1), ('willpower', 1), ('strategy', 1), ('things', 1), ('attention', 1), ('starting', 1), ('act', 1), ('embodies', 1), ('aspirations', 1), ('adjusting', 1), ('gradually', 1), ('period', 1), ('considering', 1), ('reducing', 1), ('burdens', 1), ('unspoilt', 1), ('wildernesses', 1), ('analysed', 1), ('movements', 1), ('vessels', 1), ('operating', 1), ('vicinity', 1), ('whale', 1), ('highlights', 1), ('incidents', 1), ('boats', 1), ('involved', 1), ('groundings', 1), ('oil', 1), ('spills', 1), ('accidents', 1), ('serious', 1), ('published', 1), ('tuesday', 1), ('comes', 1), ('create', 1), ('network', 1), ('calling', 1), ('halt', 1), ('considered', 1), ('status', 1), ('frida', 1), ('bengtsson', 1), ('wants', 1), ('show', 1), ("it's", 1), ('responsible', 1), ('player', 1), ('then', 1), ('voluntarily', 1), ('getting', 1), ('instead', 1), ('backing', 1), ('tracts', 1), ('tract', 1), ('protecting', 1), ('banning', 1), ('created', 1), ('ross', 1), ('reserve', 1), ('vast', 1), ('third', 1), ('under', 1), ('consideration', 1), ('west', 1), ('commission', 1), ('living', 1), ('resources', 1), ('manages', 1), ('proposal', 1), ('conference', 1), ('australia', 1), ('october', 1), ('decision', 1), ('until', 1), ('keith', 1), ('reid', 1), ('organisation', 1), ('sought', 1), ('balance', 1), ('taking', 1), ('place', 1), ('nearer', 1), ('often', 1), ('happening', 1), ('empty', 1), ('creation', 1), ('system', 1), ('ongoing', 1), ('added', 1), ('term', 1), ('operation', 1), ('depends', 1), ('healthy', 1), ('thriving', 1), ('always', 1), ('had', 1), ('environmental', 1), ('non', 1), ('governmental', 1), ('organisations', 1), ('strongly', 1), ('intend', 1), ('continue', 1), ('including', 1), ('talks', 1), ('discuss', 1), ('improvements', 1), ('latest', 1), ('ones', 1), ('establishment', 1), ('hope', 1), ('positively', 1), ('knowledge', 1), ('caused', 1), ('depriving', 1), ('carried', 1), ('unprecedented', 1), ('purpose', 1), ('reduce', 1), ('establish', 1), ('regulate', 1), ('publicise', 1), ('recommendation', 1), ('opting', 1), ('operate', 1), ('away', 1), ('suggested', 1), ('volunteering', 1), ('endangered', 1), ('refraining', 1), ('showing', 1), ('sense', 1), ('responsibility', 1), ('leading', 1), ('aim', 1), ('raise', 1), ('public', 1), ('vulnerability', 1), ('ban', 1), ('commercial', 1), ('interference', 1), ('sustain', 1), ('damaging', 1), ('define', 1), ('role', 1), ('coordinator', 1), ('authority', 1), ('big', 1), ('analysis', 1), ('provider', 1), ('needed', 1), ('expertise', 1), ('initiator', 1), ('microcosm', 1), ('mediate', 1), ('seek', 1), ('alleviate', 1), ('external', 1), ('while', 1), ('equipping', 1), ('understand', 1), ('handle', 1), ('once', 1), ('sheltering', 1), ('broadening', 1), ('circumstances', 1), ('ideals', 1), ('clash', 1), ('outright', 1), ('adults', 1), ('adventure', 1), ('lifetime', 1), ('treks', 1), ('bomeo', 1), ('sports', 1), ('tour', 1), ('barbados', 1), ('appear', 1), ('almost', 1), ('routine', 1), ('state', 1), ('thousands', 1), ('profit', 1), ('companies', 1), ('arrange', 1), ('meanwhile', 1), ('arrive', 1), ('breakfast', 1), ('action', 1), ('group', 1), ('nine', 1), ('every', 1), ('classroom', 1), ('fall', 1), ('below', 1), ('discrepancy', 1), ('startlingly', 1), ('apparent', 1), ('introducing', 1), ('requirement', 1), ('off', 1), ('tap', 1), ('richer', 1), ('aunts', 1), ('probing', 1), ('rock', 1), ('pools', 1), ('local', 1), ('beach', 1), ('practising', 1), ('french', 1), ('language', 1), ('exchange', 1), ('fire', 1), ("children's", 1), ('passions', 1), ('boost', 1), ('skills', 1), ('eyes', 1), ('possibilities', 1), ('outings', 1), ('bright', 1), ('get', 1), ('scores', 1), ('tests', 1), ('globalised', 1), ('age', 1), ('international', 1), ('manage', 1), ('abroad', 1), ('easily', 1), ('holiday', 1), ('immense', 1), ('mounting', 1), ('financial', 1), ('shown', 1), ('remarkable', 1), ('determination', 1), ('ensuring', 1), ('truly', 1), ('applauded', 1), ('methods', 1), ('whole', 1), ('proceeds', 1), ('pooled', 1), ('extend', 1), ('justified', 1), ('average', 1), ('initiatives', 1), ('doors', 1), ('pull', 1), ('expensive', 1), ('little', 1), ('party', 1), ('celebration', 1), ('guilt', 1), ('left', 1), ('behind', 1), ('department', 1), ('education', 1), ('guidance', 1), ('charge', 1), ('board', 1), ('lodging', 1), ('syllabus', 1), ('receiving', 1), ('government', 1), ('aid', 1), ('exempt', 1), ('costs', 1), ('seem', 1), ('ignore', 1), ('advice', 1), ('kind', 1), ('glamorous', 1), ('exotic', 1), ('becoming', 1), ('common', 1), ('bring', 1), ('communities', 1), ('single', 1), ('handed', 1), ('least', 1), ('expect', 1), ('foster', 1), ('divisions', 1), ('exclude', 1), ('prepare', 1), ('challenge', 1), ('social', 1), ('motivate', 1), ('develop', 1), ('physical', 1), ('intellectual', 1), ('abilities', 1), ('encourage', 1), ('achieve', 1), ('backgrounds', 1), ('mix', 1), ('each', 1), ('widen', 1), ('gap', 1), ('privileged', 1), ('give', 1), ('relatives', 1), ('build', 1), ('aiming', 1), ('improve', 1), ('services', 1), ("students'", 1), ('mutual', 1), ('involving', 1), ('campus', 1), ('low', 1), ('regarding', 1), ('hard', 1), ('miss', 1), ('broaden', 1), ('despite', 1), ('adventures', 1), ('run', 1), ('risks', 1), ("author's", 1), ('expectation', 1), ('bringing', 1), ('resolving', 1), ('existing', 1), ('discrepancies', 1), ('avoiding', 1), ('creating', 1), ('gaps', 1), ('among', 1), ('giving', 1), ('poor', 1), ('preferential', 1), ('treatment', 1), ('populations', 1), ('brink', 1), ('end', 1), ('century', 1), ("study's", 1), ('states', 1), ('transforms', 1), ('either', 1), ('co', 1), ('celine', 1), ('university', 1), ('strasbourg', 1), ('france', 1), ('warned', 1), ("there're", 1), ('actions', 1), ('aimed', 1), ('halting', 1), ('controlling', 1), ('pace', 1), ('induced', 1), ('stays', 1), ('soon', 1), ('earlier', 1), ('month', 1), ('combination', 1), ('disastrous', 1), ('seals', 1), ("today's", 1), ('starkest', 1), ('yet', 1), ('devastating', 1), ('exploitation', 1), ("antarctic's", 1), ('delicate', 1), ('unless', 1), ('greenhouse', 1), ('gas', 1), ('emissions', 1), ('drop', 1), ('million', 1), ('pairs', 1), ('type', 1), ('breed', 1), ('specific', 1), ('where', 1), ('access', 1), ('warms', 1), ('called', 1), ('upward', 1), ('movement', 1), ('nutrient', 1), ('supports', 1), ('abundance', 1), ('south', 1), ('means', 1), ('fish', 1), ('leaving', 1), ('chicks', 1), ('distance', 1), ('fool', 1), ('prows', 1), ('wiped', 1), ('plight', 1), ('seabirds', 1), ('mammals', 1), ('occupy', 1), ('chain', 1), ('call', 1), ('bio', 1), ('sensitive', 1), ('predicting', 1), ('impacts', 1), ('sub', 1), ('closer', 1), ('retreating', 1), ('source', 1), ('scarce', 1), ('handful', 1), ('sustaining', 1), ('large', 1), ('happen', 1), ('verge', 1), ('dying', 1), ('melting', 1), ('destroy', 1), ('forever', 1), ('shrinking', 1), ('force', 1), ('accelerated', 1), ('recent', 1), ('years', 1), ('fatal', 1), ('certain', 1), ('worsened', 1), ('pollution', 1), ('birds', 1), ('extinct', 1), ('primarily', 1), ('kinds', 1), ('majority', 1), ('baby', 1), ('live', 1), ('invade', 1), ("penguins'", 1), ('distances', 1), ('reluctant', 1), ('leave', 1), ('propagation', 1), ('ultimate', 1), ('retreat', 1), ('luge', 1)]

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

身在江湖的郭大侠

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值