Python3.7 练习题(二) 使用Python进行文本词频统计

 

# 使用Python进行词频统计

mytext = """Background 
Industrial Light & Magic (ILM) was started in 1975 by filmmaker George Lucas, in order to create the special effects for the original Star Wars film. Since then, ILM has grown into a visual effects powerhouse that has contributed not just to the entire Star Wars series, but also to films as diverse as Forrest Gump, Jurassic Park, Who Framed Roger Rabbit, Raiders of the Lost Ark, and Terminator 2. ILM has won numerous Academy Awards for Best Visual Effects, not to mention a string of Clio awards for its work on television advertisements.

While much of ILM's early work was done with miniature models and motion controlled cameras, ILM has long been on the bleeding edge of computer generated visual effects. Its computer graphics division dates back to 1979, and its first CG production was the 1982 Genesis sequence from Star Trek II: The Wrath of Khan.

In the early days, ILM was involved with the creation of custom computer graphics hardware and software for scanning, modeling, rendering, and compositing (the process of joining rendered and scanned images together). Some of these systems made significant advances in areas such as morphing and simulating muscles and hair.

Naturally, as time went by many of the early innovations at ILM made it into the commercial realm, but the company's position on the cutting edge of visual effects technology continues to rely on an ever-changing combination of custom in-house technologies and commercial products.

Today, ILM runs a batch processing environment capable of modeling, rendering and compositing tens of thousands of motion picture frames per day. Thousands of machines running Linux, IRIX, Compaq Tru64, OS X, Solaris, and Windows join together to provide a production pipeline that is used by approximately eight hundred users daily, many of whom write or modify code that controls every step of the production process. In this context, hundreds of commercial and in-house software components are combined to create and process each frame of computer-generated or enhanced film.

Making all this work, and keeping it working, requires a certain degree of technical wizardry, as well as a tool set that is up to the task of integrating diverse and frequently changing systems.

Enter Python
Back in 1996, in the 101 Dalmation days, ILM was exclusively an SGI IRIX shop, and the production pipeline was controlled by Unix shell scripting. 
At that time, ILM was producing 15-30 shots per show, typically only a small part of each feature length film to which they were contributing."""


def wordcount(str):
    strl_list = str.replace('\n', '').lower().split(" ")
    
    count_dict = {}
    for str in strl_list:
        if str in count_dict.keys():
            count_dict[str] = count_dict[str] + 1
        else:
            count_dict[str] = 1
    count_list = sorted(count_dict.items(), key=lambda x: x[1], reverse=True)
    return count_list

print(wordcount(mytext))

 示例:

E:\Python37\python.exe E:/PythonTest/Test/Test002.py
[('of', 22), ('the', 17), ('and', 16), ('to', 10), ('ilm', 8), ('was', 7), ('a', 7), ('in', 6), ('as', 6), ('that', 5), ('by', 4), ('for', 4), ('has', 4), ('visual', 4), ('on', 4), ('production', 4), ('effects', 3), ('star', 3), ('its', 3), ('early', 3), ('computer', 3), ('commercial', 3), ('create', 2), ('wars', 2), ('into', 2), ('not', 2), ('but', 2), ('diverse', 2), ('awards', 2), ('work', 2), ('with', 2), ('motion', 2), ('controlled', 2), ('edge', 2), ('graphics', 2), ('days,', 2), ('custom', 2), ('software', 2), ('modeling,', 2), ('compositing', 2), ('process', 2), ('made', 2), ('many', 2), ('at', 2), ('it', 2), ('an', 2), ('in-house', 2), ('thousands', 2), ('per', 2), ('pipeline', 2), ('is', 2), ('or', 2), ('this', 2), ('each', 2), ('background', 1), ('industrial', 1), ('light', 1), ('&', 1), ('magic', 1), ('(ilm)', 1), ('started', 1), ('1975', 1), ('filmmaker', 1), ('george', 1), ('lucas,', 1), ('order', 1), ('special', 1), ('original', 1), ('film.', 1), ('since', 1), ('then,', 1), ('grown', 1), ('powerhouse', 1), ('contributed', 1), ('just', 1), ('entire', 1), ('series,', 1), ('also', 1), ('films', 1), ('forrest', 1), ('gump,', 1), ('jurassic', 1), ('park,', 1), ('who', 1), ('framed', 1), ('roger', 1), ('rabbit,', 1), ('raiders', 1), ('lost', 1), ('ark,', 1), ('terminator', 1), ('2.', 1), ('won', 1), ('numerous', 1), ('academy', 1), ('best', 1), ('effects,', 1), ('mention', 1), ('string', 1), ('clio', 1), ('television', 1), ('advertisements.while', 1), ('much', 1), ("ilm's", 1), ('done', 1), ('miniature', 1), ('models', 1), ('cameras,', 1), ('long', 1), ('been', 1), ('bleeding', 1), ('generated', 1), ('effects.', 1), ('division', 1), ('dates', 1), ('back', 1), ('1979,', 1), ('first', 1), ('cg', 1), ('1982', 1), ('genesis', 1), ('sequence', 1), ('from', 1), ('trek', 1), ('ii:', 1), ('wrath', 1), ('khan.in', 1), ('involved', 1), ('creation', 1), ('hardware', 1), ('scanning,', 1), ('rendering,', 1), ('(the', 1), ('joining', 1), ('rendered', 1), ('scanned', 1), ('images', 1), ('together).', 1), ('some', 1), ('these', 1), ('systems', 1), ('significant', 1), ('advances', 1), ('areas', 1), ('such', 1), ('morphing', 1), ('simulating', 1), ('muscles', 1), ('hair.naturally,', 1), ('time', 1), ('went', 1), ('innovations', 1), ('realm,', 1), ("company's", 1), ('position', 1), ('cutting', 1), ('technology', 1), ('continues', 1), ('rely', 1), ('ever-changing', 1), ('combination', 1), ('technologies', 1), ('products.today,', 1), ('runs', 1), ('batch', 1), ('processing', 1), ('environment', 1), ('capable', 1), ('rendering', 1), ('tens', 1), ('picture', 1), ('frames', 1), ('day.', 1), ('machines', 1), ('running', 1), ('linux,', 1), ('irix,', 1), ('compaq', 1), ('tru64,', 1), ('os', 1), ('x,', 1), ('solaris,', 1), ('windows', 1), ('join', 1), ('together', 1), ('provide', 1), ('used', 1), ('approximately', 1), ('eight', 1), ('hundred', 1), ('users', 1), ('daily,', 1), ('whom', 1), ('write', 1), ('modify', 1), ('code', 1), ('controls', 1), ('every', 1), ('step', 1), ('process.', 1), ('context,', 1), ('hundreds', 1), ('components', 1), ('are', 1), ('combined', 1), ('frame', 1), ('computer-generated', 1), ('enhanced', 1), ('film.making', 1), ('all', 1), ('work,', 1), ('keeping', 1), ('working,', 1), ('requires', 1), ('certain', 1), ('degree', 1), ('technical', 1), ('wizardry,', 1), ('well', 1), ('tool', 1), ('set', 1), ('up', 1), ('task', 1), ('integrating', 1), ('frequently', 1), ('changing', 1), ('systems.enter', 1), ('pythonback', 1), ('1996,', 1), ('101', 1), ('dalmation', 1), ('exclusively', 1), ('sgi', 1), ('irix', 1), ('shop,', 1), ('unix', 1), ('shell', 1), ('scripting.', 1), ('time,', 1), ('producing', 1), ('15-30', 1), ('shots', 1), ('show,', 1), ('typically', 1), ('only', 1), ('small', 1), ('part', 1), ('feature', 1), ('length', 1), ('film', 1), ('which', 1), ('they', 1), ('were', 1), ('contributing.', 1)]

Process finished with exit code 0

 

 

转载于:https://www.cnblogs.com/dangzhengtao/p/9606382.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值