python属于数据应用吗_Python数据科学(三)- python与数据科学应用(Ⅲ)

传送门:

1.使用Python计算文章中的字

speech_text = '''

I love you,Not for what you are,But for what I amWhen I am with you.I love you,Not

only for whatYou have made of yourself,But for whatYou are making of me.I love

youFor the part of meThat you bring out;I love youFor putting your handInto my

heaped-up heartAnd passing overAll the foolish, weak thingsThat you can’t

helpDimly seeing there,And for drawing outInto the lightAll the beautiful

belongingsThat no one else had lookedQuite far enough to find.I love you because

youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;Out of

the worksOf my every dayNot a reproachBut a song.I love youBecause you have

doneMore than any creedCould have doneTo make me goodAnd more than any

fateCould have doneTo make me happy.You have done itWithout a touch,Without a

word,Without a sign.You have done itBy being yourself.Perhaps that is whatBeing a

friend means,After all.

'''

speech = speech_text.split()

dic = {}

for word in speech:

if word not in dic:

dic[word]=1

else:

dic[word]=dic[word] + 1

dic.items()

在使用nltk的时候,发现一直报错,可以使用下边两行命令安装nltk

import nltk

nltk.download()

会弹出以下窗口,下载nltk.

正在下载

如果这种方式下载完成了 那就直接跳过下一步

我下了很多次最后都下载失败了,现在说第二种方法。

直接下载打包好的安装包:下载地址1:云盘密码znx7,下来的包nltk_data.zip 解压到C盘根目录下,这样是最保险的,防止找不到包。下载地址2:云盘密码4cp3

去除停用词

2.使用第二种方法直接使用python中的第三方库Counter

#代码如下

from collections import Counter

c = Counter(speech)

c. most_common(10)#出现的前十名

print(c. most_common(10))

for sw in stop_words:

del c[sw]

c.most_common(10)

Counter 是实现的 dict 的一个子类,可以用来方便地计数。

附上完整代码

speech_text = '''

I love you,

Not for what you are,

But for what I amWhen I am with you.

I love you,

Not only for whatYou have made of yourself,

But for whatYou are making of me.

I love youFor the part of meThat you bring out;

I love youFor putting your handInto my heaped-up heartAnd passing overAll the foolish,

weak thingsThat you can’t helpDimly seeing there,

And for drawing outInto the lightAll the beautiful belongingsThat no one else had lookedQuite far enough to find.

I love you because youAre helping me to makeOf the lumber of my lifeNot a tavernBut a temple;

Out of the worksOf my every dayNot a reproachBut a song.

I love youBecause you have doneMore than any creedCould have doneTo make me goodAnd more than any fateCould have doneTo make me happy.

You have done itWithout a touch,

Without a word,

Without a sign.

You have done itBy being yourself.

Perhaps that is whatBeing a friend means,

After all.

'''

#解决大小写的问题

speech = speech_text.lower().split()

print(speech)

dic = {}

for word in speech:

if word not in dic:

dic[word] = 1

else:

dic[word] = dic[word] + 1

import operator

swd = sorted(dic.items(),key=operator.itemgetter(1),reverse=True)

print(swd)

#停用词处理

from nltk.corpus import stopwords

stop_words = stopwords.words('English')

for k,v in swd:

if k not in stop_words:

print(k,v)

from collections import Counter

c = Counter(speech)

c. most_common(10)#出现的前十名

print(c. most_common(10))

for sw in stop_words:

del c[sw]

c.most_common(10)

通过这两种方法我们就不难明白为什么现在Python 在数据分析、科学计算领域用得越来越多,除了语言本身的特点,第三方库也很多很好用。

人生几何,何不python当歌?

作者:许胜利 Python爱好者社区专栏作者,请勿转载,谢谢。

博客专栏:许胜利的博客专栏

配套视频教程:

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值