NLTK库函数用法

nltk.sent_tokenize()

    for r in reader:
        print(r[0])
        print(nltk.sent_tokenize(r[0].lower()))
        print('\n')

输出:

They wont nerf it. I just hope people decide to run fun decks once TGT hits and stop being assholes.
['they wont nerf it.', 'i just hope people decide to run fun decks once tgt hits and stop being assholes.']

Seemed to start by a lot of falling over each other.
['seemed to start by a lot of falling over each other.']

That whole show was powerful. Landed a spot in my top 5
['that whole show was powerful.', 'landed a spot in my top 5']

nltk.sent_tokenize()是按符号对评论进行分隔

nltk.word_tokenize()

    for r in reader:
        print(r[0])
        print(nltk.word_tokenize(r[0].lower()))
        print('\n')

输出

Well, about that Ninth Circle...
['well', ',', 'about', 'that', 'ninth', 'circle', '...']

Goddamn you're retarded.
['goddamn', 'you', "'re", 'retarded', '.']

I'm in Tampa, you piece of shit. Come visit me.
['i', "'m", 'in', 'tampa', ',', 'you', 'piece', 'of', 'shit', '.', 'come', 'visit', 'me', '.']

按照 word分割

nltk.FreqDist()

word_freq = nltk.FreqDist(itertools.chain(*sent_words))
for w in word_freq:
    print(w, word_freq[w])

输出:

degrasse 1
hanks 1
marajuana 1
anti-vaxxers 1
felicidades 1
loader 1

输出列表中重复项的次数

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值