python 去停用词

Try caching the stopwords object, as shown below. Constructing this each time you call the function seems to be the bottleneck.

    from nltk.corpus import stopwords cachedStopWords = stopwords.words("english") def testFuncOld(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in stopwords.words("english")]) def testFuncNew(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in cachedStopWords]) if __name__ == "__main__": for i in xrange(10000): testFuncOld() testFuncNew()

I ran this through the profiler: python -m cProfile -s cumulative test.py. The relevant lines are posted below.

nCalls Cumulative Time

10000 7.723 words.py:7(testFuncOld)

10000 0.140 words.py:11(testFuncNew)

So, caching the stopwords instance gives a ~70x speedup.

转载于:https://www.cnblogs.com/Donal/p/6902048.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值