Removing stop words —— Python Data Science CookBook

In text processing, we are interested in words or phrases that will help us differentiate the given text from the other text in the corpus. Let’s call these words or phrases as  key  phrases. Every text mining application needs a way to find out the key phrases. An information retrieval application needs key phrases for the easy retrieval and ranking of search results. A text classification system needs key phrases as its features that are to be fed to a classifier. This is where stop words come into the picture. “ Sometimes, some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded from the vocabulary entirely. These words are called  stop words.” Introduction to Information Retrieval By Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze.

Tip

Remember that stop word removal is contextual and based on the application. If you are working on a sentiment analysis application on mobile or chat room text, emoticons are highly useful. You don’t remove them as they form a very good feature set for the downstream machine learning application. Typically, in a document, the frequency of stop words is very high. However, there may be other words in your corpus that may have a
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值