php使用nltk,rake-nltk:Python实现使用NLTK的快速自动关键字提取算法

rake-nltk

68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f72616b652d6e6c746b2e73766768747470733a2f2f696d672e736869656c64732e696f2f707970692f707976657273696f6e732f72616b652d6e6c746b2e73766768747470733a2f2f7472617669732d63692e6f72672f637375726665722f72616b652d6e6c746b2e7376673f6272616e63683d6d617374657268747470733a2f2f636f766572616c6c732e696f2f7265706f732f6769746875622f637375726665722f72616b652d6e6c746b2f62616467652e7376673f6272616e63683d6d617374657268747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d4d49542d626c75652e73766768747470733a2f2f696d672e736869656c64732e696f2f62616467652f5361792532305468616e6b732d212d3145414544422e737667

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

5147f7387ed4b5386a3338d6a195d195.gif

Setup

Using pip

pip install rake-nltk

Directly from the repository

git clone https://github.com/csurfer/rake-nltk.git

python rake-nltk/setup.py install

Quick start

from rake_nltk import Rake

# Uses stopwords for english from NLTK, and all puntuation characters by

# default

r = Rake()

# Extraction given the text.

r.extract_keywords_from_text()

# Extraction given the list of strings where each string is a sentence.

r.extract_keywords_from_sentences()

# To get keyword phrases ranked highest to lowest.

r.get_ranked_phrases()

# To get keyword phrases ranked highest to lowest with scores.

r.get_ranked_phrases_with_scores()

Debugging Setup

If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

python -c "import nltk; nltk.download('stopwords')"

References

Why I chose to implement it myself?

It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.

There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.

I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.

Contributing

Bug Reports and Feature Requests

Please use issue tracker for reporting bugs or feature requests.

Development

Pull requests are most welcome.

Buy the developer a cup of coffee!

If you found the utility helpful you can buy me a cup of coffee using

fa66d64255963b22a3a052e0c805aca4.png

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值