python文本摘要_SumEval一个Python实现文本摘要评估框架

weixin_39645019

于 2020-11-26 10:47:08 发布

阅读量182

点赞数

文章标签： python文本摘要

Well tested & Multi-language

evaluation framework for Text Summarization.

68747470733a2f2f62616467652e667572792e696f2f70792f73756d6576616c2e737667 68747470733a2f2f7472617669732d63692e6f72672f6368616b6b692d776f726b732f73756d6576616c2e7376673f6272616e63683d6d6173746572 68747470733a2f2f636f6465636f762e696f2f67682f6368616b6b692d776f726b732f73756d6576616c2f6272616e63682f6d61737465722f67726170682f62616467652e737667

Well tested

The ROUGE-X scores are tested compare with original Perl script (ROUGE-1.5.5.pl).

The BLEU score is calculated by SacréBLEU, that produces the same values as official script (mteval-v13a.pl) used by WMT.

Multi-language

Not only English, Japanese are also supported. The other language is extensible easily.

Of course, implementation is Pure Python!

How to use

from sumeval.metrics.rouge import RougeCalculator

rouge = RougeCalculator(stopwords=True, lang="en")

rouge_1 = rouge.rouge_n(

summary="I went to the Mars from my living town.",

references="I went to Mars",

n=1)

rouge_2 = rouge.rouge_n(

summary="I went to the Mars from my living town.",

references=["I went to Mars", "It's my living town"],

n=2)

rouge_l = rouge.rouge_l(

summary="I went to the Mars from my living town.",

references=["I went to Mars", "It's my living town"])

# You need spaCy to calculate ROUGE-BE

rouge_be = rouge.rouge_be(

summary="I went to the Mars from my living town.",

references=["I went to Mars", "It's my living town"])

print("ROUGE-1: {}, ROUGE-2: {}, ROUGE-L: {}, ROUGE-BE: {}".format(

rouge_1, rouge_2, rouge_l, rouge_be

).replace(", ", "\n"))

from sumeval.metrics.bleu import BLEUCalculator

bleu = BLEUCalculator()

score = bleu.bleu("I am waiting on the beach",

"He is walking on the beach")

bleu_ja = BLEUCalculator(lang="ja")

score_ja = bleu_ja.bleu("私はビーチで待ってる", "彼がベンチで待ってる")

From the command line

sumeval r-nlb "I'm living New York its my home town so awesome" "My home town is awesome"

output.

{

"options": {

"stopwords": true,

"stemming": false,

"word_limit": -1,

"length_limit": -1,

"alpha": 0.5,

"input-summary": "I'm living New York its my home town so awesome",

"input-references": [

"My home town is awesome"

]

},

"averages": {

"ROUGE-1": 0.7499999999999999,

"ROUGE-2": 0.6666666666666666,

"ROUGE-L": 0.7499999999999999,

"ROUGE-BE": 0

},

"scores": [

{

"ROUGE-1": 0.7499999999999999,

"ROUGE-2": 0.6666666666666666,

"ROUGE-L": 0.7499999999999999,

"ROUGE-BE": 0

}

]

}

Undoubtedly you can use file input. Please see more detail by sumeval -h.

Install

pip install sumeval

Dependencies

BLEU is depends on SacréBLEU

To calculate ROUGE-BE, spaCy is required.

To use lang ja, janome or MeCab is required.

Especially to get score of ROUGE-BE, GiNZA is needed additionally.

To use lang zh, jieba is required.

Especially to get score of ROUGE-BE, pyhanlp is needed additionally.

Test

sumeval uses two packages to test the score.

pythonrouge

It calls original perl script

pip install git+https://github.com/tagucci/pythonrouge.git

rougescore

It's simple python implementation for rouge score

pip install git+git://github.com/bdusell/rougescore.git

Welcome Contribution

🎉

Add supported language

The tokenization and dependency parse process for each language is located on sumeval/metrics/lang.

You can make language class by inheriting BaseLang.

weixin_39645019

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。