QA：基于BM25多篇文档中抽取相关问题的答案

最新推荐文章于 2023-09-23 15:37:58 发布

VIP文章步月听风

最新推荐文章于 2023-09-23 15:37:58 发布

阅读量636

点赞数 2

本文链接：https://blog.csdn.net/yaogegegege/article/details/97106538

版权

答案句抽取

1）实现特定问句对应的答案句抽取（在相关文档中抽取包含答案的句子）

如：

问题句：The complexity of problems often depends on what?

包含答案的文档：

<Context ID=1-31>

This motivates the concept of a problem being hard for a complexity class. A problem X is hard for a class of problems C if every problem in C can be reduced to X. Thus no problem in C is harder than X, since an algorithm for X allows us to solve any problem in C. Of course, the notion of hard problems depends on the type of reduction being used. For complexity classes larger than P, polynomial-time reductions are commonly used. In particular, the set of problems that are hard for NP is the set of NP-hard problems.

</Context>

2）结果提交形式：

<Q>

The complexity of problems often depends on what?

<Context ID=1-31>

<answer_sent=Of course, the notion of hard problems depends on the type of reduction being used. >

</Q>

数据：

链接：https://pan.baidu.com/s/1rnuBut6_3ZRmJOVeeHwXhw
提取码：rgkh

直接上代码，使用的编辑器是jupyter notebook，jpyter notebook是一个开源的Web应用程序代码，程序运行是以cell为模块执行，所以本文中的程序很少定义函数分模块执行。（能写函数尽量写成函数，养成写函数的好习惯，别学我偷懒）

#导入用到的一些包
from nltk import *
import collections
import re
import math
from nltk import word_tokenize, pos_tag
from nltk.tokenize import WordPunctTokenizer

#将文本标准化为小写形式
def stopWord(text):
    new_text = text.strip().lower()
    return (new_text)

这个计算两个句子的相关性，只在最后几行代码中使用，但是仅仅定义了两个函数，所以写在一起了

TF-IDF与余弦相似性的应用（二&#

最低0.47元/天解锁文章

步月听风

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
QA：基于BM25多篇文档中抽取相关问题的答案

答案句抽取1）实现特定问句对应的答案句抽取（在相关文档中抽取包含答案的句子）如：问题句：The complexity of problems often depends on what?包含答案的文档：<Context ID=1-31>This motivates the concept of a problem being hard for a complex...
复制链接

扫一扫