QA:基于BM25多篇文档中抽取相关问题的答案

答案句抽取

1)实现特定问句对应的答案句抽取(在相关文档中抽取包含答案的句子)

如:

问题句:The complexity of problems often depends on what?

包含答案的文档:

<Context ID=1-31>

This motivates the concept of a problem being hard for a complexity class. A problem X is hard for a class of problems C if every problem in C can be reduced to X. Thus no problem in C is harder than X, since an algorithm for X allows us to solve any problem in C. Of course, the notion of hard problems depends on the type of reduction being used. For complexity classes larger than P, polynomial-time reductions are commonly used. In particular, the set of problems that are hard for NP is the set of NP-hard problems.

</Context>

2)结果提交形式:

<Q>

       The complexity of problems often depends on what?

       <Context ID=1-31>

       <answer_sent=Of course, the notion of hard problems depends on the type of reduction     being used. >

</Q>

数据:

链接:https://pan.baidu.com/s/1rnuBut6_3ZRmJOVeeHwXhw 
提取码:rgkh 
 

直接上代码,使用的编辑器是jupyter notebook,jpyter notebook是一个开源的Web应用程序代码,程序运行是以cell为模块执行,所以本文中的程序很少定义函数分模块执行。(能写函数尽量写成函数,养成写函数的好习惯,别学我偷懒)

#导入用到的一些包
from nltk import *
import collections
import re
import math
from nltk import word_tokenize, pos_tag
from nltk.tokenize import WordPunctTokenizer
#将文本标准化为小写形式
def stopWord(text):
    new_text = text.strip().lower()
    return (new_text)

 这个计算两个句子的相关性,只在最后几行代码中使用,但是仅仅定义了两个函数,所以写在一起了

TF-IDF与余弦相似性的应用(二&#

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值