答案句抽取
1)实现特定问句对应的答案句抽取(在相关文档中抽取包含答案的句子)
如:
问题句:The complexity of problems often depends on what?
包含答案的文档:
<Context ID=1-31>
This motivates the concept of a problem being hard for a complexity class. A problem X is hard for a class of problems C if every problem in C can be reduced to X. Thus no problem in C is harder than X, since an algorithm for X allows us to solve any problem in C. Of course, the notion of hard problems depends on the type of reduction being used. For complexity classes larger than P, polynomial-time reductions are commonly used. In particular, the set of problems that are hard for NP is the set of NP-hard problems.
</Context>
2)结果提交形式:
<Q>
The complexity of problems often depends on what?
<Context ID=1-31>
<answer_sent=Of course, the notion of hard problems depends on the type of reduction being used. >
</Q>
数据:
链接:https://pan.baidu.com/s/1rnuBut6_3ZRmJOVeeHwXhw
提取码:rgkh
直接上代码,使用的编辑器是jupyter notebook,jpyter notebook是一个开源的Web应用程序代码,程序运行是以cell为模块执行,所以本文中的程序很少定义函数分模块执行。(能写函数尽量写成函数,养成写函数的好习惯,别学我偷懒)
#导入用到的一些包
from nltk import *
import collections
import re
import math
from nltk import word_tokenize, pos_tag
from nltk.tokenize import WordPunctTokenizer
#将文本标准化为小写形式
def stopWord(text):
new_text = text.strip().lower()
return (new_text)
这个计算两个句子的相关性,只在最后几行代码中使用,但是仅仅定义了两个函数,所以写在一起了