Question Answering Text Summarization Datasets汇总

本文介绍了三个非事实性问题解答数据集:WikiHow、其改进版WikiHowQA,以及PubMedQA,聚焦于医学领域的非文本摘要任务。挑战在于这些数据集的抽象性和与传统新闻数据集的区别。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Question Answering Text Summarization Datasets


目前对于自动文本摘要这一块,研究多数采用的是news articles的数据:DUC, Gigaword, New York Times, CNN/Daily Mail等。有兴趣的同学可以去搜索一下。本文主要介绍一些关于Non-factoid QA或更加抽象的数据集。

WikiHow:

  • 介绍:
    2018年由Koupaee等人提出来的一个从wikihow网站抓取的数据集。详细的了解请参照原始论文; 该数据的github.

  • 数据结构如下表:

dataset statisticsValue
Dataset Size230,843
Average Article Length579.8
Average Summary Length62.1
Vocabulary Size556,461
  • 示例:在这里插入图片描述
  • 适用的任务:
    non-factoid question answering 中的text summarization;
  • 挑战:
    相比CNN/Daily等数据集,WikiHow数据集更加抽象。新闻类的数据集一般核心内容集中在前三句,因此用lead-3就可以取得相当不错的效果。但却不适用于WikiHow数据集,因此目前还算是比较具有挑战性的数据集;

WikiHowQA:

  • 介绍:
    2020年由Deng等人提出来的一个在WikiHow数据集上进行了改进,且又从wikihow网站抓取的数据汇总而来的数据集。详细介绍请参照原始论文; 数据的github.

  • 数据结构如下表:

dataset statisticstrain/dev/test
#Questions76,687 / 8,000 / 22,354
#QA Pairs904,460 / 72,474 / 211,255
#Summaries142,063 / 18,909 / 42,624
Avg QLen7.20 / 6.84 / 6.69
Avg ALen520.87 / 548.26 / 554.66
Avg SLen67.38 / 61.84 / 74.42
Avg #CandA11.79 / 9.06 / 9.45
  • 适用任务:
    non-factoid question answering 中的answer selection;
    non-factoid question answering 中的text summarization;
  • 挑战:
    该数据集的数量还是可以满足深度学习的train要求。具有一定的挑战;

PubMedQA:

  • 介绍:
    2019年由Jin等人提出来的一个在PubMed网站抓取的数据汇总而来的数据集。详细介绍请参照原始论文; 数据的github. 每条数据由一个问题,一个上下文用于回答问题的,一个对上下问的总结,一个yes/no/maybe,用于评判能否回答相对应的问题。
  • 数据结构如下表:
dataset statisticsPQA-LPQA-A
Number of QA pairs1.0K211.3K
yes/no/maybe(%)55.2 / 33.8 / 11.092.8 / 7.2 / 0.0
Avg QLen/ ALen/SLen14.4 / 238.9 / 43.216.3 / 238.0 / 41.0
  • 示例:在这里插入图片描述
  • 适用任务:
    non-factoid question answering 中的text summarization;
    non-factoid question answering 中的answer selection(三分类yes/no/maybe)

MEDIQA-AnS:

  • 介绍:
    此数据集属于小而精的数据集,是在2020年由Savery等人提出来的一个医学类的数据集,主要针对使没有医学专业知识的人能更加容易获取健康的信息所提出来的。详细介绍请参照原始论文; 数据的github. 该数据比较小,包含了156个问题、问题的答案以及这些答案所对应的摘要。
  • 示例:
    在这里插入图片描述
  • 同类型的医学数据集还有:
    BioASQ
    MedInfo
    有兴趣的可以自己去查找一下。
    后续会继续补充。。。。。
### Prompt Tuning Techniques for GLM Model In the context of natural language processing (NLP), prompt tuning has emerged as an effective method to adapt large pre-trained models like GLM to specific tasks without requiring extensive retraining. The approach involves modifying only a small set of parameters associated with prompts while keeping the rest of the model frozen. For enhancing mathematical problem-solving capabilities within Large Language Models (LLMs), modifications such as those implemented in ChatGLM-Math leverage self-critique mechanisms instead of relying on external models or manual annotations for refining performance[^1]. This suggests that similar strategies could be applied when considering how to fine-tune or adjust prompts specifically designed for improving certain aspects of reasoning or task-specific outputs from GLM-based systems. When it comes to adapting models originally trained primarily on English corpora, extending vocabulary or using target-language instructions/data becomes crucial especially for non-English applications[^3]. Therefore, applying these principles to GLM would involve carefully crafting multilingual prompts tailored not just linguistically but also culturally relevant to ensure better generalization across different languages and contexts. The training methodology often follows established practices inspired by scaling laws observed in other successful transformer architectures, utilizing standard optimizers over vast datasets[^5]. When implementing prompt tuning techniques for GLM: - **Craft Specific Prompts:** Design prompts that are finely tuned towards desired outcomes whether they pertain to question answering, summarization, translation etc. - **Utilize Few-Shot Learning Scenarios:** Incorporate examples directly into the input sequence allowing the model to learn patterns more effectively during inference time even if there was limited exposure during initial training phase. - **Implement Self-Criticism Mechanisms:** Integrate feedback loops where possible so that adjustments can continuously refine both the quality and relevance of generated responses based on internal evaluations rather than solely depending upon human oversight. ```python def create_prompt(task_type="qa", example_data=None): """ Creates a specialized prompt string for given NLP task type Args: task_type (str): Type of NLP task e.g., qa for Question Answering example_data (list): List containing few-shot learning samples Returns: str: Formatted prompt ready for use with GLM model """ base_prompt = f"For this {task_type} task:" if example_data is not None: sample_prompts = "\n".join([f"- Example Input/Output Pair:\n{ex['input']}\nAnswer:{ex['output']}" for ex in example_data]) full_prompt = f"{base_prompt}\nConsider following examples:\n{sample_prompts}" else: full_prompt = base_prompt return full_prompt ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值