医疗文本数据集；医疗问答数据集；各种开源医疗数据集

最新推荐文章于 2025-01-18 09:00:00 发布

医学小达人

最新推荐文章于 2025-01-18 09:00:00 发布

阅读量1.3k

点赞数 26

分类专栏： NLP LLMs 文章标签：医疗数据集数据集问答数据集医疗文本

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/L_goodboy/article/details/144404613

版权

NLP 同时被 2 个专栏收录

64 篇文章

订阅专栏

41 篇文章

订阅专栏

Existing Medical QA & VQA Datasets

Multimodal Question Answering (QA) in the Medical Domain: A summary of Existing Datasets and Systems

I prepared this summary for my CMU/LTI talk on multimodal QA. My slides are available at https://www.slideshare.net/benabacha/multimodal-question-answering-in-the-medical-domain-cmulti-2020
This list is not exhaustive. You can email me links and references of relevant medical QA datasets and systems and I'll update the list asap. Also, several challenge-related datasets are not publicly available anymore. You can contact the organizers to have the data.

*** Two Main Tasks: Medical Question Answering (QA) & Visual Question Answering (VQA) ***

I) Medical QA Datasets:

Corpus for Evidence Based Medicine Summarization (Mollá, 2010): https://sourceforge.net/projects/ebmsumcorpus
CLEF QA4MRE Alzheimer’s task (Peñas et al, 2012).
BioASK datasets (2012-2020): Challenge Overview | bioasq.org
TREC LiveQA-Med (Ben Abacha et al, 2017): https://github.com/abachaa/LiveQA_MedicalTask_TREC2017
MEDIQA-2019 datasets on NLI, RQE, and QA (Ben Abacha et al., 2019): https://github.com/abachaa/MEDIQA2019
MEDIQA-AnS dataset of question-driven summaries of answers (Savery et al., 2020): https://osf.io/fyg46/ Paper: Question-driven summarization of answers to consumer health questions | Scientific Data
MedQuaD Collection of 47k QA pairs (Ben Abacha and Demner-Fushman, 2019): GitHub - abachaa/MedQuAD: Medical Question Answering Dataset of 47,457 QA pairs created from 12 NIH websites
Medication QA Collection (Ben Abacha et al., 2019): https://github.com/abachaa/Medication_QA_MedInfo2019
Consumer Health Question Summarization (Ben Abacha and Demner-Fushman, 2019): https://github.com/abachaa/MeQSum
emrQA: QA on Electronic Medical Records (Pampari et al., 2018). Scripts to generate emrQA from i2b2 data: https://github.com/panushri25/emrQA
EPIC-QA dataset on COVID-19 (Goodwin et al., 2020): Epidemic QA at TAC 2020
BiQA Corpus (Lamurias et al., 2020): https://github.com/lasigeBioTM/BiQA Paper:Generating Biomedical Question Answering Corpora From Q&A Forums | IEEE Journals & Magazine | IEEE Xplore
HealthQA Dataset (Zhu et al., 2019): https://github.com/mingzhu0527/HAR Paper: https://dmkd.cs.vt.edu/papers/WWW19.pdf
MASH-QA Dataset on Multiple Answer Spans Healthcare Question Answering, with 35k QA pairs (Zhu et al., 2020): https://github.com/mingzhu0527/MASHQA Paper: https://www.aclweb.org/anthology/2020.findings-emnlp.342.pdf
MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. (Pal et al., CHIL, PMLR 2022): GitHub - medmcqa/medmcqa: A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions. Paper: MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

II) Medical VQA Datasets (Radiology):

VQA-RAD (Lau et al. 2018): https://osf.io/89kps
VQA-Med 2018 (Hasan et al. 2018): AIcrowd | ImageCLEF 2018 VQA-Med | Challenges
VQA-Med 2019 (Ben Abacha et al. 2019): https://github.com/abachaa/VQA-Med-2019
VQA-Med 2020 (Ben Abacha et al. 2020): https://github.com/abachaa/VQA-Med-2020

III) Online QA Systems:

-- I searched and tested several systems (e.g. AskHERMES, MiPACQ, SimQ). This list includes only the systems that are still maintained.

CHiQA (Consumer Health Question Answering System): chiqa.nlm.nih.gov
Neural Covidex: covidex.ai

IV) Medical Datasets Relevant to Question Answering:

i2b2 shared tasks (2006-2016): www.i2b2.org/NLP
n2c2 NLP clinical challenges (2018-2019): https://n2c2.dbmi.hms.harvard.edu https://dbmi.hms.harvard.edu/programs/national-nlp-clinical-challenges-n2c2
TREC Medical Records Track (2012-2013).
TREC Clinical Decision Support Track (2014-2016): http://www.trec-cds.org
TREC Precision Medicine Track (2017-2019): http://www.trec-cds.org
CLEF eHealth (2013-2020): https://clefehealth.imag.fr
COVID dataset (CORD-19): https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

V) Medical Datasets Relevant to VQA:

ImageCLEF Medical Automatic Image Annotation (2008-2009): Image 2008: Medical Automatic Image Annotation Task | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF and ImageCLEF 2009 lung nodule detection and medical annotation task | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
ImageCLEF Medical User-oriented Image Retrieval Task (2011): Practical showcase of medical image retrieval systems | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
ImageCLEF Medical Retrieval Task (2008-2012): Medical Image Classification and Retrieval 2012 | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
ImageCLEF AMIA: Medical task (2013): AMIA: Medical task | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
ImageCLEFmed: Medical classification (2015): ImageCLEFmed: Medical classification | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
ImageCLEF Medical Clustering (2015): Medical Clustering | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
ImageCLEFmed (2016): ImageCLEFmed: The Medical Task 2016 | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
ImageCLEFcaption (2017-2020): ImageCLEFcaption | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
ImageCLEFmedical tasks (2019-2020): ImageCLEFmedical | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF and ImageCLEFmedical | ImageCLEF / LifeCLEF - Multimedia Retrieval in CLEF
MIMIC-CXR Database (2019): MIMIC-CXR Database v2.0.0

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

打赏作者

医学小达人 你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

扫码支付：¥1

获取中

扫码支付

您的余额不足，请更换扫码支付或充值

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。