朋友们好呀~在近期的研究中,小编越来越注意到用于事件抽取的数据集渐渐多样化了起来,所以这次把他们同一整理一下:
Sentence-level EE
-
ACE2005
-
KBP2017
-
MAVEN
- Time: EMNLP2020
- Paper: MAVEN: A Massive General Domain Event Detection Dataset
- Link: https://github.com/THU-KEG/MAVEN-dataset
-
FewED
-
FMC
-
CySecED
- Time: EMNLP2020
- Paper: Introducing a New Dataset for Event Detection in Cybersecurity Texts
-
CASIE
- Time: AAAI2020
- Paper: CASIE: Extracting Cybersecurity Event Information from Text
- Link: https://github.com/Ebiquity/CASIE
-
Dealogue EE
- Time: Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events, 2020
- Paper: Automatic extraction of personal events from dialogue
- Link: https://www.artie.com/data/personaleventsindialogue/
-
Commodity News Corpus for Event Extraction
-
Few-shot Financial Chinese event extraction datase
-
DuEE
-
Genia Event Extraction (GE)
- Time: 2011
- Link: http://bionlp-st.dbcls.jp/GE/2011/eval-test/
-
TimeBank
-
LitBank
- Time: ACL2019
- Paper: https://aclanthology.org/P19-1353/
- Link: https://github.com/dbamman/litbank
Doc-level EE
-
MUC4
-
DCFEE
-
ChFinAnn
-
RAMS
- Time: ACL2020
- Paper: Multi-Sentence Argument Linking
- Link: https://nlp.jhu.edu/rams/
-
WIKIEVENTS
- Time: NAACL2021
- Paper: Document-Level Event Argument Extraction by Conditional Generation
- Link: https://github.com/raspberryice/gen-arg
以上的整理主要面向于目前的事件抽取任务,以上内容在github也做了同步更新,疏漏之处欢迎交流讨论呀!