TrecQA ------ TrecQA 数据集一般用来评估QA的答案选择 它由一下论文发表和组织: + Wang et al. [What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA.](http://www.aclweb.org/anthology/D07-1003) *EMNLP-CoNLL 2007*. + Heilman and Smith. [Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions.](http://www.aclweb.org/anthology/N10-1145) *NAACL 2010*. + Yao et al. [Answer Extraction as Sequence Tagging with Tree Edit Distance.](http://www.aclweb.org/anthology/N13-1106) *NAACL-HLT 2013*. 特别的,我们使用由Yao et al.准备好的数据集,他可以从http://cs.jhu.edu/~xuchen/packages/jacana-qa-naacl2013-data-results.tar.bz2下载 `jacana-qa-naacl2013-data-results.tar.bz2`的md5是 `11f0275e95691594cd74825e0c341b7a` 本文是readme的翻译 data目录含有4个类xml文件 + `TRAIN.xml` + `TRAIN-ALL.xml` + `DEV.xml` + `TEST.xml` 这4个文件在原始数据集中的源文件是 : ``` train-less-than-40.manual-edit.xml: TRAIN in paper train2393.cleanup.xml.gz: TRAIN-ALL in paper dev-less-than-40.manual-edit.xml: DEV in paper test-less-than-40.manual-edit.xml: TEST in paper ```