EMNLP 2023 | 腾讯AI Lab两项研究获杰出论文奖-CSDN博客

腾讯AILab在EMNLP2023会议上展示了两项重要研究成果：一是构建逆向事实推理数据集IfQA，揭示大模型在复杂认知任务上的局限；二是提出可解释句子表征学习框架InterSent，提升语义表征学习的透明度。这两项成果都获得了杰出论文奖。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

感谢阅读腾讯 AI Lab 微信号第 177 篇文章。本文分享腾讯 AI Lab 两项自然语言处理领域研究成果，获 EMNLP 2023 杰出论文奖。

作为大语言模型研究的前沿阵地，今年的自然语言处理领域的国际顶会 EMNLP 异常热闹，前沿的大语言模型的研究层出不穷，腾讯 AI Lab 也有两项研究从众多研究中脱颖而出，获得大会的杰出论文奖（Outstanding Paper Award）。

研究一：为大模型学会逆向事实推理打造专属数据集

其中一篇是为大模型学会逆向事实推理提供数据集的工作，在论文中，腾讯 AI Lab 联合美国圣母大学计算机学院、艾伦人工智能研究所（AI2）的研究员共同搭建了一个包含超过 3800 个问题的数据集，揭示了现有大模型在处理这类复杂认知任务时的局限性。

逆向事实推理是智能的一种表现，也可以看成是一个思维过程，它涉及想象一个与实际发生的事实相反的情景，以探讨「如果事情发生得不一样，结果会怎样？」的问题。逆向事实推理是人类认知过程的核心组成部分，这种能力对于学习和适应至关重要，因为它使人们能够从过去的经验中学习，预测未来的结果，并根据这些预测做出决策。例如，一个人可能会思考：「如果我早点离开家，就不会迟到。」这种推理帮助个体评估不同选择的后果，并在未来的决策中使用这些信息。

对于 AI 模型来说，具备逆向事实推理能力意味着它们可以更有效地处理复杂的问题解决和决策制定任务。这种能力使AI能够预测不同行动方案的潜在结果，并从中学习，从而在类似情境下做出更加合理的决策。逆向事实推理还能帮助AI理解和模拟人类的思维过程，这对于人工智能的自然语言处理、自动化决策支持和交互式学习等领域尤为重要。

逆向事实推理在多个领域都有广泛的应用潜力。例如，在医疗领域，AI可以使用逆向事实分析来帮助医生理解不同治疗方案可能带来的不同结果。在金融领域，它可以用来评估不同投资策略在过去情境下可能产生的结果，从而指导未来的投资决策。此外，在自动驾驶车辆的开发中，逆向事实推理可以帮助系统评估不同驾驶决策的潜在安全风险，从而提高驾驶安全性。还有，在教育领域，这种能力可以帮助个性化学习系统更好地理解学生的学习方式和需要，提供更加定制化的教学内容。

由于缺乏大规模的逆向事实开放领域问答基准，这使得目前研究领域难以评估和改进模型在这一能力上的表现。为了填补这一空白，研究小组创建了一个名为 IfQA 的数据集，里面的每个问题都是基于一个通过「if」子句构成的逆向事实前提。这类问题要求模型超越从网络上检索直接的事实知识：它们必须识别要检索的正确信息，并对可能与其参数内置事实相悖的想象情境进行推理。IfQA 数据集包含的问题由众包工作者在相关维基百科页面上进行了标注。

该数据集专门设计来测试模型在逆向事实情境下的理解和推理能力。在进行了一系列的实验后，结果显示，即便是当前先进的大模型也只能正确完成大约 25-30% 的逆向事实推理问题。这一发现突出了当前 AI 技术在理解和处理非标准或非直接事实性信息方面的挑战，该研究不仅为 AI 领域提供了一个重要的评估工具，同时也揭示了未来 AI 发展中需要重点关注和改进的方向，即提升模型在高级认知任务，特别是在逆向事实推理方面的性能。

论文地址：https://aclanthology.org/2023.emnlp-main.515.pdf

研究二：表征学习框架InterSent，让语义表征学习更加透明、可解释

在另一项研究中，腾讯 AI Lab 以及加州大学戴维斯分校计算机科学系的研究员们提出了一种全新的可解释句子表征学习框架 InterSent，实现了连续的句子表征空间与离散的文本空间之间的映射，可以在连续空间中进行可解释的句子运算。经过实验证明，InterSent 框架可以显著提升多个句子生成任务上的性能，包括融合、差异和压缩。这也验证了学习到的句子表征空间对句子运算的可靠支持。这一思路可拓展至更长文本的表示、运算与语义理解，为大模型更深度地理解人类语言进行了技术探索。

具体说来，该框架包含文本空间的编码器-解码器和连续空间的运算网络，通过联合优化学习使两者对齐一致。实验结果显示，相比现有方法，该框架可以显著提高各类句子生成任务的性能，验证了学习到的句子表征的可解释性和运算性。具体来说，InterSent 框架融合了对比学习和生成建模两个目标，通过端到端学习的方式使得运算网络与编码器-解码器模型协同训练。框架包含的运算网络学习实现诸如句子融合、句子差异、句子压缩等多种句子运算。编码器-解码器模型作为信息瓶颈，迫使模型学习有效的句子表征以支持重构。联合训练使得连续空间中的运算可以准确对应文本空间中的运算，实现句子表征在连续空间（embedding space）和离散空间（natural language）的双向映射。

当前大模型对语义的理解主要停留在语义关联这个层面，很难解释复杂语义变换背后的运算过程。而自然语言中不同语义的组合运算比如交集、并集、相对补充等至关重要，该框架建立了连续空间运算和文本空间组合间的解释映射，可加深大模型对语义组合的算法理解。

此外，该思路可以推动语义表示的可解释性与透明度。可解释性要求理解每个语义表示的含义，并解释运算对应的语义变换。连续空间对文本空间运算的准确拟合有助理解表示的语义内容。这对提高大模型安全性与鲁棒性至关重要。

其次，运算性表示可增强大模型的语义组合与多步推理。连续运算对应组合文本语义，可帮助完成多轮对话，链式推理等复杂语义理解与生成任务。未来可考虑将框架扩展到更长文本级联的表示与运算。

最后，该思路也可拓展至知识图谱构建与融合领域。组合运算性有望加深对实体关联边的语义理解，提取不同类型关系，构建解释性 KG 等。

论文地址：https://aclanthology.org/2023.emnlp-main.900.pdf

参考链接：

论文1

[1] Danqi Chen and Wen-tau Yih. 2020. Open-domain question answering. In Proceedings of the 58th annual meeting of the association for computational linguistics: tutorial abstracts, pages 34–37.

[2] Amir Feder, Katherine A Keith, Emaad Manzoor, Reid Pryzant, Dhanya Sridhar, Zach Wood-Doughty, Jacob Eisenstein, Justin Grimmer, Roi Reichart, Margaret E Roberts, et al. 2022. Causal inference in natural language processing: Estimation, prediction, interpretation and beyond. Transactions of the Association for Computational Linguistics, 10:1138–1158.

[3] Qingfu Zhu, Weinan Zhang, Ting Liu, and William Yang Wang. 2020. Counterfactual off-policy training for neural dialogue generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3438–3448.

[4] Niket Tandon, Bhavana Dalvi, Keisuke Sakaguchi, Peter Clark, and Antoine Bosselut. 2019. Wiqa: A dataset for “what if...” reasoning over procedural text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6076–6085.

[5] Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, and Yejin Choi. 2019. Counterfactual story reasoning and generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5043–5053.

[6] Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 2020. Dense passage retrieval for opendomain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781.

[7] Divyansh Kaushik, Eduard Hovy, and Zachary Lipton. 2019. Learning the difference that makes a difference with counterfactually-augmented data. In International Conference on Learning Representations.

[8] Zujie Liang, Weitao Jiang, Haifeng Hu, and Jiaying Zhu. 2020. Learning to contrast the counterfactual samples for robust visual question answering. In Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pages 3285–3292.

论文2

[1] Jan A Botha, Manaal Faruqui, John Alex, Jason Baldridge, and Dipanjan Das. 2018. Learning to split and rephrase from wikipedia edit history. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 732–737.

[2] Tianyu Gao, Xingcheng Yao, and Danqi Chen. 2021. Simcse: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6894–6910.

[3] Daniela Brook Weiss, Paul Roit, Ori Ernst, and Ido Dagan. 2022. Extending multi-text sentence fusion resources via pyramid annotations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1854–1860, Seattle, United States. Association for Computational Linguistics.

[4] Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder for english. In Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, pages 169–174.

[5] Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljacic, ShangWen Li, Scott Yih, Yoon Kim, and James Glass. 2022. Diffcse: Difference-based contrastive learning for sentence embeddings. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[6] John Giorgi, Osvald Nitski, Bo Wang, and Gary Bader. 2021. Declutr: Deep contrastive learning for unsupervised textual representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 879–895.

[7] Dieuwke Hupkes, Verna Dankers, Mathijs Mul, and Elia Bruni. 2020. Compositionality decomposed: How do neural networks generalise? Journal of Artificial Intelligence Research, 67:757–795.

[8] Kexin Wang, Nils Reimers, and Iryna Gurevych. 2021. Tsdae: Using transformer-based sequential denoising auto-encoderfor unsupervised sentence embedding learning. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 671–688.

[9] Bohong Wu and Hai Zhao. 2022. Sentence representation learning with generative objective rather than contrastive objective. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.