A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task

关键词

Examination, Analysis

来源

arXiv 2016.06.09

问题

针对 CNN/Daily Mail 语料中不能使用外部知识,可能存在的指代消解错误等问题,探究一下几个问题:

  1. 语料中由于处理错误所产生的噪音有多少?
  2. 神经模型究竟学到了什么?相比于传统分类器,模型提高了哪些方面?

文章思路

本文的神经网络基于 Attentive Reader,但是又有所改进,见下图。

模型

跟 Attentive Reader 相比,主要做了如下改动:

  • 计算 document 和 query 之间的 attention 时,不采用 tanh,而是采用 bilinear。
  • 获得 context embedding 后,直接去做预测。而不是和 query 一起再做一次非线性变换。
  • 原始模型考虑文章中所有单词,现在只考虑实体。

资源

论文地址:https://arxiv.org/abs/1606.02858
代码地址:https://github.com/danqi/rc-cnn-dailymail

相关工作

以实体为中心的分类器 利用 LambdaMART 构建一个传统分类器,选取人工选定的八个特征,探究哪些特征更加有用。

Window-based MemN2Ns 文中经过试验最终认为使用 5-word window 最有效。并且这样得到 contexual embedding: Σ5i=1Ei(xi) 。对于 placeholder 同样编码,并把其他单词忽略。除此之外,还对 query 和 contextual embedding 做了点积计算相关性。

相关任务

  • MCTest 数据集是以带有选择题的短片科幻小说构成,难度为 7 岁儿童阅读理解水平。这一数据集要求强大的推理能力,而数据集比较小。
  • Children Book Test 包含四种问题:named entity, common noun, preposition and verb。用局部信息就能在后两个任务上取得很好的结果,但是前两种需要扫描这句话来做预测。
  • bAbI 数据集包含对 20 种不同的推理能力的考察,但是这个数据词汇量太小 (100-200 词),语言变化小,和真实数据有差距。

简评

经过试验,对于基于特征的分类器来说:n-gram 和实体出现频率这两个特征最重要。具体结果见下表

基于特征分类器

然后从 CNN 开发集中随机挑选 100 个样本,人工分析,结果如下

人工分析结果

其中第 5、6 两种是不能够处理的类别,也就是噪声,可以看出在样本中只有 75% 可以处理。然后对每一类都做具体分析,结果如下

错误分析

因此,神经网络相比于传统方法,在 paraphrase 和 partial clue 两类问题上有很好的提升。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
An LCD (Liquid Crystal Display) is a type of display technology that uses liquid crystals to display information. LCDs are commonly used in electronic devices such as digital watches, calculators, and mobile phones, and can also be used in more complex systems such as embedded systems and microcontrollers. A matrix keypad, on the other hand, is a type of input device that allows users to input data through a combination of buttons arranged in a matrix pattern. A typical matrix keypad has rows and columns of buttons that can be pressed to input data. The arrangement of the buttons in a matrix pattern allows for a large number of buttons to be accommodated in a small space. To interface an LCD and a matrix keypad with a microcontroller, several specifications need to be considered. These include: 1. Pinout: The pinout of the LCD and matrix keypad needs to be understood in order to connect them to the microcontroller. The pinout typically includes pins for power, ground, data, and control signals. 2. Voltage levels: The voltage levels of the LCD and matrix keypad need to be compatible with the microcontroller. Most microcontrollers operate at 5V or 3.3V, so the LCD and matrix keypad should also operate at these voltage levels. 3. Data communication protocol: The data communication protocol between the microcontroller, LCD, and matrix keypad needs to be understood in order to transfer data between them. The most common data communication protocols used are SPI (Serial Peripheral Interface) and I2C (Inter-Integrated Circuit). 4. LCD commands: The LCD has a set of commands that can be used to control its behavior, such as clearing the screen, setting the cursor position, and displaying text. These commands need to be understood in order to control the LCD using the microcontroller. 5. Matrix keypad scanning: The matrix keypad needs to be scanned in order to detect which buttons are being pressed. This is typically done using a technique called matrix scanning, where the rows and columns of the keypad are scanned sequentially to detect button presses. Overall, interfacing an LCD and matrix keypad with a microcontroller requires a thorough understanding of their specifications and how they can be connected and controlled.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值