QAnet论文笔记

最新推荐文章于 2021-06-18 21:42:55 发布

qq_16134973

最新推荐文章于 2021-06-18 21:42:55 发布

阅读量319

点赞数 2

分类专栏：论文阅读文章标签： qanet 阅读理解机器阅读理解神经网络 dl

本文链接：https://blog.csdn.net/qq_16134973/article/details/88080431

版权

论文阅读专栏收录该内容

1 篇文章 0 订阅

订阅专栏

QAnet对embedding和modeling的编码,使用cnn和self-attention而非RNN,可以很快,可以并行处理输入的token.
关键点: cnn捕获文本的局部特征,self-attetnion学习每对词间的全局的相互作用.

The key motivation behind the design of our model is the following: convolution captures the local
structure of the text, while the self-attention learns the global interaction between each pair of words.

网络结构

在这里插入图片描述

1. Input Embedding Layer

堆叠word embedding(300d),char embedding(200d)
所有的oov使用随机的初始化并训练得到的词向量

2. Embedding Encoder Layer

由一摞Encoder Block[convolution-layer × # + self-attention-layer + feed-forward-layer]构成,输入500d,输出128d
在这里插入图片描述

卷积
使用了Deep learning with depthwise separable convolutions中的方法.

kernel-size=7
the number of filters is d = 128
the number of conv layers within a block is 4

self-attention
使用multi-head机制

The number of heads is 8 throughout all the layers. 
Each of these basic operations (conv/self-attention/ffn) is placed inside a residual block,

3. Context-Query Attention Layer

记 $C$ , $Q$ 分别为被encode的context和query矩阵.计算相似度矩阵
$\in R^{n*m}$
其中 $S_{i,j}=f(q,c)=W_0 \cdot [q,c,q \circ c]$
Context-to-query attention A
$\cdot Q^T \in R^{n*d}$
query-to-context attention B
$\cdot softmax(S,axis=column)^T \cdot C^T$

4. Model Encoder Layer

每个位置的输入 $\circ a,c \circ b]$
a,b分别代表矩阵 $A$ , $B$ 的行

三个stacked model encoder blocks 共享参数
convolution layer number is 2 within a block;
the total number of blocks are 7

5. Output layer

start point
全连接层
$p^1=softmax(W_{p^1}^T[M_0;M_1])$
end point
$p^1=softmax(W_{p^2}^T[M_0;M^2])$

6. 损失函数

$L(\theta)=-\frac{1}{N}\sum_i^N \log(p_{y_i^2}^1)+\log(p_{y_i^2}^2)$

Ref

符级卷积神经网络（Char-CNN）实现文本分类–模型介绍与TensorFlow实现

qq_16134973

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录