QAnet论文笔记

QAnet对embedding和modeling的编码,使用cnn和self-attention而非RNN,可以很快,可以并行处理输入的token.
关键点: cnn捕获文本的局部特征,self-attetnion学习每对词间的全局的相互作用.

The key motivation behind the design of our model is the following: convolution captures the local
structure of the text, while the self-attention learns the global interaction between each pair of words.

网络结构

在这里插入图片描述

1. Input Embedding Layer

堆叠word embedding(300d),char embedding(200d)
所有的oov使用随机的初始化并训练得到的词向量

2. Embedding Encoder Layer

由一摞Encoder Block[convolution-layer × # + self-attention-layer + feed-forward-layer]构成,输入500d,输出128d
在这里插入图片描述

  • 卷积
    使用了Deep learning with depthwise separable convolutions中的方法.
kernel-size=7
the number of filters is d = 128
the number of conv layers within a block is 4
  • self-attention
    使用multi-head机制
The number of heads is 8 throughout all the layers. 
Each of these basic operations (conv/self-attention/ffn) is placed inside a residual block,

3. Context-Query Attention Layer

  • C C C, Q Q Q分别为被encode的context和query矩阵.计算相似度矩阵
    S ∈ R n ∗ m S \in R^{n*m} SRnm
    其中 S i , j = f ( q , c ) = W 0 ⋅ [ q , c , q ∘ c ] S_{i,j}=f(q,c)=W_0 \cdot [q,c,q \circ c] Si,j=f(q,c)=W0[q,c,qc]
  • Context-to-query attention A
    A = s o f t m a x ( S , a x i s = r o w ) ⋅ Q T ∈ R n ∗ d A=softmax(S,axis=row) \cdot Q^T \in R^{n*d} A=softmax(S,axis=row)QTRnd
  • query-to-context attention B
    B = A ⋅ s o f t m a x ( S , a x i s = c o l u m n ) T ⋅ C T B=A \cdot softmax(S,axis=column)^T \cdot C^T B=Asoftmax(S,axis=column)TCT

4. Model Encoder Layer

每个位置的输入 [ c , a , c ∘ a , c ∘ b ] [c,a,c \circ a,c \circ b] [c,a,ca,cb]
a,b分别代表矩阵 A A A, B B B的行

三个stacked model encoder blocks 共享参数
convolution layer number is 2 within a block;
the total number of blocks are 7

5. Output layer

  • start point
    全连接层
    p 1 = s o f t m a x ( W p 1 T [ M 0 ; M 1 ] ) p^1=softmax(W_{p^1}^T[M_0;M_1]) p1=softmax(Wp1T[M0;M1])
  • end point
    p 1 = s o f t m a x ( W p 2 T [ M 0 ; M 2 ] ) p^1=softmax(W_{p^2}^T[M_0;M^2]) p1=softmax(Wp2T[M0;M2])

6. 损失函数

L ( θ ) = − 1 N ∑ i N log ⁡ ( p y i 2 1 ) + log ⁡ ( p y i 2 2 ) L(\theta)=-\frac{1}{N}\sum_i^N \log(p_{y_i^2}^1)+\log(p_{y_i^2}^2) L(θ)=N1iNlog(pyi21)+log(pyi22)

Ref

符级卷积神经网络(Char-CNN)实现文本分类–模型介绍与TensorFlow实现

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值