QANet学习

符号说明:Question Q = \left \{ q_1, q_2, ..., q_m \right \},Context C = \left \{ c_1, c_2, ..., c_n \right \}, answer span S = \left \{ c_i, c_{i+1}, ..., c_{i+j} \right \}
x: represent both original word and its embedding

 和其它大部分Reading Comprehension模型一样,也包括Embedding layer, Embedding encoder layer, Contex-query attention layer, Model encoder layer, Output layer 五个模块。

1. Embedding Layer

Word:

  • 300-dim GloVe pre-trained word vectors
  • fixed during training
  • OOV words mapped to <UNK>, <UNK>的vector is randomly initialized, trained

Char:

  • 200-dim, max word length is 16
  • concatenate all char vectors of a word to form a matrix, use maximum value of each row to obtain a final vector
  • trained

Final vector of a word is \left [ x_w; x_c \right ] \in \mathbb{R}^{p_1+p_2}, and put it through two layers of high way network.

2. Embedding Encoder Layer

A stack of building blocks: [conv-layer x # + self-attention-layer + feed-forward-layer]

  • depthwise separable convolutions, memory efficient and has better generalization
  • kernel size is 7, number of filters is d = 128, number of conv layers within a block is 4
  • self-attention use mutli-head attention, head number is 8
  • Each of these basic operations (conv/self-attention/ffn) is placed inside a residual block
  • input xand a given operation f, the output is f(layernorm(x)) + x
  • total number of encoder blocks is 1
  • input dim is p_1+p_2=500, output dim is d=128

3. Context-Query Attention Layer

  • similarity matrix S \in \mathbb{R}^{n \times m}f(q, c) = W_0[q, c, q \odot c], the trilinear similarity function
  • row softmax to get \bar{S}, and Context-to-query attention is A = \bar{S}Q^T \in \mathbb{R}^{n \times d}
  • context-to-query attention benefits a bit, and first column softmax to get \bar{\bar{S}}, and then get the query to context attention  B = \bar{S} \bar{\bar{S^T}} C^T

4. Model Encoder Layer

  • Input to this layer is [c, a, c \odot a, c \odot b], where a and b are row of A and B 
  • parameters are the same as embedding encoder layer
  • number of blocks is 7
  • number of conv layers within a block is 2
  • share weights between the model encoders

5. Output Layer

predict start and end

 

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值