QANet学习

最新推荐文章于 2024-04-10 09:36:49 发布

jinfengfeng

最新推荐文章于 2024-04-10 09:36:49 发布

阅读量1.4k

点赞数 1

本文链接：https://blog.csdn.net/jinfengfeng/article/details/81095426

版权

符号说明：Question $Q = \left \{ q_1, q_2, ..., q_m \right \}$ ，Context $C = \left \{ c_1, c_2, ..., c_n \right \}$ , answer span $S = \left \{ c_i, c_{i+1}, ..., c_{i+j} \right \}$ ,
: represent both original word and its embedding

和其它大部分Reading Comprehension模型一样，也包括Embedding layer, Embedding encoder layer, Contex-query attention layer, Model encoder layer, Output layer 五个模块。

1. Embedding Layer

Word:

300-dim GloVe pre-trained word vectors
fixed during training
OOV words mapped to <UNK>, <UNK>的vector is randomly initialized, trained

Char:

200-dim, max word length is 16
concatenate all char vectors of a word to form a matrix, use maximum value of each row to obtain a final vector
trained

Final vector of a word is $\left [ x_w; x_c \right ] \in \mathbb{R}^{p_1+p_2}$ , and put it through two layers of high way network.

2. Embedding Encoder Layer

A stack of building blocks: [conv-layer x # + self-attention-layer + feed-forward-layer]

depthwise separable convolutions, memory efficient and has better generalization
kernel size is 7, number of filters is d = 128, number of conv layers within a block is 4
self-attention use mutli-head attention, head number is 8
Each of these basic operations (conv/self-attention/ffn) is placed inside a residual block
input and a given operation , the output is
total number of encoder blocks is 1
input dim is , output dim is

3. Context-Query Attention Layer

similarity matrix $S \in \mathbb{R}^{n \times m}$ , $f(q, c) = W_0[q, c, q \odot c]$ , the trilinear similarity function
row softmax to get $\bar{S}$ , and Context-to-query attention is $A = \bar{S}Q^T \in \mathbb{R}^{n \times d}$
context-to-query attention benefits a bit, and first column softmax to get $\bar{\bar{S}}$ , and then get the query to context attention $B = \bar{S} \bar{\bar{S^T}} C^T$

4. Model Encoder Layer

Input to this layer is $[c, a, c \odot a, c \odot b]$ , where and are row of and
parameters are the same as embedding encoder layer
number of blocks is 7
number of conv layers within a block is 2
share weights between the model encoders

5. Output Layer

predict start and end

jinfengfeng

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
QANet学习

符号说明：Question ，Context , answer span , : represent both original word and its embedding 和其它大部分Reading Comprehension模型一样，也包括Embedding layer, Embedding encoder layer, Contex-query attention layer, Mod...
复制链接

扫一扫