BiDAF
Char-CNN: Convolutional Neural Networks for Word Embedding in Sentence. Which cover local words embeddings to generate next-stage embedding. Multiple filters can be used to represent different mapping relations in different sub-spaces for different structural information.
GLOVE: pre-trained word embedding. Which emphasizes linear relations of word embedding vectors.
Context2Query: Attention model to embed context words with weighted sum of query word embeddings.
Query2Context: For each context word i, select j in [1, J] with maximum affinity score(attention weight) s(i, j), corresponding query word embedding is uj. Let maxU = [uj1, uj2, …, ujT], S = [s(1, j1), s(2, j2), s(T, jT)], then g is weighted sum as
sum_over_i { s(i, ji) * maxU(i) }. g is scattered to all LSTM cells in modeling Layer as input.