“ End To End Memory Network ” Keras 实现
在上一篇blog中用kaggle实现的Memory Network 是简化后的,原论文 Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus, “End-To-End Memory Networks” 中的Memory Network 没有使用LSTM等现成RNN。
原论文模型:
Single Layer
见左图,single layer就是kaggle example中模仿的内容。分为三部分:
输入记忆转化
输入的故事X被嵌入层A转化为m(None, story_len, embdding_size),输入的问题q被嵌入层B转化为u(None, question_len, embedding_size). 将m与u点乘再加上一层softmax层: p i = S o f t m a x ( u T m i ) p_i=Softmax(u^Tm_i) pi=Softmax(uTmi)
输出记忆转化
对输入的故事再进行一次Embedding,这次使用嵌入层C。最终得到response vector: o = ∑ ( p i c i ) o=\sum(p_ic_i) o=∑(pici)
最终预测
a ^ = S o f t m a x ( W ( o + u ) ) \hat a = Softmax(W(o+u)) a^=Softmax(W(o+u))
上一篇文章写了代码,这次就不写了。
Multiple Layers
论文作者设计了一个新的结构,大体就是把多个memnn叠加在一起,试图得到更好的memory 处理模型。
见右图,三层的memory network, o编码在每一层结束时需要与之前的u进行合并,再传递给下一层。
同时作者提出两种embedding weight 设计方式:
- Adjacent: 使Embedding C x + 1 = A x C^{x+1}=A^x Cx+1=Ax
- Layer-wise(RNN-LIKE): 使Embedding A 1 = A 2 = . . . = A K A^1=A^2=...=A^K A1=A2=...=AK 同理: C 1 = C 2 = . . . = C K C^1=C^2=...=C^K C1=C2=...=CK
Model Detail
作者在设计了一套representation使得模型更容易得知各个单词在文本中的位置、时间信息:
- 载入文本时颠倒内容(最后出现的信息更容易得到一开始的注意)。
- Position Encoding: 为不同位置上的内容加上权重。
- Temporal Encoding: 模型需要知道事情发生的顺序才能更好的做出判断。
m i = ∑ A x i j + T A ( i ) m_i=\sum{Ax_{ij}+T_A(i)} mi=∑Axij+TA(i); c i = ∑ C x i j + T C ( i ) c_i=\sum{Cx_{ij}+T_C(i)} ci=∑Cxij+TC(i) 其中 T A , T C T_A, T_C TA,TC在训练中可被学习。
Keras Implement
参考了Pytorch实现:https://github.com/jojonki/MemoryNetworks
def get_embedding(input_dim, output_dim):
encoder = Sequential()
encoder.add(Embedding(input_dim=vocab_size,
output_dim=64))
encoder.add(Dropout(0.2))
return encoder
hops=3
embd_sz = 64
position_encoding = True
temporal_encoding = True
bs = 20
dropout = Dropout(0.2)
A = [get_embedding(vocab_size, embd_sz)\
for _ in range(hops+1)]
B = A[0]
position encoding
if position_encoding:
J = sentence_maxlen
d = embd_sz
pe = np.zeros((J,d))
for j in range(1,J+1):
for k in range(1,d+1):
l_kj = (1-j/J) - (k/d)*(1-2*j/J)
pe[j-1][k-1] = l_kj
pe = pe[np.newaxis,np.newaxis,:,:]
pe = pe.repeat(bs, axis=0).repeat(story_maxlen,axis=1)
print(pe.shape)
pe = K.variable(value = pe) # (bs, story_len, sentence_len, embd_size)
from keras.engine.topology import Layer
class Position_Encoder(Layer):
def __init__(self, pe, **kwargs):
self.pe = pe
super(Position_Encoder, self).__init__(**kwargs)
def call(self, m):
encoded = m*self.pe
return encoded
PE = Position_Encoder(pe)
temporal encoding
from keras.layers import Layer
class Temporal_Encoder(Layer):
def __init__(self, **kwargs):
super(Temporal_Encoder, self).__init__(**kwargs)
def build(self, input_shape):
# Create a trainable weight variable for this layer.
self.kernel = self.add_weight(name='kernel',
shape=input_shape,
initializer='uniform',
trainable=True)
super(Temporal_Encoder, self).build(input_shape) # Be sure to call this somewhere!
def call(self, x):
return x+self.kernel
def compute_output_shape(self, input_shape):
return input_shape
temporal_A = Temporal_Encoder()
temporal_C = Temporal_Encoder()
建立模型
story_input = Input((story_maxlen, sentence_maxlen,))
query_input = Input((query_maxlen,))
u = dropout(B(query_input)) # (bs, question_len, embd_sz)
u = Lambda(lambda x: K.sum(x, axis=1))(u) # (bs, embd_sz)
for k in range(hops):
m = A[k](story_input) #(bs, s_len,sentence_len,emb_sz)
if position_encoding:
PE(m)
m = Lambda(lambda x: K.sum(x, axis=2))(m) #(bs, story_len,emb_sz)
if temporal_encoding:
m = temporal_A(m)
c = A[k+1](story_input) #(bs, story_len, sentence_len, embd_sz)
c = Lambda(lambda x: K.sum(x, axis=2))(c) #(bs, story_len, embd_sz)
if temporal_encoding:
c = temporal_C(c)
p = dot([m, u],axes=-1) # (bs, story_len)
p = Activation('softmax')(p)
o = dot([p,c], axes=1) # (bs, embd_sz)
u = add([o,u])
answer = Dense(vocab_size, kernel_initializer='random_normal')(u)
answer = Activation('softmax')(answer)
model = Model([story_input, query_input], answer)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])