Attention Code 参考(一)

1D Attention

基础Attention 参考https://arxiv.org/pdf/1811.05544.pdf

Basic MLP Attention

数学公式:
σ ( w 2 T tanh ⁡ ( W 1 [ u ; v ] + b 1 ) + b 2 ) \sigma\left(\boldsymbol{w}_{2}^{\boldsymbol{T}} \tanh \left(W_{1}[\boldsymbol{u} ; \boldsymbol{v}]+\boldsymbol{b}_{\mathbf{1}}\right)+b_{2}\right) σ(w2Ttanh(W1[u;v]+b1)+b2)

stanza中挑出来的代码实现稍微有点不同。

input的size是(batch, dim)明显就是用来匹配的vector u。而context就是被匹配的vector集合。

α j = s o f t m a x ( W ⋅ t a n h ( W u u + W v v j + b v ) ) \alpha_j = softmax(W \cdot tanh(W_u u + W_vv_j + b_v)) αj=softmax(Wtanh(Wuu+Wvvj+bv)) ,就是 u和v的合并的时候是使用的加而不是cat.

h ~ t = tanh ⁡ ( W c [ c t ; h t ] ) \tilde{\boldsymbol{h}}_{t}=\tanh \left(\boldsymbol{W}_{\boldsymbol{c}}\left[\boldsymbol{c}_{t} ; \boldsymbol{h}_{t}\right]\right) h~

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Here is an example code for implementing attention on top of VGG16 architecture in Keras: ```python from keras.models import Model from keras.layers import Input, Dense, Dropout, Flatten, Conv2D, MaxPooling2D, GlobalMaxPooling2D, GlobalAveragePooling2D, Concatenate, Multiply # Define input shape input_shape = (224, 224, 3) # Load VGG16 model with pre-trained weights vgg16 = VGG16(weights='imagenet', include_top=False, input_shape=input_shape) # Freeze all layers in VGG16 for layer in vgg16.layers: layer.trainable = False # Add attention layer x = GlobalMaxPooling2D()(vgg16.output) a = Dense(512, activation='relu')(x) a = Dropout(0.5)(a) a = Dense(1, activation='sigmoid')(a) a = Multiply()([a, x]) a = Concatenate()([a, GlobalAveragePooling2D()(vgg16.output)]) # Add classification layers y = Dense(512, activation='relu')(a) y = Dropout(0.5)(y) y = Dense(10, activation='softmax')(y) # Create model model = Model(inputs=vgg16.input, outputs=y) # Compile model model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) # Train model model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_val, y_val)) ``` In this example, we first load the pre-trained VGG16 model and freeze all its layers to prevent any changes to the pre-trained weights. We then add an attention layer on top of the VGG16 output, which consists of a dense layer followed by a dropout layer and a sigmoid activation layer. We multiply this attention vector with the GlobalMaxPooling2D output of VGG16 and concatenate it with the GlobalAveragePooling2D output. Finally, we add classification layers on top of the attention layer and compile and train the model.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值