keras for attention

keras还没有官方实现attention机制,有些attention的个人实现,在mnist数据集上做了下实验。模型是双向lstm+attention+dropout,话说双向lstm本身就很强大了。
参考链接:https://github.com/philipperemy/keras-attention-mechanism
https://github.com/keras-team/keras/issues/1472
环境:win10,py2.7,keras2+
代码如下:

# mnist attention
import numpy as np
np.random.seed(1337)
from keras.datasets import mnist
from keras.utils import np_utils
from keras.layers import *
from keras.models import *
from keras.optimizers import Adam

TIME_STEPS = 28
INPUT_DIM = 28
lstm_units = 64

# data pre-processing
(X_train, y_train), (X_test, y_test) = mnist.load_data('mnist.npz')
X_train = X_train.reshape(-1, 28, 28) / 255.
X_test = X_test.reshape(-1, 28, 28) / 255.
y_train = np_utils.to_categorical(y_train, num_classes=10)
y_test = np_utils.to_categorical(y_test, num_classes=10)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)

# first way attention
def attention_3d_block(inputs):
    #input_dim = int(inputs.shape[2])
    a = Permute((2, 1))(inputs)
    a = Dense(TIME_STEPS, activation='softmax')(a)
    a_probs = Permute((2, 1), name='attention_vec')(a)
    #output_attention_mul = merge([inputs, a_probs], name='attention_mul', mode='mul')
    output_attention_mul = multiply([inputs, a_probs], name='attention_mul')
    return output_attention_mul

# build RNN model with attention
inputs = Input(shape=(TIME_STEPS, INPUT_DIM))
drop1 = Dropout(0.3)(inputs)
lstm_out = Bidirectional(LSTM(lstm_units, return_sequences=True), name='bilstm')(drop1)
attention_mul = attention_3d_block(lstm_out)
attention_flatten = Flatten()(attention_mul)
drop2 = Dropout(0.3)(attention_flatten)
output = Dense(10, activation='sigmoid')(drop2)
model = Model(inputs=inputs, outputs=output)

# second way attention
# inputs = Input(shape=(TIME_STEPS, INPUT_DIM))
# units = 32
# activations = LSTM(units, return_sequences=True, name='lstm_layer')(inputs)
#
# attention = Dense(1, activation='tanh')(activations)
# attention = Flatten()(attention)
# attention = Activation('softmax')(attention)
# attention = RepeatVector(units)(attention)
# attention = Permute([2, 1], name='attention_vec')(attention)
# attention_mul = merge([activations, attention], mode='mul', name='attention_mul')
# out_attention_mul = Flatten()(attention_mul)
# output = Dense(10, activation='sigmoid')(out_attention_mul)
# model = Model(inputs=inputs, outputs=output)

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
print(model.summary())

print('Training------------')
model.fit(X_train, y_train, epochs=10, batch_size=16)

print('Testing--------------')
loss, accuracy = model.evaluate(X_test, y_test)

print('test loss:', loss)
print('test accuracy:', accuracy)

结果:训练集上准确率98.43%,测试集上准确率98.95%。貌似没有过拟合,还可以接着训练。之前跑过tensorflow给的mnist例子,双向lstm也可以达到98%以上的准确率。

关于attention的博客:
http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/
https://www.cnblogs.com/shixiangwan/p/7573589.html
https://codekansas.github.io/blog/2016/language.html
https://distill.pub/2016/augmented-rnns/

论文:
《nerual machine translation by jointly learning to align and translate》
《show attend and tell : nerual image caption generation with visual attention》
《attention based bidirectional lstm for relation classification》
《hierarchical attention networks for document classification》
欢迎交流~

以下是使用TensorFlow的Keras实现带有注意力机制和TCN的模型来预测波士顿房价,并使用过去5步来预测未来1步的示例代码: ```python import numpy as np import pandas as pd from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense, Conv1D, GlobalMaxPooling1D, concatenate, Activation, Dropout from tensorflow.keras.layers import Permute, Lambda, RepeatVector, Multiply, Add, Flatten from tensorflow.keras import backend as K # 读取数据并进行预处理 boston_housing = pd.read_csv('https://storage.googleapis.com/tf-datasets/tf_datasets/boston_housing.csv') train_data = boston_housing.values[:, :-1] train_target = boston_housing.values[:, -1] train_data = (train_data - np.mean(train_data, axis=0)) / np.std(train_data, axis=0) # 定义输入的形状 seq_len = 5 input_shape = (seq_len, train_data.shape[1]) input_layer = Input(shape=input_shape) # 定义注意力机制 def attention_3d_block(inputs): a = Permute((2, 1))(inputs) a = Dense(seq_len, activation='softmax')(a) a = Lambda(lambda x: K.mean(x, axis=1), name='dim_reduction')(a) a = RepeatVector(train_data.shape[1])(a) a_probs = Permute((2, 1), name='attention_vec')(a) output_attention_mul = Multiply()([inputs, a_probs]) return output_attention_mul # 定义TCN模型 def tcn_block(input_layer, n_filters, kernel_size, dilation_rate): d = Conv1D(filters=n_filters, kernel_size=kernel_size, dilation_rate=dilation_rate, padding='causal')(input_layer) d = Activation('relu')(d) d = Dropout(0.2)(d) return d n_filters = 64 kernel_size = 2 dilation_rates = [2**i for i in range(3)] dilated_layers = [] for dilation_rate in dilation_rates: dilated_layers.append(tcn_block(input_layer, n_filters, kernel_size, dilation_rate)) concatenated_layers = concatenate(dilated_layers, axis=-1) # 添加注意力机制 attention_layer = attention_3d_block(concatenated_layers) # 全局最大池化 pooling_layer = GlobalMaxPooling1D()(attention_layer) # 输出层 output_layer = Dense(1)(pooling_layer) # 构建模型 model = Model(inputs=input_layer, outputs=output_layer) # 编译模型 model.compile(optimizer='adam', loss='mse') # 训练模型 model.fit(train_data[:-seq_len-1], train_target[seq_len+1:], epochs=50, batch_size=32, validation_split=0.2) # 预测未来1步 test_data = train_data[-seq_len:] predicted_price = model.predict(np.expand_dims(test_data, axis=0))[0][0] print("Predicted price for the next time step: ", predicted_price) ``` 这个模型包含了一个注意力机制和TCN模块,可以用来预测波士顿房价。在这个例子中,我们使用过去5步来预测未来1步。
评论 16
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值