dropout层_关于keras的Input层与embedding层全解析

欢迎关注本人推荐系统公众号:Tiany_RecoSystem

在keras中自定义一个self attention层案例:

class Self_Attention(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(Self_Attention, self).__init__(**kwargs)
def build(self, input_shape):
# 为该层创建一个可训练的权重
#inputs.shape = (batch_size, time_steps, seq_len)
self.kernel = self.add_weight(name='kernel',
shape=(3,input_shape[2], self.output_dim),
initializer='uniform',
trainable=True)
super(Self_Attention, self).build(input_shape) # 一定要在最后调用它
def call(self, x):
Query = K.dot(x, self.kernel[0])
Key = K.dot(x, self.kernel[1])
Value = K.dot(x, self.kernel[2])
print("Query.shape", Query.shape)
print("K.permute_dimensions(Key, [0, 2, 1]).shape", K.permute_dimensions(Key, [0, 2, 1]).shape)
QK = K.batch_dot(Query, K.permute_dimensions(Key, [0, 2, 1]))
QK = QK / (self.output_dim**0.5)
QK = K.softmax(QK)
print("QK.shape",QK.shape)
Z = K.batch_dot(QK, Value)
return Z
def compute_output_shape(self, input_shape):
return (input_shape[0],input_shape[1],self.output_dim)
max_features = 20000 #字典内总的单词数/特征数
print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words = max_features)
#标签转换为独热编码
y_train, y_test = pd.get_dummies(y_train), pd.get_dummies(y_test)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')
#%%数据归一化处理
maxlen = 64
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
batch_size = 32
embed_size=128
S_inputs = Input(shape=(maxlen,), dtype='int32')
#Embedding(max_features, embed_size): 20000 * 128的embedding_matrix矩阵
embeddings_S_inputs = Embedding(max_features, embed_size)(S_inputs)
#embeddings_S_inputs : (None, maxlen, embed_size)
#独热编码后的S_inuts,存放着每个特征的index,然后到embedding矩阵中取
O_seq = Self_Attention(embed_size)(embeddings_S_inputs)
O_seq = GlobalAveragePooling1D()(O_seq)
O_seq = Dropout(0.5)(O_seq)

6319b0575907738fdda87fe71c6fe1f3.png

不看batch_size的话,其实我们输入的是一个64维度的向量,正好对应了上面的maxlen.

而我们的生成的Embedding(max_features, embed_size) 是一个20000 * 128的embedding_matrix矩阵

那么embeddings_S_inputs 为 (None, maxlen, embed_size)
独热编码后的S_inuts,存放着每个特征的index,然后到embedding矩阵中取

于是:embedding_1 (Embedding) 为 (None, 64, 128)

因此:

我们embedding层的作用,是将正整数下标转换为具有固定大小的向量。如[[4], [20]]->[[0.25,0.1], [0.6,-0.2]]。一定要注意到一个下标对应一个向量。所以embedding层,其本质就是,我们用下标去寻找对应的映射!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值