《Python 深度学习》刷书笔记 Chapter 7 Part-1 共享层权重


多输入模型


我们通过使用函数式API,可以构建具有多个输入的模型,在这种情况下,我们可以在某一时刻用一个可以组合多种张量的层将不同的输入分支合并

组合方式可以为相加,链接等

7-1 用函数式API实现双输入问答模型


from keras.models import Model
from keras import layers
from keras import Input

text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

# 模型的构建
# 文本输入是一个长度可变的序列
text_input = Input(shape = (None,), dtype = 'int32', name = 'text')

# 将输入嵌入长度为64的向量
embedded_text = layers.Embedding(text_vocabulary_size, 64)(text_input)

# 利用LSTM将向量编码为单个向量
encoded_text = layers.LSTM(32)(embedded_text)

# 对问题进行相同的处理
question_input = Input(shape = (None, ), dtype = 'int32', name = 'question')

embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)

# 将编码后的问题和文本连接起来
concatenated = layers.concatenate([encoded_text, encoded_question], axis = -1)
# 在后面添加一个softmax分类器
answer = layers.Dense(answer_vocabulary_size, activation = 'softmax')(concatenated)

# 模型实例化
# 在对模型进行实例化时,指定两个输入和输出
model = Model([text_input, question_input], answer)
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['acc'])

接下来,我们有两个可以用的函数API:

  • 我们可以向模型输入一个由Numpy数组的组成的列表
  • 输入一个将输入名称映射为Numpy数组的字典

7-2 将数据输入到多输入模型中


import numpy as np
import keras
num_samples = 1000
max_length = 100

# 生成虚构的numpy数组作为输入文本
text = np.random.randint(1, text_vocabulary_size, size = (num_samples, max_length))
# 生成虚构的问题
question = np.random.randint(1, question_vocabulary_size, size = (num_samples, max_length))

# 输出答案answers
answers = np.random.randint(answer_vocabulary_size, size=(num_samples))
answers = keras.utils.to_categorical(answers, answer_vocabulary_size)

# 使用输入组成的列表拟合
model.fit([text, question], answers, epochs = 10, batch_size = 128)

# 使用输入组成的字典来拟合
model.fit({'text': text, 'question': question}, answers, epochs = 10, batch_size = 128)
E:\develop_tools\Anaconda\envs\py36\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Epoch 1/10
1000/1000 [==============================] - 1s 1ms/step - loss: 6.2145 - acc: 0.0010
Epoch 2/10
1000/1000 [==============================] - 1s 789us/step - loss: 6.1957 - acc: 0.0290
Epoch 3/10
1000/1000 [==============================] - 1s 790us/step - loss: 6.1355 - acc: 0.0040
Epoch 4/10
1000/1000 [==============================] - 1s 801us/step - loss: 6.0581 - acc: 0.0090
Epoch 5/10
1000/1000 [==============================] - 1s 762us/step - loss: 5.9996 - acc: 0.0070
Epoch 6/10
1000/1000 [==============================] - 1s 760us/step - loss: 5.9465 - acc: 0.0070
Epoch 7/10
1000/1000 [==============================] - 1s 785us/step - loss: 5.8733 - acc: 0.0090
Epoch 8/10
1000/1000 [==============================] - 1s 769us/step - loss: 5.8046 - acc: 0.0150
Epoch 9/10
1000/1000 [==============================] - 1s 760us/step - loss: 5.7276 - acc: 0.0260
Epoch 10/10
1000/1000 [==============================] - 1s 906us/step - loss: 5.6451 - acc: 0.0240
Epoch 1/10
1000/1000 [==============================] - 1s 822us/step - loss: 5.5688 - acc: 0.0210
Epoch 2/10
1000/1000 [==============================] - 1s 804us/step - loss: 5.4997 - acc: 0.0250
Epoch 3/10
1000/1000 [==============================] - 1s 833us/step - loss: 5.4204 - acc: 0.0330
Epoch 4/10
1000/1000 [==============================] - 1s 776us/step - loss: 5.3676 - acc: 0.0320
Epoch 5/10
1000/1000 [==============================] - 1s 758us/step - loss: 5.2833 - acc: 0.0380
Epoch 6/10
1000/1000 [==============================] - 1s 820us/step - loss: 5.2338 - acc: 0.0350
Epoch 7/10
1000/1000 [==============================] - 1s 817us/step - loss: 5.1674 - acc: 0.0370
Epoch 8/10
1000/1000 [==============================] - 1s 873us/step - loss: 5.0907 - acc: 0.0430
Epoch 9/10
1000/1000 [==============================] - 1s 850us/step - loss: 5.0300 - acc: 0.0490
Epoch 10/10
1000/1000 [==============================] - 1s 796us/step - loss: 4.9771 - acc: 0.0530





<keras.callbacks.callbacks.History at 0x1c9a7adf278>

多输出模型


利用相同的方法,我们还可以使用函数式API来构建具有多个输出的模型。一个简单的例子:输入一个人的社交媒体发帖,输出这个人的年龄、性别和收入水平

7-3 用函数式API实现一个三输出模型


from keras import layers
from keras import Input
from keras.models import Model

vocabulary_size = 50000
num_income_groups = 10

posts_input = Input(shape = (None, ), dtype = 'int32', name = 'posts')
embedded_posts = layers.Embedding(vocabulary_size, 256)(posts_input)

x = layers.Conv1D(128, 5, activation = 'relu')(embedded_posts)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation = 'relu')(x)
x = layers.Conv1D(256, 5, activation = 'relu')(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation = 'relu')(x)
x = layers.Conv1D(256, 5, activation = 'relu')(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation = 'relu')(x)

# 注意输出层都有具体名称
age_prediction = layers.Dense(1, name = 'age')(x)
income_prediction = layers.Dense(num_income_groups, 
                                 activation = 'softmax', 
                                 name = 'income')(x)
gender_prediction = layers.Dense(1, activation = 'sigmoid', name = 'gender')(x)

# 实例化最终模型
model = Model(posts_input,
              [age_prediction, income_prediction, gender_prediction])

在此之中,需要我们注意的是,训练这种模型需要对各个输出分别指定不同的损失函数,但是梯度下降要求我们将一个标量最小化,所以为了能够预测模型,我们必须将这些损失合并为单个标量

7-4 多输出模型的编译选项:多重损失


# 写法1
model.compile(optimizer = 'rmsprop', 
              loss = ['mse', 'categorical_crossentropy', 'binary_crossentropy'])

# 写法2
# 注意,这种写法与上述写法效果相同,但是只能对输出层具有名称时才使用
model.compile(optimizer = 'rmsprop',
              loss = {'age' : 'mse',
                      'income' : 'categorical_crossentropy',
                      'gender' : 'binary_crossentropy'})

注意:在进行权重融合的过程中,如果严重不匹配的损失贡献会使得模型出现“偏心”,即只考虑优化某一指标

对此,我们可以对输出的权重进行调整,通常:
年龄回归(MSE):3~5
性别分类:0.1


7-5 多输出模型的编译选项:损失加权


# 输出加权的写法1
model.compile(optimizer = 'rmsprop',
              loss = ['mse', 'categorical_crossentropy', 'binary_crossentropy'],
              loss_weights = [0.25, 1., 10])

# 输出加权的写法2
model.compile(optimizer = 'rmsprop',
              loss = {'age' : 'mse',
                      'income' : 'categorical_crossentropy',
                      'gender' : 'binary_crossentropy'},
              loss_weights = {'age' : 0.25,
                              'income' : 1.,
                              'gender' : 10.})

7-6 将数据输入到多输出模型中


model.fit(posts, [age_targets, income_targets, gender_targets],
          epochs = 10, batch_size = 64)

model.fit(posts, {'age' : age_targets,
                  'income' : income_targets,
                  'gender' : gender_targets}, 
                   epochs = 10, batch_size = 64)

层组成的有向无环图


经过使用上述的函数API,我们可以构建简单的多输入、多输出的函数模型,其中有两个著名的组件,分别是Inception模块和残差链接模块

  1. Inception模块

    • 是一个模块的堆叠
    • 首先是一个1 * 1的卷积,接着是一个3 * 3的卷积
    • 也可以有更复杂的形式
    • 1 * 1 的卷积是Inception的特色,有助于区分特征学习和空间学习
    • InceptionV3是Inception的一个分类
    • Xception将分别进行通道特征学习与空间特征性学习
  2. 残差连接

    • 解决了梯度消失和表示瓶颈
    • 前面的层的输出没有与后面的层连在一起,而是直接激活相加
    • 主要有两种方法分别是:恒等残差连接与线性残差连接

Inception模块实现代码


# Inception模块
from keras import layers

# 分支a
branch_a = layers.Conv2D(128, 1, activation = 'relu', strides = 2)(x)

# 分支b
branch_b = layers.Conv2D(128, 1, activation = 'relu')(x)
branch_b = layers.Conv2D(128, 3, activation = 'relu', strides = 2)(branch_b)

# 分支c
# 在这个分支中,池化层运用到了步幅
branch_c = layers.AveragePooling2D(3, strides = 2)(x)
branch_c = layers.Conv2D(128, 3, activation = 'relu')(branch_c)

# 分支d
branch_d = layers.Conv2D(128, 1, activation = 'relu')(x)
branch_d = layers.Conv2D(128, 3, activation = 'relu')(branch_d)
branch_d = layers.Conv2D(128, 3, activation = 'relu', strides = 2)(branch_d) # 步幅2

output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis = -1)

恒等残差连接模块


# 恒等残差连接
from keras import layers

x = ...
y = layers.Conv2D(128, 3, activation = 'relu', padding = 'same')(x)
y = layers.Conv2D(128, 3, activation = 'relu', padding = 'same')(y)
y = layers.Conv2D(128, 2, activation = 'relu', padding = 'same')(y)

# 将原始地址与输出特征恒等相加
y = layers.add([y, x])

线性残差连接模块


# 线性残差连接
from keras import layers

x = ...
y = layers.Conv2D(128, 3, activation = 'relu', padding = 'same')(x)
y = layers.Conv2D(128, 3, activation = 'relu', padding = 'same')(y)
y = layers.MaxPooling(2, strides = 2)(y)

# 使用 1 * 1卷积将原始x张量线性下采样为与y相同的形状
residual = layers.Conv2D(128, 1, strides = 2, padding = 'same')(x)

y = layers.add([y, x])

共享层权重


from keras import layers
from keras import Input
from keras.models import Model

# 将lstm实例化
lstm = layers.LSTM(32)

# 构建模型左分支
left_input = Input(shape = (None, 128))
left_output = lstm(left_input)

# 构建模型右分支
right_input = Input(shape = (None, 128))
right_output = lstm(right_input)

# 在上面构建一个分类器
merged = layers.concatenate([left_output, right_output], axis = -1)
predictions = layers.Dense(1, activation = 'sigmoid')(merged)

model = Model([left_input, right_input], predictions)
model.fit([left_data, right_data], targets)

将模型作为层


from keras import layers
from keras import applications
from keras import Input

xception_base = applications.Xception(weights = None, include_top = False)

# 输入数据
left_input = Input(shape = (250, 250, 3))
right_input = Input(shape = (250, 250, 3))

# 构建模型
left_features = xception_base(left_input)
right_input = xception_base(right_input)

# 合并特征包括了来自左右视觉输入的信息
merged_features = layers.concatenate([left_features, right_input], axis = -1)

写在最后

注:本文代码来自《Python 深度学习》,做成电子笔记的方式上传,仅供学习参考,作者均已运行成功,如有遗漏请练习本文作者

各位看官,都看到这里了,麻烦动动手指头给博主来个点赞8,您的支持作者最大的创作动力哟!
<(^-^)>
才疏学浅,若有纰漏,恳请斧正
本文章仅用于各位同志作为学习交流之用,不作任何商业用途,若涉及版权问题请速与作者联系,望悉知

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值