Keras入门笔记（3）

最新推荐文章于 2024-05-13 20:45:49 发布

执念之年-转身

最新推荐文章于 2024-05-13 20:45:49 发布

阅读量415

点赞数

分类专栏： python学习笔记文章标签： Keras Python deep learning

本文链接：https://blog.csdn.net/qq_37587850/article/details/83619709

版权

python学习笔记专栏收录该内容

10 篇文章 1 订阅

订阅专栏

本文参考Keras官方文档进行学习

一些基本概念

张量

如何理解啥是张量？

0阶张量：标量；1阶张量：向量；2阶张量：矩阵；3阶张量：立方体；等等

例如：[[1,2],[3,4]]是2阶张量，有两个维度或轴，沿着第0个轴（水平方向）（为了与python的计数方式一致，本文档维度和轴从0算起）你看到的是[1,2]，[3,4]两个向量，沿着第1个轴（竖直方向）你看到的是[1,3]，[2,4]两个向量。

我们通过理解并运行以下代码来理解“沿着某个轴”：

import numpy as np

a = np.array([[1,2],[3,4]])
sum0 = np.sum(a, axis=0)
sum1 = np.sum(a, axis=1)

print(sum0)
print(sum1)

运行结果如下：

axis=0时，即沿水平方向，sum()是求和。

data_format

Theano和TensorFlow在这方面发生了分歧。Theano采用“channels_first”通道维靠前，即（样本数，通道数，高，宽）；TensorFlow采用“channels_last”通道维靠后，即（样本数，高，宽，通道数）。在代码中通过K.image_data_format()获得组织形式。

函数式模型

在Keras2中，增加了了“functional model API”，移除了图模型。“functional model API”更加强调了Sequential是特殊情况这一点（Sequential，称为序贯模型，也就是单输入单输出，一条路通到底，层与层之间只有相邻关系，跨层连接统统没有）。一般的模型就称为Model，然后如果你要用简单的Sequential，那还有一个快捷方式Sequential。

由于functional model API在使用时利用的是“函数式编程”的风格，我们这里将其译为函数式模型。总而言之，只要这个东西接收一个或一些张量作为输入，然后输出的也是一个或一些张量，那不管它是什么鬼，统统都称作“模型”。

batch

深度学习的优化算法，就是梯度下降，每次的参数更新有两种方式：

遍历全部数据集算一次损失函数，然后算函数对各个参数的梯度，更新梯度。这种方法每更新一次参数都要把数据集里的所有样本都看一遍，计算量开销大，计算速度慢，不支持在线学习，这称为Batch gradient descent，批梯度下降。
每看一个数据就算一下损失函数，然后求梯度更新参数，这个称为随机梯度下降，stochastic gradient descent。这个方法速度比较快，但是收敛性能不太好，可能在最优点附近晃来晃去，hit不到最优点。两次参数的更新也有可能互相抵消掉，造成目标函数震荡的比较剧烈。

为了克服方法的不足，采用的是一种折中手段，mini-batch gradient decent，小批的梯度下降，这种方法把数据分为若干个批，按批来更新参数，这样，一个批中的一组数据共同决定了本次梯度的方向，下降起来就不容易跑偏，减少了随机性。另一方面因为批的样本数与整个数据集相比小了很多，计算量也不是很大。现在的梯度下降都是基于mini-batch的，所以Keras的模块中经常会出现batch_size，就是指这个，也就是说，mini_batch=batch_size。

备注：Keras中用的SGD优化器也是基于mini_batch的。

epochs

epochs指的就是训练过程中数据将被“轮”多少次。

Keras FAQ：常见问题

如何使Keras调用GPU？

采用Tensorflow作为后端，当机器上有可用的GPU时，代码会自动调用GPU进行并行计算。

如何在多张GPU卡上使用Keras？

当有多个GPU卡可用时，使用Tensorflow作为后端。这时，有两种方法在多张GPU上运行一个模型：数据并行/设备并行，大多数情况下，使用“数据并行”。

数据并行

数据并行是指在多个设备上将目标模型各复制一份，并使用每个设备上的复制品处理整个数据集的不同部分数据。Keras在keras.utils.multi_gpu_model中提供有内置函数，该函数可以产生任意模型的数据并行版本，最高支持在 8片GPU上并行。可以参考以下例子：

from keras.utils import multi_gpu_model

# Replicates `model` on 8 GPUs.
# This assumes that your machine has 8 available GPUs.
parallel_model = multi_gpu_model(model, gpus=8)
parallel_model.compile(loss='categorical_crossentropy',
                       optimizer='rmsprop')

# This `fit` call will be distributed on 8 GPUs.
# Since the batch size is 256, each GPU will process 32 samples.
parallel_model.fit(x, y, epochs=20, batch_size=256)

设备并行

设备并行是在不同设备上运行同一个模型的不同部分，当模型含有多个并行结构，例如含有两个分支时，这种方式很适合。

这种并行方法可以通过使用TensorFlow device scopes实现，下面是一个例子：

# Model where a shared LSTM is used to encode two different sequences in parallel
input_a = keras.Input(shape=(140, 256))
input_b = keras.Input(shape=(140, 256))

shared_lstm = keras.layers.LSTM(64)

# Process the first sequence on one GPU
with tf.device_scope('/gpu:0'):
    encoded_a = shared_lstm(tweet_a)
# Process the next sequence on another GPU
with tf.device_scope('/gpu:1'):
    encoded_b = shared_lstm(tweet_b)

# Concatenate results on CPU
with tf.device_scope('/cpu:0'):
    merged_vector = keras.layers.concatenate([encoded_a, encoded_b],
                                             axis=-1)

"batch", "epoch"和"sample"都是啥意思？

Sample：样本，数据集中的一条数据。例如图片数据集中的一张图片，语音数据中的一段音频。

Batch：批，一个batch由若干条数据构成。batch是进行网络优化的基本单位，网络参数的每一轮优化需要使用一个batch。batch中的样本是被并行处理的。与单个样本相比，一个batch的数据能更好的模拟数据集的分布，batch越大则对输入数据分布模拟的越好，反应在网络训练上，则体现为能让网络训练的方向“更加正确”。但另一方面，一个batch也只能让网络的参数更新一次，因此网络参数的迭代会较慢。在测试网络的时候，应该在条件的允许的范围内尽量使用更大的batch，这样计算效率会更高

Epoch：轮次，如果说每个batch对应网络的一次更新的话，一个epoch对应的就是网络的一轮更新。每一轮更新中网络更新的次数可以随意，但通常会设置为遍历一遍数据集。因此一个epoch的含义是模型完整的看了一遍数据集。设置epoch的主要作用是把模型的训练的整个训练过程分为若干个段，这样我们可以更好的观察和调整模型的训练。Keras中，当指定了验证集时，每个epoch执行完后都会运行一次验证集以确定模型的性能。另外，我们可以使用回调函数在每个epoch的训练前后执行一些操作，如调整学习率，打印目前模型的一些信息等，详情请参考Callback一节。

如何保存Keras模型？

我们可以使用model.save(filepath),将Keras模型和权重保存在一个HDF5文件中，该文件包含：

模型的结构，以便重构该模型
模型的权重
训练配置（损失函数，优化器等）
优化器的状态，以便于从上次训练中断的地方开始

使用keras.model.load_mode(filepath)来重新实例化我们的模型，如果文件中存储了训练配置的话，该函数还会同时完成模型的编译，例子：

from keras.models import load_model

model.save('my_model.h5')  # creates a HDF5 file 'my_model.h5'
del model  # deletes the existing model

# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')

如果我们只希望保存模型的结构，而不保存其权重或配置信息，可以使用：

# save as JSON
json_string = model.to_json()

# save as YAML
yaml_string = model.to_yaml()

通过以上代码我们可以把模型序列化为json或者yaml文件，这些文件对人而言是友好的，如果需要的话我们甚至可以手动打开这些文件进编辑。当然，我们可以从保存好的json文件或yaml文件中载入模型：

# model reconstruction from JSON:
from keras.models import model_from_json
model = model_from_json(json_string)

# model reconstruction from YAML
model = model_from_yaml(yaml_string)

如果需要保存模型的权重，可以通过下面的代码利用HDF5进行保存。（需要安装库h5py）

model.save_weights('my_model_weights.h5')

如果我们需要加载权重到不同的网络结构（有些层一样）中，例如fine-tune或transfer-learning，你可以通过层名字来加载模型：

model.load_weights('my_model_weights.h5', by_name=True)

如何获取中间层的输出？

第一种方法：创建一个新的Model，使得它的输出是你想要的那个输出：

from keras.models import Model

model = ...  # create the original model

layer_name = 'my_layer'
intermediate_layer_model = Model(input=model.input,
                                 output=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model.predict(data)

第二种方法：建立一个Keras的函数

from keras import backend as K

# with a Sequential model
get_3rd_layer_output = K.function([model.layers[0].input],
                                  [model.layers[3].output])
layer_output = get_3rd_layer_output([X])[0]

第三种方法：当我们的模型中有Dropout层、批规范化层等组件时，我们需要在函数中传递一个learning_phase的标记：

get_3rd_layer_output = K.function([model.layers[0].input, K.learning_phase()],
                                  [model.layers[3].output])

# output in test mode = 0
layer_output = get_3rd_layer_output([X, 0])[0]

# output in train mode = 1
layer_output = get_3rd_layer_output([X, 1])[0]

如何利用Keras处理超过机器内存的数据集？

第一种方法：可以使用model.train_on_batch(X,y)和model.test_on_batch(X,y)

第二种方法：可以编写一个每次产生一个batch样本的生成器函数，并调用model.fit_generator(data_generator, samples_per_epoch, nb_epoch)进行训练

当验证集的loss不再下降时，如何中断训练？

from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=2)
model.fit(X, y, validation_split=0.2, callbacks=[early_stopping])

验证集是如何从训练集中分割出来的？

如果在 model.fit中设置validation_spilt的值，则可以将数据分为训练集和测试集。例如：设置该值为0.1，则训练集的最后10%数据将作为验证集。注意：原数据在进行验证集分割前没有被shuffle。

训练数据在训练时会被随机洗乱吗？

如果model.fit的shuffle参数为真，训练的数据就会被随机洗乱。不设置时默认为真。训练数据会在每个epoch的训练中都重新洗乱一次。验证集的数据不会被洗乱。

如何在每个epoch后记录训练/测试的loss和正确率？

model.fit在运行结束后返回一个History对象，其中含有的history属性包含了训练过程中损失函数的值以及其他度量指标。

hist = model.fit(X, y, validation_split=0.2)
print(hist.history)

如何使用状态RNN？

当使用状态RNN时，需要假设：所有的batch都具有相同数目的样本，如果X1和X2是两个相邻的batch，那么对于任何i，X2[i]都是X1[i]的后续序列。

要使用状态RNN时，我们需要显示地指定每个batch的大小。我们可以通过模型的首层参数batch_input_shape来完成，例如设置它的值为（32，10，16），代表一个具有10个时间步，每步向量长为16，每32个样本构成一个batch的输入数据格式。在RNN层中，设置stateful=True。

要重置网络的状态，使用：

model.reset_states()来重置网络中所有层的状态
layer.reset_states()来重置指定层的状态

例如：

X  # this is our input data, of shape (32, 21, 16)
# we will feed it to our model in sequences of length 10

model = Sequential()
model.add(LSTM(32, input_shape=(10, 16), batch_size=32, stateful=True))
model.add(Dense(16, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

# we train the network to predict the 11th timestep given the first 10:
model.train_on_batch(X[:, :10, :], np.reshape(X[:, 10, :], (32, 16)))

# the state of the network has changed. We can feed the follow-up sequences:
model.train_on_batch(X[:, 10:20, :], np.reshape(X[:, 20, :], (32, 16)))

# let's reset the states of the LSTM layer:
model.reset_states()

# another way to do it in this case:
model.layers[0].reset_states()

注意，predict，fit，train_on_batch ，predict_classes等方法都会更新模型中状态层的状态。这使得你不但可以进行状态网络的训练，也可以进行状态网络的预测。

如何“冻结”网络的层？

“冻结”一个层指的是该层将不参加网络训练，即该层的权重永不会更新。在进行fine-tune时我们经常会需要这项操作。在使用固定的embedding层处理文本输入时，也需要这个技术。可以通过向层的构造函数传递trainable参数来指定一个层是不是可训练的，如：

frozen_layer = Dense(32,trainable=False)

此外，也可以通过将层对象的trainable属性设为True或false来为已经搭建好的模型设置是否冻结。设置完后需要运行compile来使设置生效。例如以下代码：

x = Input(shape=(32,))
layer = Dense(32)
layer.trainable = False
y = layer(x)

frozen_model = Model(x, y)
# in the model below, the weights of `layer` will not be updated during training
frozen_model.compile(optimizer='rmsprop', loss='mse')

layer.trainable = True
trainable_model = Model(x, y)
# with this model the weights of the layer will be updated during training
# (which will also affect the above model since it uses the same layer instance)
trainable_model.compile(optimizer='rmsprop', loss='mse')

frozen_model.fit(data, labels)  # this does NOT update the weights of `layer`
trainable_model.fit(data, labels)  # this updates the weights

如何从Sequential模型中除掉一个层？

可以通过调用.pop()方法去除模型的最后一个层。

如何在Keras中使用预训练的模型？

Keras提供了以下模型：

VGG16
VGG19
ResNet50
Inception v3

可通过keras.applications载入这些模型：

from keras.applications.vgg16 import VGG16
from keras.applications.vgg19 import VGG19
from keras.applications.resnet50 import ResNet50
from keras.applications.inception_v3 import InceptionV3

model = VGG16(weights='imagenet', include_top=True)

这些代码的使用示例请参考.Application模型的文档。使用这些预训练模型进行特征抽取或fine-tune的例子可以参考博客https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html。

VGG模型也是很多Keras例子的基础模型，如

如何在Keras中使用HDF5输入？

我们可以使用keras.utils中的HDF5Matrix类来读取HDF5输入。

也可以直接使用HDF5数据库，如以下代码：

import h5py
with h5py.File('input/file.hdf5', 'r') as f:
    X_data = f['X_data']
    model.predict(X_data)

执念之年-转身

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Keras入门笔记（3）

本文参考Keras官方文档进行学习一些基本概念张量如何理解啥是张量？0阶张量：标量；1阶张量：向量；2阶张量：矩阵；3阶张量：立方体；等等例如：[[1,2],[3,4]]是2阶张量，有两个维度或轴，沿着第0个轴（水平方向）（为了与python的计数方式一致，本文档维度和轴从0算起）你看到的是[1,2]，[3,4]两个向量，沿着第1个轴（竖直方向）你看到的是[1,3]，[2,4]...
复制链接

扫一扫