机器学习之keras 的function API

辣椒种子

于 2023-12-15 23:48:48 发布

阅读量950

点赞数 26

分类专栏：机器学习文章标签：机器学习 keras 人工智能

本文链接：https://blog.csdn.net/lijunhcn/article/details/135027070

版权

机器学习专栏收录该内容

97 篇文章 8 订阅

订阅专栏

文章代码来源：《deep learning on keras》，非常好的一本书，大家如果英语好，推荐直接阅读该书，如果时间不够，可以看看此系列文章。

为什么需要API

看似有了之前的分类和回归两种例子，我们已经能够搞定世界上的所有东西了，但是，千万不要高兴的太早，因为我们之前介绍的只是将卷积层一层一层堆放的方法，这个方法虽然很管用，但是在面临以下三种情况，就显得很无力了：

多个输入的情况
多个输出的情况
隔着层有关系的情况

多个输入

文章举了一个价格预测的例子：我们有三个信息源来得到价格相关信息：

用户提交的数据
顾客口述的文本资料
相关图片

在这里插入图片描述

当然，我们可以分别三个单独训练以后加权平均，但是我们又无法保证这三个信息输入源的信息是相互独立的，很可能他们交叉以后能得到更好的结果。

多个输出

例如，我们有一堆书的数据，我们又希望得到书的种类，又希望得到出版日期，当然此时也可以分别训练得到，同样的，我们也无法保证这二者的相互独立性，所以一起训练可能会更好。

在这里插入图片描述

类似于图的结构

在这里插入图片描述

图1和图2分别表示了多个不同卷积层和残差网络的情况，这都是我们原来的方法难以解决的，所以这里隆重介绍API来搭建这些略微复杂的网络。

如何使用API

from keras import Input, layers
# This is a tensor.
input_tensor = Input(shape=(32,))
# A layer is a function.
dense = layers.Dense(32, activation='relu')
# A layer may be called on a tensor, and it returns a tensor.
output_tensor = dense(input_tensor)

在这里，我们把dense定义成了一个函数，input_tensor是一个张量，经过dense的作用输出成另一个张量。

from keras.models import Sequential, Model
from keras import layers
from keras import Input
# A Sequential model, which you already know all about.
seq_model = Sequential()
seq_model.add(layers.Dense(32, activation='relu', input_shape=(64,)))
seq_model.add(layers.Dense(32, activation='relu'))
seq_model.add(layers.Dense(10, activation='softmax'))
# Its functional equivalent
input_tensor = Input(shape=(64,))
x = layers.Dense(32, activation='relu')(input_tensor)
x = layers.Dense(32, activation='relu')(x)
output_tensor = layers.Dense(10, activation='softmax')(x)
# The Model class turns an input tensor and output tensor into a model
model = Model(input_tensor, output_tensor)
# Let's look at it!
model.summary()

此处使用model.summary()来查看我们构建的网络的情况：

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 64) 0
_________________________________________________________________
dense_1 (Dense) (None, 32) 2080
_________________________________________________________________
dense_2 (Dense) (None, 32) 105
_________________________________________________________________
dense_3 (Dense) (None, 10) 33
=================================================================
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_______________________________________________________________

【实例】问答系统

在这里插入图片描述

构建问答系统

这个网络会根据输入的问题和给的参考文献来自己找答案，通过lstm层处理以后用concatenate，将二者合起来训练得出答案，属于典型的多输入问题。

from keras.models import Model
from keras import layers
from keras import Input
text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500
# Our text input is a variable-length sequence of integers.
# Note that we can optionally name our inputs!
text_input = Input(shape=(None,), dtype='int32', name='text')
# Which we embed into a sequence of vectors of size 64
embedded_text = layers.Embedding(64, text_vocabulary_size)(text_input)
# Which we encoded in a single vector via a LSTM
encoded_text = layers.LSTM(32)(embedded_text)
# Same process (with different layer instances) for the question
question_input = Input(shape=(None,), dtype='int32', name='question')
embedded_question = layers.Embedding(32, question_vocabulary_size)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)
# We then concatenate the encoded question and encoded text
concatenated = layers.concatenate([encoded_text, encoded_question], axis=-1)
# And we add a softmax classifier on top
answer = layers.Dense(answer_vocabulary_size, activation='softmax')(concatenated)
# At model instantiation, we specify the two inputs and the output:
model = Model([text_input, question_input], answer)
model.compile(optimizer='rmsprop',
  loss='categorical_crossentropy',
  metrics=['acc'])

理解起来很简单，我们把layers看成函数，然后只需要分别定义两个输入的tensor，然后按照图的层次顺序依次用函数调用上一层的输出作为这一层的自变量。

生成随机数据作为输入输出

import numpy as np
# Let's generate some dummy Numpy data
text = np.random.randint(1, text_vocabulary_size, size=(num_samples, max_length))
question = np.random.randint(1, question_vocabulary_size, size=(num_samples, max_length))
# Answers are one-hot encoded, not integers
answers = np.random.randint(0, 1, size=(num_samples, answer_vocabulary_size))
# Fitting using a list of inputs
model.fit([text, question], answers, epochs=10, batch_size=128)
# Fitting using a dictionary of inputs (only if inputs were named!)
model.fit({'text': text, 'question': question}, answers,
  epochs=10, batch_size=128)

接下来生成了一些随机的数据来喂给网络即可。

【实例】多输出情况

在这里插入图片描述

from keras import layers
from keras import Input
from keras.models import Model
vocabulary_size = 50000
num_income_groups = 10
posts_input = Input(shape=(None,), dtype='int32', name='posts')
embedded_posts = layers.Embedding(256, vocabulary_size)(posts_input)
x = layers.Conv1D(128, 5, activation='relu')(embedded_posts)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation='relu')(x)
# Note that we are giving names to the output layers.
age_prediction = layers.Dense(1, name='age')(x)
income_prediction = layers.Dense(num_income_groups, activation='softmax', name='income')(x)
gender_prediction = layers.Dense(1, activation='sigmoid', name='gender')(x)
model = Model(input_posts, [age_prediction, income_prediction, gender_prediction])

上面代码也没什么难度，可以理解为除了最后输出那一层加的不一样以外，之前提取特征呀什么的都一样，接下来就是compile的选取：

model.compile(optimizer='rmsprop',
  loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'])
# Equivalent (only possible if you gave names to the output layers!):
model.compile(optimizer='rmsprop',
  loss={'age': 'mse',
  'income': 'categorical_crossentropy',
  'gender': 'binary_crossentropy'})

这里给出了两种写loss的方法，第二种的前提是在定义了名字的前提下。

model.compile(optimizer='rmsprop',
  loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'],
  loss_weights=[0.25, 1., 10.])
# Equivalent (only possible if you gave names to the output layers!):
model.compile(optimizer='rmsprop',
  loss={'age': 'mse',
  'income': 'categorical_crossentropy',
  'gender': 'binary_crossentropy'},
  loss_weights={'age': 0.25,
  'income': 1.,
  'gender': 10.})

有偏好的情况下，我们给loss赋予权重，这里说明我们更加重视性别的预测结果。

# age_targets, income_targets and gender_targets are assumed to be Numpy arrays
model.fit(posts, [age_targets, income_targets, gender_targets],
  epochs=10, batch_size=64)
# Equivalent (only possible if you gave names to the output layers!):
model.fit(posts, {'age': age_targets,
  'income': income_targets,
  'gender': gender_targets},
  epochs=10, batch_size=64)

喂数据。

图模样的网络构建

虽然我们有了API这个强大的工具，几乎可以构建所有的图，但是有一类图我们是弄不了的，那就是循环图，所以接下来我们讨论的都是非循环图。

from keras import layers
# We assume the existence of a 4D input tensor `x`
# Every branch has the same stride value (2), which is necessary to keep all
# branch outputs the same size, so as to be able to concatenate them.
branch_a = layers.Conv2D(128, 1, activation='relu', strides=2)(x)
# In this branch, the striding occurs in the spatial convolution layer
branch_b = layers.Conv2D(128, 1, activation='relu')(x)
branch_b = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_b)
# In this branch, the striding occurs in the average pooling layer
branch_c = layers.AveragePooling2D(3, strides=2, activation='relu')(x)
branch_c = layers.Conv2D(128, 3, activation='relu')(branch_c)
branch_d = layers.Conv2D(128, 1, activation='relu')(x)
branch_d = layers.Conv2D(128, 3, activation='relu')(branch_d)
branch_d = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_d)
# Finally, we concatenate the branch outputs to obtain the module output
output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

其实呢，理解了函数的概念，再看这个简直太小儿科了，只要注意concatenate的用法就好了。

残差网络的构建

from keras import layers
# We assume the existence of a 4D input tensor `x`
x = ...
y = layers.Conv2D(128, 3, activation='relu')(x)
y = layers.Conv2D(128, 3, activation='relu')(y)
y = layers.MaxPooling2D(2, strides=2)(y)
# We use a 1x1 convolution to linearly downsample
# the original `x` tensor to the same shape as `y`
residual = layers.Conv2D(1, strides=2)(x)
# We add the residual tensor back to the output features
y = layers.add([y, residual])

关键只要会用add就ok了
我们还遇到了需要共享某一层的情况，而keras也能很容易的实现。

from keras import layers
from keras import Input
from keras.models import Model
# We instantiate a single LSTM layer, once
lstm = layers.LSTM(32)
# Building the left branch of the model
# -------------------------------------
# Inputs are variable-length sequences of vectors of size 128
left_input = Input(shape=(None, 128))
left_output = lstm(left_input)
# Building the right branch of the model
# --------------------------------------
right_input = Input(shape=(None, 128))
# When we call an existing layer instance,
# we are reusing its weights
right_output = lstm(right_input)
# Building the classifier on top
# ------------------------------
merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, activation='sigmoid')(merged)
# Instantiating and training the model
# ------------------------------------
model = Model([left_input, right_input], predictions)
# When you train such a model, the weights of the `lstm` layer
# are updated based on both inputs.
model.fit([left_data, right_data], targets)