CNTK：前馈网络

最新推荐文章于 2024-04-12 18:03:55 发布

shaw_xianyu

最新推荐文章于 2024-04-12 18:03:55 发布

阅读量286

点赞数

分类专栏： CNTK 机器学习文章标签： CNTK 人工智能

CNTK 机器学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

在CNTK102教程中，该部分的目的是通过分类，熟悉CNTK python 库中的组件，如果看了回了Logistic回归的话，这部分可以跳过前面的介绍部分了。

介绍：

问题描述：癌症医院提供了数据，并希望我们确定患者是否有致命的恶性肿瘤或良性肿瘤。这类问题被称为分类问题。为了帮助对每个病人进行分类，我们给予了他们的年龄和肿瘤的大小。直观地，可以想象，年轻的患者或/和小肿瘤的患者不太可能患有恶性肿瘤。在下面的图中，红色表示恶性和蓝色表示良性。注意：这是一个学习的例子; 在现实生活中，需要来自不同测试/检查来源的许多特征和医生的专业知识将为患者做出诊断、治疗决定。

 
   [python]  
   view plain copy
from IPython.display import Image  
Image(url="https://www.cntk.ai/jup/cancer_data_plot.jpg", width=400, height=400)  

目标：我们的目标是学习一个分类器，根据两种特征（年龄，肿瘤大小）将任何患者分为良性或恶性类别。

在CNTK 101教程中，也就是前一篇文章，介绍了Logistic回归这样一个线性分类器，尽管在分类过程中仍会将某些数据点分错。在现实世界中，线性分类器无法在几乎不知道如何构建特征的情况下对数据进行准确的建模。这通常会导致精度限制，并需要具有更复杂决策边界的模型。在本教程中讲将多个线性单元组合成非线性分类器。

方法：任何学习算法通常有五个阶段：数据读取，数据预处理，创建模型，学习模型参数以及评估（又名测试/预测）模型。

前馈网络模型：

在这里使用的数据集与前面的Logistic回归教程中使用的数据集一样。

前馈网络如上图所示，前馈神经网络单元之间的连接不会形成一个环，也是最简单的神经网络。在整个网络中，信息从输入节点开始，经过隐藏层（有可能没有）到输出层，只向一个方向传递。

检查是否安装CNTK 以及导入组件

# Import the relevant components
from __future__ import print_function # Use a function definition from future version (say 3.x from 2.7 interpreter)
import matplotlib.pyplot as plt
%matplotlib inline

import numpy as np
import sys
import os

import cntk as C
import cntk.tests.test_utils
cntk.tests.test_utils.set_device_from_pytest_env() # (only needed for our build system)
C.cntk_py.set_fixed_random_seed(1) # fix a random seed for CNTK components

数据生成

操作与CNTK 101 （上一篇）一样。用numpy库生成一些模拟癌症的数据。这里定义了两个输入的特征和两个标签。在示例中，训练数据中每组数据都有一个标签，良性或恶性，所以这里为二分类问题。

# Ensure we always get the same amount of randomness
np.random.seed(0)

# Define the data dimensions
input_dim = 2
num_output_classes = 2

输入和标签

# Helper function to generate a random data sample
def generate_random_data_sample(sample_size, feature_dim, num_classes):
    # Create synthetic data using NumPy.
    Y = np.random.randint(size=(sample_size, 1), low=0, high=num_classes)

    # Make sure that the data is separable
    X = (np.random.randn(sample_size, feature_dim)+3) * (Y+1)
    X = X.astype(np.float32)
    # converting class 0 into the vector "1 0 0",
    # class 1 into vector "0 1 0", ...
    class_ind = [Y==class_number for class_number in range(num_classes)]
    Y = np.asarray(np.hstack(class_ind), dtype=np.float32)
    return X, Y

# Create the input variables denoting the features and the label data. Note: the input
# does not need additional info on number of observations (Samples) since CNTK first create only
# the network tooplogy first
mysamplesize = 64
features, labels = generate_random_data_sample(mysamplesize, input_dim, num_output_classes)

可视化生成的数据

# Plot the data
import matplotlib.pyplot as plt
%matplotlib inline

# given this is a 2 class
colors = ['r' if l == 0 else 'b' for l in labels[:,0]]

plt.scatter(features[:,0], features[:,1], c=colors)
plt.xlabel("Scaled age (in yrs)")
plt.ylabel("Tumor size (in cm)")
plt.show()

如果运行不了matplotlib.pyplot，可以运行conda install matplotlib来修复。

模型创建

这里设置的前馈网络比较简单，包含两个隐藏层，每层有50个节点。

实例中绿色节点为隐藏层节点，每层个数为50，层数设置为2

num_hidden_layers  =  2 
hidden_layers_dim  =  50

输入和输出

# The input variable (representing 1 observation, in our example of age and size) x, which
# in this case has a dimension of 2.
#
# The label variable has a dimensionality equal to the number of output classes in our case 2.

input = C.input_variable(input_dim)
label = C.input_variable(num_output_classes)

网络的输入为数据点或者样本，在本教程中为蓝点或红的。在这里输入数据的维度为年龄和肿瘤大小，输出标签为良性和恶性。

前馈网络设置

第一层是输入层，输入向量X，维度为M，从输入层到第一层隐藏层Z1，维度为n。输入层的节点到第一层隐藏层的输出为

z1=W⋅x+b

w是权重矩阵，b是n维向量的偏置。

在linear_layer函数中，执行两个操作，1：乘以权值w2：加上偏置项b

def linear_layer(input_var, output_dim):
    input_dim = input_var.shape[0]

    weight = C.parameter(shape=(input_dim, output_dim))
    bias = C.parameter(shape=(output_dim))

    return bias + C.times(input_var, weight)

下一步是用非线性函数转换线性层的输出，这里使用的激活函数是sigmoid。

def dense_layer(input_var, output_dim, nonlinearity):
    l = linear_layer(input_var, output_dim)

    return nonlinearity(l)

在这里定义了一个nonlinearity，可以用来尝试不同的激活函数。

在这里我们已经创建了一个隐藏层，这一层的输出会成为下一层的输入。在这个例子中共有两层，所以代码为

h1  =  dense_layer （input_var ， hidden_layer_dim ， sigmoid ）
h2  =  dense_layer （h1 ， hidden_layer_dim ， sigmoid ）

有的时候为了灵活设置层数，代码可以写为

h = dense_layer(input_var, hidden_layer_dim, sigmoid)
for i in range(1, num_hidden_layers):
    h = dense_layer(h, hidden_layer_dim, sigmoid)

# Define a multilayer feedforward classification model
def fully_connected_classifier_net(input_var, num_output_classes, hidden_layer_dim,
                                   num_hidden_layers, nonlinearity):

    h = dense_layer(input_var, hidden_layer_dim, nonlinearity)
    for i in range(1, num_hidden_layers):
        h = dense_layer(h, hidden_layer_dim, nonlinearity)

    return linear_layer(h, num_output_classes)

网络的输出用z表示

# Create the fully connected classfier
z = fully_connected_classifier_net(input, num_output_classes, hidden_layers_dim,
                                   num_hidden_layers, C.sigmoid)

虽然上述网络有助于我们更好地理解如何使用CNTK构建网络，但使用图层库会更加方便快捷。它提供了预定义的常用“图层”（lego like blocks），它简化了由彼此层叠的标准图层组成的网络设计。例如，dense_layer已经可以通过dense图层功能轻松访问以组成我们的深层模型。我们可以将输入变量传递给此模型以获得输出。

训练

通过softmax函数，输出每个类别的概率。为了训练分类器，我们需要定义损失函数，最小化输出和真实标签的误差。

Cross-entropy就是常用的损失函数，它的数学形式为：

H(p)=−∑j=1|y|yjlog(pj)

其中p是经由softmax计算得到的预测概率，y为真实的标签值。 ——参考CNTK 101 即上一篇文章。

loss = C.cross_entropy_with_softmax(z, label)

评估的话，则比较训练的结果与实际的label

eval_error = C.classification_error(z, label)

在训练的过程中，努力使得oss最小。在这里使用随机梯度下降，SGD。通常，开始的时候模型参数的随机初始化。然后计算预测和真实标签之间的误差，应用梯度下降生成新的模型参数集合，通过不断更迭参数，当误差不在显著变化或到达某一个范围的时候停止。在这里优化的关键参数之一就是learning_rate 学习率，用来调整迭代的次数。

# Instantiate the trainer object to drive the model training
learning_rate = 0.5
lr_schedule = C.learning_parameter_schedule(learning_rate)
learner = C.sgd(z.parameters, lr_schedule)
trainer = C.Trainer(z, (loss, eval_error), [learner])
# Define a utility function to compute the moving average sum.
# A more efficient implementation is possible with np.cumsum() function
def moving_average(a, w=10):
    if len(a) < w:
        return a[:]    # Need to send a copy of the array
    return [val if idx < w else sum(a[(idx-w):idx])/w for idx, val in enumerate(a)]


# Defines a utility that prints the training progress
def print_training_progress(trainer, mb, frequency, verbose=1):
    training_loss = "NA"
    eval_error = "NA"

    if mb%frequency == 0:
        training_loss = trainer.previous_minibatch_loss_average
        eval_error = trainer.previous_minibatch_evaluation_average
        if verbose:
            print ("Minibatch: {}, Train Loss: {}, Train Error: {}".format(mb, training_loss, eval_error))

    return mb, training_loss, eval_error

训练模型

通过基本的参数设置之后，就可以开始训练简单的前馈网络。

# Initialize the parameters for the trainer
minibatch_size = 25
num_samples = 20000
num_minibatches_to_train = num_samples / minibatch_size

# Run the trainer and perform model training
training_progress_output_freq = 20

plotdata = {"batchsize":[], "loss":[], "error":[]}

for i in range(0, int(num_minibatches_to_train)):
    features, labels = generate_random_data_sample(minibatch_size, input_dim, num_output_classes)

    # Specify the input variables mapping in the model to actual minibatch data for training
    trainer.train_minibatch({input : features, label : labels})
    batchsize, loss, error = print_training_progress(trainer, i,
                                                     training_progress_output_freq, verbose=0)

    if not (loss == "NA" or error =="NA"):
        plotdata["batchsize"].append(batchsize)
        plotdata["loss"].append(loss)
        plotdata["error"].append(error)

这里画出loss图，可以注意到loss大致是下降的，虽然不一定是单调的。让loss平衡的一种方式是增大前面设定的minbatch_size。

# Compute the moving average loss to smooth out the noise in SGD
plotdata["avgloss"] = moving_average(plotdata["loss"])
plotdata["avgerror"] = moving_average(plotdata["error"])

# Plot the training loss and the training error
import matplotlib.pyplot as plt

plt.figure(1)
plt.subplot(211)
plt.plot(plotdata["batchsize"], plotdata["avgloss"], 'b--')
plt.xlabel('Minibatch number')
plt.ylabel('Loss')
plt.title('Minibatch run vs. Training loss')

plt.show()

plt.subplot(212)
plt.plot(plotdata["batchsize"], plotdata["avgerror"], 'r--')
plt.xlabel('Minibatch number')
plt.ylabel('Label Prediction Error')
plt.title('Minibatch run vs. Label Prediction Error')
plt.show()

在训练模型之后，我们要做的就是评估模型。在原始的数据集中，一般70%作为训练数据，剩下的作为测试集合。在这个例子中，采用生成一些数据来作为测试数据。

# Generate new data
test_minibatch_size = 25
features, labels = generate_random_data_sample(test_minibatch_size, input_dim, num_output_classes)

trainer.test_minibatch({input : features, label : labels})

最后运行的结果为0.12