用 Python 从零开始创建神经网络（十八）：模型对象（Model Object）

原创

已于 2024-12-29 01:45:03 修改 · 1k 阅读

25 ·

CC 4.0 BY-SA版权

文章标签：

#python #神经网络 #开发语言 #深度学习 #人工智能 #回归

于 2024-12-28 18:18:55 首次发布

模型对象（Model Object）

引言
到目前为止的完整代码：

引言

我们构建了一个可以执行前向传播、反向传播以及精度测量等辅助任务的模型。通过编写相当多的代码并在一些较大的代码块中进行修改，我们实现了这些功能。此时，将模型本身转化为一个对象的做法开始显得更有意义，特别是当我们希望保存和加载这个对象以用于未来的预测任务时。此外，我们还可以利用这个对象减少一些常见代码行，使得与当前代码库的协作更加便捷，同时也更容易构建新的模型。为了完成模型对象的转换，我们将使用我们最近工作的模型，即使用正弦数据的回归模型：

from nnfs.datasets import sine_data

X, y = sine_data()

有了数据之后，我们制作模型类的第一步就是添加我们想要的各层。因此，我们可以通过以下操作来开始我们的模型类：

# Model class
class Model:
    def __init__(self):
        # Create a list of network objects
        self.layers = []
        
    # Add objects to the model
    def add(self, layer):
        self.layers.append(layer)

这样，我们就可以使用模型对象的添加方法来添加图层。仅这一点就能大大提高可读性。让我们添加一些图层：

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())

我们现在也可以查询这个模型：

print(model.layers)

>>>
[<__main__.Layer_Dense object at 0x000001D1EB2A2900>, 
<__main__.Activation_ReLU object at 0x000001D1EB2A2180>, 
<__main__.Layer_Dense object at 0x000001D1EB2A3F20>, 
<__main__.Activation_ReLU object at 0x000001D1EB2B9220>, 
<__main__.Layer_Dense object at 0x000001D1EB2BB800>, 
<__main__.Activation_Linear object at 0x000001D1EB2BBA40>]

除了添加层，我们还想为模型设置损失函数和优化器。为此，我们将创建一个名为 set 的方法：

# Set loss and optimizer
def set(self, *, loss, optimizer):
    self.loss = loss
    self.optimizer = optimizer

在参数定义中使用星号（*）表示后续的参数（在本例中是loss和optimizer）为关键字参数。由于这些参数没有默认值，因此它们是必需的关键字参数，也就是说必须通过名称和值的形式传递，从而使代码更加易读。

现在，我们可以将一个调用此方法的语句添加到我们新创建的模型对象中，并传递loss和optimizer对象：

# Create dataset
X, y = sine_data()

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())

# Set loss and optimizer objects
model.set(
    loss=Loss_MeanSquaredError(),
    optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),
    )

设置好模型的层、损失函数和优化器后，下一步就是训练了，因此我们要添加一个 train 方法。现在，我们先将其作为一个占位符，不久后再进行填充：

# Train the model
def train(self, X, y, *, epochs=1, print_every=1):
    # Main training loop
    for epoch in range(1, epochs+1):
        # Temporary
        pass

然后，我们可以在模型定义中添加对 train 方法的调用。我们将传递训练数据、epochs 的数量（10000，我们目前使用的是），以及打印训练摘要的频率。我们不需要或不希望每一步都打印，因此我们将对其进行配置：

# Create dataset
X, y = sine_data()

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())

# Set loss and optimizer objects
model.set(
    loss=Loss_MeanSquaredError(),
    optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),
    )

model.train(X, y, epochs=10000, print_every=100)

要进行训练，我们需要执行前向传播。在对象中执行前向传播稍微复杂一些，因为我们需要在层的循环中完成此操作，并且需要知道前一层的输出以正确地传递数据。查询前一层的一个问题是，第一层没有“前一层”。我们定义的第一层是第一隐含层。因此，我们的一个选择是创建一个“输入层”。这被认为是神经网络中的一层，但没有与之相关的权重和偏置。输入层仅包含训练数据，我们仅在循环迭代层时将其用作第一层的“前一层”。我们将创建一个新类，并像调用Layer_Dense类一样调用它，称为Layer_Input：

# Input "layer"
class Layer_Input:
    # Forward pass
    def forward(self, inputs):
        self.output = inputs

forward方法将训练样本设置为self.output。这一属性与其他层是通用的。这里没有必要实现反向传播方法，因为我们永远不会用到它。现在可能看起来创建这个类有点多余，但希望很快你就会明白我们将如何使用它。接下来，我们要为模型的每一层设置前一层和后一层的属性。我们将在Model类中创建一个名为finalize的方法：

	# Finalize the model
	def finalize(self):
	    # Create and set the input layer
	    self.input_layer = Layer_Input()
	    # Count all the objects
	    layer_count = len(self.layers)
	    # Iterate the objects
	    for i in range(layer_count):
	        # If it's the first layer,
	        # the previous layer object is the input layer
	        if i == 0:
	            self.layers[i].prev = self.input_layer
	            self.layers[i].next = self.layers[i+1]
	        # All layers except for the first and the last
	        elif i < layer_count - 1:
	            self.layers[i].prev = self.layers[i-1]
	            self.layers[i].next = self.layers[i+1]
	        # The last layer - the next object is the loss
	        else:
	            self.layers[i].prev = self.layers[i-1]
	            self.layers[i].next = self.loss

这段代码创建了一个输入层，并为模型对象的self.layers列表中的每一层设置了next和prev引用。我们创建了Layer_Input类，以便在循环中为第一隐藏层设置prev属性，因为我们将以统一的方式调用所有层。对于最后一层，其next层将是我们已经创建的损失函数。

现在，我们已经为模型对象执行前向传播所需的层信息准备就绪，让我们添加一个forward方法。我们将同时在训练时和之后仅进行预测（也称为模型推理）时使用这个forward方法。以下是在Model类中继续添加的代码：

# Forward pass
class Model:
	...
    # Performs forward pass
    def forward(self, X):
        # Call forward method on the input layer
        # this will set the output property that
        # the first layer in "prev" object is expecting
        self.input_layer.forward(X)
        # Call forward method of every object in a chain
        # Pass output of the previous object as a parameter
        for layer in self.layers:
            layer.forward(layer.prev.output)
        # "layer" is now the last object from the list,
        # return its output
        return layer.output

在这种情况下，我们传入输入数据 $X$ ，然后简单地通过 Model 对象中的 input_layer 处理该数据，这会在该对象中创建一个 output 属性。从这里开始，我们迭代 self.layers 中的层，这些层从第一个隐藏层开始。对于每一层，我们对上一层的输出数据 layer.prev.output 执行前向传播。对于第一个隐藏层，layer.prev 是 self.input_layer。调用每一层的 forward 方法时会创建该层的 output 属性，然后该属性会作为输入传递到下一层的 forward 方法调用中。一旦我们遍历了所有层，就会返回最后一层的输出。

这就是一次前向传播。现在，让我们将这个前向传播方法调用添加到 Model 类的 train 方法中：

# Forward pass
class Model:
	...
    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1):
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X)
            # Temporary
            print(output)
            sys.exit()

到目前为止的完整Model类：

# Model class
class Model:
    def __init__(self):
        # Create a list of network objects
        self.layers = []
        
    # Add objects to the model
    def add(self, layer):
        self.layers.append(layer)
    
    # Set loss and optimizer
    def set(self, *, loss, optimizer):
        self.loss = loss
        self.optimizer = optimizer
    
    # Train the model
    def train(self, X, y, *, epochs=1, print_every=1):
        # Main training loop
        for epoch in range(1, epochs+1):
            # Perform the forward pass
            output = self.forward(X)
            # Temporary
            print(output)
            sys.exit()

    # Finalize the model
    def finalize(self):
        # Create and set the input layer
        self.input_layer = Layer_Input()
        # Count all the objects
        layer_count = len(self.layers)
        # Iterate the objects
        for i in range(layer_count):
            # If it's the first layer,
            # the previous layer object is the input layer
            if i == 0:
                self.layers[i].prev = self.input_layer
                self.layers[i].next = self.layers[i+1]
            # All layers except for the first and the last
            elif i < layer_count - 1:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.layers[i+1]
            # The last layer - the next object is the loss
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss

    # Performs forward pass
    def forward(self, X):
        # Call forward method on the input layer
        # this will set the output property that
        # the first layer in "prev" object is expecting
        self.input_layer.forward(X)
        # Call forward method of every object in a chain
        # Pass output of the previous object as a parameter
        for layer in self.layers:
            layer.forward(layer.prev.output)
        # "layer" is now the last object from the list,
        # return its output
        return layer.output

最后，我们可以在主代码中添加 finalize 方法调用（请记住，除其他事项外，该方法还能让模型的图层知道它们的上一层和下一层）。

# Create dataset
X, y = sine_data()

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())

# Set loss and optimizer objects
model.set(
    loss=Loss_MeanSquaredError(),
    optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),
    )

# Finalize the model
model.finalize()

model.train(X, y, epochs=10000, print_every=100)

>>>
[[ 0.00000000e+00]
[-1.13209149e-08]
[-2.26418297e-08]
...
[-1.12869511e-05]
[-1.12982725e-05]
[-1.13095930e-05]]

此时，我们已经在Model类中覆盖了模型的前向传播。我们仍需要计算损失和准确率，并进行反向传播。在此之前，我们需要知道哪些层是“可训练的”，也就是说这些层具有我们可以调整的权重和偏置。为此，我们需要检查层是否有weights或biases属性。我们可以通过以下代码进行检查：

			# 如果层包含一个名为“weights”的属性，  
			# 那么它是一个可训练层 -  
			# 将其添加到可训练层列表中  
			# 我们不需要检查偏置 -  
			# 检查权重已经足够了  
			if hasattr(self.layers[i], 'weights'):
				self.trainable_layers.append(self.layers[i])

其中， $i$ 是层列表中某一层的索引。我们将把这段代码添加到 finalize 方法中。以下是目前该方法的完整代码：

    # Finalize the model
    def finalize(self):
        # Create and set the input layer
        self.input_layer = Layer_Input()
        # Count all the objects
        layer_count = len(self.layers)
        # Initialize a list containing trainable layers:
        self.trainable_layers = []
        # Iterate the objects
        for i in range(layer_count):
            # If it's the first layer,
            # the previous layer object is the input layer
            if i == 0:
                self.layers[i].prev = self.input_layer
                self.layers[i].next = self.layers[i+1]
            # All layers except for the first and the last
            elif i < layer_count - 1:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.layers[i+1]
            # The last layer - the next object is the loss
            # Also let's save aside the reference to the last object
            # whose output is the model's output
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss
                self.output_layer_activation = self.layers[i]
            
            # 如果层包含一个名为“weights”的属性，  
            # 那么它是一个可训练层 -  
            # 将其添加到可训练层列表中  
            # 我们不需要检查偏置 -  
            # 检查权重已经足够了  
            if hasattr(self.layers[i], 'weights'):
            	self.trainable_layers.append(self.layers[i])

接下来，我们将修改普通 Loss 类，使其包含以下内容：

# Common loss class
class Loss:
	...        
    # Calculates the data and regularization losses
    # given model output and ground truth values
    def calculate(self, output, y):
        # Calculate sample losses
        sample_losses = self.forward(output, y)
        # Calculate mean loss
        data_loss = np.mean(sample_losses)
        # Return the data and regularization losses
        return data_loss, self.regularization_loss()   
        
    # Set/remember trainable layers
    def remember_trainable_layers(self, trainable_layers):
        self.trainable_layers = trainable_layers

commonLoss 类中的 remember_trainable_layers 方法“告知”损失对象哪些是 Model 对象中的可训练层。在单次调用期间，calculate 方法已被修改为还会返回 self.regularization_loss() 的值。regularization_loss 方法目前需要一个层对象，但随着在 remember_trainable_layers 方法中设置了 self.trainable_layers 属性，我们现在可以迭代所有可训练层，以计算整个模型的正则化损失，而不是每次仅针对一个层进行计算：

# Common loss class
class Loss:
	...
	# Regularization loss calculation
    def regularization_loss(self):        
        # 0 by default
        regularization_loss = 0
        # Calculate regularization loss
        # iterate all trainable layers
        for layer in self.trainable_layers:
            # L1 regularization - weights
            # calculate only when factor greater than 0
            if layer.weight_regularizer_l1 > 0:
                regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
            # L2 regularization - weights
            if layer.weight_regularizer_l2 > 0:
                regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
            # L1 regularization - biases
            # calculate only when factor greater than 0
            if layer.bias_regularizer_l1 > 0:
                regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
            # L2 regularization - biases
            if layer.bias_regularizer_l2 > 0:
                regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
        return regularization_loss

为了计算准确率，我们需要预测结果。目前，根据模型的类型，预测需要不同的代码。例如，对于 softmax 分类器，我们使用 np.argmax()，但对于回归，由于输出层使用线性激活函数，预测结果直接为输出值。理想情况下，我们需要一个预测方法，该方法能够为我们的模型选择合适的预测方式。为此，我们将在每个激活函数类中添加一个 predictions 方法：

# Softmax activation
class Activation_Softmax:
	...            
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return np.argmax(outputs, axis=1)

# Sigmoid activation
class Activation_Sigmoid:
	...
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return (outputs > 0.5) * 1

# Linear activation
class Activation_Linear:
    ...
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return outputs

在 predictions 函数内部进行的所有计算与之前章节中针对适当模型所执行的计算相同。尽管我们没有计划将 ReLU 激活函数用于输出层的激活函数，但我们为了完整性仍会在此处包含它：

# ReLU activation
class Activation_ReLU:  
	...
    # Calculate predictions for outputs
    def predictions(self, outputs):
        return outputs

我们仍然需要在 Model 对象中为最终层的激活函数设置一个引用。之后我们可以调用 predictions 方法，该方法将根据输出计算并返回预测值。我们将在 Model 类的 finalize 方法中设置这一引用。

# Model class
class Model:
	...
	# Finalize the model
    def finalize(self):
    	...
			# The last layer - the next object is the loss
            # Also let's save aside the reference to the last object
            # whose output is the model's output
            else:
                self.layers[i].prev = self.layers[i-1]
                self.layers[i].next = self.loss
                self.output_layer_activation = self.layers[i]

就像不同的预测方法一样，我们也需要以不同的方式计算准确率。我们将以类似于特定损失类对象实现的方式来实现这一功能——创建特定的准确率类及其对象，并将它们与模型关联。

首先，我们会编写一个通用的 Accuracy 类，该类目前只包含一个方法 calculate，用于返回根据比较结果计算的准确率。我们已经在代码中添加了对 self.compare 方法的调用，但这个方法目前还不存在。我们将在继承自 Accuracy 类的其他类中创建该方法。现在只需要知道这个方法会返回一个由 True 和 False 值组成的列表，指示预测是否与真实值匹配。接下来，我们计算这些值的平均值（True 被视为1，False 被视为0），并将其作为准确率返回。代码如下：

# Common accuracy class
class Accuracy:
    # Calculates an accuracy
    # given predictions and ground truth values
    def calculate(self, predictions, y):
        # Get comparison results
        comparisons = self.compare(predictions, y)
        # Calculate an accuracy
        accuracy = np