动手学深度学习7.7. 稠密连接网络(DenseNet)-笔记&练习(PyTorch)

本节课程地址:本节无视频

本节教材地址:7.7. 稠密连接网络(DenseNet) — 动手学深度学习 2.0.0 documentation (d2l.ai)

本节开源代码:...>d2l-zh>pytorch>chapter_multilayer-perceptrons>densenet.ipynb


稠密连接网络(DenseNet)

ResNet极大地改变了如何参数化深层网络中函数的观点。 稠密连接网络(DenseNet)(1608.06993 (arxiv.org))在某种程度上是ResNet的逻辑扩展。让我们先从数学上了解一下。

从ResNet到DenseNet

回想一下任意函数的泰勒展开式(Taylor expansion),它把这个函数分解成越来越高阶的项。在 x 接近0时,

f(x) = f(0) + f'(0) x + \frac{f''(0)}{2!} x^2 + \frac{f'''(0)}{3!} x^3 + \ldots.

同样,ResNet将函数展开为

f(\mathbf{x}) = \mathbf{x} + g(\mathbf{x}).

也就是说,ResNet将 f 分解为两部分:一个简单的线性项和一个复杂的非线性项。 那么再向前拓展一步,如果我们想将 f 拓展成超过两部分的信息呢? 一种方案便是DenseNet。

如 图7.7.1 所示,ResNet和DenseNet的关键区别在于,DenseNet输出是连接(用图中的 [,] 表示)而不是如ResNet的简单相加。 因此,在应用越来越复杂的函数序列后,我们执行从 \mathbf{x} 到其展开式的映射:

\mathbf{x} \to \left[ \mathbf{x}, f_1(\mathbf{x}), f_2([\mathbf{x}, f_1(\mathbf{x})]), f_3([\mathbf{x}, f_1(\mathbf{x}), f_2([\mathbf{x}, f_1(\mathbf{x})])]), \ldots\right].

最后,将这些展开式结合到多层感知机中,再次减少特征的数量。 实现起来非常简单:我们不需要添加术语,而是将它们连接起来。 DenseNet这个名字由变量之间的“稠密连接”而得来,最后一层与之前的所有层紧密相连。 稠密连接如 图7.7.2 所示。

稠密网络主要由2部分构成:稠密块(dense block)和过渡层(transition layer)。 前者定义如何连接输入和输出,而后者则控制通道数量,使其不会太复杂。

(稠密块体)

DenseNet使用了ResNet改良版的“批量规范化、激活和卷积”架构(参见 7.6节 中的练习)。 我们首先实现一下这个架构。

import torch
from torch import nn
from d2l import torch as d2l


def conv_block(input_channels, num_channels):
    return nn.Sequential(
        nn.BatchNorm2d(input_channels), nn.ReLU(),
        nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1))

一个稠密块由多个卷积块组成,每个卷积块使用相同数量的输出通道。 然而,在前向传播中,我们将每个卷积块的输入和输出在通道维上连结。

class DenseBlock(nn.Module):
    def __init__(self, num_convs, input_channels, num_channels):
        super(DenseBlock, self).__init__()
        layer = []
        for i in range(num_convs):
            layer.append(conv_block(
                num_channels * i + input_channels, num_channels))
        self.net = nn.Sequential(*layer)

    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            # 连接通道维度上每个块的输入和输出
            X = torch.cat((X, Y), dim=1)
        return X

在下面的例子中,我们[定义一个]有2个输出通道数为10的(DenseBlock)。 使用通道数为3的输入时,我们会得到通道数为 3+2×10=23 的输出。 卷积块的通道数控制了输出通道数相对于输入通道数的增长,因此也被称为增长率(growth rate)。

blk = DenseBlock(2, 3, 10)
X = torch.randn(4, 3, 8, 8)
Y = blk(X)
Y.shape
torch.Size([4, 23, 8, 8])

[过渡层]

由于每个稠密块都会带来通道数的增加,使用过多则会过于复杂化模型。 而过渡层可以用来控制模型复杂度。 它通过 1×1 卷积层来减小通道数,并使用步幅为2的平均汇聚层减半高和宽,从而进一步降低模型复杂度。

def transition_block(input_channels, num_channels):
    return nn.Sequential(
        nn.BatchNorm2d(input_channels), nn.ReLU(),
        nn.Conv2d(input_channels, num_channels, kernel_size=1),
        nn.AvgPool2d(kernel_size=2, stride=2))

对上一个例子中稠密块的输出[使用]通道数为10的[过渡层]。 此时输出的通道数减为10,高和宽均减半。

blk = transition_block(23, 10)
blk(Y).shape
torch.Size([4, 10, 4, 4])

[DenseNet模型]

我们来构造DenseNet模型。DenseNet首先使用同ResNet一样的单卷积层和最大汇聚层。

b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

接下来,类似于ResNet使用的4个残差块,DenseNet使用的是4个稠密块。 与ResNet类似,我们可以设置每个稠密块使用多少个卷积层。 这里我们设成4,从而与 7.6节 的ResNet-18保持一致。 稠密块里的卷积层通道数(即增长率)设为32,所以每个稠密块将增加128个通道。

在每个模块之间,ResNet通过步幅为2的残差块减小高和宽,DenseNet则使用过渡层来减半高和宽,并减半通道数。

# num_channels为当前的通道数
num_channels, growth_rate = 64, 32
num_convs_in_dense_blocks = [4, 4, 4, 4]
blks = []
for i, num_convs in enumerate(num_convs_in_dense_blocks):
    blks.append(DenseBlock(num_convs, num_channels, growth_rate))
    # 上一个稠密块的输出通道数
    num_channels += num_convs * growth_rate
    # 在稠密块之间添加一个转换层,使通道数量减半
    if i != len(num_convs_in_dense_blocks) - 1:
        blks.append(transition_block(num_channels, num_channels // 2))
        num_channels = num_channels // 2

与ResNet类似,最后接上全局汇聚层和全连接层来输出结果。

net = nn.Sequential(
    b1, *blks,
    nn.BatchNorm2d(num_channels), nn.ReLU(),
    nn.AdaptiveAvgPool2d((1, 1)),
    nn.Flatten(),
    nn.Linear(num_channels, 10))
X = torch.rand(size=(1, 1, 224, 224))
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)

输出结果:
Sequential output shape: torch.Size([1, 64, 56, 56])
DenseBlock output shape: torch.Size([1, 192, 56, 56])
Sequential output shape: torch.Size([1, 96, 28, 28])
DenseBlock output shape: torch.Size([1, 224, 28, 28])
Sequential output shape: torch.Size([1, 112, 14, 14])
DenseBlock output shape: torch.Size([1, 240, 14, 14])
Sequential output shape: torch.Size([1, 120, 7, 7])
DenseBlock output shape: torch.Size([1, 248, 7, 7])
BatchNorm2d output shape: torch.Size([1, 248, 7, 7])
ReLU output shape: torch.Size([1, 248, 7, 7])
AdaptiveAvgPool2d output shape: torch.Size([1, 248, 1, 1])
Flatten output shape: torch.Size([1, 248])
Linear output shape: torch.Size([1, 10])

[训练模型]

由于这里使用了比较深的网络,本节里我们将输入高和宽从224降到96来简化计算。

lr, num_epochs, batch_size = 0.1, 10, 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())

输出结果:
loss 0.141, train acc 0.948, test acc 0.837
3424.4 examples/sec on cuda:0

小结

  • 在跨层连接上,不同于ResNet中将输入与输出相加,稠密连接网络(DenseNet)在通道维上连结输入与输出。
  • DenseNet的主要构建模块是稠密块和过渡层。
  • 在构建DenseNet时,我们需要通过添加过渡层来控制网络的维数,从而再次减少通道的数量。

练习

  1. 为什么我们在过渡层使用平均汇聚层而不是最大汇聚层?

解:
可能原因:

  • 平均汇聚层对输入特征图的所有像素进行平均,有助于保留特征图中的整体信息,而不仅仅是最大值,可以提供更平滑的梯度和更稳定的训练过程。
  • 最大汇聚层可能会丢失一些重要信息,因为它只保留每个区域的最大值。相比之下,平均汇聚层通过考虑所有像素来减少信息丢失。
  • 平均汇聚层有助于梯度在网络中的流动,因为它不会像最大汇聚层那样在反向传播时导致梯度截断的问题。

2. DenseNet的优点之一是其模型参数比ResNet小。为什么呢?
解:
得益于以下几个设计:

  • DenseNet通过将前面所有层的输出在通道维度上连结起来,形成下一层的输入,这样每一层都接收到了来自前面所有层的特征图。这意味着网络可以重用特征,而不是在每一层重新学习相同的特征,从而减少了参数数量。
  • DenseNet的主要构建模块是稠密块,在稠密块中,所有层共享相同的卷积核尺寸和步长,从而减少了模型的复杂性和参数数量。
  • DenseNet使用1×1卷积和平均汇聚层作为过渡层,以降低特征图的空间维度,有助于控制模型的大小,因为它减少了后续层的输入维度,从而减少了参数数量。

3. DenseNet一个诟病的问题是内存或显存消耗过多。

1)真的是这样吗?可以把输入形状换成 224×224,来看看实际的显存消耗。

2)有另一种方法来减少显存消耗吗?需要改变框架么?

解:
1)DenseNet确实显存消耗大,输入形状换成224×224时,会报错:GPU内存不足,batch_size为256时,显存为8.08 GB。
显存计算如下:

2)可以减小输入尺寸或者减小通道数来减少显存消耗,不需要改变框架,但可能影响模型性能;
或者降低模型深度,需要改变框架。

4. 实现DenseNet论文 :cite:Huang.Liu.Van-Der-Maaten.ea.2017表1所示的不同DenseNet版本。
解:
论文链接:https://arxiv.org/pdf/1608.06993
表1:

不同DenseNet版本实现如下:

#DenseNet-121
def conv_block_v2(input_channels, num_channels):
    return nn.Sequential(
        nn.BatchNorm2d(input_channels), nn.ReLU(),
        nn.Conv2d(input_channels, num_channels, kernel_size=1), 
        nn.BatchNorm2d(num_channels), nn.ReLU(),
        nn.Conv2d(num_channels, num_channels, kernel_size=3, padding=1))
class DenseBlock(nn.Module):
    def __init__(self, num_convs, input_channels, num_channels):
        super(DenseBlock, self).__init__()
        layer = []
        for i in range(num_convs):
            layer.append(conv_block_v2(
                num_channels * i + input_channels, num_channels))
        self.net = nn.Sequential(*layer)

    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            # 连接通道维度上每个块的输入和输出
            X = torch.cat((X, Y), dim=1)
        return X
# b1不变
b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
num_channels, growth_rate = 64, 32
# 稠密块的数目更改
num_convs_in_dense_blocks = [6, 12, 24, 16]
blks = []
for i, num_convs in enumerate(num_convs_in_dense_blocks):
    blks.append(DenseBlock(num_convs, num_channels, growth_rate))
    num_channels += num_convs * growth_rate
    if i != len(num_convs_in_dense_blocks) - 1:
        blks.append(transition_block(num_channels, num_channels // 2))
        num_channels = num_channels // 2
net121 = nn.Sequential(
    b1, *blks,
    nn.BatchNorm2d(num_channels), nn.ReLU(),
    nn.AdaptiveAvgPool2d((1, 1)),
    nn.Flatten(),
    nn.Linear(num_channels, 10))

X = torch.rand(size=(1, 1, 224, 224))
for layer in net121:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)

输出结果:
Sequential output shape: torch.Size([1, 64, 56, 56])
DenseBlock output shape: torch.Size([1, 256, 56, 56])
Sequential output shape: torch.Size([1, 128, 28, 28])
DenseBlock output shape: torch.Size([1, 512, 28, 28])
Sequential output shape: torch.Size([1, 256, 14, 14])
DenseBlock output shape: torch.Size([1, 1024, 14, 14])
Sequential output shape: torch.Size([1, 512, 7, 7])
DenseBlock output shape: torch.Size([1, 1024, 7, 7])
BatchNorm2d output shape: torch.Size([1, 1024, 7, 7])
ReLU output shape: torch.Size([1, 1024, 7, 7])
AdaptiveAvgPool2d output shape: torch.Size([1, 1024, 1, 1])
Flatten output shape: torch.Size([1, 1024])
Linear output shape: torch.Size([1, 10])

#DenseNet-169
# b1不变
b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
num_channels, growth_rate = 64, 32
# 稠密块的数目更改
num_convs_in_dense_blocks = [6, 12, 32, 32]
blks = []
for i, num_convs in enumerate(num_convs_in_dense_blocks):
    blks.append(DenseBlock(num_convs, num_channels, growth_rate))
    num_channels += num_convs * growth_rate
    if i != len(num_convs_in_dense_blocks) - 1:
        blks.append(transition_block(num_channels, num_channels // 2))
        num_channels = num_channels // 2
net169 = nn.Sequential(
    b1, *blks,
    nn.BatchNorm2d(num_channels), nn.ReLU(),
    nn.AdaptiveAvgPool2d((1, 1)),
    nn.Flatten(),
    nn.Linear(num_channels, 10))

X = torch.rand(size=(1, 1, 224, 224))
for layer in net169:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)

输出结果:
Sequential output shape: torch.Size([1, 64, 56, 56])
DenseBlock output shape: torch.Size([1, 256, 56, 56])
Sequential output shape: torch.Size([1, 128, 28, 28])
DenseBlock output shape: torch.Size([1, 512, 28, 28])
Sequential output shape: torch.Size([1, 256, 14, 14])
DenseBlock output shape: torch.Size([1, 1280, 14, 14])
Sequential output shape: torch.Size([1, 640, 7, 7])
DenseBlock output shape: torch.Size([1, 1664, 7, 7])
BatchNorm2d output shape: torch.Size([1, 1664, 7, 7])
ReLU output shape: torch.Size([1, 1664, 7, 7])
AdaptiveAvgPool2d output shape: torch.Size([1, 1664, 1, 1])
Flatten output shape: torch.Size([1, 1664])
Linear output shape: torch.Size([1, 10])

#DenseNet-201
# b1不变
b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
num_channels, growth_rate = 64, 32
# 稠密块的数目更改
num_convs_in_dense_blocks = [6, 12, 48, 32]
blks = []
for i, num_convs in enumerate(num_convs_in_dense_blocks):
    blks.append(DenseBlock(num_convs, num_channels, growth_rate))
    num_channels += num_convs * growth_rate
    if i != len(num_convs_in_dense_blocks) - 1:
        blks.append(transition_block(num_channels, num_channels // 2))
        num_channels = num_channels // 2
net201 = nn.Sequential(
    b1, *blks,
    nn.BatchNorm2d(num_channels), nn.ReLU(),
    nn.AdaptiveAvgPool2d((1, 1)),
    nn.Flatten(),
    nn.Linear(num_channels, 10))

X = torch.rand(size=(1, 1, 224, 224))
for layer in net201:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)

输出结果:
Sequential output shape: torch.Size([1, 64, 56, 56])
DenseBlock output shape: torch.Size([1, 256, 56, 56])
Sequential output shape: torch.Size([1, 128, 28, 28])
DenseBlock output shape: torch.Size([1, 512, 28, 28])
Sequential output shape: torch.Size([1, 256, 14, 14])
DenseBlock output shape: torch.Size([1, 1792, 14, 14])
Sequential output shape: torch.Size([1, 896, 7, 7])
DenseBlock output shape: torch.Size([1, 1920, 7, 7])
BatchNorm2d output shape: torch.Size([1, 1920, 7, 7])
ReLU output shape: torch.Size([1, 1920, 7, 7])
AdaptiveAvgPool2d output shape: torch.Size([1, 1920, 1, 1])
Flatten output shape: torch.Size([1, 1920])
Linear output shape: torch.Size([1, 10])

#DenseNet-264
# b1不变
b1 = nn.Sequential(
    nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm2d(64), nn.ReLU(),
    nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
num_channels, growth_rate = 64, 32
# 稠密块的数目更改
num_convs_in_dense_blocks = [6, 12, 64, 48]
blks = []
for i, num_convs in enumerate(num_convs_in_dense_blocks):
    blks.append(DenseBlock(num_convs, num_channels, growth_rate))
    num_channels += num_convs * growth_rate
    if i != len(num_convs_in_dense_blocks) - 1:
        blks.append(transition_block(num_channels, num_channels // 2))
        num_channels = num_channels // 2
net264 = nn.Sequential(
    b1, *blks,
    nn.BatchNorm2d(num_channels), nn.ReLU(),
    nn.AdaptiveAvgPool2d((1, 1)),
    nn.Flatten(),
    nn.Linear(num_channels, 10))

X = torch.rand(size=(1, 1, 224, 224))
for layer in net264:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t', X.shape)

输出结果:
Sequential output shape: torch.Size([1, 64, 56, 56])
DenseBlock output shape: torch.Size([1, 256, 56, 56])
Sequential output shape: torch.Size([1, 128, 28, 28])
DenseBlock output shape: torch.Size([1, 512, 28, 28])
Sequential output shape: torch.Size([1, 256, 14, 14])
DenseBlock output shape: torch.Size([1, 2304, 14, 14])
Sequential output shape: torch.Size([1, 1152, 7, 7])
DenseBlock output shape: torch.Size([1, 2688, 7, 7])
BatchNorm2d output shape: torch.Size([1, 2688, 7, 7])
ReLU output shape: torch.Size([1, 2688, 7, 7])
AdaptiveAvgPool2d output shape: torch.Size([1, 2688, 1, 1])
Flatten output shape: torch.Size([1, 2688])
Linear output shape: torch.Size([1, 10])

5. 应用DenseNet的思想设计一个基于多层感知机的模型。将其应用于4.10节 中的房价预测任务。

解:
应用DenseNet思想设计的MLP,与4.10节相同超参数时,提交Kaggle的预测结果更好些,为0.15165(原为0.16715)。

import hashlib
import os
import tarfile
import zipfile
import requests
import numpy as np
import pandas as pd
import torch
from torch import nn
from d2l import torch as d2l
# 数据准备
def download(name, cache_dir=os.path.join('..', 'data')):  #@save
    """下载一个DATA_HUB中的文件,返回本地文件名"""
    assert name in DATA_HUB, f"{name} 不存在于 {DATA_HUB}"
    url, sha1_hash = DATA_HUB[name]
    os.makedirs(cache_dir, exist_ok=True)
    fname = os.path.join(cache_dir, url.split('/')[-1])
    if os.path.exists(fname):
        sha1 = hashlib.sha1()
        with open(fname, 'rb') as f:
            while True:
                data = f.read(1048576)
                if not data:
                    break
                sha1.update(data)
        if sha1.hexdigest() == sha1_hash:
            return fname  # 命中缓存
    print(f'正在从{url}下载{fname}...')
    r = requests.get(url, stream=True, verify=True)
    with open(fname, 'wb') as f:
        f.write(r.content)
    return fname

DATA_HUB = dict()
DATA_URL = 'http://d2l-data.s3-accelerate.amazonaws.com/'
DATA_HUB['kaggle_house_train'] = (  #@save
    DATA_URL + 'kaggle_house_pred_train.csv',
    '585e9cc93e70b39160e7921475f9bcd7d31219ce')

DATA_HUB['kaggle_house_test'] = (  #@save
    DATA_URL + 'kaggle_house_pred_test.csv',
    'fa19780a7b011d9b009e8bff8e99922a8ee2eb90')

train_data = pd.read_csv(download('kaggle_house_train'))
test_data = pd.read_csv(download('kaggle_house_test'))
# 数据预处理
all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))

numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index
all_features[numeric_features] = all_features[numeric_features].apply(
    lambda x: (x - x.mean()) / (x.std()))
all_features[numeric_features] = all_features[numeric_features].fillna(0)

all_features = pd.get_dummies(all_features, dummy_na=True)
all_features = all_features * 1

n_train = train_data.shape[0]
train_features = torch.tensor(all_features[:n_train].values, dtype=torch.float32)
test_features = torch.tensor(all_features[n_train:].values, dtype=torch.float32)
train_labels = torch.tensor(
    train_data.SalePrice.values.reshape(-1, 1), dtype=torch.float32)
# 根据DenseNet设计MLP
def conv_block_1d(input_channels, num_channels):
    return nn.Sequential(
        nn.BatchNorm1d(input_channels), nn.ReLU(),
        nn.Conv1d(input_channels, growth_rate, kernel_size=3, padding=1))

class DenseBlock_1d(nn.Module):
    def __init__(self, num_convs, input_channels, num_channels):
        super(DenseBlock_1d, self).__init__()
        layer = []
        for i in range(num_convs):
            layer.append(conv_block_1d(
                num_channels * i + input_channels, num_channels))
        self.net = nn.Sequential(*layer)

    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            # 连接通道维度上每个块的输入和输出
            X = torch.cat((X, Y), dim=1)
        return X

def transition_block_1d(input_channels, num_channels):
    return nn.Sequential(
        nn.BatchNorm1d(input_channels), nn.ReLU(),
        nn.Conv1d(input_channels, num_channels, kernel_size=1),
        nn.AvgPool1d(kernel_size=2, stride=2))

b1 = nn.Sequential(
    nn.Conv1d(1, 64, kernel_size=7, stride=2, padding=3),
    nn.BatchNorm1d(64), nn.ReLU(),
    nn.MaxPool1d(kernel_size=3, stride=2, padding=1))
num_channels, growth_rate = 64, 16
num_convs_in_dense_blocks = [4, 4, 4, 4]
blks = []
for i, num_convs in enumerate(num_convs_in_dense_blocks):
    blks.append(DenseBlock_1d(num_convs, num_channels, growth_rate))
    # 上一个稠密块的输出通道数
    num_channels += num_convs * growth_rate
    # 在稠密块之间添加一个转换层,使通道数量减半
    if i != len(num_convs_in_dense_blocks) - 1:
        blks.append(transition_block_1d(num_channels, num_channels // 2))
        num_channels = num_channels // 2

def get_net():
    net = nn.Sequential(
        b1, *blks,
        nn.BatchNorm1d(num_channels), nn.ReLU(),
        nn.AdaptiveAvgPool1d((1)),
        nn.Flatten(),
        nn.Linear(num_channels, 1))
    return net
# 训练
loss = nn.MSELoss()

def log_rmse(net, features, labels):
    # 将features reshape成(batch_size,input_channels,width)
    features = features.unsqueeze(1)
    clipped_preds = torch.clamp(net(features), 1, float('inf'))
    rmse = torch.sqrt(loss(torch.log(clipped_preds),
                           torch.log(labels)))
    return rmse.item()

def train(net, train_features, train_labels, test_features, test_labels,
          num_epochs, learning_rate, weight_decay, batch_size):
    train_ls, test_ls = [], []
    train_iter = d2l.load_array((train_features, train_labels), batch_size)
    optimizer = torch.optim.Adam(net.parameters(),
                                 lr = learning_rate,
                                 weight_decay = weight_decay)
    for epoch in range(num_epochs):
        for X, y in train_iter:
            optimizer.zero_grad()
            # 将X reshape成(batch_size,input_channels,width)
            X = X.unsqueeze(1)
            l = loss(net(X), y)
            l.backward()
            optimizer.step()
        train_ls.append(log_rmse(net, train_features, train_labels))
        if test_labels is not None:
            test_ls.append(log_rmse(net, test_features, test_labels))
    return train_ls, test_ls

def get_k_fold_data(k, i, X, y):
    assert k > 1
    fold_size = X.shape[0] // k
    X_train, y_train = None, None
    for j in range(k):
        idx = slice(j * fold_size, (j + 1) * fold_size)
        X_part, y_part = X[idx, :], y[idx]
        if j == i:
            X_valid, y_valid = X_part, y_part
        elif X_train is None:
            X_train, y_train = X_part, y_part
        else:
            X_train = torch.cat([X_train, X_part], 0)
            y_train = torch.cat([y_train, y_part], 0)
    return X_train, y_train, X_valid, y_valid

def k_fold(k, X_train, y_train, num_epochs, learning_rate, weight_decay,
           batch_size):
    train_l_sum, valid_l_sum = 0, 0
    for i in range(k):
        data = get_k_fold_data(k, i, X_train, y_train)
        net = get_net()
        train_ls, valid_ls = train(net, *data, num_epochs, learning_rate,
                                   weight_decay, batch_size)
        train_l_sum += train_ls[-1]
        valid_l_sum += valid_ls[-1]
        if i == 0:
            d2l.plot(list(range(1, num_epochs + 1)), [train_ls, valid_ls],
                     xlabel='epoch', ylabel='rmse', xlim=[1, num_epochs],
                     legend=['train', 'valid'], yscale='log')
        print(f'折{i + 1},训练log rmse{float(train_ls[-1]):f}, '
              f'验证log rmse{float(valid_ls[-1]):f}')
    return train_l_sum / k, valid_l_sum / k

k, num_epochs, lr, weight_decay, batch_size = 5, 100, 5, 0, 64
train_l, valid_l = k_fold(k, train_features, train_labels, num_epochs, lr,
                          weight_decay, batch_size)
print(f'{k}-折验证: 平均训练log rmse: {float(train_l):f}, '
      f'平均验证log rmse: {float(valid_l):f}')

输出结果:
折1,训练log rmse0.075542, 验证log rmse0.147438
折2,训练log rmse0.050742, 验证log rmse0.137624
折3,训练log rmse0.046990, 验证log rmse0.124749
折4,训练log rmse0.040194, 验证log rmse0.110081
折5,训练log rmse0.024984, 验证log rmse0.085519
5-折验证: 平均训练log rmse: 0.047690, 平均验证log rmse: 0.121082

# 预测
def train_and_pred(train_features, test_features, train_labels, test_data,
                   num_epochs, lr, weight_decay, batch_size):
    net = get_net()
    train_ls, _ = train(net, train_features, train_labels, None, None,
                        num_epochs, lr, weight_decay, batch_size)
    d2l.plot(np.arange(1, num_epochs + 1), [train_ls], xlabel='epoch',
             ylabel='log rmse', xlim=[1, num_epochs], yscale='log')
    print(f'训练log rmse:{float(train_ls[-1]):f}')
    # 将网络应用于测试集。
    # 将test_features reshape成(batch_size,input_channels,width)
    test_features = test_features.unsqueeze(1)
    preds = net(test_features).detach().numpy()
    # 将其重新格式化以导出到Kaggle
    test_data['SalePrice'] = pd.Series(preds.reshape(1, -1)[0])
    submission = pd.concat([test_data['Id'], test_data['SalePrice']], axis=1)
    submission.to_csv('submission.csv', index=False)

train_and_pred(train_features, test_features, train_labels, test_data,
               num_epochs, lr, weight_decay, batch_size)

输出结果:
训练log rmse:0.044846

  • 5
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
动手深度学习 pytorch.pdf》是一本关于深度学习PyTorch框架的习教材。它由苏剑林等人编写,以PyTorch作为工具,全面介绍了深度学习的基本原理和应用。该教材主要分为6个部分,包括深度学习基础、计算机视觉、自然语言处理、生成对抗网络、深度强化习和工作实践。通过大量的例子和实践,读者可以深入理解深度学习的核心概念以及如何使用PyTorch实现深度学习模型。 首先,教材通过深度学习基础部分介绍了神经网络的基本原理、损失函数、优化算法等核心知识。接着,计算机视觉部分详细解释了图像分类、目标检测、图像风格转换等任务的实现方法。在自然语言处理部分,教材展示了如何用深度学习模型进行文本分类、语义理解等任务。生成对抗网络部分讲解了生成模型、判别模型和生成对抗训练等关键概念。深度强化习部分介绍了如何使用深度学习与强化习相结合解决控制问题。最后的工作实践部分通过实际场景案例,指导读者如何将深度学习应用到实际项目中。 《动手深度学习 pytorch.pdf》内容详实,既有理论知识又有实际应用的案例,适合既想理解深度学习基本原理又想动手实践的读者。无论是初者还是有一定基础的习者,都可以通过这本教材系统地深度学习PyTorch。总之,这本教材是深度学习的一本宝典,可以帮助读者快速入门并深入掌握深度学习PyTorch的使用。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

scdifsn

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值